Skip to content

๐ŸŒŸ Keras Data Processor (KDP)

Transform your raw data into powerful ML-ready features

A high-performance preprocessing library for tabular data built on TensorFlow. KDP combines the best of traditional preprocessing with advanced neural approaches to create state-of-the-art feature transformations.

๐Ÿ”— Integrations

๐Ÿ“š Examples

๐Ÿ“š Reference

๐Ÿค Contributing

๐Ÿ“ˆ Key Features

  • โœ“ Smart distribution detection
  • โœ“ Neural feature interactions
  • โœ“ Feature-wise Mixture of Experts
  • โœ“ Memory-efficient processing
  • โœ“ Single-pass optimization
  • โœ“ Production-ready scaling

๐Ÿ† Why Choose KDP?

Challenge Traditional Approach KDP's Solution
Complex Distributions Fixed binning strategies ๐Ÿ“Š Distribution-Aware Encoding that adapts to your specific data
Interaction Discovery Manual feature crosses ๐Ÿ‘๏ธ Tabular Attention that automatically finds important relationships
Heterogeneous Features Uniform processing ๐Ÿงฉ Feature-wise Mixture of Experts that specializes processing per feature
Feature Importance Post-hoc analysis ๐ŸŽฏ Built-in Feature Selection during training
Performance at Scale Memory issues with large datasets โšก Optimized Processing Pipeline with batching and caching

๐Ÿš€ Quick Example

from kdp import PreprocessingModel, FeatureType

# Define your features
features = {
    "age": FeatureType.FLOAT_NORMALIZED,
    "income": FeatureType.FLOAT_RESCALED,
    "occupation": FeatureType.STRING_CATEGORICAL,
    "description": FeatureType.TEXT
}

# Create and build your preprocessor
preprocessor = PreprocessingModel(
    path_data="data.csv",
    features_specs=features,
    use_distribution_aware=True,  # Smart distribution handling
    tabular_attention=True,       # Automatic feature interactions
    use_feature_moe=True,         # Specialized processing per feature
    feature_moe_num_experts=4     # Number of specialized experts
)

# Build and use
result = preprocessor.build_preprocessor()
model = result["model"]

๐Ÿ”„ Architecture Diagram

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#f0f7ff', 'primaryTextColor': '#333', 'primaryBorderColor': '#4a86e8', 'lineColor': '#4a86e8', 'secondaryColor': '#fff0f7', 'tertiaryColor': '#f7fff0' }}}%%
graph TD
    A[Raw Data] --> B[PreprocessingModel]
    B --> |Numerical Features| C1[Distribution-Aware Encoding]
    B --> |Categorical Features| C2[Smart Encoding]
    B --> |Text Features| C3[Text Vectorization]
    B --> |Date Features| C4[Date Preprocessing]

    C1 --> D[Feature-wise MoE]
    C2 --> D
    C3 --> D
    C4 --> D

    D --> E[Tabular Attention]
    E --> F[Feature Selection]
    F --> G[ML-Ready Features]

    classDef input fill:#e6f3ff,stroke:#4a86e8,stroke-width:2px,rx:8px,ry:8px;
    classDef process fill:#fff9e6,stroke:#ffb74d,stroke-width:2px,rx:8px,ry:8px;
    classDef feature fill:#e6fff9,stroke:#26a69a,stroke-width:2px,rx:8px,ry:8px;
    classDef output fill:#e8f5e9,stroke:#66bb6a,stroke-width:2px,rx:8px,ry:8px;

    class A input;
    class B,C1,C2,C3,C4 process;
    class D,E,F feature;
    class G output;

๐Ÿ” Find What You Need

๐Ÿ”ฐ New to KDP? Start with the Quick Start Guide
๐Ÿ” Specific feature type? Check the Feature Processing section
โšก Performance issues? See the Optimization guides
๐Ÿ”Œ Integration help? Visit the Integration Overview section
๐Ÿ“ Practical examples? Browse our Examples
๐Ÿ“š API details? Refer to the API Reference documentation

๐Ÿ“ฃ Community & Support

๐Ÿ™ GitHub Repository ๐Ÿ› Issue Tracker
๐Ÿ“œ MIT License - Open source and free to use