๐ Keras Data Processor (KDP)

Transform your raw data into powerful ML-ready features
A high-performance preprocessing library for tabular data built on TensorFlow. KDP combines the best of traditional preprocessing with advanced neural approaches to create state-of-the-art feature transformations.
๐ Getting Started
๐ ๏ธ Feature Processing
โก Optimization
๐ Integrations
๐ Examples
๐ Reference
๐ค Contributing
๐ Key Features
- โ Smart distribution detection
- โ Neural feature interactions
- โ Feature-wise Mixture of Experts
- โ Memory-efficient processing
- โ Single-pass optimization
- โ Production-ready scaling
๐ Why Choose KDP?
Challenge | Traditional Approach | KDP's Solution |
---|---|---|
Complex Distributions | Fixed binning strategies | Distribution-Aware Encoding that adapts to your specific data |
Interaction Discovery | Manual feature crosses | Tabular Attention that automatically finds important relationships |
Heterogeneous Features | Uniform processing | Feature-wise Mixture of Experts that specializes processing per feature |
Feature Importance | Post-hoc analysis | Built-in Feature Selection during training |
Performance at Scale | Memory issues with large datasets | Optimized Processing Pipeline with batching and caching |
๐ Quick Example
from kdp import PreprocessingModel, FeatureType
# Define your features
features = {
"age": FeatureType.FLOAT_NORMALIZED,
"income": FeatureType.FLOAT_RESCALED,
"occupation": FeatureType.STRING_CATEGORICAL,
"description": FeatureType.TEXT
}
# Create and build your preprocessor
preprocessor = PreprocessingModel(
path_data="data.csv",
features_specs=features,
use_distribution_aware=True, # Smart distribution handling
tabular_attention=True, # Automatic feature interactions
use_feature_moe=True, # Specialized processing per feature
feature_moe_num_experts=4 # Number of specialized experts
)
# Build and use
result = preprocessor.build_preprocessor()
model = result["model"]
๐ Architecture Diagram
%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#f0f7ff', 'primaryTextColor': '#333', 'primaryBorderColor': '#4a86e8', 'lineColor': '#4a86e8', 'secondaryColor': '#fff0f7', 'tertiaryColor': '#f7fff0' }}}%%
graph TD
A[Raw Data] --> B[PreprocessingModel]
B --> |Numerical Features| C1[Distribution-Aware Encoding]
B --> |Categorical Features| C2[Smart Encoding]
B --> |Text Features| C3[Text Vectorization]
B --> |Date Features| C4[Date Preprocessing]
C1 --> D[Feature-wise MoE]
C2 --> D
C3 --> D
C4 --> D
D --> E[Tabular Attention]
E --> F[Feature Selection]
F --> G[ML-Ready Features]
classDef input fill:#e6f3ff,stroke:#4a86e8,stroke-width:2px,rx:8px,ry:8px;
classDef process fill:#fff9e6,stroke:#ffb74d,stroke-width:2px,rx:8px,ry:8px;
classDef feature fill:#e6fff9,stroke:#26a69a,stroke-width:2px,rx:8px,ry:8px;
classDef output fill:#e8f5e9,stroke:#66bb6a,stroke-width:2px,rx:8px,ry:8px;
class A input;
class B,C1,C2,C3,C4 process;
class D,E,F feature;
class G output;
๐ Find What You Need
New to KDP? Start with the
Feature Processing section
Specific feature type? Check the
Optimization guides
Performance issues? See the
Integration Overview section
Integration help? Visit the
Practical examples? Browse our
API Reference documentation
API details? Refer to the