🌟 Keras Data Processor (KDP)

🚀 Getting Started

🛠️ Feature Processing

🧠 Advanced Features

⚡ Optimization

🔗 Integrations

Integration Overview

📚 Examples

📚 Reference

API Reference

🤝 Contributing

📈 Key Features

✓ Smart distribution detection
✓ Neural feature interactions
✓ Feature-wise Mixture of Experts
✓ Memory-efficient processing
✓ Single-pass optimization
✓ Production-ready scaling

🏆 Why Choose KDP?

Challenge	Traditional Approach	KDP's Solution
Complex Distributions	Fixed binning strategies	📊 Distribution-Aware Encoding that adapts to your specific data
Interaction Discovery	Manual feature crosses	👁️ Tabular Attention that automatically finds important relationships
Heterogeneous Features	Uniform processing	🧩 Feature-wise Mixture of Experts that specializes processing per feature
Feature Importance	Post-hoc analysis	🎯 Built-in Feature Selection during training
Performance at Scale	Memory issues with large datasets	⚡ Optimized Processing Pipeline with batching and caching

🚀 Quick Example

from kdp import PreprocessingModel, FeatureType

# Define your features
features = {
    "age": FeatureType.FLOAT_NORMALIZED,
    "income": FeatureType.FLOAT_RESCALED,
    "occupation": FeatureType.STRING_CATEGORICAL,
    "description": FeatureType.TEXT
}

# Create and build your preprocessor
preprocessor = PreprocessingModel(
    path_data="data.csv",
    features_specs=features,
    use_distribution_aware=True,  # Smart distribution handling
    tabular_attention=True,       # Automatic feature interactions
    use_feature_moe=True,         # Specialized processing per feature
    feature_moe_num_experts=4     # Number of specialized experts
)

# Build and use
result = preprocessor.build_preprocessor()
model = result["model"]

🔄 Architecture Diagram

%%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#f0f7ff', 'primaryTextColor': '#333', 'primaryBorderColor': '#4a86e8', 'lineColor': '#4a86e8', 'secondaryColor': '#fff0f7', 'tertiaryColor': '#f7fff0' }}}%%
graph TD
    A[Raw Data] --> B[PreprocessingModel]
    B --> |Numerical Features| C1[Distribution-Aware Encoding]
    B --> |Categorical Features| C2[Smart Encoding]
    B --> |Text Features| C3[Text Vectorization]
    B --> |Date Features| C4[Date Preprocessing]

    C1 --> D[Feature-wise MoE]
    C2 --> D
    C3 --> D
    C4 --> D

    D --> E[Tabular Attention]
    E --> F[Feature Selection]
    F --> G[ML-Ready Features]

    classDef input fill:#e6f3ff,stroke:#4a86e8,stroke-width:2px,rx:8px,ry:8px;
    classDef process fill:#fff9e6,stroke:#ffb74d,stroke-width:2px,rx:8px,ry:8px;
    classDef feature fill:#e6fff9,stroke:#26a69a,stroke-width:2px,rx:8px,ry:8px;
    classDef output fill:#e8f5e9,stroke:#66bb6a,stroke-width:2px,rx:8px,ry:8px;

    class A input;
    class B,C1,C2,C3,C4 process;
    class D,E,F feature;
    class G output;

🔍 Find What You Need

🔰 New to KDP? Start with the Quick Start Guide

🔍 Specific feature type? Check the Feature Processing section

⚡ Performance issues? See the Optimization guides

🔌 Integration help? Visit the Integration Overview section

📝 Practical examples? Browse our Examples

📚 API details? Refer to the API Reference documentation

📣 Community & Support

🐙 GitHub Repository 🐛 Issue Tracker

📜 MIT License - Open source and free to use