🧙‍♂️ Auto-Configuration: Analytics and Recommendations

Auto-Configuration: Analytics and Recommendations

Let KDP analyze your data and suggest the optimal preprocessing

Intelligent Data Analysis

Auto-Configuration examines your dataset and provides intelligent recommendations for feature processing, helping you build better models faster.

🚀 Getting Started

Basic Usage

from kdp import auto_configure, PreprocessingModel

# Analyze data and get recommendations
config = auto_configure("customer_data.csv")

# Review the recommendations
recommendations = config["recommendations"]
code_snippet = config["code_snippet"]

# Create your preprocessor using the code snippet as a guide
# Note: You'll need to manually implement the suggestions

✨ What Auto-Configuration Provides

🔍

Distribution Analysis

Identifies patterns in your numeric data to suggest optimal transformations

📊

Feature Statistics

Calculates important statistics about your features to guide preprocessing

💡

Preprocessing Recommendations

Suggests appropriate feature types and transformations based on data analysis

📝

Example Code

Generates ready-to-use code snippets based on the analysis

🔍 What It Analyzes

Data Characteristic	Example	What It Detects
Distribution Types	Log-normal income, bimodal age	Statistical distribution patterns
Feature Statistics	Mean, variance, skewness	Basic statistical properties
Data Ranges	Min/max values, outliers	Value boundaries and extremes
Value Patterns	Discrete vs continuous	How values are distributed

💼 Examples

🔎

Basic Analysis

# Basic auto-configuration analysis
config = auto_configure(
    "customer_data.csv",  # Your dataset
    batch_size=50000,     # Process in batches of this size
    save_stats=True       # Save computed statistics
)

# Review the recommendations
for feature_name, recommendation in config["recommendations"].items():
    print(f"Feature: {feature_name}")
    print(f"  Type: {recommendation['feature_type']}")
    print(f"  Preprocessing: {recommendation['preprocessing']}")

# Get the suggested code snippet
print(config["code_snippet"])

📊 Understanding the Results

📊

Results Structure

# Example results structure
config = {
    "recommendations": {
        "income": {
            "feature_type": "NumericalFeature",
            "preprocessing": ["NORMALIZATION"],
            "detected_distribution": "log_normal",
            "config": {
                # Specific configuration recommendations
            }
        },
        # More features...
    },
    "code_snippet": "# Python code with recommended configuration",
    "statistics": {
        # If save_stats=True, contains computed statistics
    }
}

🛠️ Available Options

⚙️

Configuration Options

# Auto-configuration with options
config = auto_configure(
    data_path="customer_data.csv",      # Path to your dataset
    features_specs=None,                # Optional: provide existing features specs
    batch_size=50000,                   # Batch size for processing
    save_stats=True,                    # Whether to include statistics in results
    stats_path="features_stats.json",   # Where to save/load statistics
    overwrite_stats=False               # Whether to recalculate existing stats
)

💡 Pro Tips

👀

Review Before Implementing

Always review the recommendations before blindly applying them

# Inspect the recommendations first
config = auto_configure("data.csv")

# Review before implementing
for feature, recommendation in config["recommendations"].items():
    print(f"{feature}: {recommendation['detected_distribution']}")

🧠

Combine with Domain Knowledge

Use the recommendations alongside your domain expertise

# Get recommendations
config = auto_configure("data.csv")

# Create your features dictionary, informed by recommendations
features = {
    "income": FeatureType.FLOAT_RESCALED,  # Based on recommendation
    "age": FeatureType.FLOAT_NORMALIZED,   # Based on domain knowledge
}

🔄

Update When Data Changes

Rerun when your data distribution changes

# Update statistics with new data
new_config = auto_configure(
    "updated_data.csv",
    overwrite_stats=True  # Force recalculation with new data
)

📊

Distribution-Aware Encoding

Apply recommendations for numerical features

Learn more →

🎯

Feature Selection

Improve model performance

Learn more →

📚

Feature Types Overview

Learn about all available feature types

Learn more →

🧙‍♂️ Auto-Configuration: Analytics and Recommendations