Skip to content

๐Ÿง™โ€โ™‚๏ธ Auto-Configuration: Analytics and Recommendations

Auto-Configuration: Analytics and Recommendations

Let KDP analyze your data and suggest the optimal preprocessing

Intelligent Data Analysis

Auto-Configuration examines your dataset and provides intelligent recommendations for feature processing, helping you build better models faster.

๐Ÿš€ Getting Started

Basic Usage

from kdp import auto_configure, PreprocessingModel

# Analyze data and get recommendations
config = auto_configure("customer_data.csv")

# Review the recommendations
recommendations = config["recommendations"]
code_snippet = config["code_snippet"]

# Create your preprocessor using the code snippet as a guide
# Note: You'll need to manually implement the suggestions

โœจ What Auto-Configuration Provides

๐Ÿ”

Distribution Analysis

Identifies patterns in your numeric data to suggest optimal transformations

๐Ÿ“Š

Feature Statistics

Calculates important statistics about your features to guide preprocessing

๐Ÿ’ก

Preprocessing Recommendations

Suggests appropriate feature types and transformations based on data analysis

๐Ÿ“

Example Code

Generates ready-to-use code snippets based on the analysis

๐Ÿ” What It Analyzes

Data Characteristic Example What It Detects
Distribution Types Log-normal income, bimodal age Statistical distribution patterns
Feature Statistics Mean, variance, skewness Basic statistical properties
Data Ranges Min/max values, outliers Value boundaries and extremes
Value Patterns Discrete vs continuous How values are distributed

๐Ÿ’ผ Examples

๐Ÿ”Ž

Basic Analysis

# Basic auto-configuration analysis
config = auto_configure(
    "customer_data.csv",  # Your dataset
    batch_size=50000,     # Process in batches of this size
    save_stats=True       # Save computed statistics
)

# Review the recommendations
for feature_name, recommendation in config["recommendations"].items():
    print(f"Feature: {feature_name}")
    print(f"  Type: {recommendation['feature_type']}")
    print(f"  Preprocessing: {recommendation['preprocessing']}")

# Get the suggested code snippet
print(config["code_snippet"])

๐Ÿ“Š Understanding the Results

๐Ÿ“Š

Results Structure

# Example results structure
config = {
    "recommendations": {
        "income": {
            "feature_type": "NumericalFeature",
            "preprocessing": ["NORMALIZATION"],
            "detected_distribution": "log_normal",
            "config": {
                # Specific configuration recommendations
            }
        },
        # More features...
    },
    "code_snippet": "# Python code with recommended configuration",
    "statistics": {
        # If save_stats=True, contains computed statistics
    }
}

๐Ÿ› ๏ธ Available Options

โš™๏ธ

Configuration Options

# Auto-configuration with options
config = auto_configure(
    data_path="customer_data.csv",      # Path to your dataset
    features_specs=None,                # Optional: provide existing features specs
    batch_size=50000,                   # Batch size for processing
    save_stats=True,                    # Whether to include statistics in results
    stats_path="features_stats.json",   # Where to save/load statistics
    overwrite_stats=False               # Whether to recalculate existing stats
)

๐Ÿ’ก Pro Tips

๐Ÿ‘€

Review Before Implementing

Always review the recommendations before blindly applying them

# Inspect the recommendations first
config = auto_configure("data.csv")

# Review before implementing
for feature, recommendation in config["recommendations"].items():
    print(f"{feature}: {recommendation['detected_distribution']}")
๐Ÿง 

Combine with Domain Knowledge

Use the recommendations alongside your domain expertise

# Get recommendations
config = auto_configure("data.csv")

# Create your features dictionary, informed by recommendations
features = {
    "income": FeatureType.FLOAT_RESCALED,  # Based on recommendation
    "age": FeatureType.FLOAT_NORMALIZED,   # Based on domain knowledge
}
๐Ÿ”„

Update When Data Changes

Rerun when your data distribution changes

# Update statistics with new data
new_config = auto_configure(
    "updated_data.csv",
    overwrite_stats=True  # Force recalculation with new data
)
๐Ÿ“Š

Distribution-Aware Encoding

Apply recommendations for numerical features

Learn more โ†’
๐ŸŽฏ

Feature Selection

Improve model performance

Learn more โ†’
๐Ÿ“š

Feature Types Overview

Learn about all available feature types

Learn more โ†’