Skip to content

๐Ÿ”ข Numerical Features

Transform your continuous data like age, income, or prices into powerful feature representations

๐Ÿ“‹ Quick Overview

Numerical features are the backbone of most machine learning models. KDP provides multiple ways to handle them, from simple normalization to advanced neural embeddings.

๐ŸŽฏ Types and Use Cases

Feature Type Best For Example Values When to Use
FLOAT_NORMALIZED Data with clear bounds ๐Ÿง“ Age: 18-65, โญ Score: 0-100 When you know your data falls in a specific range
FLOAT_RESCALED Unbounded, varied data ๐Ÿ’ฐ Income: $0-$1M+, ๐Ÿ“Š Revenue When data has outliers or unknown bounds
FLOAT_DISCRETIZED Values that form groups ๐Ÿ“… Years: 1-50, โญ Ratings: 1-5 When groups of values have special meaning
FLOAT Default normalization ๐Ÿ”ข General numeric values When you want standard normalization (identical to FLOAT_NORMALIZED)

๐Ÿš€ Basic Usage

The simplest way to define numerical features is with the FeatureType enum:

from kdp import PreprocessingModel, FeatureType

# โœจ Quick numerical feature definition
features = {
    "age": FeatureType.FLOAT_NORMALIZED,          # ๐Ÿง“ Age gets 0-1 normalization
    "income": FeatureType.FLOAT_RESCALED,         # ๐Ÿ’ฐ Income gets robust scaling
    "transaction_count": FeatureType.FLOAT,       # ๐Ÿ”ข Default normalization
    "rating": FeatureType.FLOAT_DISCRETIZED       # โญ Discretized into bins
}

# ๐Ÿ—๏ธ Create your preprocessor
preprocessor = PreprocessingModel(
    path_data="customer_data.csv",
    features_specs=features
)

๐Ÿง  Advanced Configuration

For more control, use the NumericalFeature class:

from kdp.features import NumericalFeature

features = {
    # ๐Ÿง“ Simple example with enhanced configuration
    "age": NumericalFeature(
        name="age",
        feature_type=FeatureType.FLOAT_NORMALIZED,
        use_embedding=True,                 # ๐Ÿ”„ Create neural embeddings
        embedding_dim=16,                   # ๐Ÿ“ Size of embedding
        preferred_distribution="normal"      # ๐Ÿ“Š Hint about distribution
    ),

    # ๐Ÿ’ฐ Financial data example
    "transaction_amount": NumericalFeature(
        name="transaction_amount",
        feature_type=FeatureType.FLOAT_RESCALED,
        use_embedding=True,
        embedding_dim=32,
        preferred_distribution="heavy_tailed"
    ),

    # โณ Custom binning example
    "years_experience": NumericalFeature(
        name="years_experience",
        feature_type=FeatureType.FLOAT_DISCRETIZED,
        num_bins=5                          # ๐Ÿ“ Number of bins
    )
}

โš™๏ธ Key Configuration Options

Parameter Description Default Suggested Range
feature_type ๐Ÿท๏ธ Base feature type FLOAT_NORMALIZED Choose from 4 types
use_embedding ๐Ÿง  Enable neural embeddings False True/False
embedding_dim ๐Ÿ“ Dimensionality of embedding 8 4-64
preferred_distribution ๐Ÿ“Š Hint about data distribution None "normal", "log_normal", etc.
num_bins ๐Ÿ”ข Bins for discretization 10 5-100

๐Ÿ”ฅ Power Features

๐Ÿ“Š

Distribution-Aware Processing

Let KDP automatically detect and handle distributions:

# โœจ Enable distribution-aware processing for all numerical features
preprocessor = PreprocessingModel(
    features_specs=features,
    use_distribution_aware=True      # ๐Ÿ” Enable distribution detection
)
๐Ÿง 

Advanced Numerical Embeddings

Using advanced numerical embeddings:

# Configure numerical embeddings
preprocessor = PreprocessingModel(
    features_specs={
        "income": NumericalFeature(
            name="income",
            feature_type=FeatureType.FLOAT_RESCALED,
            use_embedding=True,
            embedding_dim=32,
            preferred_distribution="log_normal"
        )
    }
)

๐Ÿ’ผ Real-World Examples

๐Ÿ’ฐ

Financial Analysis

# ๐Ÿ“ˆ Financial metrics with appropriate processing
preprocessor = PreprocessingModel(
    features_specs={
        "income": NumericalFeature(
            name="income",
            feature_type=FeatureType.FLOAT_RESCALED,
            preferred_distribution="log_normal"   # ๐Ÿ“‰ Log-normal distribution
        ),
        "credit_score": NumericalFeature(
            name="credit_score",
            feature_type=FeatureType.FLOAT_NORMALIZED,
            use_embedding=True,
            embedding_dim=16
        ),
        "debt_ratio": NumericalFeature(
            name="debt_ratio",
            feature_type=FeatureType.FLOAT_NORMALIZED,
            preferred_distribution="bounded"      # ๐Ÿ“Š Bounded between 0 and 1
        )
    },
    use_distribution_aware=True                   # ๐Ÿง  Smart distribution handling
)
๐Ÿ”Œ

Sensor Data

# ๐Ÿ“ก Processing sensor readings
preprocessor = PreprocessingModel(
    features_specs={
        "temperature": NumericalFeature(
            name="temperature",
            feature_type=FeatureType.FLOAT_RESCALED,
            use_embedding=True,
            embedding_dim=16
        ),
        "humidity": NumericalFeature(
            name="humidity",
            feature_type=FeatureType.FLOAT_NORMALIZED,
            preferred_distribution="bounded"      # ๐Ÿ’ง Bounded between 0 and 100
        ),
        "pressure": NumericalFeature(
            name="pressure",
            feature_type=FeatureType.FLOAT_RESCALED,
            use_embedding=True,
            embedding_dim=16
        )
    }
)

๐Ÿ’ก Pro Tips

๐Ÿ“Š

Understand Your Data Distribution

  • Use FLOAT_NORMALIZED when your data has clear bounds (e.g., 0-100%)
  • Use FLOAT_RESCALED when your data has outliers (e.g., income, prices)
  • Use FLOAT_DISCRETIZED when your values naturally form groups (e.g., age groups)
๐Ÿง 

Consider Neural Embeddings for Complex Relationships

  • Enable when a simple scaling doesn't capture the pattern
  • Increase embedding dimensions for more complex patterns (16โ†’32โ†’64)
๐Ÿ”

Let KDP Handle Distribution Detection

  • Enable use_distribution_aware=True and let KDP automatically choose
  • This is especially important for skewed or multi-modal distributions
๐Ÿ“

Custom Bin Boundaries

  • Use num_bins parameter to control discretization granularity
  • More bins = finer granularity but more parameters to learn

๐Ÿงฎ Types of Numerical Features

KDP supports different types of numerical features, each with specialized processing:

๐Ÿ”„

FLOAT

Basic floating-point features with default normalization

๐Ÿ“

FLOAT_NORMALIZED

Values normalized to the [0,1] range using min-max scaling

โš–๏ธ

FLOAT_RESCALED

Values rescaled using standardization (mean=0, std=1)

๐Ÿ“Š

FLOAT_DISCRETIZED

Continuous values binned into discrete buckets

๐Ÿ“Š Architecture Diagrams

๐Ÿ“ Normalized Numerical Feature

Below is a visualization of a model with a normalized numerical feature:

Normalized Numerical Feature

โš–๏ธ Rescaled Numerical Feature

Below is a visualization of a model with a rescaled numerical feature:

Rescaled Numerical Feature

๐Ÿ“Š Discretized Numerical Feature

Below is a visualization of a model with a discretized numerical feature:

Discretized Numerical Feature

๐Ÿง  Advanced Numerical Embeddings

When using advanced numerical embeddings, the model architecture looks like this:

Advanced Numerical Embeddings