Skip to content

๐Ÿ› ๏ธ Feature Types Overview

Making Data ML-Ready

KDP makes feature processing intuitive and powerful by transforming your raw data into the optimal format for machine learning.

๐Ÿ’ช Feature Types at a Glance

Feature Type What It's For Processing Magic
๐Ÿ”ข Numerical Continuous values like age, income, scores Normalization, scaling, embeddings, distribution analysis
๐Ÿท๏ธ Categorical Discrete values like occupation, product type Embeddings, one-hot encoding, vocabulary management
๐Ÿ“ Text Free-form text like reviews, descriptions Tokenization, embeddings, sequence handling
๐Ÿ“… Date Temporal data like signup dates, transactions Component extraction, cyclical encoding, seasonality
โž• Cross Features Feature interactions Combined embeddings, interaction modeling
๐Ÿ” Passthrough Pre-processed data, custom vectors No modification, type casting only

๐Ÿš€ Getting Started

The simplest way to define features is with the FeatureType enum:

from kdp import PreprocessingModel, FeatureType

# โœจ Quick and easy feature definition
features = {
    # ๐Ÿ”ข Numerical features - different processing strategies
    "age": FeatureType.FLOAT_NORMALIZED,        # ๐Ÿ“Š [0,1] range normalization
    "income": FeatureType.FLOAT_RESCALED,       # ๐Ÿ“ˆ Standard scaling
    "transaction_count": FeatureType.FLOAT,     # ๐Ÿงฎ Default normalization (same as FLOAT_NORMALIZED)

    # ๐Ÿท๏ธ Categorical features - automatic encoding
    "occupation": FeatureType.STRING_CATEGORICAL,      # ๐Ÿ‘” Job titles, roles
    "education_level": FeatureType.INTEGER_CATEGORICAL, # ๐ŸŽ“ Education codes

    # ๐Ÿ“ Text and dates - specialized processing
    "product_review": FeatureType.TEXT,         # ๐Ÿ’ฌ Customer feedback
    "signup_date": FeatureType.DATE,            # ๐Ÿ“† User registration date

    # ๐Ÿ” Passthrough feature - use without any processing
    "embedding_vector": FeatureType.PASSTHROUGH # ๐Ÿ”„ Pre-processed data passes directly to output
}

# ๐Ÿ—๏ธ Create your preprocessor
preprocessor = PreprocessingModel(
    path_data="customer_data.csv",
    features_specs=features
)

โญ Why Strong Feature Types Matter

๐ŸŽฏ

Optimized Processing

Each feature type gets specialized handling for better ML performance

๐Ÿ›

Reduced Errors

Catch type mismatches early in development, not during training

๐Ÿ“

Clearer Code

Self-documenting feature definitions make your code more maintainable

โšก

Enhanced Performance

Type-specific optimizations improve preprocessing speed

๐Ÿ“š Feature Type Documentation

๐Ÿ‘จโ€๐Ÿ’ป Advanced Feature Configuration

For more control, use specialized feature classes:

from kdp.features import NumericalFeature, CategoricalFeature, TextFeature, DateFeature, PassthroughFeature
import tensorflow as tf

# ๐Ÿ”ง Advanced feature configuration
features = {
    # ๐Ÿ’ฐ Numerical with advanced embedding
    "income": NumericalFeature(
        name="income",
        feature_type=FeatureType.FLOAT_RESCALED,
        use_embedding=True,
        embedding_dim=32
    ),

    # ๐Ÿช Categorical with hashing
    "product_id": CategoricalFeature(
        name="product_id",
        feature_type=FeatureType.STRING_CATEGORICAL,
        max_tokens=10000,
        category_encoding="hashing"
    ),

    # ๐Ÿ“‹ Text with custom tokenization
    "description": TextFeature(
        name="description",
        max_tokens=5000,
        embedding_dim=64,
        sequence_length=128,
        ngrams=2
    ),

    # ๐Ÿ—“๏ธ Date with cyclical encoding
    "purchase_date": DateFeature(
        name="purchase_date",
        add_day_of_week=True,
        add_month=True,
        cyclical_encoding=True
    ),

    # ๐Ÿง  Passthrough feature
    "embedding": PassthroughFeature(
        name="embedding",
        dtype=tf.float32
    )
}

๐Ÿ’ก Pro Tips for Feature Definition

1

Start Simple

Begin with basic FeatureType definitions

2

Add Complexity Gradually

Refactor to specialized feature classes when needed

3

Combine Approaches

Mix distribution-aware, attention, embeddings for best results

4

Check Distributions

Review your data distribution before choosing feature types

5

Experiment with Types

Sometimes a different encoding provides better results

6

Consider Passthrough

Use passthrough features for pre-processed data or custom vectors

๐Ÿ“Š Model Architecture Diagrams

KDP creates optimized preprocessing architectures based on your feature definitions. Here are examples of different model configurations:

๐Ÿ”„ Basic Feature Combinations

When combining numerical and categorical features:

Numeric and Categorical Features

๐ŸŒŸ All Feature Types Combined

KDP can handle all feature types in a single model:

All Feature Types Combined

๐Ÿ”‹ Advanced Configurations

โœจ Tabular Attention

Enhance feature interactions with tabular attention:

Tabular Attention

๐Ÿ”„ Transformer Blocks

Process categorical features with transformer blocks:

Transformer Blocks

๐Ÿง  Feature MoE (Mixture of Experts)

Specialized feature processing with Mixture of Experts:

Feature MoE

๐Ÿ“ค Output Modes

KDP supports different output modes for your preprocessed features:

๐Ÿ”— Concatenated Output

Concat Output Mode

๐Ÿ“ฆ Dictionary Output

Dict Output Mode