๐ ๏ธ Feature Types Overview
Making Data ML-Ready
KDP makes feature processing intuitive and powerful by transforming your raw data into the optimal format for machine learning.
๐ช Feature Types at a Glance
Feature Type | What It's For | Processing Magic |
---|---|---|
๐ข Numerical | Continuous values like age, income, scores | Normalization, scaling, embeddings, distribution analysis |
๐ท๏ธ Categorical | Discrete values like occupation, product type | Embeddings, one-hot encoding, vocabulary management |
๐ Text | Free-form text like reviews, descriptions | Tokenization, embeddings, sequence handling |
๐ Date | Temporal data like signup dates, transactions | Component extraction, cyclical encoding, seasonality |
โ Cross Features | Feature interactions | Combined embeddings, interaction modeling |
๐ Passthrough | Pre-processed data, custom vectors | No modification, type casting only |
๐ Getting Started
The simplest way to define features is with the FeatureType
enum:
from kdp import PreprocessingModel, FeatureType
# โจ Quick and easy feature definition
features = {
# ๐ข Numerical features - different processing strategies
"age": FeatureType.FLOAT_NORMALIZED, # ๐ [0,1] range normalization
"income": FeatureType.FLOAT_RESCALED, # ๐ Standard scaling
"transaction_count": FeatureType.FLOAT, # ๐งฎ Default normalization (same as FLOAT_NORMALIZED)
# ๐ท๏ธ Categorical features - automatic encoding
"occupation": FeatureType.STRING_CATEGORICAL, # ๐ Job titles, roles
"education_level": FeatureType.INTEGER_CATEGORICAL, # ๐ Education codes
# ๐ Text and dates - specialized processing
"product_review": FeatureType.TEXT, # ๐ฌ Customer feedback
"signup_date": FeatureType.DATE, # ๐ User registration date
# ๐ Passthrough feature - use without any processing
"embedding_vector": FeatureType.PASSTHROUGH # ๐ Pre-processed data passes directly to output
}
# ๐๏ธ Create your preprocessor
preprocessor = PreprocessingModel(
path_data="customer_data.csv",
features_specs=features
)
โญ Why Strong Feature Types Matter
Optimized Processing
Each feature type gets specialized handling for better ML performance
Reduced Errors
Catch type mismatches early in development, not during training
Clearer Code
Self-documenting feature definitions make your code more maintainable
Enhanced Performance
Type-specific optimizations improve preprocessing speed
๐ Feature Type Documentation
Numerical Features
Handle continuous values with advanced normalization and distribution-aware processing
Categorical Features
Process discrete categories with smart embedding techniques and vocabulary management
Text Features
Work with free-form text using tokenization, embeddings, and sequence handling
Date Features
Extract temporal patterns from dates with component extraction and cyclical encoding
Cross Features
Model feature interactions with combined embeddings and interaction modeling
Passthrough Features
Include unmodified data or pre-computed features directly in your model
๐จโ๐ป Advanced Feature Configuration
For more control, use specialized feature classes:
from kdp.features import NumericalFeature, CategoricalFeature, TextFeature, DateFeature, PassthroughFeature
import tensorflow as tf
# ๐ง Advanced feature configuration
features = {
# ๐ฐ Numerical with advanced embedding
"income": NumericalFeature(
name="income",
feature_type=FeatureType.FLOAT_RESCALED,
use_embedding=True,
embedding_dim=32
),
# ๐ช Categorical with hashing
"product_id": CategoricalFeature(
name="product_id",
feature_type=FeatureType.STRING_CATEGORICAL,
max_tokens=10000,
category_encoding="hashing"
),
# ๐ Text with custom tokenization
"description": TextFeature(
name="description",
max_tokens=5000,
embedding_dim=64,
sequence_length=128,
ngrams=2
),
# ๐๏ธ Date with cyclical encoding
"purchase_date": DateFeature(
name="purchase_date",
add_day_of_week=True,
add_month=True,
cyclical_encoding=True
),
# ๐ง Passthrough feature
"embedding": PassthroughFeature(
name="embedding",
dtype=tf.float32
)
}
๐ก Pro Tips for Feature Definition
Start Simple
Begin with basic FeatureType
definitions
Add Complexity Gradually
Refactor to specialized feature classes when needed
Combine Approaches
Mix distribution-aware, attention, embeddings for best results
Check Distributions
Review your data distribution before choosing feature types
Experiment with Types
Sometimes a different encoding provides better results
Consider Passthrough
Use passthrough features for pre-processed data or custom vectors
๐ Model Architecture Diagrams
KDP creates optimized preprocessing architectures based on your feature definitions. Here are examples of different model configurations:
๐ Basic Feature Combinations
When combining numerical and categorical features:

๐ All Feature Types Combined
KDP can handle all feature types in a single model:

๐ Advanced Configurations
โจ Tabular Attention
Enhance feature interactions with tabular attention:

๐ Transformer Blocks
Process categorical features with transformer blocks:

๐ง Feature MoE (Mixture of Experts)
Specialized feature processing with Mixture of Experts:

๐ค Output Modes
KDP supports different output modes for your preprocessed features:
๐ Concatenated Output

๐ฆ Dictionary Output
