โ Cross Features
Cross Features in KDP
Capture powerful interactions between features to uncover hidden patterns in your data.
๐ Overview
Cross features model the interactions between input features, unlocking patterns that individual features alone might miss. They're especially powerful for capturing relationships like "product category ร user location" or "day of week ร hour of day" that drive important outcomes in your data.
Feature Interaction
Capture complex relationships between features
Pattern Discovery
Uncover hidden correlations in your data
Efficient Processing
Optimized for large-scale feature crosses
Smart Embeddings
Learn meaningful feature combinations
๐ง How Cross Features Work

KDP's cross features combine input features through a sophisticated embedding process, creating rich representations of feature interactions.
Feature Combination
Merging values from different features
Vocabulary Creation
Building a vocabulary of meaningful combinations
Embedding Generation
Creating dense representations of combined features
Pattern Discovery
Finding non-linear relationships between features
๐ Basic Usage
from kdp import PreprocessingModel, FeatureType
# Define your features
features = {
"product_category": FeatureType.STRING_CATEGORICAL,
"user_country": FeatureType.STRING_CATEGORICAL,
"age_group": FeatureType.STRING_CATEGORICAL
}
# Create a preprocessor with cross features
preprocessor = PreprocessingModel(
path_data="customer_data.csv",
features_specs=features,
# Define crosses as (feature1, feature2, embedding_dim)
feature_crosses=[
("product_category", "user_country", 32), # Cross with 32-dim embedding
("age_group", "user_country", 16) # Cross with 16-dim embedding
]
)
โ๏ธ Key Configuration Parameters
Parameter | Description | Default | Suggested Range |
---|---|---|---|
feature1 |
First feature to cross | - | Any feature name |
feature2 |
Second feature to cross | - | Any feature name |
embedding_dim |
Dimensionality of cross embedding | 16 | 8-64 |
hash_bucket_size |
Size of hash space for combinations | 10000 | 1000-100000 |
use_attention |
Apply attention to cross embeddings | False | Boolean |
๐ ๏ธ Cross Feature Types
Categorical ร Categorical
The most common type, capturing relationships between discrete features:
# Creating categorical crosses
preprocessor = PreprocessingModel(
features_specs={
"product_category": FeatureType.STRING_CATEGORICAL,
"user_country": FeatureType.STRING_CATEGORICAL
},
feature_crosses=[
("product_category", "user_country", 32)
]
)
Categorical ร Numerical
Capture how numerical relationships change across categories:
# Creating categorical ร numerical crosses
preprocessor = PreprocessingModel(
features_specs={
"product_category": FeatureType.STRING_CATEGORICAL,
"price": FeatureType.FLOAT_RESCALED
},
feature_crosses=[
("product_category", "price", 32)
]
)
Date Component Crosses
Useful for temporal patterns that depend on multiple time components:
# Creating date component crosses
from kdp.features import DateFeature
preprocessor = PreprocessingModel(
features_specs={
"transaction_time": DateFeature(
name="transaction_time",
add_day_of_week=True,
add_hour=True
)
},
# Cross day of week with hour of day
feature_crosses=[
("transaction_time_day_of_week", "transaction_time_hour", 16)
]
)
Multiple Crosses
Combine multiple cross features to capture complex interactions:
# Creating multiple crosses
preprocessor = PreprocessingModel(
features_specs={
"product_category": FeatureType.STRING_CATEGORICAL,
"user_country": FeatureType.STRING_CATEGORICAL,
"device_type": FeatureType.STRING_CATEGORICAL,
"user_age": FeatureType.FLOAT_NORMALIZED
},
# Define multiple crosses to capture different interactions
feature_crosses=[
("product_category", "user_country", 32),
("device_type", "user_country", 16),
("product_category", "user_age", 24)
]
)
๐ก Advanced Cross Feature Techniques
๐ Attention-Enhanced Crosses
Apply attention mechanisms to learn which interactions matter most:
# Creating cross features with attention
from kdp import PreprocessingModel, FeatureType
from kdp.features import CrossFeature
preprocessor = PreprocessingModel(
features_specs={
"product_id": FeatureType.STRING_CATEGORICAL,
"user_id": FeatureType.STRING_CATEGORICAL
},
feature_crosses=[
# Define cross with attention
CrossFeature(
feature1="product_id",
feature2="user_id",
embedding_dim=32,
use_attention=True,
attention_heads=4
)
]
)
๐ง Multi-way Crosses
Create complex interactions between three or more features:
# Creating multi-way crosses (3+ features)
from kdp.features import CompoundFeature, CrossFeature
# First create a cross of two features
product_location_cross = CrossFeature(
name="product_location_cross",
feature1="product_category",
feature2="user_location",
embedding_dim=32
)
# Then cross the result with a third feature
preprocessor = PreprocessingModel(
features_specs={
"product_category": FeatureType.STRING_CATEGORICAL,
"user_location": FeatureType.STRING_CATEGORICAL,
"time_of_day": FeatureType.STRING_CATEGORICAL,
# Add the intermediate cross
"product_location_cross": product_location_cross
},
# Cross the intermediate with a third feature
feature_crosses=[
("product_location_cross", "time_of_day", 48)
]
)
๐ง Real-World Examples
E-commerce Recommendations
# Cross features for e-commerce recommendations
from kdp import PreprocessingModel, FeatureType
from kdp.features import CategoricalFeature, DateFeature
preprocessor = PreprocessingModel(
path_data="ecommerce_data.csv",
features_specs={
# User features
"user_segment": FeatureType.STRING_CATEGORICAL,
"user_device": FeatureType.STRING_CATEGORICAL,
# Product features
"product_category": CategoricalFeature(
name="product_category",
feature_type=FeatureType.STRING_CATEGORICAL,
embedding_dim=32
),
"product_price_range": FeatureType.STRING_CATEGORICAL,
# Temporal features
"browse_time": DateFeature(
name="browse_time",
add_day_of_week=True,
add_hour=True,
add_is_weekend=True
)
},
# Define crosses for recommendation patterns
feature_crosses=[
# User segment ร product category (what segments like what categories)
("user_segment", "product_category", 48),
# Device ร price range (mobile users prefer different price points)
("user_device", "product_price_range", 16),
# Temporal ร product (weekend browsing patterns)
("browse_time_is_weekend", "product_category", 32),
# Time of day ร product (morning vs evening preferences)
("browse_time_hour", "product_category", 32)
]
)
Fraud Detection
# Cross features for fraud detection
from kdp import PreprocessingModel, FeatureType
from kdp.features import NumericalFeature, DateFeature
preprocessor = PreprocessingModel(
path_data="transactions.csv",
features_specs={
# Transaction features
"transaction_amount": NumericalFeature(
name="transaction_amount",
feature_type=FeatureType.FLOAT_RESCALED,
use_distribution_aware=True
),
"merchant_category": FeatureType.STRING_CATEGORICAL,
"payment_method": FeatureType.STRING_CATEGORICAL,
# User features
"user_country": FeatureType.STRING_CATEGORICAL,
"account_age_days": FeatureType.FLOAT_NORMALIZED,
# Time features
"transaction_time": DateFeature(
name="transaction_time",
add_hour=True,
add_day_of_week=True,
add_is_weekend=True
)
},
# Cross features for fraud patterns
feature_crosses=[
# Country ร merchant (unusual combinations)
("user_country", "merchant_category", 32),
# Payment method ร amount (unusual payment methods for large amounts)
("payment_method", "transaction_amount", 24),
# Time ร amount (unusual times for large transactions)
("transaction_time_hour", "transaction_amount", 24),
# Country ร time (transactions from unusual locations at odd hours)
("user_country", "transaction_time_hour", 32)
],
# Enable tabular attention for additional interaction discovery
tabular_attention=True
)
๐ Model Architecture
KDP combines features, creates a vocabulary or hash space for combinations, and embeds these into dense representations to capture meaningful interactions.
๐ Pro Tips
๐ฏ Choose Meaningful Crosses
Focus on feature pairs with likely interactions based on domain knowledge:
- Product ร location (regional preferences)
- Time ร event (temporal patterns)
- User ร item (personalization)
- Price ร category (price sensitivity)
โ ๏ธ Beware of Sparsity
Crosses between high-cardinality features can create sparse combinations:
- Use embeddings (default in KDP) rather than one-hot encoding
- Consider hashing for very high cardinality crosses
- Use category_encoding="hashing" for feature types with many values
๐ Cross Dimensionality
Choose embedding dimension based on cross importance and complexity:
- More important crosses deserve higher dimensionality
- Simple crosses: 8-16 dimensions
- Complex crosses: 32-64 dimensions
- Rule of thumb: โดโ(possible combinations)
๐ Alternative Approaches
Consider other interaction modeling techniques alongside crosses:
- Enable tabular_attention=True to automatically discover interactions
- Use transformer_blocks for more sophisticated feature relationships
- Try dot-product interactions for numerical features
๐ Comparing With Alternatives
Approach | Pros | Cons | When to Use |
---|---|---|---|
Cross Features | Explicit modeling of specific interactions | Need to specify each interaction | When you know which interactions matter |
Tabular Attention | Automatic discovery of interactions | Less control | When you're unsure which interactions matter |
Transformer Blocks | Most powerful interaction modeling | Most computationally expensive | For complex interaction patterns |
Feature MoE | Adaptive feature processing | Higher complexity | For heterogeneous feature sets |