Skip to content

โž• Cross Features

Cross Features in KDP

Capture powerful interactions between features to uncover hidden patterns in your data.

๐Ÿ“‹ Overview

Cross features model the interactions between input features, unlocking patterns that individual features alone might miss. They're especially powerful for capturing relationships like "product category ร— user location" or "day of week ร— hour of day" that drive important outcomes in your data.

๐Ÿ”—

Feature Interaction

Capture complex relationships between features

๐ŸŽฏ

Pattern Discovery

Uncover hidden correlations in your data

โšก

Efficient Processing

Optimized for large-scale feature crosses

๐Ÿง 

Smart Embeddings

Learn meaningful feature combinations

๐Ÿง  How Cross Features Work

Cross Features Architecture

KDP's cross features combine input features through a sophisticated embedding process, creating rich representations of feature interactions.

๐Ÿ”„

Feature Combination

Merging values from different features

๐Ÿ“Š

Vocabulary Creation

Building a vocabulary of meaningful combinations

๐Ÿงฎ

Embedding Generation

Creating dense representations of combined features

๐Ÿ”

Pattern Discovery

Finding non-linear relationships between features

๐Ÿ“ Basic Usage

from kdp import PreprocessingModel, FeatureType

# Define your features
features = {
    "product_category": FeatureType.STRING_CATEGORICAL,
    "user_country": FeatureType.STRING_CATEGORICAL,
    "age_group": FeatureType.STRING_CATEGORICAL
}

# Create a preprocessor with cross features
preprocessor = PreprocessingModel(
    path_data="customer_data.csv",
    features_specs=features,

    # Define crosses as (feature1, feature2, embedding_dim)
    feature_crosses=[
        ("product_category", "user_country", 32),  # Cross with 32-dim embedding
        ("age_group", "user_country", 16)          # Cross with 16-dim embedding
    ]
)

โš™๏ธ Key Configuration Parameters

Parameter Description Default Suggested Range
feature1 First feature to cross - Any feature name
feature2 Second feature to cross - Any feature name
embedding_dim Dimensionality of cross embedding 16 8-64
hash_bucket_size Size of hash space for combinations 10000 1000-100000
use_attention Apply attention to cross embeddings False Boolean

๐Ÿ› ๏ธ Cross Feature Types

Categorical ร— Categorical

The most common type, capturing relationships between discrete features:

# Creating categorical crosses
preprocessor = PreprocessingModel(
    features_specs={
        "product_category": FeatureType.STRING_CATEGORICAL,
        "user_country": FeatureType.STRING_CATEGORICAL
    },
    feature_crosses=[
        ("product_category", "user_country", 32)
    ]
)

Categorical ร— Numerical

Capture how numerical relationships change across categories:

# Creating categorical ร— numerical crosses
preprocessor = PreprocessingModel(
    features_specs={
        "product_category": FeatureType.STRING_CATEGORICAL,
        "price": FeatureType.FLOAT_RESCALED
    },
    feature_crosses=[
        ("product_category", "price", 32)
    ]
)

Date Component Crosses

Useful for temporal patterns that depend on multiple time components:

# Creating date component crosses
from kdp.features import DateFeature

preprocessor = PreprocessingModel(
    features_specs={
        "transaction_time": DateFeature(
            name="transaction_time",
            add_day_of_week=True,
            add_hour=True
        )
    },
    # Cross day of week with hour of day
    feature_crosses=[
        ("transaction_time_day_of_week", "transaction_time_hour", 16)
    ]
)

Multiple Crosses

Combine multiple cross features to capture complex interactions:

# Creating multiple crosses
preprocessor = PreprocessingModel(
    features_specs={
        "product_category": FeatureType.STRING_CATEGORICAL,
        "user_country": FeatureType.STRING_CATEGORICAL,
        "device_type": FeatureType.STRING_CATEGORICAL,
        "user_age": FeatureType.FLOAT_NORMALIZED
    },
    # Define multiple crosses to capture different interactions
    feature_crosses=[
        ("product_category", "user_country", 32),
        ("device_type", "user_country", 16),
        ("product_category", "user_age", 24)
    ]
)

๐Ÿ’ก Advanced Cross Feature Techniques

๐Ÿ” Attention-Enhanced Crosses

Apply attention mechanisms to learn which interactions matter most:

# Creating cross features with attention
from kdp import PreprocessingModel, FeatureType
from kdp.features import CrossFeature

preprocessor = PreprocessingModel(
    features_specs={
        "product_id": FeatureType.STRING_CATEGORICAL,
        "user_id": FeatureType.STRING_CATEGORICAL
    },
    feature_crosses=[
        # Define cross with attention
        CrossFeature(
            feature1="product_id",
            feature2="user_id",
            embedding_dim=32,
            use_attention=True,
            attention_heads=4
        )
    ]
)

๐Ÿง  Multi-way Crosses

Create complex interactions between three or more features:

# Creating multi-way crosses (3+ features)
from kdp.features import CompoundFeature, CrossFeature

# First create a cross of two features
product_location_cross = CrossFeature(
    name="product_location_cross",
    feature1="product_category",
    feature2="user_location",
    embedding_dim=32
)

# Then cross the result with a third feature
preprocessor = PreprocessingModel(
    features_specs={
        "product_category": FeatureType.STRING_CATEGORICAL,
        "user_location": FeatureType.STRING_CATEGORICAL,
        "time_of_day": FeatureType.STRING_CATEGORICAL,
        # Add the intermediate cross
        "product_location_cross": product_location_cross
    },
    # Cross the intermediate with a third feature
    feature_crosses=[
        ("product_location_cross", "time_of_day", 48)
    ]
)

๐Ÿ”ง Real-World Examples

E-commerce Recommendations

# Cross features for e-commerce recommendations
from kdp import PreprocessingModel, FeatureType
from kdp.features import CategoricalFeature, DateFeature

preprocessor = PreprocessingModel(
    path_data="ecommerce_data.csv",
    features_specs={
        # User features
        "user_segment": FeatureType.STRING_CATEGORICAL,
        "user_device": FeatureType.STRING_CATEGORICAL,

        # Product features
        "product_category": CategoricalFeature(
            name="product_category",
            feature_type=FeatureType.STRING_CATEGORICAL,
            embedding_dim=32
        ),
        "product_price_range": FeatureType.STRING_CATEGORICAL,

        # Temporal features
        "browse_time": DateFeature(
            name="browse_time",
            add_day_of_week=True,
            add_hour=True,
            add_is_weekend=True
        )
    },

    # Define crosses for recommendation patterns
    feature_crosses=[
        # User segment ร— product category (what segments like what categories)
        ("user_segment", "product_category", 48),

        # Device ร— price range (mobile users prefer different price points)
        ("user_device", "product_price_range", 16),

        # Temporal ร— product (weekend browsing patterns)
        ("browse_time_is_weekend", "product_category", 32),

        # Time of day ร— product (morning vs evening preferences)
        ("browse_time_hour", "product_category", 32)
    ]
)

Fraud Detection

# Cross features for fraud detection
from kdp import PreprocessingModel, FeatureType
from kdp.features import NumericalFeature, DateFeature

preprocessor = PreprocessingModel(
    path_data="transactions.csv",
    features_specs={
        # Transaction features
        "transaction_amount": NumericalFeature(
            name="transaction_amount",
            feature_type=FeatureType.FLOAT_RESCALED,
            use_distribution_aware=True
        ),
        "merchant_category": FeatureType.STRING_CATEGORICAL,
        "payment_method": FeatureType.STRING_CATEGORICAL,

        # User features
        "user_country": FeatureType.STRING_CATEGORICAL,
        "account_age_days": FeatureType.FLOAT_NORMALIZED,

        # Time features
        "transaction_time": DateFeature(
            name="transaction_time",
            add_hour=True,
            add_day_of_week=True,
            add_is_weekend=True
        )
    },

    # Cross features for fraud patterns
    feature_crosses=[
        # Country ร— merchant (unusual combinations)
        ("user_country", "merchant_category", 32),

        # Payment method ร— amount (unusual payment methods for large amounts)
        ("payment_method", "transaction_amount", 24),

        # Time ร— amount (unusual times for large transactions)
        ("transaction_time_hour", "transaction_amount", 24),

        # Country ร— time (transactions from unusual locations at odd hours)
        ("user_country", "transaction_time_hour", 32)
    ],

    # Enable tabular attention for additional interaction discovery
    tabular_attention=True
)

๐Ÿ“Š Model Architecture

graph TD A1[Feature 1] --> C[Feature Combination] A2[Feature 2] --> C C --> D[Hash/Lookup] D --> E[Embedding Layer] E --> F[Cross Representation] style A1 fill:#e3f2fd,stroke:#64b5f6,stroke-width:2px style A2 fill:#e3f2fd,stroke:#64b5f6,stroke-width:2px style C fill:#e8f5e9,stroke:#66bb6a,stroke-width:2px style D fill:#fff8e1,stroke:#ffd54f,stroke-width:2px style E fill:#f3e5f5,stroke:#ce93d8,stroke-width:2px style F fill:#e8eaf6,stroke:#7986cb,stroke-width:2px

KDP combines features, creates a vocabulary or hash space for combinations, and embeds these into dense representations to capture meaningful interactions.

๐Ÿ’Ž Pro Tips

๐ŸŽฏ Choose Meaningful Crosses

Focus on feature pairs with likely interactions based on domain knowledge:

  • Product ร— location (regional preferences)
  • Time ร— event (temporal patterns)
  • User ร— item (personalization)
  • Price ร— category (price sensitivity)

โš ๏ธ Beware of Sparsity

Crosses between high-cardinality features can create sparse combinations:

  • Use embeddings (default in KDP) rather than one-hot encoding
  • Consider hashing for very high cardinality crosses
  • Use category_encoding="hashing" for feature types with many values

๐Ÿ“ Cross Dimensionality

Choose embedding dimension based on cross importance and complexity:

  • More important crosses deserve higher dimensionality
  • Simple crosses: 8-16 dimensions
  • Complex crosses: 32-64 dimensions
  • Rule of thumb: โดโˆš(possible combinations)

๐Ÿ”„ Alternative Approaches

Consider other interaction modeling techniques alongside crosses:

  • Enable tabular_attention=True to automatically discover interactions
  • Use transformer_blocks for more sophisticated feature relationships
  • Try dot-product interactions for numerical features

๐Ÿ”„ Comparing With Alternatives

Approach Pros Cons When to Use
Cross Features Explicit modeling of specific interactions Need to specify each interaction When you know which interactions matter
Tabular Attention Automatic discovery of interactions Less control When you're unsure which interactions matter
Transformer Blocks Most powerful interaction modeling Most computationally expensive For complex interaction patterns
Feature MoE Adaptive feature processing Higher complexity For heterogeneous feature sets