Skip to content

๐Ÿ” Passthrough Features

Passthrough Features in KDP

Handle IDs, metadata, and pre-processed data without unwanted transformations.

๐Ÿ“‹ Overview

Passthrough features allow you to include data in your model inputs without any preprocessing modifications. They're perfect for IDs, metadata, pre-processed data, and scenarios where you need to preserve exact values. With KDP v1.11.1+, you can now choose whether passthrough features are included in the main model output or kept separately for manual use.

๐Ÿ”‘

ID Preservation

Keep product IDs, user IDs for result mapping

๐Ÿท๏ธ

Metadata Handling

Include metadata without processing it

โšก

Direct Integration

Include pre-processed data without modifications

๐Ÿ”„

Flexible Output

Choose between processed inclusion or separate access

๐Ÿš€ When to Use Passthrough Features

๐Ÿ”‘

IDs & Identifiers

Product IDs, user IDs, transaction IDs that you need for mapping results but shouldn't influence the model

๐Ÿท๏ธ

Metadata

Timestamps, source information, or other metadata needed for post-processing but not for ML

๐Ÿ”ข

Pre-computed Features

Pre-computed embeddings or features that should be included in model processing

๐Ÿ“Š

Raw Values

Exact original values that need to be preserved without any transformations

๐Ÿ”

Feature Testing

Compare raw vs processed feature performance in experiments

๐Ÿ’ก Two Modes of Operation

KDP v1.11.1+ introduces two distinct modes for passthrough features:

๐Ÿ”„ Legacy Mode (Processed Output)

When: include_passthrough_in_output=True (default for backwards compatibility) Use case: Pre-computed features that should be part of model processing

preprocessor = PreprocessingModel(
    path_data="data.csv",
    features_specs=features,
    include_passthrough_in_output=True  # Default - backwards compatible
)

When: include_passthrough_in_output=False Use case: IDs, metadata that should be preserved but not processed

preprocessor = PreprocessingModel(
    path_data="data.csv",
    features_specs=features,
    include_passthrough_in_output=False  # Recommended for IDs/metadata
)

๐Ÿ’ก Defining Passthrough Features

from kdp import PreprocessingModel, FeatureType
from kdp.features import PassthroughFeature
import tensorflow as tf

# Simple approach using enum
features = {
    "product_id": FeatureType.PASSTHROUGH,  # Will use default tf.float32
    "price": FeatureType.FLOAT_NORMALIZED,
    "category": FeatureType.STRING_CATEGORICAL
}

# Advanced configuration with explicit dtype
features = {
    "product_id": PassthroughFeature(
        name="product_id",
        dtype=tf.string  # Specify string for IDs
    ),
    "user_id": PassthroughFeature(
        name="user_id",
        dtype=tf.int64   # Specify int for numeric IDs
    ),
    "embedding_vector": PassthroughFeature(
        name="embedding_vector",
        dtype=tf.float32  # For pre-computed features
    ),
    "price": FeatureType.FLOAT_NORMALIZED,
    "category": FeatureType.STRING_CATEGORICAL
}

๐Ÿ—๏ธ Real-World Example: E-commerce Recommendation

import pandas as pd
from kdp import PreprocessingModel, FeatureType
from kdp.features import PassthroughFeature
import tensorflow as tf

# Sample e-commerce data
data = pd.DataFrame({
    'product_id': ['P001', 'P002', 'P003'],      # String ID - for mapping
    'user_id': [1001, 1002, 1003],              # Numeric ID - for mapping
    'price': [29.99, 49.99, 19.99],             # ML feature
    'category': ['electronics', 'books', 'clothing'],  # ML feature
    'rating': [4.5, 3.8, 4.2]                   # ML feature
})

# Define features with proper separation
features = {
    # IDs for mapping - should NOT influence the model
    'product_id': PassthroughFeature(name='product_id', dtype=tf.string),
    'user_id': PassthroughFeature(name='user_id', dtype=tf.int64),

    # Actual ML features - should be processed
    'price': FeatureType.FLOAT_NORMALIZED,
    'category': FeatureType.STRING_CATEGORICAL,
    'rating': FeatureType.FLOAT_NORMALIZED
}

# Create preprocessor with separate passthrough access
preprocessor = PreprocessingModel(
    path_data="ecommerce_data.csv",
    features_specs=features,
    output_mode='dict',
    include_passthrough_in_output=False  # Keep IDs separate
)

model = preprocessor.build_preprocessor()

# Now you can:
# 1. Use the model for ML predictions (price, category, rating)
# 2. Access product_id and user_id separately for mapping results
# 3. No dtype concatenation issues between string IDs and numeric features
## ๐Ÿ“Š How Passthrough Features Work
Passthrough Feature Model

Passthrough features create input signatures but can be processed or kept separate based on your configuration.

### Legacy Mode Flow (include_passthrough_in_output=True)
โž•

Input Signature

Added to model inputs with proper dtype

๐Ÿ”„

Minimal Processing

Type casting and optional reshaping only

๐Ÿ”—

Concatenated

Included in main model output (grouped by dtype)

### Recommended Mode Flow (include_passthrough_in_output=False)
โž•

Input Signature

Added to model inputs with proper dtype

๐Ÿšซ

No Processing

Completely bypasses KDP transformations

๐Ÿ“ฆ

Separate Access

Available separately for manual use

## ๐Ÿ”ง Configuration Options
Parameter Type Description
name str The name of the feature
feature_type FeatureType Set to FeatureType.PASSTHROUGH by default
dtype tf.DType The data type of the feature (default: tf.float32)
include_passthrough_in_output bool Whether to include in main output (True) or keep separate (False)
## ๐ŸŽฏ Advanced Examples ### Example 1: Mixed Dtype IDs
# Handles both string and numeric IDs without concatenation errors
features = {
    'product_id': PassthroughFeature(name='product_id', dtype=tf.string),
    'user_id': PassthroughFeature(name='user_id', dtype=tf.int64),
    'session_id': PassthroughFeature(name='session_id', dtype=tf.string),
    'price': FeatureType.FLOAT_NORMALIZED
}

preprocessor = PreprocessingModel(
    path_data="data.csv",
    features_specs=features,
    include_passthrough_in_output=False  # No dtype mixing issues
)
### Example 2: Pre-computed Embeddings
# Include pre-computed features in model processing
features = {
    'text_embedding': PassthroughFeature(
        name='text_embedding',
        dtype=tf.float32
    ),
    'image_embedding': PassthroughFeature(
        name='image_embedding',
        dtype=tf.float32
    ),
    'user_age': FeatureType.FLOAT_NORMALIZED
}

preprocessor = PreprocessingModel(
    path_data="embeddings.csv",
    features_specs=features,
    include_passthrough_in_output=True  # Include in model processing
)
### Example 3: Metadata Preservation
# Keep metadata for post-processing without affecting the model
features = {
    'timestamp': PassthroughFeature(name='timestamp', dtype=tf.string),
    'source_system': PassthroughFeature(name='source_system', dtype=tf.string),
    'batch_id': PassthroughFeature(name='batch_id', dtype=tf.int64),
    'sales_amount': FeatureType.FLOAT_NORMALIZED,
    'product_category': FeatureType.STRING_CATEGORICAL
}

preprocessor = PreprocessingModel(
    path_data="sales_data.csv",
    features_specs=features,
    include_passthrough_in_output=False  # Metadata separate from ML
)
## โš ๏ธ Important Considerations
โš ๏ธ

Dtype Compatibility

When using include_passthrough_in_output=True, passthrough features are grouped by dtype to prevent concatenation errors. String and numeric passthrough features are handled separately.

๐ŸŽฏ

Recommended Usage

Use include_passthrough_in_output=False for IDs and metadata that shouldn't influence your model. Use True only for pre-processed features that should be part of model processing.

๐Ÿ”„

Backwards Compatibility

The default is True to maintain backwards compatibility with existing code. New projects should explicitly choose the appropriate mode.

## ๐Ÿ” Troubleshooting

โŒ "Cannot concatenate tensors of different dtypes"

Solution: Set include_passthrough_in_output=False for ID/metadata features, or ensure all passthrough features have compatible dtypes.

โŒ "inputs not connected to outputs"

Solution: This can happen with passthrough-only models. Ensure you have at least one processed feature, or use dict mode for passthrough-only scenarios.

โŒ String features showing as tf.float32

Solution: Explicitly specify dtype in PassthroughFeature: dtype=tf.string

## ๐Ÿ“ˆ Performance Tips
โšก

Reduce Model Complexity

Use include_passthrough_in_output=False for IDs to keep your model focused on actual ML features

๐ŸŽฏ

Clear Separation

Separate concerns: IDs for mapping, features for ML, metadata for analysis

๐Ÿ”„

Choose the Right Mode

Legacy mode for pre-computed features, recommended mode for identifiers and metadata