๐งฉ Feature-wise Mixture of Experts
Feature-wise Mixture of Experts (MoE)
Specialized processing for heterogeneous tabular features
๐ Overview
Feature-wise Mixture of Experts (MoE) is a powerful technique that applies different processing strategies to different features based on their characteristics. This approach allows for more specialized handling of each feature, improving model performance on complex, heterogeneous datasets.
๐ Basic Usage
from kdp import PreprocessingModel, FeatureType
# Define features
features = {
"age": FeatureType.FLOAT_NORMALIZED,
"income": FeatureType.FLOAT_RESCALED,
"occupation": FeatureType.STRING_CATEGORICAL,
"purchase_history": FeatureType.FLOAT_ARRAY,
}
# Create preprocessor with Feature MoE
preprocessor = PreprocessingModel(
path_data="data.csv",
features_specs=features,
use_feature_moe=True, # Turn on the magic
feature_moe_num_experts=4, # Four specialized experts
feature_moe_expert_dim=64 # Size of expert representations
)
# Build and use
result = preprocessor.build_preprocessor()
model = result["model"]
๐งฉ How Feature MoE Works

KDP's Feature MoE uses a "divide and conquer" approach with smart routing: each expert is a specialized neural network, a router determines which experts should process each feature, features can use multiple experts with different weights, and residual connections preserve original feature information.
โ๏ธ Configuration Options
Parameter | Description | Default | Recommended Range |
---|---|---|---|
feature_moe_num_experts |
Number of specialists | 4 | 3-5 for most tasks, 6-8 for very complex data |
feature_moe_expert_dim |
Size of expert output | 64 | Larger (96-128) for complex patterns |
feature_moe_routing |
How to assign experts | "learned" | "learned" for automatic, "predefined" for control |
feature_moe_sparsity |
Use only top k experts | 2 | 1-3 (lower = faster, higher = more accurate) |
feature_moe_hidden_dims |
Expert network size | [64, 32] | Deeper for complex relationships |
๐ก Pro Tips for Feature MoE
Group Similar Features
Assign related features to the same expert for consistent processing, like grouping demographic, financial, product, and temporal features to different experts.
Visualize Expert Assignments
Examine which experts handle which features by plotting the assignments as a heatmap to understand your model's internal decisions.
Progressive Training
Start with frozen experts, then fine-tune to allow the model to learn basic patterns before specializing.
๐ When to Use Feature MoE
Heterogeneous Features
When your features have very different statistical properties (categorical, text, numerical, temporal).
Complex Multi-Modal Data
When features come from different sources or modalities (user features, item features, interaction features).
Transfer Learning
When adapting a model to new features with domain-specific experts for different feature groups.