Skip to content

๐Ÿงฉ Feature-wise Mixture of Experts

Feature-wise Mixture of Experts (MoE)

Specialized processing for heterogeneous tabular features

๐Ÿ“‹ Overview

Feature-wise Mixture of Experts (MoE) is a powerful technique that applies different processing strategies to different features based on their characteristics. This approach allows for more specialized handling of each feature, improving model performance on complex, heterogeneous datasets.

๐Ÿš€ Basic Usage

from kdp import PreprocessingModel, FeatureType

# Define features
features = {
    "age": FeatureType.FLOAT_NORMALIZED,
    "income": FeatureType.FLOAT_RESCALED,
    "occupation": FeatureType.STRING_CATEGORICAL,
    "purchase_history": FeatureType.FLOAT_ARRAY,
}

# Create preprocessor with Feature MoE
preprocessor = PreprocessingModel(
    path_data="data.csv",
    features_specs=features,
    use_feature_moe=True,               # Turn on the magic
    feature_moe_num_experts=4,          # Four specialized experts
    feature_moe_expert_dim=64           # Size of expert representations
)

# Build and use
result = preprocessor.build_preprocessor()
model = result["model"]

๐Ÿงฉ How Feature MoE Works

Feature MoE Architecture

KDP's Feature MoE uses a "divide and conquer" approach with smart routing: each expert is a specialized neural network, a router determines which experts should process each feature, features can use multiple experts with different weights, and residual connections preserve original feature information.

โš™๏ธ Configuration Options

Parameter Description Default Recommended Range
feature_moe_num_experts Number of specialists 4 3-5 for most tasks, 6-8 for very complex data
feature_moe_expert_dim Size of expert output 64 Larger (96-128) for complex patterns
feature_moe_routing How to assign experts "learned" "learned" for automatic, "predefined" for control
feature_moe_sparsity Use only top k experts 2 1-3 (lower = faster, higher = more accurate)
feature_moe_hidden_dims Expert network size [64, 32] Deeper for complex relationships

๐Ÿ’ก Pro Tips for Feature MoE

Group Similar Features

Assign related features to the same expert for consistent processing, like grouping demographic, financial, product, and temporal features to different experts.

Visualize Expert Assignments

Examine which experts handle which features by plotting the assignments as a heatmap to understand your model's internal decisions.

Progressive Training

Start with frozen experts, then fine-tune to allow the model to learn basic patterns before specializing.

๐Ÿ” When to Use Feature MoE

Heterogeneous Features

When your features have very different statistical properties (categorical, text, numerical, temporal).

Complex Multi-Modal Data

When features come from different sources or modalities (user features, item features, interaction features).

Transfer Learning

When adapting a model to new features with domain-specific experts for different feature groups.