🏭 Preprocessing Layers Factory

The PreprocessorLayerFactory class provides a convenient way to create and manage preprocessing layers for your machine learning models. It supports both standard Keras preprocessing layers and custom layers defined within the KDP framework.

🎡 Using Keras Preprocessing Layers

All preprocessing layers available in Keras can be used within the PreprocessorLayerFactory. You can access these layers by their class names. Here's an example of how to use a Keras preprocessing layer:

normalization_layer = PreprocessorLayerFactory.create_layer(
    "Normalization",
    axis=-1,
    mean=None,
    variance=None
)

Available layers:

Normalization - Standardizes numerical features
Discretization - Bins continuous features into discrete intervals
CategoryEncoding - Converts categorical data into numeric representations
Hashing - Performs feature hashing for categorical variables
HashedCrossing - Creates feature crosses using hashing
StringLookup - Converts string inputs to integer indices
IntegerLookup - Maps integer inputs to indexed array positions
TextVectorization - Processes raw text into encoded representations
... and more

🏗️ Custom KDP Preprocessing Layers

In addition to Keras layers, the PreprocessorLayerFactory includes several custom layers specific to the KDP framework. Here's a list of available custom layers:

`cast_to_float32_layer(name='cast_to_float32', **kwargs)` `staticmethod`

Create a CastToFloat32Layer layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the layer.	`'cast_to_float32'`
`**kwargs`	`dict`	Additional keyword arguments to pass to the layer constructor.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	An instance of the CastToFloat32Layer layer.

`create_layer(layer_class, name=None, **kwargs)` `staticmethod`

Create a layer using the layer class name, automatically filtering kwargs based on the layer class.

Parameters:

Name	Type	Description	Default
`layer_class`	`str \| Class Object`	The name of the layer class to be created (e.g., 'Normalization', 'Rescaling') or the class object itself.	required
`name`	`str`	The name of the layer. Optional.	`None`
`**kwargs`		Additional keyword arguments to pass to the layer constructor.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	An instance of the specified layer class.

`date_encoding_layer(name='date_encoding_layer', **kwargs)` `staticmethod`

Create a DateEncodingLayer layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the layer.	`'date_encoding_layer'`
`**kwargs`	`dict`	Additional keyword arguments to pass to the layer constructor.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	An instance of the DateEncodingLayer layer.

`date_parsing_layer(name='date_parsing_layer', **kwargs)` `staticmethod`

Create a DateParsingLayer layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the layer.	`'date_parsing_layer'`
`**kwargs`	`dict`	Additional keyword arguments to pass to the layer constructor.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	An instance of the DateParsingLayer layer.

`date_season_layer(name='date_season_layer', **kwargs)` `staticmethod`

Create a SeasonLayer layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the layer.	`'date_season_layer'`
`**kwargs`	`dict`	Additional keyword arguments to pass to the layer constructor.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	An instance of the SeasonLayer layer.

`differencing_layer(name='differencing', order=1, fill_value=0.0, drop_na=True, **kwargs)` `staticmethod`

Create a DifferencingLayer for differencing time series data to make it stationary.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the layer.	`'differencing'`
`order`	`int`	Order of differencing. Default is 1.	`1`
`fill_value`	`float`	Value to use for filling initial values. Default is 0.0.	`0.0`
`drop_na`	`bool`	Whether to drop rows with NaN values. Default is True.	`True`
`**kwargs`	`dict`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	DifferencingLayer instance.

`distribution_aware_encoder(name='distribution_aware', num_bins=1000, epsilon=1e-06, detect_periodicity=True, handle_sparsity=True, adaptive_binning=True, mixture_components=3, prefered_distribution=None, **kwargs)` `staticmethod`

Create a DistributionAwareEncoder layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the layer	`'distribution_aware'`
`num_bins`	`int`	Number of bins for quantile encoding	`1000`
`epsilon`	`float`	Small value for numerical stability	`1e-06`
`detect_periodicity`	`bool`	Whether to detect and handle periodic patterns	`True`
`handle_sparsity`	`bool`	Whether to handle sparse data specially	`True`
`adaptive_binning`	`bool`	Whether to use adaptive binning	`True`
`mixture_components`	`int`	Number of components for mixture modeling	`3`
`specified_distribution`	`DistributionType`	Optional specific distribution type to use	required
`**kwargs`		Additional keyword arguments	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	DistributionAwareEncoder layer

`distribution_transform_layer(name='distribution_transform', transform_type='none', lambda_param=0.0, epsilon=1e-10, min_value=0.0, max_value=1.0, clip_values=True, auto_candidates=None, **kwargs)` `staticmethod`

Create a DistributionTransformLayer layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the layer	`'distribution_transform'`
`transform_type`	`str`	Type of transformation to apply	`'none'`
`lambda_param`	`float`	Parameter for parameterized transformations	`0.0`
`epsilon`	`float`	Small value for numerical stability	`1e-10`
`min_value`	`float`	Minimum value for min-max scaling	`0.0`
`max_value`	`float`	Maximum value for min-max scaling	`1.0`
`clip_values`	`bool`	Whether to clip values to the specified range	`True`
`auto_candidates`	`list[str]`	List of transformations to consider in auto mode	`None`
`**kwargs`		Additional keyword arguments	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	DistributionTransformLayer layer

`gated_linear_unit_layer(units, name='gated_linear_unit', **kwargs)` `staticmethod`

Create a GatedLinearUnit layer.

Parameters:

Name	Type	Description	Default
`units`	`int`	Dimensionality of the output space	required
`name`	`str`	Name of the layer	`'gated_linear_unit'`
`**kwargs`	`dict`	Additional arguments to pass to the layer	`{}`

Returns:

Name	Type	Description
`GatedLinearUnit`	`tf.keras.layers.Layer`	A GatedLinearUnit layer instance

`gated_residual_network_layer(units, dropout_rate=0.2, name='gated_residual_network', **kwargs)` `staticmethod`

Create a GatedResidualNetwork layer.

Parameters:

Name	Type	Description	Default
`units`	`int`	Dimensionality of the output space	required
`dropout_rate`	`float`	Fraction of the input units to drop	`0.2`
`name`	`str`	Name of the layer	`'gated_residual_network'`
`**kwargs`	`dict`	Additional arguments to pass to the layer	`{}`

Returns:

Name	Type	Description
`GatedResidualNetwork`	`tf.keras.layers.Layer`	A GatedResidualNetwork layer instance

`global_numerical_embedding_layer(global_embedding_dim=8, global_mlp_hidden_units=16, global_num_bins=10, global_init_min=-3.0, global_init_max=3.0, global_dropout_rate=0.1, global_use_batch_norm=True, global_pooling='average', name='global_numerical_embedding', **kwargs)` `staticmethod`

Create a GlobalNumericalEmbedding layer.

Parameters:

Name	Type	Description	Default
`global_embedding_dim`	`int`	Dimension of the final global embedding	`8`
`global_mlp_hidden_units`	`int`	Number of hidden units in the global MLP	`16`
`global_num_bins`	`int`	Number of bins for discretization	`10`
`global_init_min`	`float`	Minimum value for initialization	`-3.0`
`global_init_max`	`float`	Maximum value for initialization	`3.0`
`global_dropout_rate`	`float`	Dropout rate for regularization	`0.1`
`global_use_batch_norm`	`bool`	Whether to use batch normalization	`True`
`global_pooling`	`str`	Pooling method to use ("average" or "max")	`'average'`
`name`	`str`	Name of the layer	`'global_numerical_embedding'`
`**kwargs`	`dict`	Additional arguments to pass to the layer	`{}`

Returns:

Name	Type	Description
`GlobalNumericalEmbedding`	`tf.keras.layers.Layer`	A GlobalNumericalEmbedding layer instance

`lag_feature_layer(name='lag_feature', lags=None, fill_value=0.0, drop_na=True, **kwargs)` `staticmethod`

Create a LagFeatureLayer for generating lag features from time series data.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the layer.	`'lag_feature'`
`lags`	`list[int]`	List of lag values to create. Default is [1] (one step back).	`None`
`fill_value`	`float`	Value to use for filling NaN values. Default is 0.0.	`0.0`
`drop_na`	`bool`	Whether to drop rows with NaN values. Default is True.	`True`
`**kwargs`	`dict`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	LagFeatureLayer instance.

`moving_average_layer(name='moving_average', periods=None, pad_value=0.0, keep_original=True, **kwargs)` `staticmethod`

Create a MovingAverageLayer for computing moving averages to smooth time series data.

Parameters:

Name	Type	Description	Default
`name`	`str`	Name of the layer.	`'moving_average'`
`periods`	`list[int]`	List of periods (window sizes) for moving averages. Default is [7] (7-period MA).	`None`
`pad_value`	`float`	Value to use for padding. Default is 0.0.	`0.0`
`keep_original`	`bool`	Whether to keep the original series alongside MAs. Default is True.	`True`
`**kwargs`	`dict`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	MovingAverageLayer instance.

`multi_resolution_attention_layer(num_heads, d_model, embedding_dim=32, name='multi_resolution_attention', **kwargs)` `staticmethod`

Create a MultiResolutionTabularAttention layer.

Parameters:

Name	Type	Description	Default
`num_heads`	`int`	Number of attention heads	required
`d_model`	`int`	Dimensionality of the attention model	required
`embedding_dim`	`int`	Dimension for categorical embeddings	`32`
`name`	`str`	Name of the layer	`'multi_resolution_attention'`
`**kwargs`	`dict`	Additional arguments to pass to the layer	`{}`

Returns:

Name	Type	Description
`MultiResolutionTabularAttention`	`tf.keras.layers.Layer`	A MultiResolutionTabularAttention layer instance

`numerical_embedding_layer(embedding_dim=8, mlp_hidden_units=16, num_bins=10, init_min=-3.0, init_max=3.0, dropout_rate=0.1, use_batch_norm=True, name='numerical_embedding', **kwargs)` `staticmethod`

Create a NumericalEmbedding layer.

Parameters:

Name	Type	Description	Default
`embedding_dim`	`int`	Dimension of the output embedding	`8`
`mlp_hidden_units`	`int`	Number of hidden units in the MLP	`16`
`num_bins`	`int`	Number of bins for discretization	`10`
`init_min`	`float`	Minimum value for initialization	`-3.0`
`init_max`	`float`	Maximum value for initialization	`3.0`
`dropout_rate`	`float`	Dropout rate for regularization	`0.1`
`use_batch_norm`	`bool`	Whether to use batch normalization	`True`
`name`	`str`	Name of the layer	`'numerical_embedding'`
`**kwargs`	`dict`	Additional arguments to pass to the layer	`{}`

Returns:

Name	Type	Description
`NumericalEmbedding`	`tf.keras.layers.Layer`	A NumericalEmbedding layer instance

`rolling_stats_layer(window_size, name='rolling_stats', statistics=None, window_stride=1, pad_value=0.0, **kwargs)` `staticmethod`

Create a RollingStatsLayer for computing rolling statistics over a sliding window.

Parameters:

Name	Type	Description	Default
`window_size`	`int`	Size of the sliding window.	required
`name`	`str`	Name of the layer.	`'rolling_stats'`
`statistics`	`list[str]`	List of statistics to compute. Options: 'mean', 'std', 'min', 'max', 'sum', 'median', 'range', 'variance'. Default is ['mean'].	`None`
`window_stride`	`int`	Stride of the sliding window. Default is 1.	`1`
`pad_value`	`float`	Value to use for padding. Default is 0.0.	`0.0`
`**kwargs`	`dict`	Additional keyword arguments.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	RollingStatsLayer instance.

`tabular_attention_layer(num_heads, d_model, name='tabular_attention', **kwargs)` `staticmethod`

Create a TabularAttention layer.

Parameters:

Name	Type	Description	Default
`num_heads`	`int`	Number of attention heads	required
`d_model`	`int`	Dimensionality of the attention model	required
`name`	`str`	Name of the layer	`'tabular_attention'`
`**kwargs`	`dict`	Additional arguments to pass to the layer	`{}`

Returns:

Name	Type	Description
`TabularAttention`	`tf.keras.layers.Layer`	A TabularAttention layer instance

`text_preprocessing_layer(name='text_preprocessing', **kwargs)` `staticmethod`

Create a TextPreprocessingLayer layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the layer.	`'text_preprocessing'`
`**kwargs`	`dict`	Additional keyword arguments to pass to the layer constructor.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	An instance of the TextPreprocessingLayer layer.

`transformer_block_layer(name='transformer', **kwargs)` `staticmethod`

Create a TransformerBlock layer.

Parameters:

Name	Type	Description	Default
`name`	`str`	The name of the layer.	`'transformer'`
`**kwargs`	`dict`	Additional keyword arguments to pass to the layer constructor.	`{}`

Returns:

Type	Description
`tf.keras.layers.Layer`	An instance of the TransformerBlock layer.

`variable_selection_layer(nr_features=None, units=16, dropout_rate=0.2, name='variable_selection', **kwargs)` `staticmethod`

Create a VariableSelection layer.

Parameters:

Name	Type	Description	Default
`nr_features`	`int`	Number of input features	`None`
`units`	`int`	Dimensionality of the output space	`16`
`dropout_rate`	`float`	Fraction of the input units to drop	`0.2`
`name`	`str`	Name of the layer	`'variable_selection'`
`**kwargs`	`dict`	Additional arguments to pass to the layer	`{}`

Returns:

Name	Type	Description
`VariableSelection`	`tf.keras.layers.Layer`	A VariableSelection layer instance

🏭 Preprocessing Layers Factory

🎡 Using Keras Preprocessing Layers

🏗️ Custom KDP Preprocessing Layers

cast_to_float32_layer(name='cast_to_float32', **kwargs) staticmethod

create_layer(layer_class, name=None, **kwargs) staticmethod

date_encoding_layer(name='date_encoding_layer', **kwargs) staticmethod

date_parsing_layer(name='date_parsing_layer', **kwargs) staticmethod

date_season_layer(name='date_season_layer', **kwargs) staticmethod

differencing_layer(name='differencing', order=1, fill_value=0.0, drop_na=True, **kwargs) staticmethod

distribution_aware_encoder(name='distribution_aware', num_bins=1000, epsilon=1e-06, detect_periodicity=True, handle_sparsity=True, adaptive_binning=True, mixture_components=3, prefered_distribution=None, **kwargs) staticmethod

distribution_transform_layer(name='distribution_transform', transform_type='none', lambda_param=0.0, epsilon=1e-10, min_value=0.0, max_value=1.0, clip_values=True, auto_candidates=None, **kwargs) staticmethod

gated_linear_unit_layer(units, name='gated_linear_unit', **kwargs) staticmethod

gated_residual_network_layer(units, dropout_rate=0.2, name='gated_residual_network', **kwargs) staticmethod

global_numerical_embedding_layer(global_embedding_dim=8, global_mlp_hidden_units=16, global_num_bins=10, global_init_min=-3.0, global_init_max=3.0, global_dropout_rate=0.1, global_use_batch_norm=True, global_pooling='average', name='global_numerical_embedding', **kwargs) staticmethod

lag_feature_layer(name='lag_feature', lags=None, fill_value=0.0, drop_na=True, **kwargs) staticmethod

moving_average_layer(name='moving_average', periods=None, pad_value=0.0, keep_original=True, **kwargs) staticmethod

multi_resolution_attention_layer(num_heads, d_model, embedding_dim=32, name='multi_resolution_attention', **kwargs) staticmethod

numerical_embedding_layer(embedding_dim=8, mlp_hidden_units=16, num_bins=10, init_min=-3.0, init_max=3.0, dropout_rate=0.1, use_batch_norm=True, name='numerical_embedding', **kwargs) staticmethod

rolling_stats_layer(window_size, name='rolling_stats', statistics=None, window_stride=1, pad_value=0.0, **kwargs) staticmethod

tabular_attention_layer(num_heads, d_model, name='tabular_attention', **kwargs) staticmethod

text_preprocessing_layer(name='text_preprocessing', **kwargs) staticmethod

transformer_block_layer(name='transformer', **kwargs) staticmethod

variable_selection_layer(nr_features=None, units=16, dropout_rate=0.2, name='variable_selection', **kwargs) staticmethod

`cast_to_float32_layer(name='cast_to_float32', **kwargs)` `staticmethod`

`create_layer(layer_class, name=None, **kwargs)` `staticmethod`

`date_encoding_layer(name='date_encoding_layer', **kwargs)` `staticmethod`

`date_parsing_layer(name='date_parsing_layer', **kwargs)` `staticmethod`

`date_season_layer(name='date_season_layer', **kwargs)` `staticmethod`

`differencing_layer(name='differencing', order=1, fill_value=0.0, drop_na=True, **kwargs)` `staticmethod`

`distribution_aware_encoder(name='distribution_aware', num_bins=1000, epsilon=1e-06, detect_periodicity=True, handle_sparsity=True, adaptive_binning=True, mixture_components=3, prefered_distribution=None, **kwargs)` `staticmethod`

`distribution_transform_layer(name='distribution_transform', transform_type='none', lambda_param=0.0, epsilon=1e-10, min_value=0.0, max_value=1.0, clip_values=True, auto_candidates=None, **kwargs)` `staticmethod`

`gated_linear_unit_layer(units, name='gated_linear_unit', **kwargs)` `staticmethod`

`gated_residual_network_layer(units, dropout_rate=0.2, name='gated_residual_network', **kwargs)` `staticmethod`

`global_numerical_embedding_layer(global_embedding_dim=8, global_mlp_hidden_units=16, global_num_bins=10, global_init_min=-3.0, global_init_max=3.0, global_dropout_rate=0.1, global_use_batch_norm=True, global_pooling='average', name='global_numerical_embedding', **kwargs)` `staticmethod`

`lag_feature_layer(name='lag_feature', lags=None, fill_value=0.0, drop_na=True, **kwargs)` `staticmethod`

`moving_average_layer(name='moving_average', periods=None, pad_value=0.0, keep_original=True, **kwargs)` `staticmethod`

`multi_resolution_attention_layer(num_heads, d_model, embedding_dim=32, name='multi_resolution_attention', **kwargs)` `staticmethod`

`numerical_embedding_layer(embedding_dim=8, mlp_hidden_units=16, num_bins=10, init_min=-3.0, init_max=3.0, dropout_rate=0.1, use_batch_norm=True, name='numerical_embedding', **kwargs)` `staticmethod`

`rolling_stats_layer(window_size, name='rolling_stats', statistics=None, window_stride=1, pad_value=0.0, **kwargs)` `staticmethod`

`tabular_attention_layer(num_heads, d_model, name='tabular_attention', **kwargs)` `staticmethod`

`text_preprocessing_layer(name='text_preprocessing', **kwargs)` `staticmethod`

`transformer_block_layer(name='transformer', **kwargs)` `staticmethod`

`variable_selection_layer(nr_features=None, units=16, dropout_rate=0.2, name='variable_selection', **kwargs)` `staticmethod`