๐ญ Preprocessing Layers Factory
The PreprocessorLayerFactory
class provides a convenient way to create and manage preprocessing layers for your machine learning models. It supports both standard Keras preprocessing layers and custom layers defined within the KDP framework.
๐ก Using Keras Preprocessing Layers
All preprocessing layers available in Keras can be used within the PreprocessorLayerFactory
. You can access these layers by their class names. Here's an example of how to use a Keras preprocessing layer:
normalization_layer = PreprocessorLayerFactory.create_layer(
"Normalization",
axis=-1,
mean=None,
variance=None
)
- Normalization - Standardizes numerical features
- Discretization - Bins continuous features into discrete intervals
- CategoryEncoding - Converts categorical data into numeric representations
- Hashing - Performs feature hashing for categorical variables
- HashedCrossing - Creates feature crosses using hashing
- StringLookup - Converts string inputs to integer indices
- IntegerLookup - Maps integer inputs to indexed array positions
- TextVectorization - Processes raw text into encoded representations
- ... and more
๐๏ธ Custom KDP Preprocessing Layers
In addition to Keras layers, the PreprocessorLayerFactory
includes several custom layers specific to the KDP framework. Here's a list of available custom layers:
cast_to_float32_layer(name='cast_to_float32', **kwargs)
staticmethod
Create a CastToFloat32Layer layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the layer. |
'cast_to_float32'
|
**kwargs |
dict
|
Additional keyword arguments to pass to the layer constructor. |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
An instance of the CastToFloat32Layer layer. |
create_layer(layer_class, name=None, **kwargs)
staticmethod
Create a layer using the layer class name, automatically filtering kwargs based on the layer class.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
layer_class |
str | Class Object
|
The name of the layer class to be created (e.g., 'Normalization', 'Rescaling') or the class object itself. |
required |
name |
str
|
The name of the layer. Optional. |
None
|
**kwargs |
Additional keyword arguments to pass to the layer constructor. |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
An instance of the specified layer class. |
date_encoding_layer(name='date_encoding_layer', **kwargs)
staticmethod
Create a DateEncodingLayer layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the layer. |
'date_encoding_layer'
|
**kwargs |
dict
|
Additional keyword arguments to pass to the layer constructor. |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
An instance of the DateEncodingLayer layer. |
date_parsing_layer(name='date_parsing_layer', **kwargs)
staticmethod
Create a DateParsingLayer layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the layer. |
'date_parsing_layer'
|
**kwargs |
dict
|
Additional keyword arguments to pass to the layer constructor. |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
An instance of the DateParsingLayer layer. |
date_season_layer(name='date_season_layer', **kwargs)
staticmethod
Create a SeasonLayer layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the layer. |
'date_season_layer'
|
**kwargs |
dict
|
Additional keyword arguments to pass to the layer constructor. |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
An instance of the SeasonLayer layer. |
distribution_aware_encoder(name='distribution_aware', num_bins=1000, epsilon=1e-06, detect_periodicity=True, handle_sparsity=True, adaptive_binning=True, mixture_components=3, prefered_distribution=None, **kwargs)
staticmethod
Create a DistributionAwareEncoder layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of the layer |
'distribution_aware'
|
num_bins |
int
|
Number of bins for quantile encoding |
1000
|
epsilon |
float
|
Small value for numerical stability |
1e-06
|
detect_periodicity |
bool
|
Whether to detect and handle periodic patterns |
True
|
handle_sparsity |
bool
|
Whether to handle sparse data specially |
True
|
adaptive_binning |
bool
|
Whether to use adaptive binning |
True
|
mixture_components |
int
|
Number of components for mixture modeling |
3
|
specified_distribution |
DistributionType
|
Optional specific distribution type to use |
required |
**kwargs |
Additional keyword arguments |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
DistributionAwareEncoder layer |
distribution_transform_layer(name='distribution_transform', transform_type='none', lambda_param=0.0, epsilon=1e-10, min_value=0.0, max_value=1.0, clip_values=True, auto_candidates=None, **kwargs)
staticmethod
Create a DistributionTransformLayer layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
Name of the layer |
'distribution_transform'
|
transform_type |
str
|
Type of transformation to apply |
'none'
|
lambda_param |
float
|
Parameter for parameterized transformations |
0.0
|
epsilon |
float
|
Small value for numerical stability |
1e-10
|
min_value |
float
|
Minimum value for min-max scaling |
0.0
|
max_value |
float
|
Maximum value for min-max scaling |
1.0
|
clip_values |
bool
|
Whether to clip values to the specified range |
True
|
auto_candidates |
list[str]
|
List of transformations to consider in auto mode |
None
|
**kwargs |
Additional keyword arguments |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
DistributionTransformLayer layer |
gated_linear_unit_layer(units, name='gated_linear_unit', **kwargs)
staticmethod
Create a GatedLinearUnit layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
units |
int
|
Dimensionality of the output space |
required |
name |
str
|
Name of the layer |
'gated_linear_unit'
|
**kwargs |
dict
|
Additional arguments to pass to the layer |
{}
|
Returns:
Name | Type | Description |
---|---|---|
GatedLinearUnit |
tf.keras.layers.Layer
|
A GatedLinearUnit layer instance |
gated_residual_network_layer(units, dropout_rate=0.2, name='gated_residual_network', **kwargs)
staticmethod
Create a GatedResidualNetwork layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
units |
int
|
Dimensionality of the output space |
required |
dropout_rate |
float
|
Fraction of the input units to drop |
0.2
|
name |
str
|
Name of the layer |
'gated_residual_network'
|
**kwargs |
dict
|
Additional arguments to pass to the layer |
{}
|
Returns:
Name | Type | Description |
---|---|---|
GatedResidualNetwork |
tf.keras.layers.Layer
|
A GatedResidualNetwork layer instance |
global_numerical_embedding_layer(global_embedding_dim=8, global_mlp_hidden_units=16, global_num_bins=10, global_init_min=-3.0, global_init_max=3.0, global_dropout_rate=0.1, global_use_batch_norm=True, global_pooling='average', name='global_numerical_embedding', **kwargs)
staticmethod
Create a GlobalNumericalEmbedding layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
global_embedding_dim |
int
|
Dimension of the final global embedding |
8
|
global_mlp_hidden_units |
int
|
Number of hidden units in the global MLP |
16
|
global_num_bins |
int
|
Number of bins for discretization |
10
|
global_init_min |
float
|
Minimum value for initialization |
-3.0
|
global_init_max |
float
|
Maximum value for initialization |
3.0
|
global_dropout_rate |
float
|
Dropout rate for regularization |
0.1
|
global_use_batch_norm |
bool
|
Whether to use batch normalization |
True
|
global_pooling |
str
|
Pooling method to use ("average" or "max") |
'average'
|
name |
str
|
Name of the layer |
'global_numerical_embedding'
|
**kwargs |
dict
|
Additional arguments to pass to the layer |
{}
|
Returns:
Name | Type | Description |
---|---|---|
GlobalNumericalEmbedding |
tf.keras.layers.Layer
|
A GlobalNumericalEmbedding layer instance |
multi_resolution_attention_layer(num_heads, d_model, embedding_dim=32, name='multi_resolution_attention', **kwargs)
staticmethod
Create a MultiResolutionTabularAttention layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_heads |
int
|
Number of attention heads |
required |
d_model |
int
|
Dimensionality of the attention model |
required |
embedding_dim |
int
|
Dimension for categorical embeddings |
32
|
name |
str
|
Name of the layer |
'multi_resolution_attention'
|
**kwargs |
dict
|
Additional arguments to pass to the layer |
{}
|
Returns:
Name | Type | Description |
---|---|---|
MultiResolutionTabularAttention |
tf.keras.layers.Layer
|
A MultiResolutionTabularAttention layer instance |
numerical_embedding_layer(embedding_dim=8, mlp_hidden_units=16, num_bins=10, init_min=-3.0, init_max=3.0, dropout_rate=0.1, use_batch_norm=True, name='numerical_embedding', **kwargs)
staticmethod
Create a NumericalEmbedding layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
embedding_dim |
int
|
Dimension of the output embedding |
8
|
mlp_hidden_units |
int
|
Number of hidden units in the MLP |
16
|
num_bins |
int
|
Number of bins for discretization |
10
|
init_min |
float
|
Minimum value for initialization |
-3.0
|
init_max |
float
|
Maximum value for initialization |
3.0
|
dropout_rate |
float
|
Dropout rate for regularization |
0.1
|
use_batch_norm |
bool
|
Whether to use batch normalization |
True
|
name |
str
|
Name of the layer |
'numerical_embedding'
|
**kwargs |
dict
|
Additional arguments to pass to the layer |
{}
|
Returns:
Name | Type | Description |
---|---|---|
NumericalEmbedding |
tf.keras.layers.Layer
|
A NumericalEmbedding layer instance |
tabular_attention_layer(num_heads, d_model, name='tabular_attention', **kwargs)
staticmethod
Create a TabularAttention layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
num_heads |
int
|
Number of attention heads |
required |
d_model |
int
|
Dimensionality of the attention model |
required |
name |
str
|
Name of the layer |
'tabular_attention'
|
**kwargs |
dict
|
Additional arguments to pass to the layer |
{}
|
Returns:
Name | Type | Description |
---|---|---|
TabularAttention |
tf.keras.layers.Layer
|
A TabularAttention layer instance |
text_preprocessing_layer(name='text_preprocessing', **kwargs)
staticmethod
Create a TextPreprocessingLayer layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the layer. |
'text_preprocessing'
|
**kwargs |
dict
|
Additional keyword arguments to pass to the layer constructor. |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
An instance of the TextPreprocessingLayer layer. |
transformer_block_layer(name='transformer', **kwargs)
staticmethod
Create a TransformerBlock layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
name |
str
|
The name of the layer. |
'transformer'
|
**kwargs |
dict
|
Additional keyword arguments to pass to the layer constructor. |
{}
|
Returns:
Type | Description |
---|---|
tf.keras.layers.Layer
|
An instance of the TransformerBlock layer. |
variable_selection_layer(nr_features=None, units=16, dropout_rate=0.2, name='variable_selection', **kwargs)
staticmethod
Create a VariableSelection layer.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
nr_features |
int
|
Number of input features |
None
|
units |
int
|
Dimensionality of the output space |
16
|
dropout_rate |
float
|
Fraction of the input units to drop |
0.2
|
name |
str
|
Name of the layer |
'variable_selection'
|
**kwargs |
dict
|
Additional arguments to pass to the layer |
{}
|
Returns:
Name | Type | Description |
---|---|---|
VariableSelection |
tf.keras.layers.Layer
|
A VariableSelection layer instance |