Transformers module¶
The pipelines module contains the definitions of our scikit-learn compliant preprocessing steps i.e. transformers. Transformers are estimators supporting transform and/or fit_transform methods see Dataset transformations , scikit-lego and feature-engine for collections of transformers.
-
class
pyro_risks.models.transformers.
CategorySelector
(variable: str, category: Union[str, list])[source]¶ Bases:
sklearn.base.BaseEstimator
Select features and targets rows.
The CategorySelector transformer select features and targets rows belonging to given variable categories.
- Parameters
variable – variable to be used for selection.
category – modalities to be selected.
-
fit_resample
(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series][source]¶ Select features and targets rows.
The fit_resample method allows for selecting the features and target rows. The method does not resample the dataset, the naming convention ensure the compatibility of the transformer with imbalanced-learn Pipeline object.
- Parameters
X – Training dataset features
y – Training dataset target
- Returns
Training dataset features and target tuple.
-
class
pyro_risks.models.transformers.
FeatureSelector
(exclude: List[str], method: str = 'pearson', threshold: float = 0.15)[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Select features correlated to the target.
Select features with correlation to the target above the threshold.
- Parameters
exclude – column to exclude from correlation calculation.
method – correlation matrix calculation method.
threshold – columns on which to add lags
-
fit
(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.FeatureSelector[source]¶ Fit the FeatureSelector on X.
Compute the correlation matrix.
- Parameters
X – Training dataset features.
y – Training dataset target.
- Returns
Transformer.
-
class
pyro_risks.models.transformers.
FeatureSubsetter
(columns: List[str])[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Subset dataframe’s column.
Subset any given of the dataframe.
- Parameters
threshold – columns on which to add lags
-
fit
(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.FeatureSubsetter[source]¶ Comply with pipeline requirements.
The method does not fit the dataset, the naming convention ensure the compatibility of the transformer with scikit-learn Pipeline object.
- Parameters
X – Training dataset features.
y – Training dataset target.
- Returns
Transformer.
-
class
pyro_risks.models.transformers.
Imputer
(columns: list, missing_values: Union[int, float, str] = nan, strategy: str = 'mean', fill_value: Optional[float] = None, verbose: int = 0, copy: bool = True, add_indicator: bool = False)[source]¶ Bases:
sklearn.impute._base.SimpleImputer
Impute missing values.
The Imputer transformer wraps scikit-learn SimpleImputer transformer.
- Parameters
missing_values – the placeholder for the missing values.
strategy – the imputation strategy (mean, median, most_frequent, constant).
fill_value – fill_value is used to replace all occurrences of missing_values (default to 0).
verbose – controls the verbosity of the imputer.
copy – If True, a copy of X will be created.
add_indicator – If True, a MissingIndicator transform will stack onto output of the imputer’s transform.
-
fit
(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.Imputer[source]¶ Fit the imputer on X.
- Parameters
X – Training dataset features.
y – Training dataset target.
- Returns
Transformer.
-
class
pyro_risks.models.transformers.
LagTransformer
(date_column: str, zone_column: str, columns: List[str])[source]¶ Bases:
sklearn.base.BaseEstimator
,sklearn.base.TransformerMixin
Add lags features of the selected columns.
Lags added correspond to day -1, -3 and -7 and are added to each department separately.
- Parameters
date_column – date column.
zone_columns – geographical zoning column.
columns – columns to add lag.
-
fit
(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.LagTransformer[source]¶ Fit the imputer on X.
- Parameters
X – Training dataset features.
y – Training dataset target.
- Returns
Transformer.
-
class
pyro_risks.models.transformers.
TargetDiscretizer
(discretizer: Callable)[source]¶ Bases:
sklearn.base.BaseEstimator
Discretize numerical target variable.
The TargetDiscretizer transformer maps target variable values to discrete values using a user defined function.
- Parameters
discretizer – user defined function.
-
fit_resample
(X: pandas.core.frame.DataFrame, y: pandas.core.series.Series) → Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series][source]¶ Discretize the target variable.
The fit_resample method allows for discretizing the target variable. The method does not resample the dataset, the naming convention ensure the compatibility of the transformer with imbalanced-learn Pipeline object.
- Parameters
X – Training dataset features
y – Training dataset target
- Returns
Training dataset features and target tuple.