Transformers module¶

The pipelines module contains the definitions of our scikit-learn compliant preprocessing steps i.e. transformers. Transformers are estimators supporting transform and/or fit_transform methods see Dataset transformations , scikit-lego and feature-engine for collections of transformers.

class pyro_risks.models.transformers.CategorySelector(variable: str, category: Union[str, list])[source]¶

Bases: sklearn.base.BaseEstimator

Select features and targets rows.

The CategorySelector transformer select features and targets rows belonging to given variable categories.

Parameters

variable – variable to be used for selection.
category – modalities to be selected.

fit_resample(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series][source]¶

Select features and targets rows.

The fit_resample method allows for selecting the features and target rows. The method does not resample the dataset, the naming convention ensure the compatibility of the transformer with imbalanced-learn Pipeline object.

Parameters

X – Training dataset features
y – Training dataset target

Returns

Training dataset features and target tuple.

class pyro_risks.models.transformers.FeatureSelector(exclude: List[str], method: str = 'pearson', threshold: float = 0.15)[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Select features correlated to the target.

Select features with correlation to the target above the threshold.

Parameters

exclude – column to exclude from correlation calculation.
method – correlation matrix calculation method.
threshold – columns on which to add lags

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.FeatureSelector [source]¶

Fit the FeatureSelector on X.

Compute the correlation matrix.

Parameters

X – Training dataset features.
y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Select lag features.

Parameters: X – Training dataset features.
Returns: Transformed training dataset.

class pyro_risks.models.transformers.FeatureSubsetter(columns: List[str])[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Subset dataframe’s column.

Subset any given of the dataframe.

Parameters: threshold – columns on which to add lags

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.FeatureSubsetter [source]¶

Comply with pipeline requirements.

The method does not fit the dataset, the naming convention ensure the compatibility of the transformer with scikit-learn Pipeline object.

Parameters

X – Training dataset features.
y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Select columns.

Parameters: X – Training dataset features.
Returns: Training dataset features subset.

class pyro_risks.models.transformers.Imputer(columns: list, missing_values: Union[int, float, str] = nan, strategy: str = 'mean', fill_value: Optional[float] = None, verbose: int = 0, copy: bool = True, add_indicator: bool = False)[source]¶

Bases: sklearn.impute._base.SimpleImputer

Impute missing values.

The Imputer transformer wraps scikit-learn SimpleImputer transformer.

Parameters

missing_values – the placeholder for the missing values.
strategy – the imputation strategy (mean, median, most_frequent, constant).
fill_value – fill_value is used to replace all occurrences of missing_values (default to 0).
verbose – controls the verbosity of the imputer.
copy – If True, a copy of X will be created.
add_indicator – If True, a MissingIndicator transform will stack onto output of the imputer’s transform.

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.Imputer [source]¶

Fit the imputer on X.

Parameters

X – Training dataset features.
y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Impute X missing values.

Parameters: X – Training dataset features.
Returns: Transformed training dataset.

class pyro_risks.models.transformers.LagTransformer(date_column: str, zone_column: str, columns: List[str])[source]¶

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Add lags features of the selected columns.

Lags added correspond to day -1, -3 and -7 and are added to each department separately.

Parameters

date_column – date column.
zone_columns – geographical zoning column.
columns – columns to add lag.

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None) → pyro_risks.models.transformers.LagTransformer [source]¶

Fit the imputer on X.

Parameters

X – Training dataset features.
y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame) → pandas.core.frame.DataFrame[source]¶

Create lag features.

Parameters: X – Training dataset features.
Returns: Transformed training dataset.

class pyro_risks.models.transformers.TargetDiscretizer(discretizer: Callable)[source]¶

Bases: sklearn.base.BaseEstimator

Discretize numerical target variable.

The TargetDiscretizer transformer maps target variable values to discrete values using a user defined function.

Parameters: discretizer – user defined function.

fit_resample(X: pandas.core.frame.DataFrame, y: pandas.core.series.Series) → Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series][source]¶

Discretize the target variable.

The fit_resample method allows for discretizing the target variable. The method does not resample the dataset, the naming convention ensure the compatibility of the transformer with imbalanced-learn Pipeline object.

Parameters

X – Training dataset features
y – Training dataset target

Returns

Training dataset features and target tuple.