Transformers module

The pipelines module contains the definitions of our scikit-learn compliant preprocessing steps i.e. transformers. Transformers are estimators supporting transform and/or fit_transform methods see Dataset transformations , scikit-lego and feature-engine for collections of transformers.

class pyro_risks.models.transformers.CategorySelector(variable: str, category: Union[str, list])[source]

Bases: sklearn.base.BaseEstimator

Select features and targets rows.

The CategorySelector transformer select features and targets rows belonging to given variable categories.

Parameters
  • variable – variable to be used for selection.

  • category – modalities to be selected.

fit_resample(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None)Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series][source]

Select features and targets rows.

The fit_resample method allows for selecting the features and target rows. The method does not resample the dataset, the naming convention ensure the compatibility of the transformer with imbalanced-learn Pipeline object.

Parameters
  • X – Training dataset features

  • y – Training dataset target

Returns

Training dataset features and target tuple.

class pyro_risks.models.transformers.FeatureSelector(exclude: List[str], method: str = 'pearson', threshold: float = 0.15)[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Select features correlated to the target.

Select features with correlation to the target above the threshold.

Parameters
  • exclude – column to exclude from correlation calculation.

  • method – correlation matrix calculation method.

  • threshold – columns on which to add lags

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None)pyro_risks.models.transformers.FeatureSelector[source]

Fit the FeatureSelector on X.

Compute the correlation matrix.

Parameters
  • X – Training dataset features.

  • y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Select lag features.

Parameters

X – Training dataset features.

Returns

Transformed training dataset.

class pyro_risks.models.transformers.FeatureSubsetter(columns: List[str])[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Subset dataframe’s column.

Subset any given of the dataframe.

Parameters

threshold – columns on which to add lags

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None)pyro_risks.models.transformers.FeatureSubsetter[source]

Comply with pipeline requirements.

The method does not fit the dataset, the naming convention ensure the compatibility of the transformer with scikit-learn Pipeline object.

Parameters
  • X – Training dataset features.

  • y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Select columns.

Parameters

X – Training dataset features.

Returns

Training dataset features subset.

class pyro_risks.models.transformers.Imputer(columns: list, missing_values: Union[int, float, str] = nan, strategy: str = 'mean', fill_value: Optional[float] = None, verbose: int = 0, copy: bool = True, add_indicator: bool = False)[source]

Bases: sklearn.impute._base.SimpleImputer

Impute missing values.

The Imputer transformer wraps scikit-learn SimpleImputer transformer.

Parameters
  • missing_values – the placeholder for the missing values.

  • strategy – the imputation strategy (mean, median, most_frequent, constant).

  • fill_value – fill_value is used to replace all occurrences of missing_values (default to 0).

  • verbose – controls the verbosity of the imputer.

  • copy – If True, a copy of X will be created.

  • add_indicator – If True, a MissingIndicator transform will stack onto output of the imputer’s transform.

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None)pyro_risks.models.transformers.Imputer[source]

Fit the imputer on X.

Parameters
  • X – Training dataset features.

  • y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Impute X missing values.

Parameters

X – Training dataset features.

Returns

Transformed training dataset.

class pyro_risks.models.transformers.LagTransformer(date_column: str, zone_column: str, columns: List[str])[source]

Bases: sklearn.base.BaseEstimator, sklearn.base.TransformerMixin

Add lags features of the selected columns.

Lags added correspond to day -1, -3 and -7 and are added to each department separately.

Parameters
  • date_column – date column.

  • zone_columns – geographical zoning column.

  • columns – columns to add lag.

fit(X: pandas.core.frame.DataFrame, y: Optional[pandas.core.series.Series] = None)pyro_risks.models.transformers.LagTransformer[source]

Fit the imputer on X.

Parameters
  • X – Training dataset features.

  • y – Training dataset target.

Returns

Transformer.

transform(X: pandas.core.frame.DataFrame)pandas.core.frame.DataFrame[source]

Create lag features.

Parameters

X – Training dataset features.

Returns

Transformed training dataset.

class pyro_risks.models.transformers.TargetDiscretizer(discretizer: Callable)[source]

Bases: sklearn.base.BaseEstimator

Discretize numerical target variable.

The TargetDiscretizer transformer maps target variable values to discrete values using a user defined function.

Parameters

discretizer – user defined function.

fit_resample(X: pandas.core.frame.DataFrame, y: pandas.core.series.Series)Tuple[pandas.core.frame.DataFrame, pandas.core.series.Series][source]

Discretize the target variable.

The fit_resample method allows for discretizing the target variable. The method does not resample the dataset, the naming convention ensure the compatibility of the transformer with imbalanced-learn Pipeline object.

Parameters
  • X – Training dataset features

  • y – Training dataset target

Returns

Training dataset features and target tuple.