pyro_risks.datasets.datasets_mergers module

pyro_risks.datasets.datasets_mergers.merge_by_proximity(df_left: pandas.core.frame.DataFrame, time_col_left: str, df_right: pandas.core.frame.DataFrame, time_col_right: str, how: str)pandas.core.frame.DataFrame[source]

Merge df_left and df_right by finding in among all points in df_left, the closest point in df_right. For instance, df_left can be a history wildfires dataset and df_right a weather conditions datasets and we want to match each wildfire with its closest weather point. This can also be used if, for instance, we want to merge FWI dataset (df_left) with ERA5/VIIRS datatset (df_right).

Parameters
  • df_left – pd.DataFrame Left dataframe, must have “latitude” and “longitude” columns.

  • time_col_left – str Name of the time column in df_left.

  • df_right – pd.DataFrame Right dataset, must have points described by their latitude and longitude.

  • time_col_right – str Name of the time column in df_right.

  • how – str How the pandas merge needs to be done.

Returns

Merged dataset by point (lat/lon) proximity.

pyro_risks.datasets.datasets_mergers.merge_datasets_by_closest_weather_point(df_weather: pandas.core.frame.DataFrame, time_col_weather: str, df_fires: pandas.core.frame.DataFrame, time_col_fires: str)pandas.core.frame.DataFrame[source]

Merge weather and fire datasets when the weather dataset is provided using satellite data such as ERA5 Land hourly dataset provided here https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form and accessible through cdsapi.

Parameters
  • df_weather – pd.DataFrame Weather conditions dataframe, must have “latitude” and “longitude” columns.

  • time_col_weather – str Name of the time column in df_weather.

  • df_fires – pd.DataFrame Wildfires history dataset, must have points described by their latitude and longitude.

  • time_col_fires – str Name of the time column in df_fires.

Returns: pd.DataFrame

Merged dataset by weather station proximity.

pyro_risks.datasets.datasets_mergers.merge_datasets_by_closest_weather_station(df_weather: pandas.core.frame.DataFrame, time_col_weather: str, df_fires: pandas.core.frame.DataFrame, time_col_fires: str)pandas.core.frame.DataFrame[source]

Merge two datasets: one of weather conditions and the other of wildfires history data. Each dataset must contain a time column, and the weather dataset must have a STATION column which allows to identify uniquely each station. The merge is done by finding the closest weather station to each (lat, lon) point of the wildfires history dataset. The latter is then grouped by date and closest_weather_station, which then allows to join it with the weather conditions dataframe.

Parameters
  • df_weather – pd.DataFrame Weather conditions dataframe. Must have a STATION column to identify each weather station.

  • time_col_weather – str Name of the time column in df_weather.

  • df_fires – pd.DataFrame Wildfires history dataset, must have points described by their latitude and longitude.

  • time_col_fires – str Name of the time column in df_fires.

Returns: pd.DataFrame

Merged dataset by weather station proximity.

pyro_risks.datasets.datasets_mergers.merge_datasets_by_departements(dataframe1: pandas.core.frame.DataFrame, time_col1: str, geometry_col1: str, dataframe2: pandas.core.frame.DataFrame, time_col2: str, geometry_col2: str, how: str)pandas.core.frame.DataFrame[source]

Merge two datasets containing some kind of geometry and date columns. The merge is down on [time_col1, time_col2] and [geometry_col1, geometry_col2]. Here the geometry is based on French departements. Therefore the geometry columns should contains either the code on the departement or its geometry (should be consistent throughout both datasets).

Finally the merge is done according to the how parameter. Keep me mind that this parameter must be so that the merged dataframe keeps similar dimensions to the weather dataframe. This is because if there is an inner join, we will keep only the days where wildfires were declared. Therefore if the weather dataframe is the left frame, then how must be left, if it is the right frame, how must be right.

Parameters
  • dataframe1 – pd.DataFrame First dataframe, containing a time column and a geometry one.

  • time_col1 – str Name of the time column of dataframe1 on which the merge will be done.

  • geometry_col1 – str Name of the geometry column of dataframe1 on which the merge will be done.

  • dataframe2 – pd.DataFrame Second dataframe, containing a time column and a geometry one.

  • time_col2 – str Name of the time column of dataframe2 on which the merge will be done.

  • geometry_col2 – str Name of the geometry column of dataframe2 on which the merge will be done.

  • how – Parameter of the merge, should correspond to which of the left or right frame the weather dataframe is.

Returns: pd.DataFrame

Merged dataset on French departement.

pyro_risks.datasets.era_fwi_viirs module

class pyro_risks.datasets.era_fwi_viirs.MergedEraFwiViirs(era_source_path: Optional[str] = None, viirs_source_path: Optional[str] = None, fwi_source_path: Optional[str] = None)[source]

Bases: pandas.core.frame.DataFrame

Create dataframe for modeling described in models/score_v0.py.

Get weather, nasafirms viirs fires and fwi datasets, then filter some of the lines corresponding to vegetation fires excluding low confidence ones merges. Finally aggregated versions of the dataframes by department and by day. For each of the features of weather and fwi datasets creates min, max, mean and std. Fires are counted for each department and day.

Returns

pd.DataFrame

pyro_risks.datasets.queries_api module

pyro_risks.datasets.queries_api.call_era5land(output_path: str, year: str, month: str, day: str)None[source]

Call cdpaspi to get ERA5Land data as file nc format for given date.

By default “time” = “14:00”. It is not an issue since we get these ERA5 Land data with a 2 months delay.

Parameters
  • output_path – str

  • year – str

  • month – str

  • day – str

pyro_risks.datasets.queries_api.call_era5t(output_path: str, year: str, month: str, day: str)None[source]

Call cdpaspi to get ERA5T data as file nc format for given date.

Most recent available data is Day -5. By default “time” = “14:00”. It is not an issue since we get these ERA5T data with a 5 days delay.

Parameters
  • output_path – str

  • year – str

  • month – str

  • day – str

pyro_risks.datasets.queries_api.call_fwi(output_path: str, year: str, month: str, day: str)None[source]

Get data from Fire danger indices historical data from the Copernicus Climate Data Store.

Information on FWI can be found here: https://datastore.copernicus-climate.eu/c3s/published-forms/c3sprod/cems-fire-historical/Fire_In_CDS.pdf

Please follow the instructions before using the CDS API: https://cds.climate.copernicus.eu/api-how-to Most recent available data is Day -2

Parameters
  • output_path – str

  • year – str

  • month – str

  • day – str

pyro_risks.datasets.utils module

pyro_risks.datasets.utils.download(url: str, default_extension: str, unzip: Optional[bool] = True, destination: str = './tmp')None[source]

Helper function for downloading, unzipping and saving compressed file from a given URL.

Parameters
  • url – URL of the compressed archive

  • default_extension – extension of the archive

  • unzip – whether archive should be unzipped. Defaults to True.

  • destination – folder where the file should be saved. Defaults to ‘.’.

pyro_risks.datasets.utils.find_closest_location(df_weather: pandas.core.frame.DataFrame, latitude: float, longitude: float)Tuple[float, float][source]

For a given point (latitude, longitude), get the closest point which exists in df_weather. This function is to be used when the user do not choose to use weather stations data but satellite data e.g. ERA5 Land variables.

Parameters
  • df_weather – pd.DataFrame Dataframe of land/weather conditions

  • latitude – float Latitude of the point to which we want to find the closest point in df_weather.

  • longitude – float Longitude of the point to which we want to find the closest in df_weather.

Returns: Tuple(float, float)

Tuple of the closest weather point (closest_lat, closest_lon) of the point (lat, lon)

pyro_risks.datasets.utils.find_closest_weather_station(df_weather: pandas.core.frame.DataFrame, latitude: pandas.core.frame.DataFrame, longitude: pandas.core.frame.DataFrame)int[source]

The weather dataframe SHOULD contain a “STATION” column giving the id of each weather station in the dataset.

Parameters
  • df_weather – pd.DataFrame Dataframe of weather conditions

  • latitude – float Latitude of the point to which we want to find the closest weather station

  • longitude – float Longitude of the point to which we want to find the closest weather station

Returns: int

Id of the closest weather station of the point (lat, lon)

pyro_risks.datasets.utils.get_fname(url: str)Tuple[str, Optional[str], Optional[str]][source]

Find file name, extension and compression of an archive located by an URL.

Parameters

url – URL of the compressed archive

Raises
  • ValueError – if URL contains more than one extension

  • ValueError – if URL contains more than one compression format

Returns

A tuple containing the base file name, extension and compression format

pyro_risks.datasets.utils.get_ghcn(start_year: Optional[int] = None, end_year: Optional[int] = None, destination: str = './ghcn')None[source]

Download yearly Global Historical Climatology Network - Daily (GHCN-Daily) (.csv) From (NCEI).

Parameters
  • start_year – first year to be retrieved. Defaults to None.

  • end_year – first that will not be retrieved. Defaults to None.

  • destination – destination directory. Defaults to ‘./ghcn’.

pyro_risks.datasets.utils.get_intersection_range(ts1: pandas.core.series.Series, ts2: pandas.core.series.Series)pandas.core.indexes.datetimes.DatetimeIndex[source]

Computes the intersecting date range of two series.

Parameters
  • ts1 – time series

  • ts2 – time series

pyro_risks.datasets.utils.get_modis(start_year: Optional[int] = None, end_year: Optional[int] = None, yearly: Optional[bool] = False, destination: str = './firms')None[source]

Download last 24H or yearly France active fires from the FIRMS NASA. :param start_year: first year to be retrieved. Defaults to None. :param end_year: first that will not be retrieved. Defaults to None. :param yearly: whether to download yearly active fires or not. Defaults to False. :param destination: destination directory. Defaults to ‘./firms’.]

pyro_risks.datasets.utils.get_nearest_points(source_points: List[Tuple[Any, Any]], candidates: List[Tuple[Any, Any]])Tuple[source]

Find nearest neighbor for all source points from a set of candidate points using KDTree algorithm.

Parameters
  • source_points – List[Tuple] List of tuples (lat, lon) for which you want to find the closest point in candidates.

  • candidates – List[Tuple] List of tuples (lat, lon) which are all possible closest points.

Returns: Tuple
indicesarray of integers

The locations of the neighbors in candidates.

distancesarray of floats

The distances to the nearest neighbors..

pyro_risks.datasets.utils.url_retrieve(url: str, timeout: Optional[float] = None)bytes[source]

Retrives and pass the content of an URL request.

Parameters
  • url – URL to request

  • timeout – number of seconds before the request times out. Defaults to 4.

Raises

requests.exceptions.ConnectionError

Returns

Content of the response