pyro_risks.datasets.datasets_mergers module¶
-
pyro_risks.datasets.datasets_mergers.
merge_by_proximity
(df_left: pandas.core.frame.DataFrame, time_col_left: str, df_right: pandas.core.frame.DataFrame, time_col_right: str, how: str) → pandas.core.frame.DataFrame[source]¶ Merge df_left and df_right by finding in among all points in df_left, the closest point in df_right. For instance, df_left can be a history wildfires dataset and df_right a weather conditions datasets and we want to match each wildfire with its closest weather point. This can also be used if, for instance, we want to merge FWI dataset (df_left) with ERA5/VIIRS datatset (df_right).
- Parameters
df_left – pd.DataFrame Left dataframe, must have “latitude” and “longitude” columns.
time_col_left – str Name of the time column in df_left.
df_right – pd.DataFrame Right dataset, must have points described by their latitude and longitude.
time_col_right – str Name of the time column in df_right.
how – str How the pandas merge needs to be done.
- Returns
Merged dataset by point (lat/lon) proximity.
-
pyro_risks.datasets.datasets_mergers.
merge_datasets_by_closest_weather_point
(df_weather: pandas.core.frame.DataFrame, time_col_weather: str, df_fires: pandas.core.frame.DataFrame, time_col_fires: str) → pandas.core.frame.DataFrame[source]¶ Merge weather and fire datasets when the weather dataset is provided using satellite data such as ERA5 Land hourly dataset provided here https://cds.climate.copernicus.eu/cdsapp#!/dataset/reanalysis-era5-land?tab=form and accessible through cdsapi.
- Parameters
df_weather – pd.DataFrame Weather conditions dataframe, must have “latitude” and “longitude” columns.
time_col_weather – str Name of the time column in df_weather.
df_fires – pd.DataFrame Wildfires history dataset, must have points described by their latitude and longitude.
time_col_fires – str Name of the time column in df_fires.
- Returns: pd.DataFrame
Merged dataset by weather station proximity.
-
pyro_risks.datasets.datasets_mergers.
merge_datasets_by_closest_weather_station
(df_weather: pandas.core.frame.DataFrame, time_col_weather: str, df_fires: pandas.core.frame.DataFrame, time_col_fires: str) → pandas.core.frame.DataFrame[source]¶ Merge two datasets: one of weather conditions and the other of wildfires history data. Each dataset must contain a time column, and the weather dataset must have a STATION column which allows to identify uniquely each station. The merge is done by finding the closest weather station to each (lat, lon) point of the wildfires history dataset. The latter is then grouped by date and closest_weather_station, which then allows to join it with the weather conditions dataframe.
- Parameters
df_weather – pd.DataFrame Weather conditions dataframe. Must have a STATION column to identify each weather station.
time_col_weather – str Name of the time column in df_weather.
df_fires – pd.DataFrame Wildfires history dataset, must have points described by their latitude and longitude.
time_col_fires – str Name of the time column in df_fires.
- Returns: pd.DataFrame
Merged dataset by weather station proximity.
-
pyro_risks.datasets.datasets_mergers.
merge_datasets_by_departements
(dataframe1: pandas.core.frame.DataFrame, time_col1: str, geometry_col1: str, dataframe2: pandas.core.frame.DataFrame, time_col2: str, geometry_col2: str, how: str) → pandas.core.frame.DataFrame[source]¶ Merge two datasets containing some kind of geometry and date columns. The merge is down on [time_col1, time_col2] and [geometry_col1, geometry_col2]. Here the geometry is based on French departements. Therefore the geometry columns should contains either the code on the departement or its geometry (should be consistent throughout both datasets).
Finally the merge is done according to the how parameter. Keep me mind that this parameter must be so that the merged dataframe keeps similar dimensions to the weather dataframe. This is because if there is an inner join, we will keep only the days where wildfires were declared. Therefore if the weather dataframe is the left frame, then how must be left, if it is the right frame, how must be right.
- Parameters
dataframe1 – pd.DataFrame First dataframe, containing a time column and a geometry one.
time_col1 – str Name of the time column of dataframe1 on which the merge will be done.
geometry_col1 – str Name of the geometry column of dataframe1 on which the merge will be done.
dataframe2 – pd.DataFrame Second dataframe, containing a time column and a geometry one.
time_col2 – str Name of the time column of dataframe2 on which the merge will be done.
geometry_col2 – str Name of the geometry column of dataframe2 on which the merge will be done.
how – Parameter of the merge, should correspond to which of the left or right frame the weather dataframe is.
- Returns: pd.DataFrame
Merged dataset on French departement.
pyro_risks.datasets.era_fwi_viirs module¶
-
class
pyro_risks.datasets.era_fwi_viirs.
MergedEraFwiViirs
(era_source_path: Optional[str] = None, viirs_source_path: Optional[str] = None, fwi_source_path: Optional[str] = None)[source]¶ Bases:
pandas.core.frame.DataFrame
Create dataframe for modeling described in models/score_v0.py.
Get weather, nasafirms viirs fires and fwi datasets, then filter some of the lines corresponding to vegetation fires excluding low confidence ones merges. Finally aggregated versions of the dataframes by department and by day. For each of the features of weather and fwi datasets creates min, max, mean and std. Fires are counted for each department and day.
- Returns
pd.DataFrame
pyro_risks.datasets.queries_api module¶
-
pyro_risks.datasets.queries_api.
call_era5land
(output_path: str, year: str, month: str, day: str) → None[source]¶ Call cdpaspi to get ERA5Land data as file nc format for given date.
By default “time” = “14:00”. It is not an issue since we get these ERA5 Land data with a 2 months delay.
- Parameters
output_path – str
year – str
month – str
day – str
-
pyro_risks.datasets.queries_api.
call_era5t
(output_path: str, year: str, month: str, day: str) → None[source]¶ Call cdpaspi to get ERA5T data as file nc format for given date.
Most recent available data is Day -5. By default “time” = “14:00”. It is not an issue since we get these ERA5T data with a 5 days delay.
- Parameters
output_path – str
year – str
month – str
day – str
-
pyro_risks.datasets.queries_api.
call_fwi
(output_path: str, year: str, month: str, day: str) → None[source]¶ Get data from Fire danger indices historical data from the Copernicus Climate Data Store.
Information on FWI can be found here: https://datastore.copernicus-climate.eu/c3s/published-forms/c3sprod/cems-fire-historical/Fire_In_CDS.pdf
Please follow the instructions before using the CDS API: https://cds.climate.copernicus.eu/api-how-to Most recent available data is Day -2
- Parameters
output_path – str
year – str
month – str
day – str
pyro_risks.datasets.utils module¶
-
pyro_risks.datasets.utils.
download
(url: str, default_extension: str, unzip: Optional[bool] = True, destination: str = './tmp') → None[source]¶ Helper function for downloading, unzipping and saving compressed file from a given URL.
- Parameters
url – URL of the compressed archive
default_extension – extension of the archive
unzip – whether archive should be unzipped. Defaults to True.
destination – folder where the file should be saved. Defaults to ‘.’.
-
pyro_risks.datasets.utils.
find_closest_location
(df_weather: pandas.core.frame.DataFrame, latitude: float, longitude: float) → Tuple[float, float][source]¶ For a given point (latitude, longitude), get the closest point which exists in df_weather. This function is to be used when the user do not choose to use weather stations data but satellite data e.g. ERA5 Land variables.
- Parameters
df_weather – pd.DataFrame Dataframe of land/weather conditions
latitude – float Latitude of the point to which we want to find the closest point in df_weather.
longitude – float Longitude of the point to which we want to find the closest in df_weather.
- Returns: Tuple(float, float)
Tuple of the closest weather point (closest_lat, closest_lon) of the point (lat, lon)
-
pyro_risks.datasets.utils.
find_closest_weather_station
(df_weather: pandas.core.frame.DataFrame, latitude: pandas.core.frame.DataFrame, longitude: pandas.core.frame.DataFrame) → int[source]¶ The weather dataframe SHOULD contain a “STATION” column giving the id of each weather station in the dataset.
- Parameters
df_weather – pd.DataFrame Dataframe of weather conditions
latitude – float Latitude of the point to which we want to find the closest weather station
longitude – float Longitude of the point to which we want to find the closest weather station
- Returns: int
Id of the closest weather station of the point (lat, lon)
-
pyro_risks.datasets.utils.
get_fname
(url: str) → Tuple[str, Optional[str], Optional[str]][source]¶ Find file name, extension and compression of an archive located by an URL.
- Parameters
url – URL of the compressed archive
- Raises
ValueError – if URL contains more than one extension
ValueError – if URL contains more than one compression format
- Returns
A tuple containing the base file name, extension and compression format
-
pyro_risks.datasets.utils.
get_ghcn
(start_year: Optional[int] = None, end_year: Optional[int] = None, destination: str = './ghcn') → None[source]¶ Download yearly Global Historical Climatology Network - Daily (GHCN-Daily) (.csv) From (NCEI).
- Parameters
start_year – first year to be retrieved. Defaults to None.
end_year – first that will not be retrieved. Defaults to None.
destination – destination directory. Defaults to ‘./ghcn’.
-
pyro_risks.datasets.utils.
get_intersection_range
(ts1: pandas.core.series.Series, ts2: pandas.core.series.Series) → pandas.core.indexes.datetimes.DatetimeIndex[source]¶ Computes the intersecting date range of two series.
- Parameters
ts1 – time series
ts2 – time series
-
pyro_risks.datasets.utils.
get_modis
(start_year: Optional[int] = None, end_year: Optional[int] = None, yearly: Optional[bool] = False, destination: str = './firms') → None[source]¶ Download last 24H or yearly France active fires from the FIRMS NASA. :param start_year: first year to be retrieved. Defaults to None. :param end_year: first that will not be retrieved. Defaults to None. :param yearly: whether to download yearly active fires or not. Defaults to False. :param destination: destination directory. Defaults to ‘./firms’.]
-
pyro_risks.datasets.utils.
get_nearest_points
(source_points: List[Tuple[Any, Any]], candidates: List[Tuple[Any, Any]]) → Tuple[source]¶ Find nearest neighbor for all source points from a set of candidate points using KDTree algorithm.
- Parameters
source_points – List[Tuple] List of tuples (lat, lon) for which you want to find the closest point in candidates.
candidates – List[Tuple] List of tuples (lat, lon) which are all possible closest points.
- Returns: Tuple
- indicesarray of integers
The locations of the neighbors in candidates.
- distancesarray of floats
The distances to the nearest neighbors..
-
pyro_risks.datasets.utils.
url_retrieve
(url: str, timeout: Optional[float] = None) → bytes[source]¶ Retrives and pass the content of an URL request.
- Parameters
url – URL to request
timeout – number of seconds before the request times out. Defaults to 4.
- Raises
requests.exceptions.ConnectionError –
- Returns
Content of the response