Categories
Articles

Prediction of Wildfire Using Machine Learning

See How We Create a Machine Learning Model That Predicts When a Wildfire Will Become Critical.

Some articles and research papers such as the EIA white paper have widely discussed the significant impact of wildfires on solar irradiance and solar energy generation, and also their impact on air quality.

These wildfires have huge consequences in terms of environment, social — as people are being forced to leave their home — and the economy. It has become very urgent to find a way to anticipate them, or at least minimize their impact.

In recent years, with the emergence of new technologies, fighting wildfires has made significant progress. For example, NASA is contributing a lot in this domain by covering wildfires using many sources with their satellite instruments that are often the first to detect wildfires burning in remote regions. The locations of new fires are sent directly to land managers worldwide within hours of the satellite overpass.

Some actors in the market such as Insight Robotics propose early wildfire detection systems. Most of these systems are powered by artificial intelligence to detect thermal signals and smoke a few minutes right after the fire has started, and provide 24/7 protection against wildfires.

Also, many wildfire datasets are now available in open-source. These datasets provide labeled images of fires allowing the building of machine or deep learning models that learn how to detect smoke and identify fires.

In this post, we explore the feasibility of using a machine learning approach to predict the evolution of existing fire and anticipate when it is going to become critical and require huge resources to contain it. Here, the main difference with the existing approaches lies in the use of different data sources.

Indeed, as we have already demonstrated the impact of wildfires on solar irradiance in a preceding work presented in a previous article, we decided to use the same data in our new analysis. 

Finally, we show how to build a machine learning model that predicts if an existing wildfire is going to grow sufficient enough and burn a critical amount of acres of surface in the hours following its appearance.

All the code is available here.

Disclaimer: 

Note that here we are not predicting if a wildfire is about to occur, but we are just focusing on predicting to which extent an existing wildfire is going to become critical in the near future. Thus, making it possible to mobilize the appropriate resources to contain them.

Overview

In this post we will go through:

  • The context of our analysis
  • How we collect the data
  • Our machine learning approach: how we created a learning base, and how we built and evaluate our predictive model

Context

This post follows a preceding work discussing the recent increase in wildfire activity in California and demonstrates the impact of these wildfires on solar irradiance. This new analysis based on the same datasets focuses on the wildfires in California in the US.

To know more about the occurrence of the wildfires in California and some of their impact, refer to this notebook available in the atoti’s notebook gallery.

Data Collection

As specified in the base post, we focused on data reflecting the solar irradiance in California in the years 2016–2020. In particular, we use the General Horizontal Irradiance (GHI) data in California. GHI is the combination of Direct Normal (adjusted for solar angle) and Diffuse Irradiance (DNI and DHI, respectively). 

Environmental data were collected from the weather stations:

|    | datetime                  | Station           |   ghi |   dni |   wind_speed |   wind_direction |   dhi |   air_temperature |   solar_zenith_angle |
|---:|:--------------------------|:------------------|------:|------:|-------------:|-----------------:|------:|------------------:|---------------------:|
|  0 | 2020-01-01 00:00:00+00:00 | station_000071236 |    96 |   450 |          1.8 |               43 |    28 |              15.3 |                81.25 |
|  1 | 2020-01-01 01:00:00+00:00 | station_000071236 |     0 |     0 |          1.6 |               43 |     0 |              14   |                92.24 |
|  2 | 2020-01-01 02:00:00+00:00 | station_000071236 |     0 |     0 |          1.6 |               45 |     0 |              13.6 |               103.91 |
|  3 | 2020-01-01 03:00:00+00:00 | station_000071236 |     0 |     0 |          1.5 |               48 |     0 |              13.4 |               116.08 |
|  4 | 2020-01-01 04:00:00+00:00 | station_000071236 |     0 |     0 |          1.4 |               55 |     0 |              13.3 |               128.56 |

The location of the weather stations are also provided in the data, as shown in the following image:

|    | Station           |   Latitude |   Longitude |
|---:|:------------------|-----------:|------------:|
|  0 | station_000071236 |      32.65 |     -117.06 |
|  1 | station_000071239 |      32.65 |     -116.94 |
|  2 | station_000071242 |      32.65 |     -116.82 |
|  3 | station_000071245 |      32.65 |     -116.7  |
|  4 | station_000071248 |      32.65 |     -116.58 |

Also, we use the data of the wildfire for the same period, sourced from fire.ca.gov.

Fire data:

|    | Fire                     |   AcresBurned | StartedDate               | EndedDate                 |   StartedMonth |
|---:|:-------------------------|--------------:|:--------------------------|:--------------------------|---------------:|
|  0 | 2016-10-10Creek Fire     |            65 | 2016-10-10 00:00:00+00:00 | 2016-10-12 00:00:00+00:00 |             10 |
|  1 | 2016-04-24Taglio Fire    |            30 | 2016-04-24 00:00:00+00:00 | 2016-04-24 00:00:00+00:00 |             04 |
|  2 | 2016-05-30Tulloch Fire   |            85 | 2016-05-30 00:00:00+00:00 | 2016-06-01 00:00:00+00:00 |             05 |
|  3 | 2016-05-22Metz Fire      |          3876 | 2016-05-22 00:00:00+00:00 | 2016-05-25 00:00:00+00:00 |             05 |
|  4 | 2016-05-23Wheatland Fire |           156 | 2016-05-23 00:00:00+00:00 | 2016-05-25 00:00:00+00:00 |             05 |

Fire location:

|    | Fire                     |   Latitude |   Longitude |
|---:|:-------------------------|-----------:|------------:|
|  0 | 2016-10-10Creek Fire     |    38.4096 |    -122.432 |
|  1 | 2016-04-24Taglio Fire    |    37.2171 |    -121.08  |
|  2 | 2016-05-30Tulloch Fire   |    37.9276 |    -120.529 |
|  3 | 2016-05-22Metz Fire      |    36.3812 |    -121.201 |
|  4 | 2016-05-23Wheatland Fire |    34.276  |    -118.354 |

The tables in the images here above show that there are 1362 fires and 2875 stations to be considered.

All the information about the data and the way they were processed are given in this notebook.

Machine Learning Approach

Our approach consists of using the environmental data collected in different stations to monitor the evolution of existing fires. These data comprise the solar irradiance information hour per hour and also the information about the temperature of the air and the wind (direction and speed).

Our model uses the following parameters:

  • Descriptors: GHI, DNI, DHI, wind speed, wind direction, air temperature, solar zenith angle;
  • Target: fire category or criticality (-1: fire is not critical; +1: critical)

The idea is to analyze the time series defined by these descriptors and use them to predict the time series of the target. 

Creation of the Learning Base

For this analysis, we have created a learning base from scratch. This base reconciles the station and the fire data.

Selection of the Fires and Stations

We decided to filter the stations’ data and retain only the ones that are located “close” to where fires happen. In other words, for each fire studied, only data collected from the stations within a certain distance are considered. Here, we consider a limit area of 10 kilometers.

Also, we decided to retain fires with at least two stations within the surrounding 10 kilometers. Over the 1362 fires in the dataset, only 1215 fulfilled this criterion and were retained. For each fire retained, we consider only the two closest stations.

Furthermore, we filter the data by considering only fires that lasted less than 120 days. As we can see in the following images, this limit corresponds to about half of the fires:

Distribution of the duration of the fires in days:

fire_duration = list(fire_data['EndedDate'].apply(pd.to_datetime) - fire_data['StartedDate'].apply(pd.to_datetime))
fire_duration = [duration.days for duration in fire_duration]

values = list(np.quantile(fire_duration, quantiles))

print('Distribution of the durations of the fires (in days):\n')
for quantile, value in zip(quantiles, values):
    print(f'Quantile {int(quantile*100)}%: {int(value)}\n')
Distribution of the durations of the fires (in days):

Quantile 0%: 0

Quantile 10%: 1

Quantile 20%: 2

Quantile 30%: 5

Quantile 40%: 21

Quantile 50%: 89

Quantile 60%: 136

Quantile 70%: 171

Quantile 80%: 194

Quantile 90%: 223

Quantile 100%: 1473

Ultimately, the perimeter of the analysis comprises the data for only 666 fires and data of their associated 2 closest stations limited to the period when the fires happened.

Creation of the Target

In the original fire dataset, the following information is provided:

  • The location: latitude and longitude
  • The start and end dates of the fires
  • The number of acres burnt

However, these data do not give, for example, the evolution of the number of acres burnt hour per hour. They just provide the final duration of the fires and the total number of acres burnt. So, for a given fire, we just know if it has been critical or not, but we do not have the information of the moment when it became critical. Which is a major limitation of our analysis.

So we created the required granular data by interpolating the number of acres burnt between 0 and the final number between the start and the end dates of the fires. For this interpolation, we consider the wind speed as the main weighting factor for the evolution of the number of acres burnt. This simulates the evolution of the criticality of the fires, hour per hour, and allows us to model it.

Finally, we define the threshold of 10,000 acres for identifying critical fires. This threshold corresponds to the 95th quantile of the distribution of the number of acres burnt, which is equivalent to considering the top-5 percent of the fires as being critical, and the remaining fires as being non-critical.

The following table images show the distribution of the number of acres burnt:

Distribution of the number of acres burnt:

accres_burnt = list(fire_data['AcresBurned'])
quantiles = list(np.arange(0, 1.01, 0.01))
values = list(np.quantile(accres_burnt, quantiles))

print('Distribution of the acres burnt:\n')
for quantile, value in zip(quantiles, values):
    print(f'Quantile {int(quantile*100)}%: {int(value)}\n')
Distribution of the acres burnt:

Quantile 0%: 2

Quantile 1%: 10

Quantile 2%: 10

Quantile 3%: 10

Quantile 4%: 11

Quantile 5%: 12

...

Quantile 90%: 2949

Quantile 91%: 3875

Quantile 92%: 4746

Quantile 93%: 5737

Quantile 94%: 7519

Quantile 95%: 10281

Quantile 96%: 18535

Quantile 97%: 31561

Quantile 98%: 47801

Quantile 99%: 83445

Quantile 100%: 1032648

Finally, we obtain the augmented dataset that we use for learning the evolution of the criticality of the fires.

Augmented dataset:

|    | fire                  | station   | datetime                  |   ghi |   dni |   wind_speed |   wind_direction |   dhi |   air_temperature |   solar_zenith_angle |   acres_burnt |   category |
|---:|:----------------------|:----------|:--------------------------|------:|------:|-------------:|-----------------:|------:|------------------:|---------------------:|--------------:|-----------:|
|  0 | 2016-04-24Taglio Fire | station_1 | 2016-04-24 13:00:00+00:00 |     0 |     0 |          1.3 |            315.3 |     0 |                 6 |                94.69 |      0.422256 |         -1 |
|  1 | 2016-04-24Taglio Fire | station_1 | 2016-04-24 14:00:00+00:00 |    73 |   282 |          2   |            311.2 |    41 |                 8 |                83.4  |      1.07188  |         -1 |
|  2 | 2016-04-24Taglio Fire | station_1 | 2016-04-24 15:00:00+00:00 |   273 |   615 |          2.3 |            313.8 |    80 |                10 |                71.74 |      1.81895  |         -1 |
|  3 | 2016-04-24Taglio Fire | station_1 | 2016-04-24 16:00:00+00:00 |   488 |   771 |          2.5 |            324.9 |   102 |                13 |                60    |      2.63098  |         -1 |
|  4 | 2016-04-24Taglio Fire | station_1 | 2016-04-24 17:00:00+00:00 |   682 |   858 |          2.5 |            330.1 |   114 |                15 |                48.53 |      3.44301  |         -1 |

Modeling

Our problem consists in predicting the evolution of the criticality of the fires. At each time step, i.e. each hour, we predict if there is a high chance that the considered fire will become critical, i.e. it will have burnt 10,000 acres or more since it started, within the next 6 hours.

We chose arbitrarily the horizon of prediction of 6 hours as being the sufficient minimum delay required for alerting.

Our problem is a time series binary classification: 

  • Class -1: the fire will remain non-critical within the next 6 hours;
  • Class +1: the fire will burn a critical number of acres in the next 6 hours.

We applied the following steps for machine learning. We:

  • Rolled the time series of the descriptor variables;
  • Extracted features from these rolled time series;
  • Apply a classification algorithm to these feature-engineered data.

Feature Engineering

Unlike the common approaches consisting in the use of models dedicated to time series forecasting as Vector AutoRegression (VAR) or Recurrent Neural Networks (RNNs), here we use a different approach: we summarize the time series of the descriptor variables within the last 4–12 hours using calculated synthetic features, and then we apply a predicting model to these extracted features to predict what will be the criticality of the fires in the next 6 hours.

We use the library tsfresh to prepare the data and extract the time series synthetic features. tsfresh is a python package that automatically calculates a large number of time series characteristics — the so-called features — by combining established algorithms from statistics, time-series analysis, signal processing, and nonlinear dynamics with a robust feature selection algorithm.

The following image shows the rolled dataset:

df_rolled_train = roll_time_series(data_train,
                                   column_id="fire",
                                   column_sort="time_step",
                                   rolling_direction=1,
                                   max_timeshift=11,
                                   min_timeshift=3,
                                   n_jobs=num_cpus)
|        | fire                  |   time_step |   ghi_station_1 |   dni_station_1 |   wind_speed_station_1 |   wind_direction_station_1 |   dhi_station_1 |   air_temperature_station_1 |   solar_zenith_angle_station_1 |   ghi_station_2 |   dni_station_2 |   wind_speed_station_2 |   wind_direction_station_2 |   dhi_station_2 |   air_temperature_station_2 |   solar_zenith_angle_station_2 |   acres_burnt |   duration_in_hours |   category | id                           |
|-------:|:----------------------|------------:|----------------:|----------------:|-----------------------:|---------------------------:|----------------:|----------------------------:|-------------------------------:|----------------:|----------------:|-----------------------:|---------------------------:|----------------:|----------------------------:|-------------------------------:|--------------:|--------------------:|-----------:|:-----------------------------|
|      0 | 2016-04-24Taglio Fire |           1 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           6 |                          94.69 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           8 |                          94.6  |      0.422256 |                   1 |         -1 | ('2016-04-24Taglio Fire', 4) |
|      1 | 2016-04-24Taglio Fire |           2 |              73 |             282 |                    2   |                      311.2 |              41 |                           8 |                          83.4  |              72 |             248 |                    2   |                      311.2 |              43 |                          10 |                          83.31 |      1.07188  |                   2 |         -1 | ('2016-04-24Taglio Fire', 4) |
|      2 | 2016-04-24Taglio Fire |           3 |             273 |             615 |                    2.3 |                      313.8 |              80 |                          10 |                          71.74 |             268 |             578 |                    2.3 |                      313.8 |              86 |                          12 |                          71.65 |      1.81895  |                   3 |         -1 | ('2016-04-24Taglio Fire', 4) |
|      3 | 2016-04-24Taglio Fire |           4 |             488 |             771 |                    2.5 |                      324.9 |             102 |                          13 |                          60    |             481 |             740 |                    2.5 |                      324.9 |             109 |                          15 |                          59.91 |      2.63098  |                   4 |         -1 | ('2016-04-24Taglio Fire', 4) |
|   2264 | 2016-04-24Taglio Fire |           1 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           6 |                          94.69 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           8 |                          94.6  |      0.422256 |                   1 |         -1 | ('2016-04-24Taglio Fire', 5) |
|   2265 | 2016-04-24Taglio Fire |           2 |              73 |             282 |                    2   |                      311.2 |              41 |                           8 |                          83.4  |              72 |             248 |                    2   |                      311.2 |              43 |                          10 |                          83.31 |      1.07188  |                   2 |         -1 | ('2016-04-24Taglio Fire', 5) |
|   2266 | 2016-04-24Taglio Fire |           3 |             273 |             615 |                    2.3 |                      313.8 |              80 |                          10 |                          71.74 |             268 |             578 |                    2.3 |                      313.8 |              86 |                          12 |                          71.65 |      1.81895  |                   3 |         -1 | ('2016-04-24Taglio Fire', 5) |
|   2267 | 2016-04-24Taglio Fire |           4 |             488 |             771 |                    2.5 |                      324.9 |             102 |                          13 |                          60    |             481 |             740 |                    2.5 |                      324.9 |             109 |                          15 |                          59.91 |      2.63098  |                   4 |         -1 | ('2016-04-24Taglio Fire', 5) |
|   2268 | 2016-04-24Taglio Fire |           5 |             682 |             858 |                    2.5 |                      330.1 |             114 |                          15 |                          48.53 |             674 |             833 |                    2.5 |                      330.1 |             122 |                          17 |                          48.44 |      3.44301  |                   5 |         -1 | ('2016-04-24Taglio Fire', 5) |
| 350286 | 2016-04-24Taglio Fire |           1 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           6 |                          94.69 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           8 |                          94.6  |      0.422256 |                   1 |         -1 | ('2016-04-24Taglio Fire', 6) |
| 350287 | 2016-04-24Taglio Fire |           2 |              73 |             282 |                    2   |                      311.2 |              41 |                           8 |                          83.4  |              72 |             248 |                    2   |                      311.2 |              43 |                          10 |                          83.31 |      1.07188  |                   2 |         -1 | ('2016-04-24Taglio Fire', 6) |
| 350288 | 2016-04-24Taglio Fire |           3 |             273 |             615 |                    2.3 |                      313.8 |              80 |                          10 |                          71.74 |             268 |             578 |                    2.3 |                      313.8 |              86 |                          12 |                          71.65 |      1.81895  |                   3 |         -1 | ('2016-04-24Taglio Fire', 6) |
| 350289 | 2016-04-24Taglio Fire |           4 |             488 |             771 |                    2.5 |                      324.9 |             102 |                          13 |                          60    |             481 |             740 |                    2.5 |                      324.9 |             109 |                          15 |                          59.91 |      2.63098  |                   4 |         -1 | ('2016-04-24Taglio Fire', 6) |
| 350290 | 2016-04-24Taglio Fire |           5 |             682 |             858 |                    2.5 |                      330.1 |             114 |                          15 |                          48.53 |             674 |             833 |                    2.5 |                      330.1 |             122 |                          17 |                          48.44 |      3.44301  |                   5 |         -1 | ('2016-04-24Taglio Fire', 6) |
| 350291 | 2016-04-24Taglio Fire |           6 |             835 |             905 |                    2.1 |                      323.3 |             121 |                          16 |                          37.91 |             827 |             883 |                    2.1 |                      323.3 |             129 |                          18 |                          37.83 |      4.12511  |                   6 |         -1 | ('2016-04-24Taglio Fire', 6) |
| 353682 | 2016-04-24Taglio Fire |           1 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           6 |                          94.69 |               0 |               0 |                    1.3 |                      315.3 |               0 |                           8 |                          94.6  |      0.422256 |                   1 |         -1 | ('2016-04-24Taglio Fire', 7) |
| 353683 | 2016-04-24Taglio Fire |           2 |              73 |             282 |                    2   |                      311.2 |              41 |                           8 |                          83.4  |              72 |             248 |                    2   |                      311.2 |              43 |                          10 |                          83.31 |      1.07188  |                   2 |         -1 | ('2016-04-24Taglio Fire', 7) |
| 353684 | 2016-04-24Taglio Fire |           3 |             273 |             615 |                    2.3 |                      313.8 |              80 |                          10 |                          71.74 |             268 |             578 |                    2.3 |                      313.8 |              86 |                          12 |                          71.65 |      1.81895  |                   3 |         -1 | ('2016-04-24Taglio Fire', 7) |
| 353685 | 2016-04-24Taglio Fire |           4 |             488 |             771 |                    2.5 |                      324.9 |             102 |                          13 |                          60    |             481 |             740 |                    2.5 |                      324.9 |             109 |                          15 |                          59.91 |      2.63098  |                   4 |         -1 | ('2016-04-24Taglio Fire', 7) |
| 353686 | 2016-04-24Taglio Fire |           5 |             682 |             858 |                    2.5 |                      330.1 |             114 |                          15 |                          48.53 |             674 |             833 |                    2.5 |                      330.1 |             122 |                          17 |                          48.44 |      3.44301  |                   5 |         -1 | ('2016-04-24Taglio Fire', 7) |

And the following shows the dataset containing extracted and filtered features:

X_train = extract_features(df_rolled_train[X_cols],
                           column_id='id',
                           column_sort='time_step',
                           default_fc_parameters=ComprehensiveFCParameters(),
                           impute_function=impute,
                           n_jobs=num_cpus)

Filter the synthetic features using tsfresh:

X_train_filtered = select_features(X_train, train_target)
|                               |   duration_in_hours__quantile__q_0.8 |   duration_in_hours__cwt_coefficients__coeff_10__w_2__widths_(2, 5, 10, 20) |   duration_in_hours__cwt_coefficients__coeff_9__w_20__widths_(2, 5, 10, 20) |   duration_in_hours__cwt_coefficients__coeff_9__w_10__widths_(2, 5, 10, 20) |   duration_in_hours__cwt_coefficients__coeff_9__w_5__widths_(2, 5, 10, 20) |   dni_station_2__agg_linear_trend__attr_"rvalue"__chunk_len_5__f_agg_"min" |   ghi_station_2__agg_linear_trend__attr_"rvalue"__chunk_len_5__f_agg_"mean" |   dhi_station_1__friedrich_coefficients__coeff_2__m_3__r_30 |   solar_zenith_angle_station_2__lempel_ziv_complexity__bins_100 |   target |
|:------------------------------|-------------------------------------:|----------------------------------------------------------------------------:|----------------------------------------------------------------------------:|----------------------------------------------------------------------------:|---------------------------------------------------------------------------:|---------------------------------------------------------------------------:|----------------------------------------------------------------------------:|------------------------------------------------------------:|----------------------------------------------------------------:|---------:|
| ('2016-04-24Taglio Fire', 4)  |                                  3.4 |                                                                   136.439   |                                                                   275.314   |                                                                    353.331  |                                                                   346.907  |                                                                   0        |                                                                  -0.0462345 |                                                   0.0520217 |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 5)  |                                  4.2 |                                                                   136.439   |                                                                   275.314   |                                                                    353.331  |                                                                   346.907  |                                                                   0        |                                                                  -0.0462345 |                                                   0.295392  |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 6)  |                                  5   |                                                                   136.439   |                                                                   275.314   |                                                                    353.331  |                                                                   346.907  |                                                                   1        |                                                                   1         |                                                   0.277601  |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 7)  |                                  5.8 |                                                                   136.439   |                                                                   275.314   |                                                                    353.331  |                                                                   346.907  |                                                                   1        |                                                                   1         |                                                   0.301687  |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 8)  |                                  6.6 |                                                                   136.439   |                                                                   275.314   |                                                                    353.331  |                                                                   346.907  |                                                                   1        |                                                                   1         |                                                   0.471699  |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 9)  |                                  7.4 |                                                                   136.439   |                                                                   275.314   |                                                                    353.331  |                                                                   346.907  |                                                                   1        |                                                                   1         |                                                   4.3959    |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 10) |                                  8.2 |                                                                   136.439   |                                                                     8.55429 |                                                                     11.4036 |                                                                    12.931  |                                                                   1        |                                                                   1         |                                                  -3.10186   |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 11) |                                  9   |                                                                     6.22255 |                                                                    10.6203  |                                                                     14.0649 |                                                                    15.6446 |                                                                   0.995945 |                                                                   0.701654  |                                                   0.394956  |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 12) |                                  9.8 |                                                                    11.1295  |                                                                    13.6332  |                                                                     17.9682 |                                                                    19.5456 |                                                                   0.988508 |                                                                   0.559556  |                                                   1.34601   |                                                               1 |       -1 |
| ('2016-04-24Taglio Fire', 13) |                                 10.8 |                                                                    11.9276  |                                                                    15.3215  |                                                                     20.1318 |                                                                    21.6576 |                                                                   0.988938 |                                                                  -0.126108  |                                                   3.06305   |                                                               1 |       -1 |

Using tsfresh we extracted more than 11,000 features. By using the select_features function from this library, we ended up with more than 1,100 features. In the table here above, we just show some of the features extracted.

For more details on how we roll the dataset and extract the synthetic features, check this notebook, and this other notebook, respectively.

Creation and evaluation of the classifier

Once we have extracted synthetic data from the time series, we can now use them to predict the evolution of the fires’ criticality.

We decided to use the Orthogonal Partial Least Squares Discriminant Analysis (OPLS-DA) to solve our classification problem. OPLS-DA was introduced as an improvement of the PLS-DA approach to discriminate two or more groups (classes) using multivariate data. In OPLS-DA, a regression model is constructed between the multivariate data and a response variable that only contains class information. The obvious advantage of OPLS-DA compared with PLS-DA is that a single component serves as a descriptor for the class, while the other components describe the variation orthogonal to the first predictive component.

To perform OPLS-DA, we use the Python library pyopls. This package provides a scikit-learn-style transformer to perform OPLS. OPLS is a pre-processing method to remove variation from the descriptor variables that are orthogonal to the target variable (1).

We use cross-validation to determine the optimal number of components to be used with the OPLS model, and then we use this number to fit the final OPLS model.

# Remove the variation from the descriptor variables
ncomp = 50
opls = OPLS(ncomp)
Z_train = opls.fit_transform(X_train, train_target)

# Perform a 1-component PLS regression
pls = PLSRegression(1)
pls.fit(Z_train, train_target)

# Transform the test data and predict the fires criticality
# Using the fitted OPLS and PLS models respectively
Z_test = opls.transform(X_test)
y_pred_test = pls.predict(Z_test)
fpr_test, tpr_test, thresholds_test = roc_curve(y_test, y_pred_test)
roc_auc_test = roc_auc_score(y_test_6h, y_pred_test)

# Plot the ROC curve and display the AUC
plt.figure(figsize=(20, 10))
plt.plot(fpr_test, tpr_test, lw=2, color='red', label=f'Test (AUC={roc_auc_test:.4f})')
plt.plot([0, 1], [0, 1], color='gray', lw=2, linestyle='--')
plt.xlabel('False Positive Rate')
plt.ylabel('True Positive Rate')
plt.title('ROC Curve')
plt.legend(loc='lower right')
plt.show()

We display the ROC Curve and the calculated AUC:

ROC Curve and AUC.

The displayed results show an AUC of 0.65, which means that there is little information in the descriptors wrt to the target variable. This leads to a weak predictive power of the model.

This is certainly due to the approximation in the creation of the target variable that seems to be a little bit too inaccurate to reach satisfactory results regarding our initial objective.

Conclusion

In this post, we have seen:

  • How to conciliate different data sources and augment the data to create a learning base;
  • How to perform feature engineering based on time series using Python;
  • How to prototype and evaluate a simple model for the prediction of a time series.

Although our analysis is not very precise, it is nevertheless encouraging and could open the door to a new approach towards the protection against wildfire by providing a new way of monitoring the evolution of the fires based on environmental data.

Finally, we could improve the model by collecting historical data on the evolution of fires, for example, the number of acres burnt each hour.