Multivariate time series forecasting

A multivariate time series has more than one time-dependent variable. Each variable depends not only on its past values but also has some dependency on other variables. This dependency is used for forecasting future values.

Considering our example : our dataset includes temperature, humidity, light, pir, and sound for one year from the 01/01/2019 every 60 minutes. In this case, there are multiple variables to be considered to optimally predict temperature.

The work carried out is presented in the notebook collab under this link.

1. Data preparation

1.1. Data extraction

To effectively predict temperature, we considered three features : temperature, humidity, and light.

As part of out internship, our study focused on the node \(28\). We then selected two months of data (from June 20, 2019 at 9AM to August 19, 2019 at 09AM) that do not contain any missing data during these two months on node \(28\).

As explained in the univariate part, the dataset contains 1441 rows, the first 1200 rows of the data will be the training dataset (TRAIN_SPLIT = 1200), and there remaining will be the validation dataset.

1.2. Data scaling

This step is also explaining in the univariate part and in the colab notebook (the documentation on scaling is available on this link ).

As part of our study, we used the standardization. The associated code is as follows :

dataset = features.values
data_mean = dataset[:TRAIN_SPLIT].mean(axis=0)
data_std = dataset[:TRAIN_SPLIT].std(axis=0)

dataset = (dataset-data_mean)/data_std

1.3. Data modeling

The below function performs the same windowing task as the univariate single-step model, however, here it samples the past observation based on the step size given.

def multivariate_data(dataset, target, start_index, end_index, history_size,
                      target_size, step, single_step=False):
  data = []
  labels = []

  start_index = start_index + history_size
  if end_index is None:
    end_index = len(dataset) - target_size

  for i in range(start_index, end_index):
    indices = range(i-history_size, i, step)
    data.append(dataset[indices])

    if single_step:
      labels.append(target[i+target_size])
    else:
      labels.append(target[i:i+target_size])

  return np.array(data), np.array(labels)

2. Single-step model

In a single step setup, the model learns to predict a single point in the future based on some history provided.

2.1. Data separation

Data from the last five (5) days were shown, i.e. 120 observations were sampled as per the hourly sampling schedule.

past_history = 120
future_target = 0
STEP = 1

x_train_single, y_train_single = multivariate_data(dataset, dataset[:, 1], 0,
                                                   TRAIN_SPLIT, past_history,
                                                   future_target, STEP,
                                                   single_step=True)
x_val_single, y_val_single = multivariate_data(dataset, dataset[:, 1],
                                               TRAIN_SPLIT, None, past_history,
                                               future_target, STEP,
                                               single_step=True)

2.2. Shuffling , batching and caching data

Here, the same batch-size and buffer-size values were used as for the univariate case.

train_data_single = tf.data.Dataset.from_tensor_slices((x_train_single, y_train_single))
train_data_single = train_data_single.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_data_single = tf.data.Dataset.from_tensor_slices((x_val_single, y_val_single))
val_data_single = val_data_single.batch(BATCH_SIZE).repeat()

2.3. Definition of the single-step model

We have created a sequential model such as we have an LSTM input layer composed of 32 neurons, in which we passed the argument input_shape, and a dense output layer.

We used then the compile method, by taking as optimizer the stochastic gradient descent(SGD), and as loss the root mean square error (RMSE) :

single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32,
                                           input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dense(1))

single_step_model.compile(optimizer='SGD', loss=root_mean_squared_error)

2.4. Training the model

By taking the number of epoch \(= 20\), we trained the model as follows :

single_step_history = single_step_model.fit(train_data_single, epochs=EPOCHS,
                                            steps_per_epoch=TRAIN_SPLIT,
                                            validation_data=val_data_single,
                                            validation_steps=50)

The results obtained are shown in the figure below :

Examination of the graph shows that the training loss curve decreases considerably in relation to the validation loss curve. This leads to the training of a bad model, and therefore a bad prediction. This is explained by the phenomenon of overfitting.

2.5. Overfitting problem

Overfitting is one of the main causes of the poor performance of predictive models generated by Machine Learning algorithms. It refers to the fact that the predictive model produced by the Machine Learning algorithm fits well with the Training Set. Therefore, the predictive model will capture all the "aspects" and details that characterize the Training Set data. In this sense, it will capture all the fluctuations and random variations in the Training Set data. In other words, the predictive model will capture the generalizable correlations and the noise produced by the data.

When such an event occurs, the predictive model will be able to make very good predictions about the data in the Training Set (the data it has already "seen" and adapted to), but it will make poor predictions about data it has not yet seen in its learning phase.

It is said that the predictive function does not generalize well. And that the model suffers from overfitting.

To get throw this problem, we used the dropout method. Dropout regularization is a computationally cheap way to regularize a deep neural network.

Dropout works by probabilistically removing, or “dropping out,” inputs to a layer, which may be input variables in the data sample or activations from a previous layer. It has the effect of simulating a large number of networks with very different network structure and, in turn, making nodes in the network generally more robust to the inputs.

The function dropout is implemented in Keras. We then used it during the training of our previous model,but it is not used when evaluating the skill of the model. [Dropout]

single_step_model = tf.keras.models.Sequential()
single_step_model.add(tf.keras.layers.LSTM(32,
                                           input_shape=x_train_single.shape[-2:]))
single_step_model.add(tf.keras.layers.Dropout(.09))
single_step_model.add(tf.keras.layers.Dense(1))

single_step_model.compile(optimizer='SGD', loss=root_mean_squared_error)

We obtained this following plot after training the model using the dropout method :

We can clearly see that the results obtained with the droput are much better than the previous ones. Indeed, the curves representing the training loss and the validation loss are closer together, have the same behavior, and both converge towards a loss value of 0.08.

We then conclude that the model created is well and will be able to efficiently predict the value of the temperature at time \(t\) based on the last 120 observations of temperature, humidity, and light.

2.6. Predict a single-step future

Now that the model is trained, we made a few sample predictions. The model is given the history of three features (temperature, humidity, and light) over the past five days sampled every hour (120 data-points). Since the goal is to predict the temperature, the plot only displays the past temperature. The prediction is made one day into the future :

3. Multi-step model

In a multi-step prediction model, given a past history, the model needs to learn to predict a range of future values. Thus, unlike a single step model, where only a single future point is predicted, a multi-step model predict a sequence of the future.

3.1. Data separation

For the multi-step model, the training data again consists of recordings over the past five days sampled every hour. However, here, the model needs to learn to predict the temperature for the next 3 hours.

future_target = 3

x_train_multi, y_train_multi = multivariate_data(dataset, dataset[:, 1], 0,
                                                 TRAIN_SPLIT, past_history,
                                                 future_target, STEP)
x_val_multi, y_val_multi = multivariate_data(dataset, dataset[:, 1],
                                             TRAIN_SPLIT, None, past_history,
                                             future_target, STEP)

Concerning the code associated with shuffling , batching and caching the data for the multi-step model, this is the same as for the single-step model.

3.2. Definition of the multi-step model

Since the task here is a bit more complicated than the previous task, the model now consists of two LSTM layers (an intermediate layer containing 16 neurons has been added, with the 'relu' function as an activation function). Finally, since 3 predictions are made, the dense layer outputs 3 predictions.

multi_step_model = tf.keras.models.Sequential()
multi_step_model.add(tf.keras.layers.LSTM(32,
                                          return_sequences=True,
                                          input_shape=x_train_multi.shape[-2:]))
multi_step_model.add(tf.keras.layers.LSTM(16, activation='relu'))
multi_step_model.add(tf.keras.layers.Dense(3))

multi_step_model.compile(optimizer='SGD', loss=root_mean_squared_error)

3.3. Training the model

Here we have taken an epoch number of 20, and obtained a loss plot shown below :

3.4. Predict a multi-step future

3.5. Hypertuning

In order to try to improve our model as much as possible, we used the keras tuner library. The Keras Tuner is a library that helps picking the optimal set of hyperparameters for the TensorFlow program. The process of selecting the right set of hyperparameters for the machine learning (ML) application is called hyperparameter tuning or hypertuning.

Hyperparameters are the variables that govern the training process and the topology of an ML model. These variables remain constant over the training process and directly impact the performance of the ML program.[Keras_tuner]

Within the framework of our study, we tried to apply hypertuning with the BayesianOptimization method as an optimization method. It is a sequential design strategy for global optimization of black-box functions that does not assume any functional forms. It is usually employed to optimize expensive-to-evaluate functions.[Bayesian_optimization]

The associated code of this step is presented in the colab notebook.

References

Deep learning for Time series by Jason Brownlee.
Deep Time Series Forecasting with Python An Intuitive Introduction to Deep Learning for Applied Time Series Modeling by N.D.Lewis.
Hands on ML SciKit Learn Tensorflow by Aurélien Géron.
[Dropout] tf.keras.layers.Dropout.
[Keras_tuner] Introduction to the Keras Tuner.
[Bayesian_optimization] Exploring Bayesian Optimization.