import networkx as nx
import matplotlib.pyplot as plt
from networkx.drawing.nx_pydot import graphviz_layout
= nx.DiGraph()
G "Switzerland", "Greater-Zurich")
G.add_edge("Switzerland", "Ticino")
G.add_edge("Greater-Zurich", "Zurich")
G.add_edge("Greater-Zurich", "Winterthur")
G.add_edge("Ticino", "Lugano")
G.add_edge("Ticino", "Bellinzona")
G.add_edge(
= {
pos "Switzerland": (4, 1),
"Greater-Zurich": (6, 0.5),
"Zurich": (5, 0),
"Winterthur": (7, 0),
"Ticino": (2, 0.5),
"Lugano": (1, 0),
"Bellinzona": (3, 0),
}= {
options "node_color": "white",
"edgecolors": "black",
"linewidths": 0,
"width": 2,
}
= plt.figure(figsize=(6, 3))
_ =pos, **options)
nx.draw_networkx(G, pos"off")
plt.axis( plt.show()
Probabilistic forecast reconciliation
In this case study we implement and test methods for probabilistic forecast reconciliation based on the recent papers by Panagiotelis et al. (2022) and Zambon, Azzimonti, and Corani (2022).
Forecast reconciliation is often found in hierarchical time series analysis scenarios, i.e., where time series can be (dis-)aggregated linearly by various attributes in a hierarchical way. For instance, consider a scenario in retail forecasting where we are interested in predicting the number of sold books per day. Forecasts are often required for all levels of a hierarchy, i.e., for the time series on the very bottom (e.g., cities), the inner nodes of a hierarchy (e.g., regions, counties) and the root node of the hierarchy (e.g., country) and it is natural to want the forecasts to add up in the same way as the data. For example, forecasts of sales per city should add up to forecasts of sales per canton, etc. In this case the hierarchy is induced by the granularity of the location as exemplified by the tree below:
Propabilistic forecast reconciliation aims at generating coherent probabilistic forecasts over all levels of the hierarchy.
Before we implement the two methods introduced in Panagiotelis et al. (2022) and Zambon, Azzimonti, and Corani (2022), we briefly introduce probabilistic reconciliation. We then implement and test the two methods where we use GPJax
to produce forecasts with Gaussian processes. I bundled the code for this case study as a Python package called reconcile
which can be found here.
import distrax
import gpjax as gpx
import jax
from jax import numpy as jnp
from jax import random
import numpy as np
import optax
import pandas as pd
import pathlib
from reconcile.forecast import Forecaster
from reconcile.grouping import Grouping
from reconcile.probabilistic_reconciliation import ProbabilisticReconciliation
import matplotlib.pyplot as plt
import seaborn as sns
import palettes
from jax.config import config
"jax_enable_x64", True) config.update(
<frozen importlib._bootstrap>:228: RuntimeWarning: scipy._lib.messagestream.MessageStream size changed, may indicate binary incompatibility. Expected 56 from C header, got 64 from PyObject
Probabilistic reconciliation
Denote with \(\mathbf{b}^t \in \mathbb{R}^P\) a vector of observations of “bottom-level” time series at time \(t\) and with \(y^t_{n}\) an observation of node \(n\) at time \(t\). For instance, in the example above \(\mathbf{b}^t = \{y^t_{\text{Lugano}}, y^t_{\text{Bellinzona}}, y^t_{\text{Zurich}}, y^t_{\text{Winterthur}} \}\). We can aggregate the bottom level observations such that the parents of nodes are sums of their children, via an summing matrix \(\mathbf{S} \in \{0, 1\}^{Q \times P}\)
\[ \mathbf{y}^t = \mathbf{Sb} \]
where \(\mathbf{S}\) defines the hierarchical structure of the time series. Again, as an example, for the hierarchy above
\[ \mathbf{S} = \begin{pmatrix} \mathbf{A}\\ \mathbf{I}\\ \end{pmatrix} = \begin{pmatrix} 1 & 1 & 1 & 1\\ 1 & 1 & 0 & 0\\ 0 & 0 & 1 & 1\\ 1 & 0 & 0 & 0\\ 0 & 1 & 0 & 0\\ 0 & 0 & 1 & 0\\ 0 & 0 & 0 & 1\\ \end{pmatrix} \]
where \(\mathbf{A}\) is called aggregation matrix, \(\mathbf{I}\) is a diagonal matrix, and the vector \(\mathbf{y}^t =\{y^t_{\text{Switzerland}}, y^t_{\text{Ticino}}, y^t_{\text{Greater-Zurich}}, y^t_{\text{Lugano}},y^t_{\text{Bellinzona}}, y^t_{\text{Zurich}}, y^t_{\text{Winterthur}}\}\) contains all the time series over the hierarchy. When we generate forecasts \(\hat{\mathbf{y}}\) for all of these timeseries, they might not cohere to the linear constraints induced by the hierarchy.
In a probabilistic (e.g., Bayesian) framework forecasts are available in the form of probability distributions. Following Zambon, Azzimonti, and Corani (2022), we denote \(\hat{\nu} \in \mathbb{P}(\mathbb{R}^Q)\) the forecast distribution of \(\hat{\mathbf{y}}\) and call it coherent (\(\tilde{\nu}\)) if \(\text{supp}(\hat{\nu})\) is in a linear subspace \(\mathcal{S}\) of dimension \(P\) of \(\mathbb{R}^Q\).
To find a coherent forecast distribution \(\tilde{\nu}\), Panagiotelis et al. (2022) propose to fit a map \(\psi: \mathbb{R}^Q \rightarrow \mathcal{S}\), such that \(\tilde{\nu} = \psi_{\sharp}\hat{\nu}\). Panagiotelis et al. (2022) use a simple linear transformation \(\psi(\mathbf{y}) = \mathbf{S}(\mathbf{d} + \mathbf{W}\mathbf{y})\) and fit it using an approximate energy score as objective.
Zambon, Azzimonti, and Corani (2022) propose sampling from the reconciled base forecast density \(\tilde{\pi}(\mathbf{b})\) of \(\mathbf{b}\) via \(\tilde{\pi}(\mathbf{b}) \propto \pi_U(\mathbf{A}\mathbf{b}) \pi_B(\mathbf{b})\) where \(\mathbf{A}\) is the aggregation matrix, and \(\pi_U\) and \(\pi_B\) are the predictive densities of the upper level time series, i.e., the ones corresponding to the inner nodes and root, and bottom level time series.
For further details, please refer to the original publications.
Data
We will test the two methods on a finance data set, specifically stock index data of some constituents of the SP500 which we take from datahub and Kaggle.
Of the SP500, we somewhat arbitrarily picked some consituents from the health care and information technology industries and selected all entries from 2022 on. In that case the hierarchy consists of two levels, where “Health Care” and “Information Technology” are the parents of the different companies (which are leaves).
def preprocess():
= pd.read_csv("./constituents.csv")
constituents_sp500 = constituents_sp500[
constituents_sp500 == "Health Care") |
(constituents_sp500.Sector == "Information Technology")
(constituents_sp500.Sector
]
= []
dfs for constituent in constituents_sp500.iterrows():
= f"./stocks/{constituent[1].Symbol}.csv"
path if pathlib.Path(path).exists():
= pd.read_csv(path)
df "Sector"] = constituent[1].Sector
df["Name"] = constituent[1].Name
df[
dfs.append(df)= pd.concat(dfs)
df
= [
df_constituents "Illumina Inc", "Johnson & Johnson", "Pfizer Inc.", "Thermo Fisher Scientific", "Merck & Co.",
"Activision Blizzard", "Boston Scientific", "Netflix Inc.", "Microsoft Corp.", "Seagate Technology"
]= df[df.Name.isin(df_constituents)]
df
= pd.to_datetime(df.Date, infer_datetime_format=True)
df.Date = df[df.Date > np.datetime64("2021-12-31")]
df
return df
= preprocess() df
Let’s have a look at the different stock time series.
"Standardized"] = df.groupby(
df["Sector", "Name"]
["Close"].transform(lambda x: (x - x.mean()) / x.std())
)[= sns.FacetGrid(
g
df,="Sector",
row="Name",
hue=sns.color_palette("crest"),
palette=False,
sharex=False,
sharey=2.5,
height=3,
aspect
)= g.map_dataframe(
_ ="Date", y="Standardized", style="Name",
sns.lineplot, x
)= g.set_axis_labels("", "Standardized index")
_
plt.show()= df.drop(columns="Standardized") df
Before we try start forecasting these, we postprocess the data, e.g., by transforming the dates into numeric values and then extract them from the data frame as arrays.
def postprocess(df):
"Series"] = df["Sector"] + ":" + df["Name"]
df[= df[["Date", "Series", "Close"]]
df = df.rename(columns={"Close": "Value"})
df = df.pivot(index="Date", columns='Series', values='Value').reset_index()
df = df.dropna()
df 0, "x", (df.Date - np.min(df.Date)) / (np.max(df.Date) - np.min(df.Date)))
df.insert(return df
def get_bottom_level_timeseries(df):
= df.x.values.reshape(1, 1, -1)
x = df.drop(columns=["Date", "x"]).values.T
b = b.reshape(1, *b.shape)
b return b, x
= postprocess(df)
D = get_bottom_level_timeseries(D) b, x
Using reconcile
we can define a hierarchy (or more generally a grouping) using the Grouping
class. To do so, we first create a “:”-separated string for every bottom level time series that specifies the path of the time series to the root of the hierarchy.
= list(D.drop(columns=["Date", "x"]).columns)
hierarchy hierarchy
['Health Care:Boston Scientific',
'Health Care:Illumina Inc',
'Health Care:Johnson & Johnson',
'Health Care:Merck & Co.',
'Health Care:Pfizer Inc.',
'Health Care:Thermo Fisher Scientific',
'Information Technology:Activision Blizzard',
'Information Technology:Microsoft Corp.',
'Information Technology:Netflix Inc.',
'Information Technology:Seagate Technology']
Since in our example, the hierarchy is very flat (only one inner level), the strings are only separated by one colon. For instance “Boston Scientific” is in the health care sector, hence we create an entry “Health Care:Boston Scientific”. We then create a grouping using the list of strings.
= pd.DataFrame({"hierarchy": hierarchy})
groups = Grouping(groups) grouping
Forecasts
Before we can probabilistically reconcile forecasts, we first need to generate probabilistic forecasts (yupp), or rather predictive distributions. We use GPJax
’s Gaussian process implementations for this. Specifically, we fit a Gaussian process to every time series separately and then, in a post-processing step, reconcile those forecasts.
GPs are arguably not the best model here, but for the sake of demonstrating the two reconciliation methods it is fine, since we don’t aim to make perfectly accurate predictions here.
We first get all the time series, i.e, bottom and upper level time series, from the Grouping
object.
= grouping.all_timeseries(b)
all_timeseries = jnp.tile(x, [1, all_timeseries.shape[1], 1]) all_features
We then define a forecasting class. The class needs to inherit from Forecaster
which is implemented in the Python package we developed for this case study. The class needs to provide a fit
method that fits all of the time series, a posterior_predictive
method which returns a posterior predictive distribution of all time series as distrax
object, and a method predictive_posterior_probability
that computes the probabilty of observing some event under the posterior predictive.
class GPForecaster(Forecaster):
"""Example implementation of a forecaster"""
def __init__(self):
self._models = []
self._xs = None
self._ys = None
@property
def data(self):
"""Returns the data"""
return self._ys, self._xs
def fit(self, rng_key, ys, xs):
"""Fit a model to each of the time series"""
self._xs = xs
self._ys = ys
= xs.shape[1]
p self._models = [None] * p
for i in np.arange(p):
= xs[:, [i], :], ys[:, [i], :]
x, y # fit a model for each time series
= self._fit_one(rng_key, x, y)
learned_params, D # save the learned parameters and the original data
self._models[i] = learned_params, D
def _fit_one(self, rng_key, x, y):
# here we use GPs to model the time series
= gpx.Dataset(X=x.reshape(-1, 1), y=y.reshape(-1, 1))
D = self._model(rng_key, D.n)
posterior, likelihood
= gpx.initialise(posterior, key=rng_key)
parameter_state = jax.jit(posterior.marginal_log_likelihood(D, negative=True))
mll
= optax.adam(learning_rate=5e-2)
optimiser = gpx.fit(mll, parameter_state, optimiser, n_iters=2000, verbose=False)
inference_state = inference_state.unpack()
learned_params, _
return learned_params, D
@staticmethod
def _model(rng_key, n):
= gpx.Dataset(X=x[0, 0].reshape(-1, 1), y=b[0, 0].reshape(-1, 1))
D = gpx.RBF() + gpx.Matern32()
kernel = gpx.Prior(mean_function=gpx.Constant(), kernel=kernel)
prior = gpx.Gaussian(num_datapoints=D.n)
likelihood = prior * likelihood
posterior
return posterior, likelihood
def posterior_predictive(self, rng_key, xs_test):
"""Compute the joint
posterior predictive distribution of all timeseries at xs_test"""
= xs_test.shape[1]
q = [None] * q
means = [None] * q
covs for i in np.arange(q):
= xs_test[:, [i], :].reshape(-1, 1)
x_test = self._models[i]
learned_params, D = self._model(rng_key, D.n)
posterior, likelihood = posterior(learned_params, D)(x_test)
latent_distribution = likelihood(learned_params, latent_distribution)
predictive_dist = predictive_dist.mean()
means[i] = jnp.linalg.cholesky(predictive_dist.covariance_matrix)
cov = cov.reshape((1, *cov.shape))
covs[i]
# here we stack the means and covariance functions of all
# GP models we used
= jnp.vstack(means)
means = jnp.vstack(covs)
covs
# here we use a single distrax distribution to model the predictive
# posterior of _all_ models
= distrax.MultivariateNormalTri(means, covs)
posterior_predictive return posterior_predictive
def predictive_posterior_probability(
self, rng_key, ys_test, xs_test
):"""Compute the log predictive posterior probability of an observation"""
= self.posterior_predictive(rng_key, xs_test)
preds = preds.log_prob(ys_test)
lp return lp
The time series have a total length of 204 of which we take the first 200 for training.
all_timeseries.shape
(1, 13, 204)
= 200 idx_train
We then fit the GPForecaster
to the time series.
= GPForecaster()
forecaster
forecaster.fit(1), all_timeseries[:, :, :idx_train], all_features[:, :, :idx_train]
random.PRNGKey( )
/Users/simon/miniconda3/envs/etudes/lib/python3.9/site-packages/gpjax/parameters.py:180: UserWarning: Parameter constant has no transform. Defaulting to identity transfom.
warnings.warn(
As baseline to the two reconciliation methods, we also sample from the posterior predictive of the base forecasts.
= forecaster.posterior_predictive(
y_predictive 1), all_features
random.PRNGKey(
)= y_predictive.sample(
y_predictive_samples =random.PRNGKey(2), sample_shape=(1000,)
seed
)= jnp.mean(y_predictive_samples, axis=0, keepdims=True)[:, 3:, :] b_predictive_mean
= pd.DataFrame(b_predictive_mean[0].T, columns=hierarchy)
D_pred 0, "Data type", "Predicted")
D_pred.insert(1, "Date", D.Date) D_pred.insert(
Next we use a method similar to Panagiotelis et al. (2022) to reconcile the base forecasts.
= ProbabilisticReconciliation(grouping, forecaster)
recon = recon.fit_reconciled_posterior_predictive(
b_pred_fit_recon 1), all_features, n_samples=1000, n_iter=1000,
random.PRNGKey(
)= jnp.mean(b_pred_fit_recon, axis=0, keepdims=True) b_pred_fit_recon_mean
= pd.DataFrame(b_pred_fit_recon_mean[0].T, columns=hierarchy)
D_fit_recon 0, "Data type", "Reconciled via optimization")
D_fit_recon.insert(1, "Date", D.Date) D_fit_recon.insert(
We use the MCMC-based method in Zambon, Azzimonti, and Corani (2022) for sampling-based reconciliation.
= recon.sample_reconciled_posterior_predictive(
b_pred_sample_recon 1), all_features
random.PRNGKey(
)= jnp.mean(
b_pred_sample_recon_mean =1, keepdims=False), axis=0, keepdims=True
jnp.mean(b_pred_sample_recon, axis )
= pd.DataFrame(b_pred_sample_recon_mean[0].T, columns=hierarchy)
D_sample_recon 0, "Data type", "Reconciled via sampling")
D_sample_recon.insert(1, "Date", D.Date) D_sample_recon.insert(
Let’s visualize the results. We first aggregate all forecasts in a data frame and then plot the original data, the base forecasts and the two recconciled forecasts.
= D.drop(columns="x")
D_all 0, "Data type", "Real")
D_all.insert(= pd.concat([D_all, D_pred, D_sample_recon, D_fit_recon], axis=0)
D_all = D_all.melt(id_vars=["Date", "Data type"])
D_all 'Sector','Name']] = D_all.variable.str.split(":",expand=True) D_all[[
= sns.FacetGrid(
g
D_all,="Name",
col=2,
col_wrap=False,
sharex=False,
sharey=1.75,
height=2
aspect
)= g.map_dataframe(
_ ="Date", y="value", style="Data type", color="black"
sns.lineplot, x
)= g.tight_layout()
_ = g.set_axis_labels("", "Index")
_ = g.set(xticklabels=[])
_ = g.add_legend()
_ plt.show()
The results look decent for some of the time series and even worse than the base forecasts for others. Given that the time series don’t present clear trends or saisonalities, the task was arguably a very difficult one though.
Conclusion
This case study implemented and tested two methods based on recent papers on probabilstic forecast reconciliation. We evaluated the methods (superficially) on stock-index time series data of the SP500 with mixed results. Stock-index data is notoriously hard to forecast due to their (apparent) stationarity, and the evaluation was consequently not very meaningful, but rather intended to understand probabilistic reconciliation methods better.
The methods demonstrated above are implemented in a Python package called reconcile
which can be found on PyPI and GitHub.
Session info
import session_info
=False) session_info.show(html
-----
distrax 0.1.2
gpjax 0.5.1
jax 0.3.25
jaxlib 0.3.25
matplotlib 3.6.2
networkx 2.8.6
numpy 1.23.4
optax 0.1.3
palettes NA
pandas 1.5.1
reconcile NA
seaborn 0.11.2
session_info 1.0.0
-----
IPython 8.4.0
jupyter_client 7.3.4
jupyter_core 4.10.0
jupyterlab 3.3.4
notebook 6.4.12
-----
Python 3.9.10 | packaged by conda-forge | (main, Feb 1 2022, 21:27:43) [Clang 11.1.0 ]
macOS-12.6-arm64-i386-64bit
-----
Session information updated at 2022-11-20 12:33
License
The case study is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.