Better time series forecasting using expert knowledge

15.2.2019 | 6 minutes reading time

Methods for time series forecasting have become more and more powerful in recent decades, ranging form simple linear models to complex machine learning algorithms. Nevertheless, not only the quality of the forecasts is important, but also their acceptance by the staff. Especially with automatic forecasts, there is the possibility of distrust and incomprehension among long-term dispatchers. Furthermore, long-standing senior employees in many cases have a very good overview of customer behavior, market situation and development, economic conditions and many other important factors. Therefore, it makes sense to include this expert knowledge in the predictions of machine learning algorithms.

The following blogpost will therefore show a way to include expert knowledge in the predictions of arbitrary algorithms (Python sourcecode: Maximum Entropy Example ).

Basic forecast using Facebook Prophet

We start with the famous air passengers time series that shows the monthly totals of international airline passengers between 1949 to 1960 in thousands from which we would like to predict the year 1960:

In order to do this, we use Facebook Prophet with multiplicative seasonality:

The red circles mark the forecasts for May and July 1960, which are visibly off. Fortunately, Facebook Prophet does not only provide us with the point forecasts, but also with associated Markov-Chain-Monte-Carlo samples $y^{\ast}_{i}$ from the posterior predictive distribution of each forecast step. Let’s take a look at the kernel density estimate of the posterior predictive distribution $p_{0}\left(y\right)$ of the forecast for May 1960:

Calculating the integral $\int_{-\infty}^{\infty} y \ p_{0}\left(y\right) dy \approx \frac{1}{n} \sum y^{\ast}_{i}$ yields the point forecast for May, which is $\hat{y}_{\text{May}}= 440 $. In order to improve the forecast, it would be useful if we were able to enrich the posterior predictive distribution by expert views about future events.

Mathematical background

The starting point is the Kullback-Leibler divergence:
$$\text{KL}\left[p,p_{0}\right] = \int_{-\infty}^{\infty} p\left(y\right)\text{log}\frac{p\left(y\right)}{p_{0}\left(y\right)} dy.$$
Given the prior $p_{0}\left(y\right)$, we seek the distribution $p\left(y\right)$ that minimizes the functional $\text{KL}$, given certain constraints. Or, in other words: we are looking for the distribution $p\left(y\right)$ that has some predefined properties and comes as close as possible to our prior knowledge $p_{0}\left(y\right)$. The distribution $p\left(y\right)$ then is the called Maximum Entropy distribution. What could these constraints look like? What could the expert say?

“The probability of 400.000 or fewer passengers for next July in my view is 5%.”
$\Leftrightarrow$
$\int_{-\infty}^{400} p\left(y\right)dy \overset{!}{=} 0.05$

“We have a strong growing economy, so I think with 80% probability we will have between 440.000 and 480.000 passengers.”
$\Leftrightarrow$
$\int_{440}^{480} p\left(y\right)dy \overset{!}{=} 0.8$

“I expect 460.000 passengers.”
$\Leftrightarrow$
$\int_{-\infty}^{\infty} y \ p\left(y\right)dy \overset{!}{=} 460$

Therefore, our constraints k=1,2,..,m are of the form
$$\int_{-\infty}^{\infty} F_{k}\left(y\right) \ p\left(y\right)dy \overset{!}{=} f_{k}.$$

What does $F_{k}\left(y\right)$ mean? This is best understood by inspecting the second and third constraint-example. For the second example, it is
$$F\left(y\right) = \begin{cases}
1, \text{if} \ y \in \left[440,480\right]\\
0, \text{else}
\end{cases}$$
and for the third example, we simply have
$$F\left(y\right) = y.$$

In order to minimize $\text{KL}$ under constraints, the Lagrange multipliers $\boldsymbol{\lambda} = \lambda_{1}, \lambda_{2},…,\lambda_{m}$ have to be introduced. We arrive at the functional:

$$ L\left[p, \boldsymbol{\lambda}\right] = \int_{-\infty}^{\infty} p\left(y\right)\text{log}\frac{p\left(y\right)}{p_{0}\left(y\right)}dy-\lambda_{1}\left(\int_{-\infty}^{\infty} F_{1}\left(y\right) \ p\left(y\right)dy – f_{1}\right)-…-\lambda_{m}\left(\int_{-\infty}^{\infty} F_{m}\left(y\right) \ p\left(y\right)dy – f_{m}\right).$$

The first step is to calculate the derivatives of $L$ with respect to $p$ and $\boldsymbol{\lambda}$ and to set them to zero. Beginning with the functional derivative with respect to $p$, we get

$$\frac{\delta L}{\delta p} = \text{log}\frac{p\left(y\right)}{p_{0}\left(y\right)}+1 – \lambda_{1}F_{1}\left(y\right)-…-\lambda_{m}F_{m}\left(y\right)\overset{!}{=}0.$$ After resolving to $p\left(y\right)$ and normalizing the result, we arrive at the Bolzmann distribution
$$p_{B}\left(y\right) = \frac{1}{Z}p_{0}\left(y\right) e^\left(\lambda_{1}F_{1}\left(y\right)+…+\lambda_{m}F_{m}\left(y\right)\right)$$
with the normalizing constant
$$Z\left(\boldsymbol{\lambda}\right)=\int_{-\infty}^{\infty} p_{0}\left(y\right) e^\left(\lambda_{1}F_{1}\left(y\right)+…+\lambda_{m}F_{m}\left(y\right)\right)dy.$$
The partial derivatives of $L$ with respect to $\boldsymbol{\lambda}$ read
$$\frac{\partial L}{\partial \lambda_{k}} = \int_{-\infty}^{\infty} F_{k}\left(y\right) \ p\left(y\right)dy-f_{k}\overset{!}{=} 0,\ k=1,…,m.$$
As we already have calculated our normalized solution to $p\left(y\right)$, which is $p_{B}\left(y\right)$, we can insert this result into the derivatives:
$$\frac{\partial L}{\partial \lambda_{k}} = \int_{-\infty}^{\infty} F_{k}\left(y\right) \ \underbrace{\frac{1}{Z}p_{0}\left(y\right) e^\left(\lambda_{1}F_{1}\left(y\right)+…+\lambda_{m}F_{m}\left(y\right)\right)}_{p_{B}\left(y\right)}dy-f_{k}\overset{!}{=} 0,\ k=1,…,m.$$
This, however, means nothing more than: $E\left[F_{k}\right]\overset{!}{=}f_{k},\ k=1,…,m$.

We are finally there: we have to find $\boldsymbol{\lambda}$, so that the expected values of the functions $F_{k}$ match the given constraints.

As the number of constraints rises, the numerical solution to the the system of equations becomes increasingly harder to find. Due to the problem of multiple local minima, we refrain from using a gradient-based algorithm and instead use a heuristic algorithm. In our case, it is the particle swarm algorithm (Python package pyswarm).

Improving the forecasts for May and July

In this section we will make up expert assessments for May and July 1960 and show how the forecasts are affected.

The expert assessment for May:

“This May we had 420.000 Passengers and we will definitely not have fewer in May 1960 (probability 1%). Furthermore, given the numbers of the last three years, I am sure that a growth rate compared to this May between 7.5% and 15% is extremely probable (probability 80%). However, an increase of 15% or more compared to this May, in my opinion, is unrealistic (probability 1%).”

This results in the following constraints:

$\int_{-\infty}^{420} p_{B}\left(y\right)dy \overset{!}{=} 0.01$

$\int_{451}^{483} p_{B}\left(y\right)dy \overset{!}{=} 0.8$

$\int_{483}^{\infty} p_{B}\left(y\right)dy \overset{!}{=} 0.01$

The expert assessment for July:

“This July we had 448.000 Passengers. Comparing the Julys of the past five years, we can see that we had an average increase of 50 passengers per year. Due to the good economic situation, I am sure that we will at least regain this growth (probability 80%).” This yields the constraint: $\int_{498}^{\infty} p_{B}\left(y\right)dy \overset{!}{=} 0.8$.

The following two figures show the distributions of the Facebook Prophet forecasts and the associated Maximum Entropy distributions. As can be seen, the expert’s assessments lead to distributions that differ significantly from the prior distributions. Nevertheless, the Maximum Entropy distributions have the smallest possible distance to the priors, while maintaining the given constraints.

In the last figure, the forecasts which result from the Maximum Entropy distributions as well as the Facebook Prophet forecasts are shown. The RMSE of the forecasts of Facebook Prophet is 64.90. Using the Maximum Entropy approach leads to a RMSE of 30.94, which is equal to a reduction of approximately 52%.

This artificial example is intended to show that the inclusion of expert assessments, which in many cases may reflect only gut instincts or common sense, can be useful to improve the forecasts of complex machine learning algorithms. In addition, the inclusion of employee opinions may also increase the general acceptance of forecasts.

References:

Kullback, S., Information Theory and Statistics, John Wiley & Sons, 1959.
Singer, H., Maximum entropy inference for mixed continuous‐discrete variables, International Journal of Intelligent Systems, John Wiley & Sons, 2010.

Was this post helpful?

Blog author

Dominik Ballreich

Do you still have questions? Just send me a message.

fromDominik Ballreich

Can you win the stacking challenge? An example of heuristic optimization

I have come across an interesting optimization problem. The task is to stack the items of a given set of boxes of different sizes, weights, and stabilities onto as few pallets as possible. Moreover, there is a multitude of additional conditions that...

Data
Software development

27.3.2019 | 9 minutes reading time

Dominik Ballreich

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Querying Databricks Delta Tables in Motherduck

Intro In a previous article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog. Building upon that, a next ...

Data

25.4.2025 | 4 [Missing String "readingTime"]

Hendrik Kamp

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 [Missing String "readingTime"]

Daniel Kocot

Miriam Greis

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 5 [Missing String "readingTime"]

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 [Missing String "readingTime"]

Daniel Kocot

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 [Missing String "readingTime"]

Dr. Florian Rademacher

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 [Missing String "readingTime"]

Daniel Kocot

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

The pillars of modern data architectures as success factors for organisations In the digital economy, a well-thought-out data architecture and the efficient use of data are crucial for organisational success. Data products, data contracts and API contracts...

Data
API

13.6.2024 | 7 [Missing String "readingTime"]

Daniel Kocot

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 [Missing String "readingTime"]

Dr. Florian Rademacher

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 [Missing String "readingTime"]

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 [Missing String "readingTime"]

Francesca Diana

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

The Digital Product Passport represents a significant shift for digital units within organisations, compelling them to ensure comprehensive data transparency. This tool not only serves as a product's digital fingerprint but also opens up new dimensions...

Data
Product management

25.1.2024 | 7 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

In the ever-evolving landscape of software development, buzzwords and paradigms come and go. One such term that has gained significant traction in recent years is "API-First Development." It's been hailed as the holy grail of modern software engineering...

API
Data

19.10.2023 | 5 [Missing String "readingTime"]

Daniel Kocot

An introduction to federated learning in an industrial context: Advanced

In the Machine Learning space, it was long believed that sharing learnings or weights was safe in the sense that the input data couldn't be extracted. However, this belief has been challenged by researchers coming out over the years. Nowadays, numerous...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 9 [Missing String "readingTime"]

An introduction to federated learning in an industrial context: Fundamentals

With the help of data, companies are able to make more informed decisions, optimize their workflows and gain an edge in the competitive world of business using the power of Machine Learning (ML). However, handling data has become increasingly difficult...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 8 [Missing String "readingTime"]

Charge your APIs Volume 13: Data meets APIOps

In the swirling digital vortex that modern businesses navigate, two things stand clear as day: our escalating reliance on Application Programming Interfaces (APIs) and the immeasurable value of data. The API Operations (APIOps) pipeline, with its automated...

API
Data

24.8.2023 | 11 [Missing String "readingTime"]

Daniel Kocot

Simple Fraud Detection with PyMC

In one of my last projects, we were facing a prediction problem with very limited data. Each set of data took a specialist hours to compile, and results were not always successful. Therefore, we were looking for a tool to handle these requirements, as...

Python
Data Science

26.1.2023 | 7 [Missing String "readingTime"]

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

In this article, we'll explore how to use the Poetry package manager to manage the dependencies of a machine learning project that makes use of the M1 GPU for TensorFlow training. We'll cover the motivation for using Poetry in this context, and we'll...

Machine Learning
Apple
Data
AI
Python

11.1.2023 | 3 [Missing String "readingTime"]

Denis Stalz-John

Money, Money, Money - Monetization of APIs needs more than just a business...

Welcome to my blog series on the topic of my bachelor's thesis, "Real-time dashboard with distributed streaming". To summarize, it discusses the visualization of API-related data that is essential for business owners. How is this series structured? This...

API
Streaming
Data

27.10.2022 | 5 [Missing String "readingTime"]

Python on an M1 chip: Running smoothly using Docker

I have been working as a data scientist at codecentric for several years now. Thus, my language of choice is Python and I am using it in several projects on a daily basis. Last year, I got pretty excited about the announcement of the new versions of ...

Data
Machine Learning
Apple
Python

14.2.2022 | 6 [Missing String "readingTime"]

Denis Stalz-John

BigQuery to the rescue: How to prototype an ML system for a medium-sized...

BigQuery can help with building an ML system for production with a short time to market.Follow industry standards. Agile methods, the MLOps framework and focus on an MVP are helpful.Model improvement is not everything. A good model evaluation as well...

Data

2.2.2022 | 9 [Missing String "readingTime"]

Better time series forecasting using expert knowledge

Basic forecast using Facebook Prophet

Mathematical background

Improving the forecasts for May and July

References:

Was this post helpful?

Blog author

More articles

Can you win the stacking challenge? An example of heuristic optimization

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Querying Databricks Delta Tables in Motherduck

Introducing Data Interface Quadrants (DIQs)

Access Databricks UnityCatalog from duckdb

Charge your APIs Volume 36 - Trends for 2025

When Business Meets Technology: From Data Product to Data Architecture...

Charge your APIs Volume 28: Empowering application and data integration...

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

Becoming a Data-Driven Company with Applied Data Products

A/B Testing: Tool support and testing GrowthBook

A/B Testing: An introduction

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

An introduction to federated learning in an industrial context: Advanced

An introduction to federated learning in an industrial context: Fundamentals

Charge your APIs Volume 13: Data meets APIOps

Simple Fraud Detection with PyMC

How to combine Poetry, TensorFlow, and the power of the Apple M1 GPU

Money, Money, Money - Monetization of APIs needs more than just a business...

Python on an M1 chip: Running smoothly using Docker

BigQuery to the rescue: How to prototype an ML system for a medium-sized...