Matrix Factorization for Ad Recommendation

14.3.2018 | 7 minutes reading time

This blog post describes how matrix factorization can be applied to the problem of ad targeting. It draws from my experience of developing a machine-learning-based solution for this task for the real-time performance marketing company twiago together with other colleagues from our Data Science team.

The problem: Ad targeting

Let us begin by framing the problem: We have an idle website floating in the endless vastness of the internet. Like this:

The page contains two div containers A and B. Whenever a user visits our page, we want to place ads in these containers. For example, such an ad could be a banner ad as in the following picture:

Typically, we would have many of these possible ads. The decision which ad should be placed in which ad container is handled by an ad server that handles the request for ad material sent by the web page. This is depicted in the following illustration:

This all should happen in a blink of a moment and finally the page renders to the users containing two ads.

Now the next interesting thing that could happen: the user clicks on one of the ads. If so, the advertiser will pay a certain amount of money to the operators of the ad server and the page owner.

There is a whole industry for this and in general a huge market for online advertising.

As we have seen in our toy example, there are two major players: the publisher (web page owner) and the advertiser (the person advertising goods and services). Typically a publisher does not host one single web site, but several sites under his domain. A newspaper or online sports magazine might be a good example. Our ad server would also serve a network of publishers. This is illustrated here:

There is a mutual benefit here: Advertisers can publish on various sites, potentially reaching different audiences, while publishers will benefit from “crowd wisdom” (or: collaborative filtering ) as ads are exposed to more users and will finally flock to the right ad spaces.

The performance indicator: Click-through rate

We can measure the performance of an ad by its click-through rate (CTR). It describes the probability of the ad receiving a click when it is shown. Mathematically it is given by this ratio:

CTR = (Number of clicks) / (Number of impressions)

In this formula you should think of the denominator as the number of “unique views by unique users”. The click-through rate depends on the ad space and the ad. The number of clicks and impression counts are with respect to a fixed time window (one week in our case).

Side note: What technically counts as an impression can be defined by certain industry standards (e.g. those provided by www.iab.com ) and is part of the implementation of the ad serving technology. We can ignore the technical details and trust the ad server to provide a systematic way of counting “impressions” and providing this information to us.

The problem restated: Filling missing values

We can rephrase the problem stated above as in the following terms: we have to fill in missing values of the click-through rate matrix and rank ads by the thus predicted click-through rates. Of course, the predictions should somehow align with the already known values.

Assuming that we have M ad space in our complete network of publishers and N ads we could deliver, we can write down a matrix of shape (M x N) that records for each possible combination (Ad, AdSpace) the currently observed click-through rate. This matrix is called click-through matrix (for short: CTR matrix).

Of course, this matrix would contain missing values for the following reason: not every ad has been (and likely never will be) shown on all possible ad spaces. So certain combinations have not occurred yet. As said, we would like to fill in these values with the constraint that we do not want to deviate too much from the known values with our predictions.

Side note: Typically this matrix would be rather sparse. But thanks to the fact that we target ad spaces (rather than individual users) on websites with a lot of traffic, impressions and click counts for individual users aggregate rather quickly to yield a matrix with a good degree of filling in the course of a week which was the time frame for our batch job.

The solution: Matrix factorization

We are interested in predicting these missing values since they might reveal how well a yet unknown combination of ad and ad space would perform and whether it would make sense to bring them together.

How can we achieve this? A popular technique for such collaborative filtering tasks is matrix factorization, which is what we also use here.

Matrix factorization in a nutshell

To outline the geometric idea behind this, recall how the matrix product X of a matrix W with the matrix H is given in algebraic terms:

This formula computes the entry X[i,j] at position (i,j) as dot product of the i-th row of W with the j-th column of H. This is the algebraic definition usually given. A more appealing geometric picture can be obtained as follows:
This says that the j-th column of the product X is given by weighting the columns of W with the corresponding weights from the j-th row of H and summing everything up.
Depending on your taste, this might be a little bit too much of math. But let us give a interpretation of this in more layman terms: whenever we take a data matrix X of records and write it as product X=WH as in the following picture

we can use the second factor H as the new representation of the data and reconstruct the original data use the first factor W as seen here:

Now for a given matrix X there might be several ways to decompose X as product WH. A point of interest is to choose the pair (W, H) such that the number of rows of H (equivalently: the number of columns of W) is strictly lower than the number of rows of the original matrix X. This would mean that we reduce the number of features in our representation and correspond to data compression.
The idea is that during the compression we will learn what information present in X is important and which is not. But how would we learn this compressed representation?
Once again, machine learning (a.k.a. mathematical/numerical optimization) comes to the rescue. If you liked the math above, this picture is for you:

We attempt to approximate X as WH (denoted as X with a hat in the picture). For this, we try to minimize the error J(W,H). This error is given by comparing the squared errors between the known entries of X and the approximation WH. (Remember our click-through matrix has missing values; so this fits into the overall picture.) One can use the Alternating squares algorithm (ALS) to compute such an approximation.

The deployment

We delivered a software solution that runs a weekly batch job in the cloud. It first fetches the impression and click statistics from the existing ad server and then computes a table of favorable ad and as space combinations for the next week. For this, it used the mathematical optimization we outlined above.

The final solution is deployable to Amazon Web Serives (AWS) as JAR. In this concrete case we use a single EC2 instance to run the weekly batch job.

A note on the software development part: we decided to use Apache Spark and Scala to code everything. Less so because we had to deal with big data (we are looking at a few GBs) but rather because it allows us to write ETL pipelines and machine learning components using a single ecosystem or API. (Of course, this is also possible with other solutions.)

The outcome

In a live test we observed a performance improvement of 15 – 20 % compared to the existing system based on expert rules. This is quite good and shows the potential of using a machine-learning-based approach in this case.

Summary

In this blog post we explained how matrix factorization can be used to predict missing values from a data matrix and saw how to apply this technique to the problem of ad targeting.

Was this post helpful?

Blog author

Daniel Pape

Do you still have questions? Just send me a message.

fromDaniel Pape

Spark 2.0 – Datasets and case classes

The brand new major 2.0 release of Apache Spark was given out two days ago. One of its features is the unification of the DataFrame and Dataset APIs. While the DataFrame API has been part of Spark since the advent of Spark SQL (they replaced SchemaRDDs...

27.7.2016 | 7 minutes reading time

Daniel Pape

Spam classification using Spark’s DataFrames, ML and Zeppelin (Part 1)

This is the first entry in a series of blog posts about building and validating machine learning pipelines with Apache Spark . Its main concern is to show how to explore data with Spark and Apache Zeppelin notebooks in order to build machine learning...

Scala
Big Data
Data
Machine Learning

22.6.2016 | 15 minutes reading time

Daniel Pape

Calculating Pi with Apache Spark

Apache Spark is a system for cluster computing and part of the increasingly popular SMACK stack . The aim of this blog post is to provide a beginners introduction on how to set up a mini Spark cluster of virtual machines (VMs) using Vagrant and to run...

Big Data
Machine Learning

16.4.2016 | 9 minutes reading time

Daniel Pape

Your job at codecentric?

Jobs

Agile Developer und Consultant (w/d/m)

Alle Standorte

Querying Databricks Delta Tables in Motherduck

Intro In a previous article, my colleague Matthias Niehoff demonstrated how duckdb can serve as a viable alternative to Spark for processing data stored in Databricks, specifically by directly accessing the Unity Catalog. Building upon that, a next ...

Data

25.4.2025 | 4 [Missing String "readingTime"]

Hendrik Kamp

Introducing Data Interface Quadrants (DIQs)

In today’s rapidly evolving, data-driven world, organisations face an increasingly complex challenge: how to design, implement, and manage data interfaces that meet both immediate operational demands and long-term strategic business objectives. A data...

API
Data

30.1.2025 | 8 [Missing String "readingTime"]

Daniel Kocot

Miriam Greis

Access Databricks UnityCatalog from duckdb

Databricks is a great platform when it comes to data management and governance, mostly due to the unity catalog. But Spark as an engine for processing the data is just ok'ish, especially when data is not really big. New engines like polars, datafusion...

Data

20.1.2025 | 5 [Missing String "readingTime"]

Matthias Niehoff

Charge your APIs Volume 36 - Trends for 2025

As 2025 approaches, new trends are emerging in the world of APIs. After 2024 was user-centric, the focus is now shifting back to developer needs and increasing productivity. APIs are evolving and the technologies surrounding them are becoming more powerful...

Integration
API
Data
Software architecture

11.12.2024 | 5 [Missing String "readingTime"]

Daniel Kocot

We deployed our SaaS Application on fly.io (and it was great).

How we deployed our application in a fraction of the time while saving 100% of the cost. Our team, a bunch of experienced software engineers without prior contact to cloud deployments, wanted to deploy our OCPP-compliant EV Charging Station Simulator...

AWS
Cloud

23.10.2024 | 4 [Missing String "readingTime"]

Jannis Mainczyk

Dangling DNS in cloud infrastructures

Dangling DNS entries are nothing new. Forgotten, outdated or incorrect DNS records can lead to subdomains being taken over and used in phishing campaigns, for example, to steal employee secrets. Due to dynamic IP addresses of rapidly changing resources...

IT-Security
Validation
Cloud
AWS
Infrastructure

5.9.2024 | 4 [Missing String "readingTime"]

Markus Höfer

When Business Meets Technology: From Data Product to Data Architecture...

Abstract The Data Product Canvas (DPC) is a tool for the lightweight and iterative definition of data products. It increases the efficiency of product definition by clearly presenting the key impact areas on data products. Additionally, the DPC motivates...

Software architecture
Data
DDD
Digital product developement

6.8.2024 | 24 [Missing String "readingTime"]

Dr. Florian Rademacher

Spring Boot and HTMX: Deployment to AWS Lambda

This is the next part of my series about Spring Boot and HTMX. In this post, I will show you how to deploy the application created in the previous post to AWS Lambda. If you're in a hurry or impatient, you can simply check out the accompanying Git Repo...

Serverless
Spring
AWS
DevOps
Cloud

30.7.2024 | 5 [Missing String "readingTime"]

Charge your APIs Volume 28: Empowering application and data integration...

In today's fast-paced world, seamless application and data integration is crucial for organisational success. This blog explores how frameworks like Maslow's Pyramid, Team Topologies, Evolutionary Architectures, API Federation, and API Marketplaces, ...

API
Data
Integration

25.7.2024 | 8 [Missing String "readingTime"]

Daniel Kocot

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

The pillars of modern data architectures as success factors for organisations In the digital economy, a well-thought-out data architecture and the efficient use of data are crucial for organisational success. Data products, data contracts and API contracts...

Data
API

13.6.2024 | 7 [Missing String "readingTime"]

Daniel Kocot

Becoming a Data-Driven Company with Applied Data Products

In recent years, the hype surrounding the value of data has grown continuously, and a multitude of concepts and methods have emerged on how companies can become 'data-driven'. From strategic top management to detail-oriented data analysts attempts are...

Agile
Big Data
Data
Product management
Digitalization
Data Science
Business Intelligence

18.5.2024 | 9 [Missing String "readingTime"]

Dr. Florian Rademacher

A/B Testing: Tool support and testing GrowthBook

In the previous blog post we introduced some general concepts of A/B testing: we explored the main aspects, defined test types and explained the most common statistical methods. Now we want to explore the areas in which A/B testing tools can provide...

Testing
Python
Data
UX/UI
Analysis
JavaScript

18.3.2024 | 20 [Missing String "readingTime"]

Francesca Diana

A/B Testing: An introduction

This blog series aims to aid teams who are contemplating adding A/B testing to their toolkit but are unsure of which tool to use. In addition to helping with tool selection, the series also provides the entire team with a consistent initial understanding...

Testing
Data
UX/UI
Analysis

6.2.2024 | 29 [Missing String "readingTime"]

Francesca Diana

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

The Digital Product Passport represents a significant shift for digital units within organisations, compelling them to ensure comprehensive data transparency. This tool not only serves as a product's digital fingerprint but also opens up new dimensions...

Data
Product management

25.1.2024 | 7 [Missing String "readingTime"]

Daniel Kocot

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

In the ever-evolving landscape of software development, buzzwords and paradigms come and go. One such term that has gained significant traction in recent years is "API-First Development." It's been hailed as the holy grail of modern software engineering...

API
Data

19.10.2023 | 5 [Missing String "readingTime"]

Daniel Kocot

An introduction to federated learning in an industrial context: Advanced

In the Machine Learning space, it was long believed that sharing learnings or weights was safe in the sense that the input data couldn't be extracted. However, this belief has been challenged by researchers coming out over the years. Nowadays, numerous...

Machine Learning
Big Data
Data Science
Data

18.9.2023 | 9 [Missing String "readingTime"]

An introduction to federated learning in an industrial context: Fundamentals

With the help of data, companies are able to make more informed decisions, optimize their workflows and gain an edge in the competitive world of business using the power of Machine Learning (ML). However, handling data has become increasingly difficult...

Machine Learning
Data Science
Data
Big Data

25.8.2023 | 8 [Missing String "readingTime"]

Charge your APIs Volume 13: Data meets APIOps

In the swirling digital vortex that modern businesses navigate, two things stand clear as day: our escalating reliance on Application Programming Interfaces (APIs) and the immeasurable value of data. The API Operations (APIOps) pipeline, with its automated...

API
Data

24.8.2023 | 11 [Missing String "readingTime"]

Daniel Kocot

Python and CDK (Part 2): Taking control of Python dependencies in AWS ...

In Part 1 of this series, Developing AWS Lambda Functions with Python and CDK, we covered the initial setup of a CDK and Python project. We walked through the process of creating a basic Hello World* Lambda function, testing it with a unit test, defining...

AWS
Serverless
Python

2.6.2023 | 2 [Missing String "readingTime"]

Python and CDK (Part 1): Developing AWS Lambda functions with Python and...

This blog post assumes that you are familiar with Python development and know the basic concepts of Amazon CDK. What's more, you should have an AWS account and have configured the AWS CLI. If you're new to CDK, go here, if you need to configure the AWS...

AWS
Serverless
Python

6.3.2023 | 6 [Missing String "readingTime"]

Matrix Factorization for Ad Recommendation

The problem: Ad targeting

The performance indicator: Click-through rate

The problem restated: Filling missing values

The solution: Matrix factorization

Matrix factorization in a nutshell

The deployment

The outcome

Summary

Was this post helpful?

Blog author

More articles

Spark 2.0 – Datasets and case classes

Spam classification using Spark’s DataFrames, ML and Zeppelin (Part 1)

Calculating Pi with Apache Spark

Your job at codecentric?

Agile Developer und Consultant (w/d/m)

More articles in this subject area

Querying Databricks Delta Tables in Motherduck

Introducing Data Interface Quadrants (DIQs)

Access Databricks UnityCatalog from duckdb

Charge your APIs Volume 36 - Trends for 2025

We deployed our SaaS Application on fly.io (and it was great).

Dangling DNS in cloud infrastructures

When Business Meets Technology: From Data Product to Data Architecture...

Spring Boot and HTMX: Deployment to AWS Lambda

Charge your APIs Volume 28: Empowering application and data integration...

Data for the Masses Volume 2: Data Products, Data Contracts and API Contracts

Becoming a Data-Driven Company with Applied Data Products

A/B Testing: Tool support and testing GrowthBook

A/B Testing: An introduction

Data for the Masses Volume 1: The Digital Product Passport - A Key Element...

Charge your APIs: NordicAPIs Platform Summit Edition - API first ... not...

An introduction to federated learning in an industrial context: Advanced

An introduction to federated learning in an industrial context: Fundamentals

Charge your APIs Volume 13: Data meets APIOps

Python and CDK (Part 2): Taking control of Python dependencies in AWS ...

Python and CDK (Part 1): Developing AWS Lambda functions with Python and...