Marketing Mixture Models are tools for extracting the media channel (and other factors) contribution to target KPI, which in most cases is sales. The models (we will call them MMMs from now on) help us collect insights about channel contributions, which can then be used to optimize the marketing strategy, e.g. increase spending in more beneficial channels, and decrease the budget of those, which do not bring much value.

In this article, we will talk about the functionalities of MMMs, the main requirements for data, potential use cases, available tools (along with their pros and cons), and what use they can all bring to the end client.

Model components

MMM’s can either additive, or multiplicative (based on whether the internal component values are added, or multiplied together). In most cases, an additive model is used. Example of a possible decomposition:

Here, t is a time period, and α is the intercept. The most important effects of MMM’s are often used and combined inside the media channelst element. They are Carryover and Shape.

Carryover Effect

A carryover effect in basic terms is just some effect that carries over a while after the initial action has been performed.

For instance, we showed an ad for only one day, but we saw its effect on sales a few days after the ad had stopped.

The carryover effect may occur for several reasons, such as delayed exposure to the ad, delayed consumer response, or even purchases from consumers who have heard from those who first saw the ad (word of mouth).

To model the carryover effect of advertising, we transform the time series of media spend in one channel through the adstock function (we will skip the mathematical explanations for simplicity).

Shape Effect

The shape effect refers to the change in sales in response to the increasing intensity of advertising in the same time period. Basically, it shows how would our KPI be impacted if we increased or decreased the ad intensity.

For instance, if we spend 100 dollars for an ad, we would get 1000 back in sales, however, if we would spend 200 dollars, we would only get back 1500.

Example of a shape effect:

To model the shape effect, one of the possible functions to use is the Hill function (again, you can read about the concrete mathematical explanation in various public sources).

Besides the Hill function, other functional forms can be used to model the shape effect, such as the exponential function, the sigmoid function (also referred to as the logistic function), or the integral of other probability distributions.

Effect combination

Previously mentioned carryover and shape effects can be combined in two ways:

If money spent in each time period is relatively small compared to accumulated spending, the shape effect is less obvious. In this case, the order of transformations would be adstock → shape.
On the other hand, if media spending is heavily concentrated in one-off time periods, shape → adstock is preferred.

In theory, it is not necessary to use both of these effects, yet the public studies and our experiments conclude that in most cases it is beneficial to do so for the model’s accuracy and confidence.

Of course, there can be some additional components used and combined in an additive or multiplicative way, such as:

Seasonality (for capturing instances of some seasonal impact).
Trend (for capturing steady, long-lasting incline/decline of the KPI).
Control variables (for capturing other event contributions to the KPI).

More about these effects can be found here:

Data requirements

Format

The most important requirement for the data is that it should be a time series, containing ad spending (or impressions) for each media channel, along with the target KPI (e.g. sales).

Example of a dataset with the appropriate format, containing impressions (as media channels) and sales (as target):

Frequency

An acceptable data frequency is weekly or daily, but weekly entries are preferred.

With daily values, it is harder to capture seasonality and media impact due to higher variance. For this reason, daily data can be used, but only with up to 1-2 years of entries since after that, the variance of data is so high that the model’s results start to deteriorate. Too little variance will also lead to result deterioration and low model confidence.

Hence, the main takeaway is that extremely low variance (such as in yearly data), or extremely high variance (such as in hourly data) will be inaccurate, so it is recommended to avoid it entirely.

Quantity

Our team has experimented with various data frequencies and quantities - models were compared based on accuracy and confidence. From the obtained results we can provide the data quantity recommendations (minimum and preferred amounts):

Minimum amount - at least 3 months (90 entries) of daily data, or at least 6 months (24-30 entries) of weekly data.
Preferred amount - at least 1 year (365 entries) of daily data, or at least 2-5 years (96-240 entries) of weekly data.

Keep in mind that with minimal data requirements, the results may not be entirely correct, since the model will struggle to capture exact coefficients. The minimal amount of data can only be used to infer the approximate contributions of each channel.

With the preferred amount - the model is expected to provide robust results with a much lower margin of error.

Sparsity

Sparsity can be considered as gaps in data consistency. Example of a highly inconsistent (sparse) dataset:

From our experiments, it was evident that high sparsity will negatively affect the overall results due to incorrect trend/seasonality/adstock capturing.

It is highly recommended that the provided data (weekly or daily) has no gaps. Or, in the worst-case scenario, the missing gaps are filled/interpolated.

Granularity

Another important concept is data granularity. It is generally recommended to use media channel variables, which are at a more granular level. Example:

The preferred media channels to use would be the values in the “Variable“ column. If possible - even more granularity could be used (for instance, split the Facebook channel into Facebook video, Facebook ads, and others). This is applicable only if the more granular channels are not highly correlated.

However, the overall channel count for the model should not be too high (more than 4-5 channels will likely introduce uncertainty), so more granularity should be used carefully - preferably when there are not that many channels available in the first place.

Channel-to-entry ratio

Recommendations from public sources state that the minimal channel-to-entry ratio should be 1:7-10.

For example: if we have 3 ad channels without any additional features, and use a weekly model - then a training dataset of 21-30 weeks should be sufficient enough.

The main takeaway from the experiments of our team was that the perfect ratio is reached with fewer media channels and more data entries. The best results were obtained with 4-6 channels and 1-2 years of data, so the ideal ratio is leaning more towards 1:25 for weekly data, and 1:90 for daily.

The models were more sensitive to data entry count than the channel count, however, the quality of media channel data is more important than quantity. Choosing only the most important channels will lead to better results in most cases.

Use cases

The main property for the MMMs to work is that each media channel has consistent media channel data. Below we specify a few use cases (both good and bad), which could help to determine when a MMM is appropriate to use, and when it will simply not work.

Good use cases

The client has constant ad spends on some media channels and knows the KPI for each time entry. He/she wants to know which channels are worth spending money on, and which are not (to increase KPI).
The client has constant impression data on some media channels and knows the KPI for each time entry. He/she wants to know which channels are worth increasing impressions on (to make the KPI higher).
The client has a similar dataset and needs as mentioned above, but the dataset is divided into different regions. In this case, a geo model can be used (available in lightweight_mmm library).
The client has a similar dataset as mentioned above but wants to know the ROAS, ROI, or the optimal budget for each media channel. In this case, the budget allocator along the MMM model could be used (available in pymc-marketing library).

Bad use cases

The client has run a campaign for 1 month (for 1 or a few media channels) and wants to know if that increased the sales going forward (e.g. for the next 6 months with no further spending). In this case, the model will not work since it needs to have constant media channel values alongside the observed KPI.
The client has multiple media channels, in which the ad spend (or impressions) are mostly constant (that is with little variation). In this case, the model will struggle to capture coefficients.
The client has a highly inconsistent or sparse dataset. The results in this case will likely not be satisfactory.
The client has the same needs as in one of the good use cases, yet all his media channels are highly correlated. In such case, the model will not work. It will not be able to distinguish between contribution coefficients and will likely output them as identical. Due to this, before training the model, all datasets should be checked and cleaned of high correlation.
The client does not have a regular, time series based dataset - only a few one-off ad spend events, or the overall ad spend and sales. The model cannot be used in this case.

Tools

There are many great tools available regarding the MMM creation, yet our team mainly focused on Python solutions. For this purpose, 2 libraries were explored for training MMMs, both having their own advantages and disadvantages.

Lightweight_mmm

Pros:

Free choice of what type of model to use (carryover, adstock, hill adstock). Hill adstock is recommended for real-life data since it includes both adstock and saturation functions.
Geo model support.
Saturation curve, ROI plotting, budget allocator functionalities.
Both impression and ad spend media channel data can be combined in the model.

Cons:

The free combination of shape (saturation) and carryover (adstock) effects is not available.
Data scaling has to be performed manually.
Lift tests are not supported.

PyMC-Marketing

Pros:

Data scaling is applied automatically.
An ability to incorporate lift test calibration from CausalPy (to further improve the model’s results).
Saturation curve, ROAS plotting, and budget allocator functionalities.

Cons:

By default, delayed adstock and logistic saturation effects are used (no way to change it).
Only one of the impression and ad spend media channel data can be used (the model is oriented toward ad spend).

In general, the best (default) choice is pymc-marketing for convenience and the ability to calibrate the model with lift tests.

Lightweight_mmm should be chosen if we need geo models or support for both impressions and ad spend parameters.

Raw models (e.g. pymc or pyro) can also be created and used, yet it is only recommended to do so in very specific cases - for instance, when the client has specific needs/knowledge about the model effects or both of the libraries cannot bring satisfactory results.

More information about these tools is presented here:

Use for the end client

MMMs can be used in various ways in order to provide insights and bring business values to the client. The most popular useful functionalities are:

Channel contribution analysis.
KPI predictions.
Saturation curves.
Budget optimizer.

Channel contribution analysis

The model is able to extract the coefficients of each media channel, representing each of their impact on the target KPI (sales). The coefficients can be normalized into percentages, and the model’s confidence can be represented by a vertical line (the longer the line - the less confident our model is about a particular prediction). With this information, the client can see the biggest and smallest contributors with respective confidence.

Example: Google, Email, and Affiliate channels are the biggest contributors with lower confidence. Article_Released is the smallest contributor with the biggest confidence.

KPI predictions

The model is also able to use the extracted coefficients to predict future sales. This could be useful if the client has a planned marketing strategy to use in the future and wants to see what potential sales could it bring.

Example: The model is capable of predicting future sales with good accuracy (R2 > 0.9 and MAPE < 10% would be an ideal result.

Saturation curves

This functionality is arguably the best one (in terms of business value) since it clearly shows the client how would each channel contribute to sales if we increased or decreased channel spendings (or impressions). It can confidently point out both - the channels worth investing further, and those, which would be beneficial to eliminate.

Example: Google and Email channels are the biggest contributors now, marked at the dashed line. If we were to increase the spendings by 1.5 times (see δ = 1.5), these channels would still bring great returns, while the other 3 channel contributions would remain practically unchanged.

Budget optimizer

Lastly, the budget optimizer tool can be detrimental to efficiently optimizing the planned budget. Based on the obtained contributions and saturation curves, the optimizer can reallocate the budget between channels to obtain the maximum KPI (sales).

Example: The budget for Facebook was lowered and mainly reallocated to channels Google and Overall_Views. This action increased the predicted sales by 70.000 euro.

We have successfully gone over the main concepts of Marketing Mixture Models, covering their usage possibilities, data requirements, available tools, and general tips & tricks.

Do you need help with your Marketing Mixture Model? Contact our team here, and we will find the best solution for you.

Marketing Mixture Models: an in-depth overview