You seems to be using
old browser.

To get the most our of #!% please visit us from one of the following browsers.

menu

Products

menu

Solutions

menu

Consumer Packaged Goods

menu

Retail

menu

Financial Services

menu

HR Analytics

menu

Outcomes

menu

Company

menu

About Us

menu

Partnerships

menu

Resources

menu

Contact Us

Blogs

How (Artificial) Intelligent is your Marketing Mix Model?

Marketing Mix Modeling has long established itself as a go to tool for Marketers to plan their marketing budget. Most Fortune 500 companies have been using it for decades to get the maximum bang for their buck. While MMx has been offered as a solution by many firms; each one claiming to have their own differentiators, ranging from different model specifications to hyper-parameter tuning methods to results computation to usability and governance. Though the model specifications and construct might be completely different across industries, but the central approach remains the same. The ultimate tool would be the one that provides maximum flexibility, transfers learning from past models and incorporates industry benchmarks. Without delving into the overall technical details in this blog, let me talk about how the media is measured in a Marketing Mix model.

In order to capture the true effect of media, one needs to apply a transformation function to the media data. The effect of airing an ad (be it TV or any other media) doesn’t end immediately. Infact, it has a lingering impact. An ad that you watch today might influence your decision to buy the product 3 months from now. But again, the impact is not the same as on day 1. There is a decay in impact. Ad-stock is the most commonly used transformation for media. Decay rates are different for different media vehicles. For instance, the decay parameter for a TV should be much smaller than a digital ad. The bigger the decay, the longer the half-life.

The calculation of volume contribution of a media is dependent on multiple model settings. You could be running an additive model or a multiplicative or a semi-log model. Each of these models has its own pros and cons. Computationally, additive models are the easiest and most intuitive but they come with their own limitations. One of the biggest criticisms of additive models comes from the fact that they are not the most effective when it comes to calculating synergistic effects. There are unsophisticated ways of including interaction between media (How much more effective it is to air TV & Radio ads together vis-a-vis just TV and Radio individually) i.e., by including dummies. Multiplicative models are considered superior in capturing these interaction effects. Contribution from synergy is calculated by evaluating second order interactions between variables. Perhaps the biggest challenge in a multiplicative model is calculating Volume Decomposition. Volume due to addition, volume due to removal, a weighted average of the 2 aforementioned techniques are some of the ways companies calculate volume from a media. Though none of them are full proof. I will talk about these techniques in detail in my subsequent blogs.

In a Marketing Mix Model, media effectiveness largely depends on two hyper-parameters – Saturation & Half-life. Saturation refers to the point beyond which you do not get a significant return from your media investments and Half-life refers to the number of weeks taken to accumulate half of your returns from a media spend. E.g. A TV ad of 200 GRPs may fetch you 1000 units of sales. A half-life of 24 weeks means that the TV ad helped you generate 500 units of sales in the first 24 weeks. If you have multiple media variables in the primary model, analysts tend to fix random values for saturation and decay. Typically, these hyper-parameters do not get the attention they deserve in the model for multiple reasons. Firstly, it is difficult (nearly impossible) to test every possible value of these parameters manually and secondly, there is no way to validate if the user defined values are optimal. Enter the era of super computing! While Grid search and random searches are still computationally very intensive, techniques like Bayesian Optimization do a great job of identifying the optimal values of these hyper-parameters. Wikipedia defines Bayesian Optimization as a sequential design strategy for global optimization of Black box functions that doesn’t require derivation. What it essentially does is that it removes the human bias from the model and helps capture the effectiveness of different campaigns more accurately and help you plan your media strategy in a more informed manner.

Apart from blindly trusting the numbers thrown by a model (sometimes the numbers can be counter-intuitive like spending on TV bringing negative sales), most MMx tools let you incorporate your beliefs in the model by using Bayesian priors or other ways of constrained modeling. E.g. If you strongly believe that the price elasticity of your product is between -1 & -2, algorithms give you a choice of fixing the coefficient. However, it is strongly recommended that you use priors minimally so that the models have the freedom to give you insights that you don’t already know.


About the Author

.

Ram Kumar
Director, Advanced Analytics

A Mathematician at heart, an Econometrician by education and a Data Scientist by
profession, Ram heads Advanced Analytics practice at GainInsights. Ram has over
13 years of experience in providing analytics-based consulting to some of the
biggest brands in CPG, Retail & Insurance domains.

Looking to connect with us?

Start a conversation