Criteo is a global ad-tech company providing online display advertising services. Found in Paris and listed in the US, their Commerce Media Platform enables marketers to bring consumers to brands, allowing media owners to become ad publishers.
It held a hackathon to stimulate ideas for exploring opportunities in the field of Sentiment Analysis. During the 4-day hackathon, I created a Brand Sentiment Monitor for Criteo's clients to better understand correlation between consumers’ sentiment and their consumption behaviour. The solution won the champion of the hackathon.
Criteo wanted to leverage social data and sentiment analysis to:
1. Who: Choose the right audiences to target
2. When: Target the users at the right time
3. What: Collect more insights about partners brand & products
But utilising sentiment analysis as a service to create value for customer is not that simple. Sentiment analysis contains non-deductible errors. It is hard to draw meaningful inferences and actionable insights.
To compensate these inherent weaknesses of sentiment analysis, instead of creating a customer-facing model, I focused on coming up with a "demographic agnostic model" for Criteo's Account Strategists, who are at the forefront to gauge business insights.
Identifying the right user of sentiment analysis is a key question. Because of the inherent weaknesses described above, we believed that the real user needs to have the literacy in reading sentiment analysis. Instead of the brand itself, the real user should be the one who provides consultation to Criteo's brand clients - the Account Strategist.
Account strategist is responsible for managing, maintaining, and up-selling post sale communications with client partners. They dig deep into the campaign data to build client strategy, and lead regular business reviews and communicating optimisations. They are best placed to use the sentimental analysis tool and provide direct solutions to brand clients.
We interviewed a Criteo Account Strategist to validate our understanding about their roles and day-to-day operations. An empathy map to account strategist was created for a holistic understanding to identify the user's needs.
We proposed a Brand Sentiment Monitor as a solution, which is a dashboard with multiple tools enabling Account Strategists to
1. Monitor brand healthiness from various social channels over time, and
2. Provide contexts of what is happening/affecting the brand audiences.
It consists of 7 major components, including (i) Brand Health Index, (ii) Sentiment Trend, (iii) Channel Sentiment, (iv) Geo-sentiment Map, (v) Word Cloud, (vi) Sentiment Clusters and (vii) Top Influencer List. Through different components, users can go beyond the surface to understand the current online sentiment, which could easily be distorted by lack of context.
To facilitate Account Strategist to bring the freshest brand sentiment insights to clients (like 10 mins before the client meeting), we came up with a 3-level info structure, inspired by journalism's "inverted pyramid" concept.
In an inverted pyramid, the most important information goes at the beginning of the story, followed by information in descending level of importance.
The first level shows the brand sentiment index of the day, as well as the trend over time. It gives users a sense on how the brand image is perceived across major social media platforms.
The second level provides the essential information about the sentiment distribution (+ve/neutral/-ve), and further classify them by platforms, geolocations and keywords.
The third level provides the most nuanced information, including the demographic-agnostic clustering of brand audiences by interests (as shown by their social media content), and identify top influencers who spread the positive/negative sentiments.
To demonstrate the true utility of the solution, we created a prototype using real data.
We applied the solution to global sport goods retailer Decathlon, which is Criteo’s existing client. We retrieved the latest tweets (3,500 tweets between 2 Mar-15 Mar) that mentioned or hashtagged "#Decathlon" for processing through the sentiment analysis model and the demographic agnostic model we created.
Amy took a look of the sentiment health index at the dashboard. She found that there was a steady drop of positive sentiments on Decathlon over the past few days.
She then looked into the dashboard data and found that the majority of the negative sentiment came from users in Europe.
She then looked at the word cloud which contained the keywords that driving the sentiments. She looked at different dates of keywords. The keyword changed from "sport" and "Olympics" on 2 March to "Boycott Decathlon" and "Ukraine" over the next 10 days. She realised that it was because Decathlon refused to close their business in Russia and people reacted negatively online.
Amy referred to the brand audience cluster charts. She found that most of the audience groups had negative sentiment. But one group (cycling) had rather neutral sentiment and they had substantial amount of followers. She would then suggest to Decathlon in the afternoon meeting to:
1. Avoid launching ads in the mass media in Europe for the time being
2. Relocate and prioritise the marketing budget on their cycling business
3. Bring to their senior management attention on the deteriorating support on social media
Through the top influencer list function Amy quickly suggested to Decathlon a list of influencers for possible collaboration. She also suggested to Decathlon the list of influencers they should look into for managing the potential publicity risks.
Criteo Sentiment Analysis Dashboard was built on Jupyter Notebook and Tableau platform, utilizing multiple APIs and public modules, such as sklearn, SHAP and Nominatim. In general, after retrieving the data from social media, there are three major data processing features performed along the pipeline:
It is necessary to clean the missing values and align the formatting of data found on the retrieved dataset before they can be used for analysis. An example was that the user location data was not standardized. A data cleaning using Nominatim was performed to get the standardized location of the user.
It is hard to retrieve data that contains conventional user segmentation features like age, gender and occupation from social media since they were either unavailable or unverified (or both). Thus, to define the users groups, the tool utilized a bottom-up approach by using unsupervised clustering method from users metrics and later defined the common characteristics of the users from each cluster.
Different multiple sentiment analysis tools have their strengths and weaknesses, and they complement each other. The ensembled model combined the strengths of multiple tools to get a collective sentiment for a more sophisticated and reliable analysis.