Building an end-to-end recommendation engine using Reinforcement Learning.


Thirunayan Dinesh

10th November 2020

cover image two

The Product

Expert Republic is an on-demand video consulting application that allows users to book one-to-one and group sessions with professional Experts who provide a range of useful services. Expert Republic came into fruition during the COVID-19 pandemic when most professionals were struggling to carry out their services like they used to because they heavily relied on physical interaction. Expert Republic was built as a solution for this crisis. With this mobile/web application in hand, experts were able to conduct their businesses completely digitally. Most Experts who signed up for the application were able to not just retain their current customer base but also to reach out to clients from all over the world, at the comfort of their homes.

But as the application started experiencing massive growth in users and bookings in the middle of the pandemic it became clear that to maintain the growth in users and bookings in the platform it is crucial to provide them with an extremely personalized user experience. And this means recommending users with services which are suitable for their needs and interests. For example a user who has more interest in startups and technology, and has had consultation sessions within those topics should be recommended with experts who provide consulting services about startup funding and leadership. But this requires building a recommender system which not only understands the interests of each and every user using a diverse range of factors but also adopts itself to even understand shifts in user interests. For example, if a user’s usual interest has been on fitness coaching but found a new interest in cooking, our algorithm should be able to identify this shift in interest through in-app user behaviour and then recommend users content based on this new interests too.

Solving data?

As with any ML problem ,the first step was to identify key data points from users which would represent their interests. The second step was to architect and engineer a scalable data pipeline which can capture these key data points from users to feed them into our model continuously so that we keep on learning shifting interests landscapes of our users using multiple behavioral attributes including factors like clickstreams, past sessions , likes and reviews.

Building an RL based recommender engine?

Traditionally, recommendation systems fall under two categories, collaborative filtering and content-based filtering systems. In a nutshell in collaborative filtering a “User/Item” interaction matrix is created where the user is suggested services based on what other users with similar interests liked. In content-based filtering the user is recommended items based on their previous selections, likes and dislikes using supervised methods. Many existing recommendation engines popularly use a supervised approach either one of these approaches or a hybrid approach combining both.

Although this supervised learning based approaches are widely used, they inherently present 3 major problems

  • Cold start - Until a threshold amount of data is collected for an individual user, supervised learning based recommendation algorithms struggle to recommend new users with content with minimal data. This phenomenon is known as “cold start” in recommender engines.

  • Short Term Recommendations - In our application we wanted to recommend the users of Expert Republic with services from experts they are familiar with, but we also wanted provide users with services from experts they would like to explore more about too, thereby not only optimizing for short term engagement but also for long-term exploration too, and most supervised / unsupervised recommendation algorithms focus only on short-term optimizations.

  • Static view of user interests - One of the major problems with collaborative / content-based recommenders is that once initial phases of training are done, the features learnt are kept static , giving the recommender engine a static view of user interests. We wanted to build a recommendation engine which can dynamically adapt it’s policy mechanism over time to undershifting user interests.

Reinforcement learning to the rescue!

Reinforcement learning is a machine learning paradigm in which an agent is trained to make decisions to repeated simulations. In a nutshell an agent is either rewarded or penalized based on the decision it took and the reward or penalty is used as a signal to update the agent’s decision parameters so that the decisions taken are more towards maximizing the reward. The Markow property of reinforcement learning algorithms fit them well to be used in recommender engine environments.

We used a combination of on-policy methods REINFORCE and SARSA to learn the optimal policy and keep iterating on it. One of the major challenges we had was to simulate the large action space of all possibilities of user’s making selections from different recommendations. For simulation based training we used the RecSim framework introduced by google. We initially create a sample distribution of user profiles and content recommendations, then the user actions are simulated using a user choice model which is trained on real-world user choice data. This data is then used to train a reinforcement learning agent to rank expert services for individual user profiles. In addition this approach also solves the cold start problem by enabling the agent to provide suitable recommendations to a new user by generating a policy during that can quickly model a user’s interest using past data.In addition we also constructed a continuous retraining pipeline which would use user engagement data collected over time to retrain the agent’s policy framework continuously so that the policy is continuously adapted to features that can represent shifting user interests.

blog content


Our primary metrics to evaluate the overall performance of the recommendation engine, were CTR (click through rates) and overall engagement rates identified by the growth in expert session bookings and user activity. After the deployment of our RL based recommendation engine we were able to notice more than a 65% spike in CTR. And the number of daily booking increased by more than 28%. This showed us that our reinforcement learning based approach proves to be more effective in preventing problems like cold start and performs better in recommending services to users in constantly changing interest landscape, increasing overall user experience and engagement. We are currently working on improving this architecture by consistently updating the reward and policy mechanisms to achieve convergence faster with less compute power and training time.