Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
SlideShare a Scribd company logo

1

Discover the Unseen:
Tailored Recommendation
of Unwatched Content
Harshit Jain & Charan Kamal

2

Harshit Jain
■ Software Engineer at JioCinema
■ Works with the Personalization team
■ 5 years of experience building large scale distributed
systems
■ Passionate about technology, a dedicated Golang
enthusiast, and an avid traveller
Your photo
goes here,
smile :)

3

■ About JioCinema
■ Recommendations - how do we ensure freshness?
■ Scale and challenges
■ ScyllaDB on steroids with Bloom Filters
Agenda

4

About JioCinema
JioCinema is an OTT streaming platform that offers free
and subscription-based video on demand and live streaming
content.
■ More than 10M daily active unique users.
■ Streams prominent cricket tournaments, notably IPL,
widely acknowledged as the most-watched cricket
league worldwide.
■ Home to one of the biggest Football leagues in Europe
(LaLiga).
■ Offers Video-on-Demand (VOD) content in more than
10 Indian languages.

5

Unlocking Engagement
The Power and Significance of
“Personalization”

6

Let’s look at some examples

7

Optimizing Personalization
Challenges with Managing Redundancy in
Personalization

8

The Challenge!
■ Customer has already watched
“House of the Dragon”
■ Recommending "House of the
Dragon" in the personalized tray
constitutes an inefficient
allocation of valuable real estate
and resources

9

The Solution:“Watch Discounting”
Watch Discounting refers to the practice of removing content that
customers have already watched
■ Importance
■ Efficient Real Estate Utilization
■ Improved Content Discovery
■ Enhanced Customer Experience

10

Watch Discounting in action !
After Watch Discounting “House of Dragons”

11

Navigating Technical
Challenges of Watch
Discounting

12

Hurdles in Fueling Watch Discounting
■ Scale: Managing already-watched content for more than 10M daily active
customers poses a considerable challenge.
■ Concurrency: Handling user interactions becomes challenging with an average of
20 million of concurrent users during high concurrency events.
■ Latency: Maintaining a smooth user interaction on JioCinema necessitates keeping
latency within SLA, regardless of scale and concurrency challenges.

13

Charan Kamal
■ Software Engineer at JioCinema
■ Works with the Personalisation Team
■ 6 years of experience in developing expansive
distributed system at scale
■ Passionate about highly scalable system, loves Go and
Java, in free time you can find me either playing fifa or
drawing

14

Bloom filters to the rescue!
■ Bloom filters are space-efficient probabilistic data structures designed
for rapid membership lookup in a set.
■ What makes them suitable for our use case ?
■ Trade-off with False Positives Acceptable: Given that it's a recommendation
system, tolerating a few false positives is acceptable, as it allows for more
efficient memory usage without significant compromises.
■ Reduced Storage Requirements: In our context, the critical aspect is
minimizing storage due to the presence of a very large dataset.

15

Why in-memory or redis Bloom Filters won’t work
■ In-memory Bloom filters do exhibit relatively lower latency, they come with
drawbacks that are not conducive to this particular use case which are:
■ Data Volatility
■ Replicating the data across all the application pods is costly
■ Bloom filters in Redis and ScyllaDB share similar purposes, distinct differences
make Redis unsuitable for this specific use case:
■ Cost consideration - Redis charges us on each operation whereas scylla has a fixed cost.

16

Serving Fresh Content!

17

■ ScyllaDB tuning
■ Using Mini pods
■ Using High cardinality of partition key
■ Using TTLs to make sure older and irrelevant data gets removed
■ Using LOCAL_QUORUM to read from local data centres only
ScyllaDB + Bloom filters for the win!

18

Time for some
Statistics

19

■ Statistics from a recently concluded high scale event
■ At the onset of the match, a hockey stick pattern of requests was noted.
Here Comes Scale!

20

■ Latency within SLA
■ Healthy CPU utilisation
Scale Handled!

21

Stay in Touch
Harshit Jain
harshit.jain@viacom18.com
https://github.com/modestlearner
https://www.linkedin.com/in/harshit-jain-911003148/
Charan Kamal
charan.kamal@viacom18.com
https://github.com/ckstudy2021
https://www.linkedin.com/in/charan-kamal-
1303ba16a

More Related Content

Discover the Unseen: Tailored Recommendation of Unwatched Content

  • 1. Discover the Unseen: Tailored Recommendation of Unwatched Content Harshit Jain & Charan Kamal
  • 2. Harshit Jain ■ Software Engineer at JioCinema ■ Works with the Personalization team ■ 5 years of experience building large scale distributed systems ■ Passionate about technology, a dedicated Golang enthusiast, and an avid traveller Your photo goes here, smile :)
  • 3. ■ About JioCinema ■ Recommendations - how do we ensure freshness? ■ Scale and challenges ■ ScyllaDB on steroids with Bloom Filters Agenda
  • 4. About JioCinema JioCinema is an OTT streaming platform that offers free and subscription-based video on demand and live streaming content. ■ More than 10M daily active unique users. ■ Streams prominent cricket tournaments, notably IPL, widely acknowledged as the most-watched cricket league worldwide. ■ Home to one of the biggest Football leagues in Europe (LaLiga). ■ Offers Video-on-Demand (VOD) content in more than 10 Indian languages.
  • 5. Unlocking Engagement The Power and Significance of “Personalization”
  • 6. Let’s look at some examples
  • 7. Optimizing Personalization Challenges with Managing Redundancy in Personalization
  • 8. The Challenge! ■ Customer has already watched “House of the Dragon” ■ Recommending "House of the Dragon" in the personalized tray constitutes an inefficient allocation of valuable real estate and resources
  • 9. The Solution:“Watch Discounting” Watch Discounting refers to the practice of removing content that customers have already watched ■ Importance ■ Efficient Real Estate Utilization ■ Improved Content Discovery ■ Enhanced Customer Experience
  • 10. Watch Discounting in action ! After Watch Discounting “House of Dragons”
  • 12. Hurdles in Fueling Watch Discounting ■ Scale: Managing already-watched content for more than 10M daily active customers poses a considerable challenge. ■ Concurrency: Handling user interactions becomes challenging with an average of 20 million of concurrent users during high concurrency events. ■ Latency: Maintaining a smooth user interaction on JioCinema necessitates keeping latency within SLA, regardless of scale and concurrency challenges.
  • 13. Charan Kamal ■ Software Engineer at JioCinema ■ Works with the Personalisation Team ■ 6 years of experience in developing expansive distributed system at scale ■ Passionate about highly scalable system, loves Go and Java, in free time you can find me either playing fifa or drawing
  • 14. Bloom filters to the rescue! ■ Bloom filters are space-efficient probabilistic data structures designed for rapid membership lookup in a set. ■ What makes them suitable for our use case ? ■ Trade-off with False Positives Acceptable: Given that it's a recommendation system, tolerating a few false positives is acceptable, as it allows for more efficient memory usage without significant compromises. ■ Reduced Storage Requirements: In our context, the critical aspect is minimizing storage due to the presence of a very large dataset.
  • 15. Why in-memory or redis Bloom Filters won’t work ■ In-memory Bloom filters do exhibit relatively lower latency, they come with drawbacks that are not conducive to this particular use case which are: ■ Data Volatility ■ Replicating the data across all the application pods is costly ■ Bloom filters in Redis and ScyllaDB share similar purposes, distinct differences make Redis unsuitable for this specific use case: ■ Cost consideration - Redis charges us on each operation whereas scylla has a fixed cost.
  • 17. ■ ScyllaDB tuning ■ Using Mini pods ■ Using High cardinality of partition key ■ Using TTLs to make sure older and irrelevant data gets removed ■ Using LOCAL_QUORUM to read from local data centres only ScyllaDB + Bloom filters for the win!
  • 19. ■ Statistics from a recently concluded high scale event ■ At the onset of the match, a hockey stick pattern of requests was noted. Here Comes Scale!
  • 20. ■ Latency within SLA ■ Healthy CPU utilisation Scale Handled!
  • 21. Stay in Touch Harshit Jain [email protected] https://github.com/modestlearner https://www.linkedin.com/in/harshit-jain-911003148/ Charan Kamal [email protected] https://github.com/ckstudy2021 https://www.linkedin.com/in/charan-kamal- 1303ba16a