Link recommendations use a bunch of tables called growthexperiments_link_recommendations (one per wiki) on x1, which cache data from a recommendation system (which is slow). Currently we are keeping a contant pool of ~20K articles per wiki, which is enough to give users a feed of link recommendation tasks within some article topic they choose. But if we wanted to suggest people link recommendation tasks about the article they are reading at the moment (the project name for this is "entry point in reading experience"), we'd need this data for all articles.
We want to assess 1) if it would be reasonable to run an experiment on a few mid-size wikis to test how much a reading entry point would help with turning readers into editors and retaining new editors; 2) whether it would be feasible to scale up to all wikis eventually, 3) whether it would help or hurt or be necessary / impossible to move these tables out of MediaWiki (they just cache responses for a Kubernetes-based web service, so logically they could just as easily live in a database belonging to that service).
Currently the table size is something like 50-100M (so about 5K per wiki). On cswiki, which is our go-to wiki for testing new features, including every article would take about 2G. On enwiki, it would be about 20G.
Background:
- link recommendations feature documentation
- link recommendations technical documentation
- link recommendations tables documention: T266913: Add a link engineering: create tables in Wikimedia production
- (old) reading entry point task: T240513: Newcomer tasks: entry point in reading experience
- service plans: T307881: Scaling of link suggestions service