Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
BBC RussianHomePhabricator
Log In
Maniphest T274062

Dice coff/Levenshtein and hamming distance for page titles in Abuse Filter
Open, Needs TriagePublic

Description

Hi, for a long time an LTA is creating pages on enwn with the tilte visually similar to "This is a problematic string".
The create pages like "thís is a problemat1c string", "tnis iss a problemátiç strimg" and of such sorts. There is not much in the article body.

I really think dice coffecient, or some sort of string similarity algorithm, and then a hamming disctance calculating function for the page title can help preventing abuse.

Can these functions be added to Abuse Filter?

Related: T36912.

Event Timeline

Hi, why was this task created, given the previous discussion in T36912: Add string distance function to AbuseFilter?

@Aklapper @Tgr told me to create a task. Besides, this task specifically requests the functions only for page title.

It would be better to keep discussing string distance functions in one place.

Dice coefficient OTOH can be calculated relatively quickly (n*logn) so it should not have performance issues. It might not be terribly useful for long text, but maybe it would work for titles.