Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
BBC RussianHomePhabricator
Log In
Maniphest T343308

fiwiki RC filters classify all edits as 'very likely bad faith'
Closed, ResolvedPublic

Description

See local VP thread: https://fi.wikipedia.org/wiki/Wikipedia:Kahvihuone_(tekniikka)#Tuoreissa_muutoksissa_vain_pahantahtoisia_muokkauksia

Starting from sometime before August 1st, the fiwiki RC feed has started classifying practically all edits as 'very likely bad faith'. The timeline matches with ORES lift wing migration: T342115#9055266.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Tried to analyze: https://fi.wikipedia.org/w/index.php?title=Andranik_Ozanjan&curid=1761901&diff=21724224&oldid=21724211

Picked diff=21724224 as rev-id and ran the following in quarry:

SELECT * FROM ores_classification
WHERE oresc_rev = '21724224'
ORDER BY oresc_rev DESC
LIMIT 10;

Result was:

oresc_id	oresc_rev	oresc_model	oresc_class	oresc_probability	oresc_is_predicted
8934144	21724224	8	1	0.021	0
8934145	21724224	9	1	0.999	1

That is inline with:

elukey@stat1004:~$ curl -s https://inference.svc.eqiad.wmnet:30443/v1/models/fiwiki-goodfaith:predict -X POST -d '{"rev_id": 21724224}' -i -H "Host: fiwiki-goodfaith.revscoring-editquality-goodfaith.wikimedia.org"  --http1.1 
HTTP/1.1 200 OK
content-length: 193
content-type: application/json
date: Wed, 02 Aug 2023 11:02:41 GMT
server: istio-envoy
x-envoy-upstream-service-time: 131

{"fiwiki":{"models":{"goodfaith":{"version":"0.5.1"}},"scores":{"21724224":{"goodfaith":{"score":{"prediction":true,"probability":{"false":6.593534607191032e-10,"true":0.9999999993406465}}}}}}}elukey@stat1004:~$ curl -s https://inference.svc.eqiad.wmnet:30443/v1/models/fidamaging:predict -X POST -d '{"rev_id": 21723984}' -i -H "Host: fiwiki-damaging.revscoring-editquality-damagingfaith.wikimedia.org"  --htt^C.1 

elukey@stat1004:~$ curl -s https://inference.svc.eqiad.wmnet:30443/v1/models/fiwiki-damaging:predict -X POST -d '{"rev_id": 21724224}' -i -H "Host: fiwiki-damaging.revscoring-editquality-damaging.wikimedia.org"  --http1.1 
HTTP/1.1 200 OK
content-length: 191
content-type: application/json
date: Wed, 02 Aug 2023 11:02:59 GMT
server: istio-envoy
x-envoy-upstream-service-time: 122

{"fiwiki":{"models":{"damaging":{"version":"0.5.1"}},"scores":{"21724224":{"damaging":{"score":{"prediction":false,"probability":{"false":0.9789067272463305,"true":0.021093272753669456}}}}}}}elukey@stat1004:~$

The revision is definitely goodfaith, with is it marked the opposite?

The thresholds used in ores-extension:

wgOresModelThresholds (https://noc.wikimedia.org/wiki.php?wiki=fiwiki#wgOresModelThresholds)

"goodfaith": {
    "statistics": {
        "thresholds": {
            "false": [
                { # "maybebad"
                    "!f1": null,
                    "!precision": null,
                    "!recall": 0,
                    "accuracy": 0.033,
                    "f1": 0.063,
                    "filter_rate": 0,
                    "fpr": 1,
                    "match_rate": 1,
                    "precision": 0.033,
                    "recall": 1,
                    "threshold": 0
                },
                { # "likelybad"
                    "!f1": 0.986,
                    "!precision": 0.982,
                    "!recall": 0.99,
                    "accuracy": 0.973,
                    "f1": 0.527,
                    "filter_rate": 0.975,
                    "fpr": 0.01,
                    "match_rate": 0.025,
                    "precision": 0.601,
                    "recall": 0.469,
                    "threshold": 0.263
                },
                null # "verylikelybad"
            ],
            "true": [  # "likelygood" 
                null
            ]
        }
    }
}

wgOresFiltersThresholds (https://noc.wikimedia.org/wiki.php?wiki=fiwiki#wgOresFiltersThresholds)

	'fiwiki' => [
		// damaging uses defaults for everything
		'goodfaith' => [
			// likelygood, maybebad, likelybad use defaults
			'verylikelybad' => [ 'min' => 0, 'max' => 'maximum recall @ precision >= 0.9' ],
		],
	],

defaults (in extension.json):

				"goodfaith": {
					"likelygood": {
						"min": "maximum recall @ precision >= 0.995",
						"max": 1
					},
					"maybebad": {
						"min": 0,
						"max": "maximum filter_rate @ recall >= 0.9"
					},
					"likelybad": {
						"min": 0,
						"max": "maximum recall @ precision >= 0.6"
					},
					"verylikelybad": false
				},

The thresholds that ORES uses:

"likelygood"
=+0.995"" class="remarkup-link" target="_blank" rel="noreferrer">https://ores.wikimedia.org/v3/scores/fiwiki/?models=goodfaith&model_info=statistics.thresholds.true.%22maximum+recall+@+precision+%3E=+0.995%22

"maybebad"
=+0.9"" class="remarkup-link" target="_blank" rel="noreferrer">https://ores.wikimedia.org/v3/scores/fiwiki/?models=goodfaith&model_info=statistics.thresholds.false.%22maximum+filter_rate+@+recall+%3E=+0.9%22

"likelybad"
=+0.6"" class="remarkup-link" target="_blank" rel="noreferrer">https://ores.wikimedia.org/v3/scores/fiwiki/?models=goodfaith&model_info=statistics.thresholds.false.%22maximum+recall+@+precision+%3E=+0.6%22

"verylikelybad"
=+0.9"" class="remarkup-link" target="_blank" rel="noreferrer">https://ores.wikimedia.org/v3/scores/fiwiki/?models=goodfaith&model_info=statistics.thresholds.false.%22maximum+recall+@+precision+%3E=+0.9%22

It seems the thresholds are the same as the ones used by ORES.

Change 944916 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/mediawiki-config@master] ext-ORES: avoid Lift Wing calls for fiwiki

https://gerrit.wikimedia.org/r/944916

Change 944916 merged by jenkins-bot:

[operations/mediawiki-config@master] ext-ORES: avoid Lift Wing calls for fiwiki

https://gerrit.wikimedia.org/r/944916

Mentioned in SAL (#wikimedia-operations) [2023-08-02T14:49:37Z] <elukey@deploy1002> Started scap: Backport for [[gerrit:944916|ext-ORES: avoid Lift Wing calls for fiwiki (T343308)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-02T14:51:17Z] <elukey@deploy1002> elukey: Backport for [[gerrit:944916|ext-ORES: avoid Lift Wing calls for fiwiki (T343308)]] synced to the testservers mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-02T14:58:46Z] <elukey@deploy1002> Finished scap: Backport for [[gerrit:944916|ext-ORES: avoid Lift Wing calls for fiwiki (T343308)]] (duration: 09m 08s)

We reverted the change for fiwiki but the page in the description is still showing red flags, even for new changes. Tried to purge the URL from the CDN but didn't work as well.

It seems to be working now, we'll keep this task open to figure out what went wrong.

The RC filters now only show "maybe malicious" and "probably malicious" which is correct. Don't see "very likely malicious" anymore.

I have an assumption about what is causing this problem, but I need to verify it by examining the code further. My assumption is that the new setting has somehow messed up the threshold configs for "maybe bad faith" and "very likely bad faith" (should be null)

The threshold for "maybe bad faith" is

{
    "!f1": null,
    "!precision": null,
    "!recall": 0,
    "accuracy": 0.033,
    "f1": 0.063,
    "filter_rate": 0,
    "fpr": 1,
    "match_rate": 1,
    "precision": 0.033,
    "recall": 1,
    "threshold": 0
},

"threshold" is 0 means edits with a good-faith's false probability greater than 0 will be classified as “maybe bad faith”, that's basically all edits and so we can see in
https://fi.wikipedia.org/wiki/Toiminnot:Tuoreet_muutokset?hidebots=1&hidecategorization=1&hideWikibase=1&limit=500&days=30&goodfaith__maybebad_color=c3&goodfaith__likelybad_color=c4&urlversion=2
all edits are flagged as “maybe bad faith” (in yellow colour)

That looks similar to what we saw with the new backend LiftWing - all edits are flagged as “very likely bad faith.”

If that is the case, I also think this might not be an appropriate threshold for "maybe bad faith". Even rev-id like 21726059 whose good-faith prediction is true with a probability 0.99999 will be flagged as “maybe bad faith”.

Great analysis Aiko! One thing that I still don't understand is why now it works fine, meanwhile it doesn't when we switch to Lift Wing.

IIUC wgOresFiltersThresholds and wgOresModelThresholds are the same, so something else is at play.

The interesting thing is that the model threshold config of fiwiki is the only one having a config like this:

'true' =>
  [ 0 => null, ],

Could it be that the Lift Wing code reads this config in a different way?

Edit: Answering to myself with https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/915541

feat: hardcode threshold calls to switch to Lift Wing

ORES supports querying the models and balance specificity with sensitivity based on queries provided
in the configuration for each model. In order to facilitate the transition to Lift Wing we load these thresholds from files
instead of fetching them from ORES. This is done by using a feature flag (global $wgOresUseLiftwing) to switch switching the ThresholdLookup service with the new service named
ThresholdLookupConfig.

Relevant task https://phabricator.wikimedia.org/T319170#8807964

From =+0.995"|statistics.thresholds.true."maximum+filter_rate+@+recall+>=+0.9"|statistics.thresholds.true."maximum+recall+@+precision+>=+0.6"|statistics.thresholds.true."maximum+recall+@+precision+>=+0.9"&format=json" class="remarkup-link" target="_blank" rel="noreferrer">https://ores.wikimedia.org/v3/scores/fiwiki/?models=goodfaith&model_info=statistics.thresholds.false.%22maximum+recall+%40+precision+%3E%3D+0.995%22%7Cstatistics.thresholds.true.%22maximum+filter_rate+%40+recall+%3E%3D+0.9%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.6%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.9%22&format=json:

{'fiwiki': {'models': {'goodfaith': {'statistics': {'thresholds': {'false': ['null'],
                                                                   'true': [{'!f1': 0.473,
                                                                             '!precision': 0.349,
                                                                             '!recall': 0.732,
                                                                             'accuracy': 0.947,
                                                                             'f1': 0.972,
                                                                             'filter_rate': 0.068,
                                                                             'fpr': 0.268,
                                                                             'match_rate': 0.932,
                                                                             'precision': 0.991,
                                                                             'recall': 0.954,
                                                                             'threshold': 1.0},
                                                                            {'!f1': 'null',
                                                                             '!precision': 'null',
                                                                             '!recall': 0.0,
                                                                             'accuracy': 0.967,
                                                                             'f1': 0.983,
                                                                             'filter_rate': 0.0,
                                                                             'fpr': 1.0,
                                                                             'match_rate': 1.0,
                                                                             'precision': 0.967,
                                                                             'recall': 1.0,
                                                                             'threshold': 0.0},
                                                                            {'!f1': 'null',
                                                                             '!precision': 'null',
                                                                             '!recall': 0.0,
                                                                             'accuracy': 0.967,
                                                                             'f1': 0.983,
                                                                             'filter_rate': 0.0,
                                                                             'fpr': 1.0,
                                                                             'match_rate': 1.0,
                                                                             'precision': 0.967,
                                                                             'recall': 1.0,
                                                                             'threshold': 0.0}]}}}}}}

@achou I am not sure if reading the above correctly, but I see false with null (not true). Could it be the issue?

So the ORES extension calls ORES for thresholds, and uses the above, rendering a result via RC filters.
When we switch to Lift Wing, we use the hardcoded values and render what we have in wgOresModelThresholds, added by Ilias and maybe not correct for fiwiki. Does it make sense?

@elukey In Ilias's comment https://phabricator.wikimedia.org/T319170#8807964, the example url is querying thresholds for frwiki damaging model.

https://ores.wikimedia.org/v3/scores/frwiki/?models=damaging&model_info=statistics.thresholds.false.%22maximum+recall+%40+precision+%3E%3D+0.995%22%7Cstatistics.thresholds.true.%22maximum+filter_rate+%40+recall+%3E%3D+0.9%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.6%22%7Cstatistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.9%22&format=json

The url contains four rules (after model_info=), and since it's for damaging, filters are:

  • Very likely good => statistics.thresholds.false."maximum+recall+@+precision+>=+0.995"
  • May have problems => statistics.thresholds.true."maximum+filter_rate+@+recall+>=+0.9"
  • Likely have problems => statistics.thresholds.true."maximum+recall+@+precision+>=+0.6"
  • Very likely have problems => statistics.thresholds.true."maximum+recall+@+precision+>=+0.9"

But for goodfaith model, it needs to swap the prediction true <> false:

  • Very likely good faith => statistics.thresholds.true."maximum+recall+@+precision+>=+0.995"
  • May be bad faith => statistics.thresholds.false."maximum+filter_rate+@+recall+>=+0.9"
  • Likely bad faith => statistics.thresholds.false."maximum+recall+@+precision+>=+0.6"
  • Very likely bad faith => statistics.thresholds.false."maximum+recall+@+precision+>=+0.9"

So the query for fiwiki goodfaith should be:

https://ores.wikimedia.org/v3/scores/fiwiki/?models=goodfaith&model_info=statistics.thresholds.true.%22maximum+recall+%40+precision+%3E%3D+0.995%22%7Cstatistics.thresholds.false.%22maximum+filter_rate+%40+recall+%3E%3D+0.9%22%7Cstatistics.thresholds.false.%22maximum+recall+%40+precision+%3E%3D+0.6%22%7Cstatistics.thresholds.false.%22maximum+recall+%40+precision+%3E%3D+0.9%22&format=json

The result is the same as the hardcoded values we have in wgOresModelThresholds for fiwiki, but I suspect that maybe something is off in ThresholdLookupConfig.php so it mismatches the filter and threshold.

ahhh okok sigh I thought swapping damaging with goodfaith was enough, sorry for the confusion :(

The result is the same as the hardcoded values we have in wgOresModelThresholds for fiwiki, but I suspect that maybe something is off in ThresholdLookupConfig.php so it mismatches the filter and threshold.

Do you think that the following bit trips some error?

'true' =>
  [ 0 => null, ],

On cswiki, I'm also observing an unusually high rate of edits highlighted as 'very likely bad'. All IP edits are marked as such. I think it started around the beginning of this week.

Change 946510 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/mediawiki-config@master] ext-ORES: force cswiki to use the ORES settings/backend

https://gerrit.wikimedia.org/r/946510

Change 946510 merged by jenkins-bot:

[operations/mediawiki-config@master] ext-ORES: force cswiki to use the ORES settings/backend

https://gerrit.wikimedia.org/r/946510

Mentioned in SAL (#wikimedia-operations) [2023-08-07T08:16:16Z] <elukey@deploy1002> Started scap: Backport for [[gerrit:946510|ext-ORES: force cswiki to use the ORES settings/backend (T343308)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-07T08:24:53Z] <elukey@deploy1002> elukey: Backport for [[gerrit:946510|ext-ORES: force cswiki to use the ORES settings/backend (T343308)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-07T08:31:07Z] <elukey@deploy1002> Finished scap: Backport for [[gerrit:946510|ext-ORES: force cswiki to use the ORES settings/backend (T343308)]] (duration: 14m 50s)

On cswiki, I'm also observing an unusually high rate of edits highlighted as 'very likely bad'. All IP edits are marked as such. I think it started around the beginning of this week.

Hi @matej_suchanek, thanks a lot for the report! I have reverted cswiki's behavior to the previous version, let me know if the RC filters look better now!

The RC filters now only show "maybe malicious" and "probably malicious" which is correct. Don't see "very likely malicious" anymore.

@achou one thing that I don't get is why we have the following config for fiwiki:

'wgOresFiltersThresholds' => [
...
	'fiwiki' => [
		// damaging uses defaults for everything
		'goodfaith' => [
			// likelygood, maybebad, likelybad use defaults
			'verylikelybad' => [ 'min' => 0, 'max' => 'maximum recall @ precision >= 0.9' ],
		],
	],

This wasn't touched by Ilias' changes, but IIUC it hints to the fact that verylikelybad should be there. Am I totally wrong?

From the ORES access logs, this is the exact URI used to fetch thresholds:

Let's pick fiwiki:

ORES API:

{'fiwiki': {'models': {'goodfaith': {'statistics': {'thresholds': {'false': ['null',
                                                                             {'!f1': 'null',
                                                                              '!precision': 'null',
                                                                              '!recall': 0.0,
                                                                              'accuracy': 0.033,
                                                                              'f1': 0.063,
                                                                              'filter_rate': 0.0,
                                                                              'fpr': 1.0,
                                                                              'match_rate': 1.0,
                                                                              'precision': 0.033,
                                                                              'recall': 1.0,
                                                                              'threshold': 0.0},
                                                                             {'!f1': 0.986,
                                                                              '!precision': 0.982,
                                                                              '!recall': 0.99,
                                                                              'accuracy': 0.973,
                                                                              'f1': 0.527,
                                                                              'filter_rate': 0.975,
                                                                              'fpr': 0.01,
                                                                              'match_rate': 0.025,
                                                                              'precision': 0.601,
                                                                              'recall': 0.469,
                                                                              'threshold': 0.263}],
                                                                   'true': ['null']}}}}}}

ORES Extension config:

'goodfaith' =>
  [
    'statistics' =>
      [
        'thresholds' =>
          [
            'false' =>
              [
                0 =>
                  [
                    '!f1' => null,
                    '!precision' => null,
                    '!recall' => 0.0,
                    'accuracy' => 0.033,
                    'f1' => 0.063,
                    'filter_rate' => 0.0,
                    'fpr' => 1.0,
                    'match_rate' => 1.0,
                    'precision' => 0.033,
                    'recall' => 1.0,
                    'threshold' => 0.0,
                  ],
                1 =>
                  [
                    '!f1' => 0.986,
                    '!precision' => 0.982,
                    '!recall' => 0.99,
                    'accuracy' => 0.973,
                    'f1' => 0.527,
                    'filter_rate' => 0.975,
                    'fpr' => 0.01,
                    'match_rate' => 0.025,
                    'precision' => 0.601,
                    'recall' => 0.469,
                    'threshold' => 0.263,
                  ],
                2 => null,
              ],
            'true' =>
              [
                0 => null,
              ],
          ],
      ],
  ],
],
],

The order of the elements of the false array is not the same, null comes first in ORES API and last in the hardcoded config.

On cswiki, I'm also observing an unusually high rate of edits highlighted as 'very likely bad'. All IP edits are marked as such. I think it started around the beginning of this week.

Hi @matej_suchanek, thanks a lot for the report! I have reverted cswiki's behavior to the previous version, let me know if the RC filters look better now!

I think that fixed it. I am definitely not observing what I reported earlier.

cswiki:

ORES API:

{'cswiki': {'models': {'goodfaith': {'statistics': {'thresholds': {'false': [{'!f1': 0.989,
                                                                              '!precision': 0.979,
                                                                              '!recall': 1.0,
                                                                              'accuracy': 0.979,
                                                                              'f1': 0.121,
                                                                              'filter_rate': 0.999,
                                                                              'fpr': 0.0,
                                                                              'match_rate': 0.001,
                                                                              'precision': 1.0,
                                                                              'recall': 0.064,
                                                                              'threshold': 0.882},
                                                                             {'!f1': 0.966,
                                                                              '!precision': 0.998,
                                                                              '!recall': 0.935,
                                                                              'accuracy': 0.935,
                                                                              'f1': 0.383,
                                                                              'filter_rate': 0.917,
                                                                              'fpr': 0.065,
                                                                              'match_rate': 0.083,
                                                                              'precision': 0.243,
                                                                              'recall': 0.901,
                                                                              'threshold': 0.05},
                                                                             {'!f1': 0.991,
                                                                              '!precision': 0.989,
                                                                              '!recall': 0.992,
                                                                              'accuracy': 0.982,
                                                                              'f1': 0.563,
                                                                              'filter_rate': 0.98,
                                                                              'fpr': 0.008,
                                                                              'match_rate': 0.02,
                                                                              'precision': 0.601,
                                                                              'recall': 0.53,
                                                                              'threshold': 0.519}],
                                                                   'true': [{'!f1': 0.526,
                                                                             '!precision': 0.394,
                                                                             '!recall': 0.792,
                                                                             'accuracy': 0.968,
                                                                             'f1': 0.983,
                                                                             'filter_rate': 0.045,
                                                                             'fpr': 0.208,
                                                                             'match_rate': 0.955,
                                                                             'precision': 0.995,
                                                                             'recall': 0.972,
                                                                             'threshold': 0.704}]}}}}}}

ORES extension config:

'goodfaith' =>
  [
    'statistics' =>
      [
        'thresholds' =>
          [
            'false' =>
              [
                0 =>
                  [
                    '!f1' => 0.966,
                    '!precision' => 0.998,
                    '!recall' => 0.935,
                    'accuracy' => 0.935,
                    'f1' => 0.383,
                    'filter_rate' => 0.917,
                    'fpr' => 0.065,
                    'match_rate' => 0.083,
                    'precision' => 0.243,
                    'recall' => 0.901,
                    'threshold' => 0.05,
                  ],
                1 =>
                  [
                    '!f1' => 0.991,
                    '!precision' => 0.989,
                    '!recall' => 0.992,
                    'accuracy' => 0.982,
                    'f1' => 0.563,
                    'filter_rate' => 0.98,
                    'fpr' => 0.008,
                    'match_rate' => 0.02,
                    'precision' => 0.601,
                    'recall' => 0.53,
                    'threshold' => 0.519,
                  ],
                2 =>
                  [
                    '!f1' => 0.989,
                    '!precision' => 0.979,
                    '!recall' => 1.0,
                    'accuracy' => 0.979,
                    'f1' => 0.121,
                    'filter_rate' => 0.999,
                    'fpr' => 0.0,
                    'match_rate' => 0.001,
                    'precision' => 1.0,
                    'recall' => 0.064,
                    'threshold' => 0.882,
                  ],
              ],
            'true' =>
              [
                0 =>
                  [
                    '!f1' => 0.526,
                    '!precision' => 0.394,
                    '!recall' => 0.792,
                    'accuracy' => 0.968,
                    'f1' => 0.983,
                    'filter_rate' => 0.045,
                    'fpr' => 0.208,
                    'match_rate' => 0.955,
                    'precision' => 0.995,
                    'recall' => 0.972,
                    'threshold' => 0.704,
                  ],
              ],
          ],
      ],
  ],
],

@achou maybe it is nothing but the "false" array elements order is not the same in fiwiki and cswiki:

  • in fiwiki, null is the first in the ORES API, meanwhile it is the last in the Lift Wing config.
  • in cswiki, the order of the elements is not the same (no nulls in here).

IIRC the order mattered, could it be the problem?

My theory, that is probably really wrong since I am not great in PHP:

  • In the ORES extension, the ThresholdLookup class is used to fetch model thresholds from the ORES API, via fetchThresholdsFromApi.
  • Ilias created ThresholdLookupConfig, that extends the above and overrides fetchThresholdsFromApi to just read from the config file, rather than calling ORES.
  • Both functions use getFiltersConfig to retrieve wgOresFiltersThresholds, namely configs like:
		'goodfaith' => [
			// likelygood, maybebad, likelybad use defaults
			'verylikelybad' => [ 'min' => 0, 'max' => 'maximum recall @ precision >= 0.9' ],
		],
  • Both functions do the same array_merge operation between model thresholds and filter thresholds, to then return the result.
  • In the ChangesListHooksHandler class, we should have the code that generates the extra RC filters.
  • More specifically, handleGoodFaith reads the goodfaith threshold ($thresholdLookup->getThresholds( 'goodfaith' )) and creates the RC filters accordingly.

My theory is that ThresholdLookup and ThresholdLookupConfig share some logic, but they don't share the same data, since as we mentioned above the order of the mode thresholds in the array returned by the ORES API is not the same as what we hardcode in the extension config (for Lift Wing), see T343308#9073095

the json representation of the cswiki's config is this:

{"goodfaith":{"statistics":{"thresholds":{"false":[{"!f1":0.966,"!precision":0.998,"!recall":0.935,"accuracy":0.935,"f1":0.383,"filter_rate":0.917,"fpr":0.065,"match_rate":0.083,"precision":0.243,"recall":0.901,"threshold":0.05},{"!f1":0.991,"!precision":0.989,"!recall":0.992,"accuracy":0.982,"f1":0.563,"filter_rate":0.98,"fpr":0.008,"match_rate":0.02,"precision":0.601,"recall":0.53,"threshold":0.519},{"!f1":0.989,"!precision":0.979,"!recall":1,"accuracy":0.979,"f1":0.121,"filter_rate":0.999,"fpr":0,"match_rate":0.001,"precision":1,"recall":0.064,"threshold":0.882}],"true":[{"!f1":0.526,"!precision":0.394,"!recall":0.792,"accuracy":0.968,"f1":0.983,"filter_rate":0.045,"fpr":0.208,"match_rate":0.955,"precision":0.995,"recall":0.972,"threshold":0.704}]}}}}

and json representations of the API response:

{"goodfaith": {"statistics": {"thresholds": {"false": [{"!f1": 0.989, "!precision": 0.979, "!recall": 1.0, "accuracy": 0.979, "f1": 0.121, "filter_rate": 0.999, "fpr": 0.0, "match_rate": 0.001, "precision": 1.0, "recall": 0.064, "threshold": 0.882}, {"!f1": 0.966, "!precision": 0.998, "!recall": 0.935, "accuracy": 0.935, "f1": 0.383, "filter_rate": 0.917, "fpr": 0.065, "match_rate": 0.083, "precision": 0.243, "recall": 0.901, "threshold": 0.05}, {"!f1": 0.991, "!precision": 0.989, "!recall": 0.992, "accuracy": 0.982, "f1": 0.563, "filter_rate": 0.98, "fpr": 0.008, "match_rate": 0.02, "precision": 0.601, "recall": 0.53, "threshold": 0.519}], "true": [{"!f1": 0.526, "!precision": 0.394, "!recall": 0.792, "accuracy": 0.968, "f1": 0.983, "filter_rate": 0.045, "fpr": 0.208, "match_rate": 0.955, "precision": 0.995, "recall": 0.972, "threshold": 0.704}]}}}}

They are different from a quick glance.

Random note: We don't really need that many thresholds and configurations for ores ext. It's bloating the config and makes it harder to debug.

Change 946546 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/mediawiki-config@master] ext-ORES: revert all wikis to use ORES instead of Lift Wing

https://gerrit.wikimedia.org/r/946546

Change 946546 merged by jenkins-bot:

[operations/mediawiki-config@master] ext-ORES: revert all wikis to use ORES instead of Lift Wing

https://gerrit.wikimedia.org/r/946546

Mentioned in SAL (#wikimedia-operations) [2023-08-07T13:52:30Z] <elukey@deploy1002> Started scap: Backport for [[gerrit:946546|ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-07T13:53:52Z] <elukey@deploy1002> elukey: Backport for [[gerrit:946546|ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308)]] synced to the testservers mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-07T13:59:19Z] <elukey@deploy1002> Finished scap: Backport for [[gerrit:946546|ext-ORES: revert all wikis to use ORES instead of Lift Wing (T343308)]] (duration: 06m 49s)

@Ladsgroup What I don't understand is how the order for the ORES API response is defined. It seems that for fiwiki and cswiki, the order is ["very likely bad faith", "maybe bad faith", "likely bad faith"]. Does this same order apply to all other wikis as well?

For fiwiki and cswiki, "very likely bad" is defined in wgOresFiltersThresholds and other categories use defaults. Would it be the reason it is the first element of the array?

For all users: we rolledback the extension to its previous behavior, so we can investigate what is the bug that caused this to various wikis. Sorry for the issues!

@Ladsgroup What I don't understand is how the order for the ORES API response is defined. It seems that for fiwiki and cswiki, the order is ["very likely bad faith", "maybe bad faith", "likely bad faith"]. Does this same order apply to all other wikis as well?

For fiwiki and cswiki, "very likely bad" is defined in wgOresFiltersThresholds and other categories use defaults. Would it be the reason it is the first element of the array?

It's been a while but my understanding has been that the order doesn't matter. The goal was to ask for the threshold of e.g. "[email protected]" and get a threshold regardless of recall or other setting so you'd have a set of pre-defined thresholds and mediawiki picks them based on another set of configuration:

	'cswiki' => [
		'damaging' => [
			// likelygood, maybebad, likelybad use defaults
			'verylikelybad' => [ 'min' => 'maximum recall @ precision >= 0.98', 'max' => 1 ],
		],
		'goodfaith' => [
			// likelygood, maybebad, likelybad use defaults
			'verylikelybad' => [ 'min' => 0, 'max' => 'maximum recall @ precision >= 0.98' ],
		],
	],

basically, in order to get the actual threshold, you need to combine these two.

The thresholds are not wrong but what represent is wrong:

ladsgroup@mwmaint1002:~$ mwscript shell.php --wiki=fiwiki
Psy Shell v0.11.10 (PHP 7.4.33 — cli) by Justin Hileman
> use MediaWiki\Logger\LoggerFactory;
> use MediaWiki\MediaWikiServices;
> use ORES\Services\ORESServices;
> use ORES\Services\PopulatedSqlModelLookup;
> use ORES\Storage\DatabaseQueryBuilder;
> use ORES\Storage\SqlModelLookup;
> use ORES\Storage\SqlScoreLookup;
> use ORES\Storage\SqlScoreStorage;
> use ORES\ThresholdParser;
> use ORES\Storage\ThresholdLookup;
> use ORES\Storage\ThresholdLookupConfig;
> 
> $services = \MediaWiki\MediaWikiServices::getInstance();
= MediaWiki\MediaWikiServices {#86}

> 
> $liftwingLookup = new ThresholdLookupConfig(new ThresholdParser( LoggerFactory::getInstance( 'ORES' ) ), ORESServices::getModelLookup(), ORESServices::getORESService(), $services->getMainWANObjectCache(), ORESServices::getLogger(), $services->getStatsdDataFactory(), $services->getMainConfig() );

= ORES\Storage\ThresholdLookupConfig {#3508}

> $legacyLookup = new ThresholdLookup( new ThresholdParser( LoggerFactory::getInstance( 'ORES' ) ), ORESServices::getModelLookup(), ORESServices::getORESService(), $services->getMainWANObjectCache(), ORESServices::getLogger(), $services->getStatsdDataFactory(), $services->getMainConfig() );

= ORES\Storage\ThresholdLookup {#6062}

> var_dump( $liftwingLookup->getThresholds( 'goodfaith', false ) );
array(2) {
  ["verylikelybad"]=>
  array(2) {
    ["min"]=>
    int(0)
    ["max"]=>
    float(1)
  }
  ["maybebad"]=>
  array(2) {
    ["min"]=>
    int(0)
    ["max"]=>
    float(0.737)
  }
}
= null

> var_dump( $legacyLookup->getThresholds( 'goodfaith', false ) );
array(2) {
  ["maybebad"]=>
  array(2) {
    ["min"]=>
    int(0)
    ["max"]=>
    float(1)
  }
  ["likelybad"]=>
  array(2) {
    ["min"]=>
    int(0)
    ["max"]=>
    float(0.737)
  }
}
= null

The array values are the same but the keys are different. I debug a bit more to figure out why.

I think I found out what's wrong. Now trying to see how can we fix it. It doesn't look easy to be honest.

@Ladsgroup if you want to add here you ideas we'll pick up when Ilias is back from holidays!

I actually found a rather easy solution. Let me explain.

Why this is happening in the first place? Your comment in T343308#9073434 is in the right direction and just needs a last push: ores thresholds set either as value (e.g. "1") or pseudo-value like maximum recall @ precision >= 0.9. When the code sees a pseudo-value, it hits ores API with = 0.6"" class="remarkup-link" target="_blank" rel="noreferrer">https://ores.wikimedia.org/v3/scores/fiwiki/?model=goodfaith&model_info=statistics.thresholds.false.%22maximum%20recall%20@%20precision%20%3E=%200.6%22 and then uses the threshold value as translation of that pseudo-value (i.e. it offloads finding the actual threshold to ores service). The only catch here is that ores extension doesn't know which pseudo value map to which threshold from ores response: It relies on ores returning them with the exact order (!) which is extremely fragile and that has been lost when replacing the API call with config values.

The solution is rather easy: Just replace all of pseudo-values with actual numbers: They were useful when ores was constantly getting its models updated so thresholds constantly shifted and we needed a way to keep it consistent without any mw config deploys. Now that we have replaced thresholds with mw configs, we actually need to do a mw config regardless when we switch ores models to a new model.

Once we replace that pseudo values with actual ones (which is not too hard to get, I can do the call inside mw and get the translated values so we are sure the thresholds stay the same), we can completely get rid of the api calls, the extra config we recently introduced, hundreds of lines code and thousands of lines of config.

What do you think?

@Ladsgroup I agree. IIUC you mean to change the config from this

'arwiki' => [
		'damaging' => [
			'likelygood' => [ 'min' => 0, 'max' => 'maximum recall @ precision >= 0.997'],

to this

'arwiki' => [
		'damaging' => [
			'likelygood' => [ 'min' => 0, 'max' => 0.833 ],

I can do that for all values and add the ORES links to help the review process. Let me know if I have misunderstood it.

Nope, that's exactly what I'm advocating ^_^

Change 948542 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores-extension: replace thresholdswith values

https://gerrit.wikimedia.org/r/948542

I have opened a patch that replaces the queries for the thresholds with the values as @Ladsgroup suggested. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/948542

To help reviewing I have added in comments the URL for the request to ORES as well as the query as it previously existed (in comments as well like this:

// https://ores.wikimedia.org/v3/scores/plwiki?models=goodfaith&model_info=statistics.thresholds.true."''maximum recall @ precision >= 0.995''"
			'likelygood' => [ 'min' => 0.925, 'max' => 1 ], // 'maximum recall @ precision >= 0.995'

The URLs and their parameters are created according to the PHP code found in the extension (min queries correspond to a true outcome while max to a false one:

$outcome = ( $bound === 'min' ) ? 'true' : 'false';

I believe the issue that we had previously is what @achou and @elukey identified regarding the order of the thresholds in the response.
I'm exploring what we need to do with the values that ORES returns as null which are the following 5 requests:

['arwiki', 'damaging', 'likelybad'],
 ['bswiki', 'goodfaith', 'verylikelybad'],
 ['fiwiki', 'goodfaith', 'likelygood'],
 ['fiwiki', 'goodfaith', 'verylikelybad'],
 ['wikidatawiki', 'goodfaith', 'verylikelybad']

Also I will submit another patch for the extension to remove the ThresholdLookupFile class and its usage.

Change 948584 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[mediawiki/extensions/ORES@master] read thresholds numeric values

https://gerrit.wikimedia.org/r/948584

I have opened a patch that replaces the queries for the thresholds with the values as @Ladsgroup suggested. https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/948542

thank you!

To help reviewing I have added in comments the URL for the request to ORES as well as the query as it previously existed (in comments as well like this:

// https://ores.wikimedia.org/v3/scores/plwiki?models=goodfaith&model_info=statistics.thresholds.true."''maximum recall @ precision >= 0.995''"
			'likelygood' => [ 'min' => 0.925, 'max' => 1 ], // 'maximum recall @ precision >= 0.995'

If it's built automatically, I don't think it needs review, just spot check some and we'll be fine.

The URLs and their parameters are created according to the PHP code found in the extension (min queries correspond to a true outcome while max to a false one:

$outcome = ( $bound === 'min' ) ? 'true' : 'false';

I believe the issue that we had previously is what @achou and @elukey identified regarding the order of the thresholds in the response.
I'm exploring what we need to do with the values that ORES returns as null which are the following 5 requests:

['arwiki', 'damaging', 'likelybad'],
 ['bswiki', 'goodfaith', 'verylikelybad'],
 ['fiwiki', 'goodfaith', 'likelygood'],
 ['fiwiki', 'goodfaith', 'verylikelybad'],
 ['wikidatawiki', 'goodfaith', 'verylikelybad']

Just set it to false in the config.

Also I will submit another patch for the extension to remove the ThresholdLookupFile class and its usage.

And a lot more code in other places too.

I checked the above classes for which ORES was returning null and it seems that indeed these don't appear in the UI.
I have explicitly disabled them by setting their value to false as @Ladsgroup suggested.

Test wiki created on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task:
https://patchdemo.wmflabs.org/wikis/7eae6522cf/w

The patches are ready!
https://gerrit.wikimedia.org/r/c/mediawiki/extensions/ORES/+/948584
https://gerrit.wikimedia.org/r/c/operations/mediawiki-config/+/948542

The order in which they are deployed doesnt matter. If however we want to enable LW usage in any wiki the extension patch should be merged first.

Change 953259 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores-extension: replace first batch of wikis model thresholds with numeric values

https://gerrit.wikimedia.org/r/953259

The above patch is a subset of the previous patch but just for a set of wikis (eswikibooks, eswikiquote, itwiki, hewiki, simplewiki and testwiki).
The reason we broke it down is that this way we can deploy it only for this set of wikis.
The action plan is the following:

  • test that numeric values work which means that no threshold requests are made to ORES
  • deploy this to all wikis
  • Start enabling Lift Wing again to all wikis gradually.

Change 953259 merged by jenkins-bot:

[operations/mediawiki-config@master] ores-extension: replace first batch of wikis model thresholds with numeric values

https://gerrit.wikimedia.org/r/953259

Mentioned in SAL (#wikimedia-operations) [2023-08-29T16:04:21Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:953259|ores-extension: replace first batch of wikis model thresholds with numeric values (T343308)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-29T16:05:44Z] <ladsgroup@deploy1002> ladsgroup and isaranto: Backport for [[gerrit:953259|ores-extension: replace first batch of wikis model thresholds with numeric values (T343308)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug2001.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-29T16:13:48Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:953259|ores-extension: replace first batch of wikis model thresholds with numeric values (T343308)]] (duration: 09m 31s)

Change 953590 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores-extension: fix thresholds

https://gerrit.wikimedia.org/r/953590

Change 953590 merged by jenkins-bot:

[operations/mediawiki-config@master] ores-extension: fix thresholds

https://gerrit.wikimedia.org/r/953590

Mentioned in SAL (#wikimedia-operations) [2023-08-30T12:17:17Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:953590|ores-extension: fix thresholds (T343308)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-30T12:19:08Z] <ladsgroup@deploy1002> isaranto and ladsgroup: Backport for [[gerrit:953590|ores-extension: fix thresholds (T343308)]] synced to the testservers mwdebug1001.eqiad.wmnet, mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-30T12:43:10Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:953590|ores-extension: fix thresholds (T343308)]] (duration: 25m 53s)

Change 948542 merged by jenkins-bot:

[operations/mediawiki-config@master] ores-extension: replace thresholds with numeric values

https://gerrit.wikimedia.org/r/948542

Mentioned in SAL (#wikimedia-operations) [2023-08-30T16:26:30Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:948542|ores-extension: replace thresholds with numeric values (T343308)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-30T16:28:06Z] <ladsgroup@deploy1002> ladsgroup and isaranto: Backport for [[gerrit:948542|ores-extension: replace thresholds with numeric values (T343308)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug2002.codfw.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Mentioned in SAL (#wikimedia-operations) [2023-08-30T16:36:39Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:948542|ores-extension: replace thresholds with numeric values (T343308)]] (duration: 10m 09s)

Change 953973 had a related patch set uploaded (by Ilias Sarantopoulos; author: Ilias Sarantopoulos):

[operations/mediawiki-config@master] ores-extension: enable lift wing for fiwiki and itwiki

https://gerrit.wikimedia.org/r/953973

Change 953973 merged by jenkins-bot:

[operations/mediawiki-config@master] ores-extension: enable lift wing for fiwiki and itwiki

https://gerrit.wikimedia.org/r/953973

Mentioned in SAL (#wikimedia-operations) [2023-08-31T12:03:56Z] <ladsgroup@deploy1002> Started scap: Backport for [[gerrit:953973|ores-extension: enable lift wing for fiwiki and itwiki (T343308)]]

Mentioned in SAL (#wikimedia-operations) [2023-08-31T12:05:34Z] <ladsgroup@deploy1002> isaranto and ladsgroup: Backport for [[gerrit:953973|ores-extension: enable lift wing for fiwiki and itwiki (T343308)]] synced to the testservers mwdebug2001.codfw.wmnet, mwdebug2002.codfw.wmnet, mwdebug1002.eqiad.wmnet, mwdebug1001.eqiad.wmnet, and mw-debug kubernetes deployment (accessible via k8s-experimental XWD option)

Change 948584 merged by jenkins-bot:

[mediawiki/extensions/ORES@master] read thresholds numeric values

https://gerrit.wikimedia.org/r/948584

Mentioned in SAL (#wikimedia-operations) [2023-08-31T12:31:01Z] <ladsgroup@deploy1002> Finished scap: Backport for [[gerrit:953973|ores-extension: enable lift wing for fiwiki and itwiki (T343308)]] (duration: 27m 05s)

Test wiki on Patch demo by ISarantopoulos-WMF using patch(es) linked to this task was deleted:

https://patchdemo.wmflabs.org/wikis/7eae6522cf/w/