Discrepancy between Graphite & Prometheus editResponseTime counts
Open, HighPublic
Actions

Assigned To

None

Authored By

	CDanis
	May 1 2024, 3:08 PM

Description

I saw that T354905: migrate MediaWiki.timing.editResponseTime to statslib had been resolved for some time now, so I looked at converting the "Successful wiki edits" panels on the Grafana front page & www.wikimediastatus.net to use the version from Prometheus.

The original Graphite metric used was MediaWiki.timing.editResponseTime.sample_rate.

As best I can tell this ought to correspond to a sum(rate(mediawiki_WikimediaEvents_editResponseTime_seconds_count[5m])) query against Thanos.

However, comparing the results, the Prometheus metric is approx half the expected value:
https://grafana.wikimedia.org/goto/sBKcZCBIg

Am I misunderstanding something or is there something wrong?

Related Objects
Search...

Status	Assigned	Task
Open	None	T343020 Converting MediaWiki Metrics to StatsLib
Resolved	herron	T350591 Audit legacy mediawiki stats used in production dashboards
Open	None	T350592 EPIC: migrate in use metrics and dashboards to statslib
Open	None	T363914 Discrepancy between Graphite & Prometheus editResponseTime counts

Event Timeline

CDanis created this task.May 1 2024, 3:08 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptMay 1 2024, 3:08 PM

CDanis triaged this task as High priority.May 1 2024, 3:08 PM

CDanis added a parent task: T350592: EPIC: migrate in use metrics and dashboards to statslib.

Krinkle subscribed.May 1 2024, 3:39 PM

Thanks for the report!

I'd hypothesize this is because Prometheus stats ingestion is not yet enabled on k8s hosts. The per-pod deployment strategy is convenient, but we've been concerned about turning it on in light of T359640: mediawiki_resourceloader_build_seconds_bucket big metric on Prometheus ops

We're coordinating with ServiceOps to redesign the exporter deployment on k8s. How we do this should also take into account: T359497: StatsD Exporter: gracefully handle metric signature changes.

Indeed I agree that would be the root cause @colewhite pointed out. In light of the fact that (as far as I'm aware) we don't have an ETA to tweak the statsd-exporter deployment on wikikube as described in T359640; I think we should go back to the graphite/statsd metric for edits, so numbers are accurate

larissagaulia moved this task from Inbox, needs triage to Soon on the MediaWiki-Platform-Team board.May 13 2024, 11:17 AM

andrea.denisse subscribed.May 15 2024, 2:29 PM

Now that T365265 is nearing completion, this may be worth another look, @CDanis?

Discrepancy between Graphite & Prometheus editResponseTime countsOpen, HighPublicActions

Description

Related ObjectsSearch...

Event Timeline

Discrepancy between Graphite & Prometheus editResponseTime counts
Open, HighPublic
Actions

Related Objects
Search...