Description

The data collected by the completion suggester experiement doesn't make sense and is all over the place. Starting on Sept 10 the TestSearchSatisfaction2 and CompletionSuggestion experiments started seeing what should be unique 64 bit numbers coming from multiple ip addresses. Figure out why this is happening and fix it.

Details

	Subject	Repo	Branch	Lines +/-
	Update CompletionSuggestion bucket selection	mediawiki/extensions/WikimediaEvents	master	+30 -33
	Update CompletionSuggestion bucket selection	mediawiki/extensions/WikimediaEvents	wmf/1.26wmf22	+30 -33

Customize query in gerrit

Related Objects
Search...

Status	Assigned	Task
Resolved	mpopov	T111858 Analyze results of A/B test on suggester (on or after 2015-09-22)
Resolved	mpopov	T112813 Verify that data from A/B test on suggester is coming through correctly (on or after 2015-09-17) redux
Resolved	EBernhardson	T112585 Fix CompletionSuggestion data collection and re-start the test.
Resolved	mpopov	T111857 Verify that data from A/B test on suggester is coming through correctly (on or after 2015-09-09)
Resolved	EBernhardson	T111078 Run A/B test on the search suggester to measure zero results rate, starting on 2015-09-08
Resolved	EBernhardson	T111091 Allow extensions to change the method used to get suggestion results
Resolved	EBernhardson	T111137 Override core suggester in AB test between current suggestions and the experimental cirrus-suggest api

Event Timeline

EBernhardson created this task.Sep 14 2015, 8:30 PM

EBernhardson claimed this task.

EBernhardson raised the priority of this task from to Needs Triage.

EBernhardson updated the task description. (Show Details)

EBernhardson added projects: Discovery-Search (Current work), CirrusSearch.

EBernhardson subscribed.

Restricted Application added a project: Discovery-ARCHIVED. · View Herald TranscriptSep 14 2015, 8:30 PM

Restricted Application added a subscriber: Aklapper. · View Herald Transcript

EBernhardson moved this task from Incoming to not in use - please delete on the Discovery-Search (Current work) board.Sep 14 2015, 8:31 PM

Change 238306 had a related patch set uploaded (by EBernhardson):
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238306

gerritbot added a project: Patch-For-Review.Sep 14 2015, 8:31 PM

Change 238306 merged by jenkins-bot:
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238306

EBernhardson mentioned this in rMEXT3a59479106d9: Updated mediawiki/extensions Project: mediawiki/extensions/WikimediaEvents….Sep 14 2015, 11:12 PM

Change 238355 had a related patch set uploaded (by EBernhardson):
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238355

EBernhardson mentioned this in rEWMVdda2eae04c5b: Update CompletionSuggestion bucket selection.Sep 14 2015, 11:19 PM

Change 238355 merged by jenkins-bot:
Update CompletionSuggestion bucket selection

https://gerrit.wikimedia.org/r/238355

EBernhardson mentioned this in rMWfc2c957be302: Updated mediawiki/core Project: mediawiki/extensions/WikimediaEvents….Sep 14 2015, 11:24 PM

EBernhardson mentioned this in rEWMV6e1c2d109135: Update CompletionSuggestion bucket selection.

patch swatted out. Will evaluate the data collected tomorrow morning to decide if this fixes the problem.

ReleaseTaggerBot added projects: WMF-deploy-2015-09-08_(1.26wmf22), WMF-deploy-2015-09-15_(1.26wmf23).Sep 15 2015, 12:00 AM

EBernhardson moved this task from not in use - please delete to Needs review on the Discovery-Search (Current work) board.Sep 15 2015, 7:43 PM

• Deskana triaged this task as High priority.Sep 15 2015, 8:55 PM

• Deskana set Security to None.

• Deskana added subscribers: mpopov, Ironholds.

• Deskana subscribed.

• Deskana added a subtask: T111857: Verify that data from A/B test on suggester is coming through correctly (on or after 2015-09-09).Sep 15 2015, 9:06 PM

• Deskana mentioned this in T111857: Verify that data from A/B test on suggester is coming through correctly (on or after 2015-09-09).

Looks like dan found the issue today, reported at https://lists.wikimedia.org/pipermail/analytics/2015-September/004285.html

So this basically means we need to throw away the clientIp information until this can be fixed.

Can we get useful test data without this value? We can still correlate together events by the same user on the same page, we just can't correlate them together across pages (but chances are they won't be opted into the test more than once).

Are there other oddities in the data we can't explain?

Update: we discussed this on IRC and arrived at the conclusion that we can assume relative independence of sets of events. Which is to say, given our low sampling rates, we are not likely to see logs of sessions from the same users.

In T112585#1643620, @mpopov wrote:

Update: we discussed this on IRC and arrived at the conclusion that we can assume relative independence of sets of events. Which is to say, given our low sampling rates, we are not likely to see logs of sessions from the same users.

Does this mean the answer to the question "Is this test still scientifically valid, can be analysed as-is, and does not need to be re-run?" is "Yes"?

To be valid I think we have to start the test over as of when the adjusted schema was deployed today. There were a few changes made to bucketing (that will also help on other tests going forward) so the data moving forward isn't directly comparable with the data prior. Maybe? Not entirely confident but putting it out there

In T112585#1643729, @EBernhardson wrote:

To be valid I think we have to start the test over as of when the adjusted schema was deployed today. There were a few changes made to bucketing (that will also help on other tests going forward) so the data moving forward isn't directly comparable with the data prior. Maybe? Not entirely confident but putting it out there

Understood. We should get the test restarted ASAP. How about restarting the test on Thursday 17th, running for one week? @mpopov @Ironholds @EBernhardson Thoughts?

I think we can consider the test restarted the moment the new schema started collecting data

In T112585#1643803, @EBernhardson wrote:

I think we can consider the test restarted the moment the new schema started collecting data

Sounds fine to me. Okay with @mpopov and @Ironholds?

• Deskana closed subtask T111857: Verify that data from A/B test on suggester is coming through correctly (on or after 2015-09-09) as Resolved.Sep 16 2015, 4:36 AM

EBernhardson moved this task from Needs review to Needs Reporting on the Discovery-Search (Current work) board.Sep 16 2015, 7:55 PM

In T112585#1643804, @Deskana wrote:

Sounds fine to me. Okay with @mpopov and @Ironholds?

Confirmed over hangouts with @Ironholds that this is okay. Resolving accordingly.

• Deskana mentioned this in T111858: Analyze results of A/B test on suggester (on or after 2015-09-22).Sep 16 2015, 8:13 PM

• Deskana moved this task from Needs triage to Search on the Discovery-ARCHIVED board.Sep 22 2015, 5:29 PM

• Deskana moved this task from Needs Reporting to Resolved on the Discovery-Search (Current work) board.Sep 22 2015, 8:05 PM

• Deskana moved this task from Inbox to Resolved/Invalid/Declined/Legacy on the CirrusSearch board.Dec 31 2015, 5:07 AM

Fix CompletionSuggestion data collection and re-start the test.Closed, ResolvedPublicActions

Description

Details

Related ObjectsSearch...

Event Timeline

Fix CompletionSuggestion data collection and re-start the test.
Closed, ResolvedPublic
Actions

Related Objects
Search...