Server Admin Log

2024-07-09

10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66043 and previous config saved to /var/cache/conftool/dbconfig/20240709-104054-root.json
10:37 Dreamy_Jazz: Finished running maintenance scripts for T366781
10:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66042 and previous config saved to /var/cache/conftool/dbconfig/20240709-103409-root.json
10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212 T369515', diff saved to https://phabricator.wikimedia.org/P66041 and previous config saved to /var/cache/conftool/dbconfig/20240709-103331-root.json
10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T369515', diff saved to https://phabricator.wikimedia.org/P66040 and previous config saved to /var/cache/conftool/dbconfig/20240709-103238-root.json
10:32 marostegui: Starting s1 codfw failover from db2212 to db2203 - T369515
10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 db1198 db1199 T365995', diff saved to https://phabricator.wikimedia.org/P66039 and previous config saved to /var/cache/conftool/dbconfig/20240709-102947-root.json
10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66038 and previous config saved to /var/cache/conftool/dbconfig/20240709-102549-root.json
10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66037 and previous config saved to /var/cache/conftool/dbconfig/20240709-101043-root.json
10:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
10:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
09:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
09:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T369515', diff saved to https://phabricator.wikimedia.org/P66036 and previous config saved to /var/cache/conftool/dbconfig/20240709-095659-root.json
09:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66035 and previous config saved to /var/cache/conftool/dbconfig/20240709-095538-root.json
09:26 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided) (duration: 00m 32s)
09:26 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided)
09:06 vgutierrez: restart purged @ cp3073
08:28 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
08:28 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
08:28 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
08:27 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
08:17 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.13 refs T366958
08:03 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
08:01 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
08:01 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
07:59 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
07:58 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
07:57 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2002.codfw.wmnet
07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
07:40 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
07:40 Dreamy_Jazz: Morning UTC backport window done
07:38 vgutierrez: repool cp3073
07:35 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
07:32 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3073.*} and A:cp
07:32 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3073.esams.wmnet
07:30 dreamyjazz@deploy1002: Synchronized wmf-config/throttle.php: Deploying throttle change for T369522 (duration: 09m 50s)
07:26 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts netbox-dev2002.codfw.wmnet
07:25 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
07:12 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on P{cp3073.*} and A:cp
07:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
07:08 fabfur@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on P{cp3073.*} and A:cp
07:08 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
06:54 Dreamy_Jazz: Start `foreachwikiindblist group2.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
05:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
05:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
05:20 marostegui: Deploy schema change on s2 eqiad db1162 dbmaint T367856
05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1162 T369339', diff saved to https://phabricator.wikimedia.org/P66034 and previous config saved to /var/cache/conftool/dbconfig/20240709-051911-marostegui.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T369339', diff saved to https://phabricator.wikimedia.org/P66033 and previous config saved to /var/cache/conftool/dbconfig/20240709-051814-marostegui.json
05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T369339', diff saved to https://phabricator.wikimedia.org/P66032 and previous config saved to /var/cache/conftool/dbconfig/20240709-051749-marostegui.json
05:17 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T369339
04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1222 with weight 0 T369339', diff saved to https://phabricator.wikimedia.org/P66031 and previous config saved to /var/cache/conftool/dbconfig/20240709-045814-marostegui.json
04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66030 and previous config saved to /var/cache/conftool/dbconfig/20240709-044128-marostegui.json
04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
04:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
04:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66029 and previous config saved to /var/cache/conftool/dbconfig/20240709-044051-marostegui.json
04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66028 and previous config saved to /var/cache/conftool/dbconfig/20240709-042544-marostegui.json
04:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66027 and previous config saved to /var/cache/conftool/dbconfig/20240709-041036-marostegui.json
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.10 (duration: 00m 57s)
03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66026 and previous config saved to /var/cache/conftool/dbconfig/20240709-035529-marostegui.json
03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.13 refs T366958 (duration: 50m 52s)
03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.13 refs T366958
01:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66025 and previous config saved to /var/cache/conftool/dbconfig/20240709-014242-arnaudb.json
01:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66024 and previous config saved to /var/cache/conftool/dbconfig/20240709-012735-arnaudb.json
01:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66023 and previous config saved to /var/cache/conftool/dbconfig/20240709-011227-arnaudb.json
00:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66022 and previous config saved to /var/cache/conftool/dbconfig/20240709-005720-arnaudb.json
00:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66021 and previous config saved to /var/cache/conftool/dbconfig/20240709-005456-arnaudb.json
00:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
00:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
00:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
00:14 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
00:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
00:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
00:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66020 and previous config saved to /var/cache/conftool/dbconfig/20240709-001324-arnaudb.json
00:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
00:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66019 and previous config saved to /var/cache/conftool/dbconfig/20240709-001250-marostegui.json
00:05 ejegg: payments-wiki upgraded from 82a5e588 to dc0c14d4

2024-07-08

23:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66018 and previous config saved to /var/cache/conftool/dbconfig/20240708-235817-arnaudb.json
23:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66017 and previous config saved to /var/cache/conftool/dbconfig/20240708-235742-marostegui.json
23:52 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_esams
23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66016 and previous config saved to /var/cache/conftool/dbconfig/20240708-234310-arnaudb.json
23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66015 and previous config saved to /var/cache/conftool/dbconfig/20240708-234235-marostegui.json
23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66014 and previous config saved to /var/cache/conftool/dbconfig/20240708-232803-arnaudb.json
23:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66013 and previous config saved to /var/cache/conftool/dbconfig/20240708-232728-marostegui.json
23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66012 and previous config saved to /var/cache/conftool/dbconfig/20240708-232549-arnaudb.json
23:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
23:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66011 and previous config saved to /var/cache/conftool/dbconfig/20240708-232527-arnaudb.json
23:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66010 and previous config saved to /var/cache/conftool/dbconfig/20240708-231020-arnaudb.json
22:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66009 and previous config saved to /var/cache/conftool/dbconfig/20240708-225513-arnaudb.json
22:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
22:42 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_esams
22:42 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3081.esams.wmnet
22:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66008 and previous config saved to /var/cache/conftool/dbconfig/20240708-224006-arnaudb.json
22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66007 and previous config saved to /var/cache/conftool/dbconfig/20240708-223752-arnaudb.json
22:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66006 and previous config saved to /var/cache/conftool/dbconfig/20240708-223741-arnaudb.json
22:26 bking@cumin2002: START - Cookbook sre.wdqs.reboot
22:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66005 and previous config saved to /var/cache/conftool/dbconfig/20240708-222234-arnaudb.json
22:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66004 and previous config saved to /var/cache/conftool/dbconfig/20240708-220727-arnaudb.json
21:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66003 and previous config saved to /var/cache/conftool/dbconfig/20240708-215220-arnaudb.json
21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66002 and previous config saved to /var/cache/conftool/dbconfig/20240708-214954-arnaudb.json
21:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
21:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66001 and previous config saved to /var/cache/conftool/dbconfig/20240708-214932-arnaudb.json
21:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P66000 and previous config saved to /var/cache/conftool/dbconfig/20240708-213425-arnaudb.json
21:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
21:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P65999 and previous config saved to /var/cache/conftool/dbconfig/20240708-211918-arnaudb.json
21:16 catrope@deploy1002: Finished scap: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) (duration: 09m 23s)
21:10 catrope@deploy1002: catrope, nmw03: Continuing with sync
21:09 catrope@deploy1002: catrope, nmw03: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:06 catrope@deploy1002: Started scap sync-world: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342)
21:05 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
21:05 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
21:05 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65998 and previous config saved to /var/cache/conftool/dbconfig/20240708-210410-arnaudb.json
21:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1023.eqiad.wmnet
21:02 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3080.esams.wmnet
21:01 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3072.esams.wmnet
21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65997 and previous config saved to /var/cache/conftool/dbconfig/20240708-210144-arnaudb.json
21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65996 and previous config saved to /var/cache/conftool/dbconfig/20240708-210106-arnaudb.json
20:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1023.eqiad.wmnet
20:52 catrope@deploy1002: Finished scap: Backport for Graph extension: Add tracking for data sources used in <graph> tags (duration: 13m 00s)
20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1022.eqiad.wmnet
20:47 catrope@deploy1002: catrope: Continuing with sync
20:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65995 and previous config saved to /var/cache/conftool/dbconfig/20240708-204559-arnaudb.json
20:43 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
20:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
20:42 catrope@deploy1002: catrope: Backport for Graph extension: Add tracking for data sources used in <graph> tags synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P65994 and previous config saved to /var/cache/conftool/dbconfig/20240708-204042-marostegui.json
20:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
20:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
20:39 catrope@deploy1002: Started scap sync-world: Backport for Graph extension: Add tracking for data sources used in <graph> tags
20:38 bking@cumin2002: START - Cookbook sre.wdqs.reboot
20:35 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
20:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65993 and previous config saved to /var/cache/conftool/dbconfig/20240708-203052-arnaudb.json
20:28 bking@cumin2002: START - Cookbook sre.wdqs.reboot
20:27 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
20:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65992 and previous config saved to /var/cache/conftool/dbconfig/20240708-201545-arnaudb.json
20:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65991 and previous config saved to /var/cache/conftool/dbconfig/20240708-201318-arnaudb.json
20:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
20:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
20:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65990 and previous config saved to /var/cache/conftool/dbconfig/20240708-201256-arnaudb.json
20:08 bking@cumin2002: START - Cookbook sre.wdqs.reboot
19:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65989 and previous config saved to /var/cache/conftool/dbconfig/20240708-195749-arnaudb.json
19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P65988 and previous config saved to /var/cache/conftool/dbconfig/20240708-194435-marostegui.json
19:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
19:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65987 and previous config saved to /var/cache/conftool/dbconfig/20240708-194242-arnaudb.json
19:39 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65986 and previous config saved to /var/cache/conftool/dbconfig/20240708-192735-arnaudb.json
19:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65985 and previous config saved to /var/cache/conftool/dbconfig/20240708-192508-arnaudb.json
19:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
19:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
19:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65984 and previous config saved to /var/cache/conftool/dbconfig/20240708-192444-arnaudb.json
19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3079.esams.wmnet
19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3071.esams.wmnet
19:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
19:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
19:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65983 and previous config saved to /var/cache/conftool/dbconfig/20240708-190937-arnaudb.json
19:02 bking@cumin2002: START - Cookbook sre.wdqs.reboot
18:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65982 and previous config saved to /var/cache/conftool/dbconfig/20240708-185430-arnaudb.json
18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65981 and previous config saved to /var/cache/conftool/dbconfig/20240708-183923-arnaudb.json
18:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65980 and previous config saved to /var/cache/conftool/dbconfig/20240708-183658-arnaudb.json
18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
18:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
18:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65979 and previous config saved to /var/cache/conftool/dbconfig/20240708-183548-arnaudb.json
18:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65978 and previous config saved to /var/cache/conftool/dbconfig/20240708-182041-arnaudb.json
18:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2002.codfw.wmnet
18:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65977 and previous config saved to /var/cache/conftool/dbconfig/20240708-180533-arnaudb.json
18:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader2002.codfw.wmnet
17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65976 and previous config saved to /var/cache/conftool/dbconfig/20240708-175026-arnaudb.json
17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65975 and previous config saved to /var/cache/conftool/dbconfig/20240708-174918-arnaudb.json
17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65974 and previous config saved to /var/cache/conftool/dbconfig/20240708-174823-arnaudb.json
17:40 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3078.esams.wmnet
17:38 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3070.esams.wmnet
17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65973 and previous config saved to /var/cache/conftool/dbconfig/20240708-173316-arnaudb.json
17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65972 and previous config saved to /var/cache/conftool/dbconfig/20240708-171810-arnaudb.json
17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65971 and previous config saved to /var/cache/conftool/dbconfig/20240708-170302-arnaudb.json
17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65970 and previous config saved to /var/cache/conftool/dbconfig/20240708-170053-arnaudb.json
17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65969 and previous config saved to /var/cache/conftool/dbconfig/20240708-170031-arnaudb.json
16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65968 and previous config saved to /var/cache/conftool/dbconfig/20240708-164524-arnaudb.json
16:39 ladsgroup@deploy1002: Finished scap: Backport for Reduce frequency of two query pages in commonswiki (T369024) (duration: 07m 50s)
16:34 ladsgroup@deploy1002: ladsgroup: Continuing with sync
16:33 ladsgroup@deploy1002: ladsgroup: Backport for Reduce frequency of two query pages in commonswiki (T369024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
16:31 ladsgroup@deploy1002: Started scap sync-world: Backport for Reduce frequency of two query pages in commonswiki (T369024)
16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65967 and previous config saved to /var/cache/conftool/dbconfig/20240708-163017-arnaudb.json
16:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65966 and previous config saved to /var/cache/conftool/dbconfig/20240708-161510-arnaudb.json
16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65965 and previous config saved to /var/cache/conftool/dbconfig/20240708-161302-arnaudb.json
16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65964 and previous config saved to /var/cache/conftool/dbconfig/20240708-161238-arnaudb.json
16:09 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
16:08 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1011.eqiad.wmnet with OS bullseye
15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3077.esams.wmnet
15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3069.esams.wmnet
15:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65963 and previous config saved to /var/cache/conftool/dbconfig/20240708-155731-arnaudb.json
15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 28s)
15:47 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:46 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:45 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:45 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 54s)
15:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
15:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65962 and previous config saved to /var/cache/conftool/dbconfig/20240708-154224-arnaudb.json
15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65961 and previous config saved to /var/cache/conftool/dbconfig/20240708-152717-arnaudb.json
15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65960 and previous config saved to /var/cache/conftool/dbconfig/20240708-152508-arnaudb.json
15:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65959 and previous config saved to /var/cache/conftool/dbconfig/20240708-152446-arnaudb.json
15:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1227 weight (T366852)', diff saved to https://phabricator.wikimedia.org/P65958 and previous config saved to /var/cache/conftool/dbconfig/20240708-152222-ladsgroup.json
15:16 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
15:13 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
15:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65957 and previous config saved to /var/cache/conftool/dbconfig/20240708-150939-arnaudb.json
14:59 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1011.eqiad.wmnet with OS bullseye
14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1002.eqiad.wmnet
14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65956 and previous config saved to /var/cache/conftool/dbconfig/20240708-145432-arnaudb.json
14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
14:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
14:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
14:51 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
14:51 claime: cleaning up old shellbox files on mw1438
14:43 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
14:43 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65955 and previous config saved to /var/cache/conftool/dbconfig/20240708-143925-arnaudb.json
14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65954 and previous config saved to /var/cache/conftool/dbconfig/20240708-143716-arnaudb.json
14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
14:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65953 and previous config saved to /var/cache/conftool/dbconfig/20240708-143654-arnaudb.json
14:34 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
14:31 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
14:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:27 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
14:22 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
14:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
14:21 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65952 and previous config saved to /var/cache/conftool/dbconfig/20240708-142147-arnaudb.json
14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:18 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:17 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
14:17 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3068.esams.wmnet
14:16 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3076.esams.wmnet
14:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
14:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65951 and previous config saved to /var/cache/conftool/dbconfig/20240708-141432-marostegui.json
14:13 claime: cleaning up old shellbox files on mw1446
14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65950 and previous config saved to /var/cache/conftool/dbconfig/20240708-140640-arnaudb.json
13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65949 and previous config saved to /var/cache/conftool/dbconfig/20240708-135925-marostegui.json
13:58 urbanecm@deploy1002: Finished scap: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 (duration: 10m 36s)
13:53 urbanecm@deploy1002: phuedx, urbanecm: Continuing with sync
13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65948 and previous config saved to /var/cache/conftool/dbconfig/20240708-135132-arnaudb.json
13:50 urbanecm@deploy1002: phuedx, urbanecm: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65947 and previous config saved to /var/cache/conftool/dbconfig/20240708-135024-arnaudb.json
13:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65946 and previous config saved to /var/cache/conftool/dbconfig/20240708-135002-arnaudb.json
13:48 urbanecm@deploy1002: Started scap sync-world: Backport for lib: Update metrics-platform to 84ed8dcbe7c9
13:47 urbanecm@deploy1002: Finished scap: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) (duration: 30m 38s)
13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65945 and previous config saved to /var/cache/conftool/dbconfig/20240708-134418-marostegui.json
13:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:39 urbanecm@deploy1002: tchin, jforrester, urbanecm: Continuing with sync
13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65944 and previous config saved to /var/cache/conftool/dbconfig/20240708-133456-arnaudb.json
13:32 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:32 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:32 urbanecm@deploy1002: tchin, jforrester, urbanecm: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65943 and previous config saved to /var/cache/conftool/dbconfig/20240708-132911-marostegui.json
13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65942 and previous config saved to /var/cache/conftool/dbconfig/20240708-131948-arnaudb.json
13:17 urbanecm@deploy1002: Started scap sync-world: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408)
13:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65941 and previous config saved to /var/cache/conftool/dbconfig/20240708-130441-arnaudb.json
13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65940 and previous config saved to /var/cache/conftool/dbconfig/20240708-130333-arnaudb.json
13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
13:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:51 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bookworm
12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:48 vgutierrez: test bwlimit per url on cp4051 - T317799
12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65939 and previous config saved to /var/cache/conftool/dbconfig/20240708-124310-marostegui.json
12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3067.esams.wmnet
12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3075.esams.wmnet
12:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
12:32 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
12:27 btullis@deploy1002: Finished deploy [airflow-dags/analytics@a2faba7]: (no justification provided) (duration: 00m 27s)
12:27 btullis@deploy1002: Started deploy [airflow-dags/analytics@a2faba7]: (no justification provided)
12:19 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bookworm
11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65938 and previous config saved to /var/cache/conftool/dbconfig/20240708-115422-root.json
11:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262476
11:47 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262476
11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65937 and previous config saved to /var/cache/conftool/dbconfig/20240708-113917-root.json
11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
11:27 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
11:26 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
11:26 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
11:25 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
11:25 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
11:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65936 and previous config saved to /var/cache/conftool/dbconfig/20240708-112411-root.json
11:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65935 and previous config saved to /var/cache/conftool/dbconfig/20240708-110905-root.json
10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3066.esams.wmnet
10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3074.esams.wmnet
10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65934 and previous config saved to /var/cache/conftool/dbconfig/20240708-105400-root.json
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65933 and previous config saved to /var/cache/conftool/dbconfig/20240708-105348-marostegui.json
10:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65932 and previous config saved to /var/cache/conftool/dbconfig/20240708-105325-marostegui.json
10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_esams
10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams
10:45 fabfur: rebooting A:cp-esams (T366555)
10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270359
10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270359
10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262476
10:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262476
10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 272432
10:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 272432
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65931 and previous config saved to /var/cache/conftool/dbconfig/20240708-103854-root.json
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65930 and previous config saved to /var/cache/conftool/dbconfig/20240708-103818-marostegui.json
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65929 and previous config saved to /var/cache/conftool/dbconfig/20240708-102347-root.json
10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65928 and previous config saved to /var/cache/conftool/dbconfig/20240708-102311-marostegui.json
10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65927 and previous config saved to /var/cache/conftool/dbconfig/20240708-100804-marostegui.json
10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
10:02 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:58 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
09:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
09:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync
09:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: sync
09:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
09:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
09:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync
09:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: sync
09:41 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
09:41 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
09:38 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
09:38 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
09:32 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
09:32 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
09:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
09:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
09:17 arturo: aborrero@apt1002:~$ sudo -i reprepro --component thirdparty/k9s includedeb bookworm-wikimedia /home/aborrero/k9s_linux_amd64.deb (T366061)
08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
08:56 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
08:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
08:50 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
08:42 arturo: update packages for thirdparty/kubeadm-k8s-1-25 bookworm-wikimedia in apt1002 (T369163)
08:26 godog: re-enable business hours americas oncall - T369122
07:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 270052
07:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 270052
06:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52455
06:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52455
06:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137409
06:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 137409
06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27768
06:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 27768
06:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61512
06:09 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61512
06:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269783
06:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269783
06:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
06:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52320
06:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7738
06:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 7738
06:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52468
06:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52468
06:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270052
06:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270052
05:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28008
05:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28008
05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17072
05:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 17072
05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263522
05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263522
05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61942
05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61942
05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18013
05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 18013
05:37 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61672
05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61672
05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28352
05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28352
05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 999
05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 999
05:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4788
05:34 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4788
05:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132167
05:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 132167
05:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6447
05:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 6447
05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65926 and previous config saved to /var/cache/conftool/dbconfig/20240708-053133-marostegui.json
05:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
05:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65925 and previous config saved to /var/cache/conftool/dbconfig/20240708-053122-marostegui.json
05:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28306
05:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28306
05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
05:24 marostegui: Deploy schema change on s5 codfw db2213 dbmaint T367856
05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 T369478', diff saved to https://phabricator.wikimedia.org/P65923 and previous config saved to /var/cache/conftool/dbconfig/20240708-051935-root.json
05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2123 to s5 primary T369478', diff saved to https://phabricator.wikimedia.org/P65922 and previous config saved to /var/cache/conftool/dbconfig/20240708-051840-root.json
05:18 marostegui: Starting s5 codfw failover from db2213 to db2123 - T369478
05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65921 and previous config saved to /var/cache/conftool/dbconfig/20240708-051615-marostegui.json
05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2123 from dump/slow', diff saved to https://phabricator.wikimedia.org/P65920 and previous config saved to /var/cache/conftool/dbconfig/20240708-051605-marostegui.json
05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2123 with weight 0 T369478', diff saved to https://phabricator.wikimedia.org/P65919 and previous config saved to /var/cache/conftool/dbconfig/20240708-050301-root.json
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65918 and previous config saved to /var/cache/conftool/dbconfig/20240708-045246-marostegui.json
04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65917 and previous config saved to /var/cache/conftool/dbconfig/20240708-043738-marostegui.json
01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65916 and previous config saved to /var/cache/conftool/dbconfig/20240708-014044-marostegui.json
01:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
01:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65915 and previous config saved to /var/cache/conftool/dbconfig/20240708-014022-marostegui.json
01:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65914 and previous config saved to /var/cache/conftool/dbconfig/20240708-012515-marostegui.json
01:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65913 and previous config saved to /var/cache/conftool/dbconfig/20240708-011008-marostegui.json
00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65912 and previous config saved to /var/cache/conftool/dbconfig/20240708-005501-marostegui.json

2024-07-07

21:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65911 and previous config saved to /var/cache/conftool/dbconfig/20240707-215014-marostegui.json
21:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65910 and previous config saved to /var/cache/conftool/dbconfig/20240707-214952-marostegui.json
21:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65909 and previous config saved to /var/cache/conftool/dbconfig/20240707-213445-marostegui.json
21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65908 and previous config saved to /var/cache/conftool/dbconfig/20240707-211938-marostegui.json
21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65907 and previous config saved to /var/cache/conftool/dbconfig/20240707-210430-marostegui.json
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65906 and previous config saved to /var/cache/conftool/dbconfig/20240707-154059-marostegui.json
15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance

2024-07-06

18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65905 and previous config saved to /var/cache/conftool/dbconfig/20240706-182625-marostegui.json
18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65904 and previous config saved to /var/cache/conftool/dbconfig/20240706-181117-marostegui.json
17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65903 and previous config saved to /var/cache/conftool/dbconfig/20240706-175610-marostegui.json
17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65902 and previous config saved to /var/cache/conftool/dbconfig/20240706-174103-marostegui.json
17:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
17:18 hnowlan@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65901 and previous config saved to /var/cache/conftool/dbconfig/20240706-124535-marostegui.json
12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65900 and previous config saved to /var/cache/conftool/dbconfig/20240706-075448-marostegui.json
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65899 and previous config saved to /var/cache/conftool/dbconfig/20240706-073941-marostegui.json
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65898 and previous config saved to /var/cache/conftool/dbconfig/20240706-072434-marostegui.json
07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65897 and previous config saved to /var/cache/conftool/dbconfig/20240706-070927-marostegui.json
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65896 and previous config saved to /var/cache/conftool/dbconfig/20240706-043535-marostegui.json
04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65895 and previous config saved to /var/cache/conftool/dbconfig/20240706-043513-marostegui.json
04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65894 and previous config saved to /var/cache/conftool/dbconfig/20240706-042006-marostegui.json
04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65893 and previous config saved to /var/cache/conftool/dbconfig/20240706-040459-marostegui.json
03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65892 and previous config saved to /var/cache/conftool/dbconfig/20240706-034952-marostegui.json
00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65891 and previous config saved to /var/cache/conftool/dbconfig/20240706-005648-marostegui.json
00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
00:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65890 and previous config saved to /var/cache/conftool/dbconfig/20240706-005626-marostegui.json
00:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65889 and previous config saved to /var/cache/conftool/dbconfig/20240706-004119-marostegui.json
00:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65888 and previous config saved to /var/cache/conftool/dbconfig/20240706-002612-marostegui.json
00:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65887 and previous config saved to /var/cache/conftool/dbconfig/20240706-001105-marostegui.json

2024-07-05

20:05 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
20:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65886 and previous config saved to /var/cache/conftool/dbconfig/20240705-185604-marostegui.json
18:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65885 and previous config saved to /var/cache/conftool/dbconfig/20240705-185542-marostegui.json
18:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65884 and previous config saved to /var/cache/conftool/dbconfig/20240705-184034-marostegui.json
18:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65883 and previous config saved to /var/cache/conftool/dbconfig/20240705-183428-root.json
18:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65882 and previous config saved to /var/cache/conftool/dbconfig/20240705-182527-marostegui.json
18:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65881 and previous config saved to /var/cache/conftool/dbconfig/20240705-181923-root.json
18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65880 and previous config saved to /var/cache/conftool/dbconfig/20240705-181020-marostegui.json
18:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65879 and previous config saved to /var/cache/conftool/dbconfig/20240705-180417-root.json
17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65878 and previous config saved to /var/cache/conftool/dbconfig/20240705-175653-ladsgroup.json
17:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65877 and previous config saved to /var/cache/conftool/dbconfig/20240705-174912-root.json
17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65876 and previous config saved to /var/cache/conftool/dbconfig/20240705-174146-ladsgroup.json
17:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65875 and previous config saved to /var/cache/conftool/dbconfig/20240705-173406-root.json
17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65874 and previous config saved to /var/cache/conftool/dbconfig/20240705-172639-ladsgroup.json
17:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65873 and previous config saved to /var/cache/conftool/dbconfig/20240705-171901-root.json
17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65872 and previous config saved to /var/cache/conftool/dbconfig/20240705-171131-ladsgroup.json
17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65871 and previous config saved to /var/cache/conftool/dbconfig/20240705-170356-root.json
17:00 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@73c6618]: (no justification provided) (duration: 00m 06s)
17:00 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@73c6618]: (no justification provided)
13:40 hashar@deploy1002: Finished deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484 (duration: 00m 06s)
13:40 hashar@deploy1002: Started deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484
12:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
12:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65869 and previous config saved to /var/cache/conftool/dbconfig/20240705-125152-marostegui.json
12:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65868 and previous config saved to /var/cache/conftool/dbconfig/20240705-125130-marostegui.json
12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65867 and previous config saved to /var/cache/conftool/dbconfig/20240705-123623-marostegui.json
12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65866 and previous config saved to /var/cache/conftool/dbconfig/20240705-122115-marostegui.json
12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65865 and previous config saved to /var/cache/conftool/dbconfig/20240705-120608-marostegui.json
11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65864 and previous config saved to /var/cache/conftool/dbconfig/20240705-115703-ladsgroup.json
11:53 dcausse: T369149: re-indexed wikidata P12861 (cirrus_rerender.rerender --wiki wikidatawiki allpages --namespace 120 --from-title P12861 --to-title P12861)
11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65863 and previous config saved to /var/cache/conftool/dbconfig/20240705-114157-ladsgroup.json
11:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65862 and previous config saved to /var/cache/conftool/dbconfig/20240705-112652-ladsgroup.json
11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65861 and previous config saved to /var/cache/conftool/dbconfig/20240705-111322-ladsgroup.json
11:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
11:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65860 and previous config saved to /var/cache/conftool/dbconfig/20240705-111146-ladsgroup.json
10:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
10:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
10:41 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) (duration: 21m 22s)
10:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
10:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
10:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149)
10:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
10:10 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:35 fabfur: running puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052271 (T369345)
09:26 XioNoX: netbox-dev2003: move from netbox-dev to netbox-next - T336275
08:55 godog: silence NELNotReported NELByCountryNotReported until Tues - T369345
08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65858 and previous config saved to /var/cache/conftool/dbconfig/20240705-085406-marostegui.json
08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65857 and previous config saved to /var/cache/conftool/dbconfig/20240705-085329-marostegui.json
08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65856 and previous config saved to /var/cache/conftool/dbconfig/20240705-083821-marostegui.json
08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65855 and previous config saved to /var/cache/conftool/dbconfig/20240705-082314-marostegui.json
08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65854 and previous config saved to /var/cache/conftool/dbconfig/20240705-080807-marostegui.json
08:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
08:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
07:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
07:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65852 and previous config saved to /var/cache/conftool/dbconfig/20240705-051202-marostegui.json
05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P65851 and previous config saved to /var/cache/conftool/dbconfig/20240705-050028-root.json
04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65850 and previous config saved to /var/cache/conftool/dbconfig/20240705-045655-marostegui.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65849 and previous config saved to /var/cache/conftool/dbconfig/20240705-045145-marostegui.json
04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65848 and previous config saved to /var/cache/conftool/dbconfig/20240705-044912-marostegui.json
04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65847 and previous config saved to /var/cache/conftool/dbconfig/20240705-044148-marostegui.json
04:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65846 and previous config saved to /var/cache/conftool/dbconfig/20240705-042641-marostegui.json
01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65845 and previous config saved to /var/cache/conftool/dbconfig/20240705-013250-marostegui.json
01:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
01:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65844 and previous config saved to /var/cache/conftool/dbconfig/20240705-013229-marostegui.json
01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65843 and previous config saved to /var/cache/conftool/dbconfig/20240705-011721-marostegui.json
01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65842 and previous config saved to /var/cache/conftool/dbconfig/20240705-010214-marostegui.json
00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65841 and previous config saved to /var/cache/conftool/dbconfig/20240705-004707-marostegui.json

2024-07-04

22:04 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
22:03 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65840 and previous config saved to /var/cache/conftool/dbconfig/20240704-220227-marostegui.json
22:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
22:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65839 and previous config saved to /var/cache/conftool/dbconfig/20240704-220205-marostegui.json
22:01 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
22:00 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
21:59 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
21:59 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65838 and previous config saved to /var/cache/conftool/dbconfig/20240704-214658-marostegui.json
21:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65837 and previous config saved to /var/cache/conftool/dbconfig/20240704-213151-marostegui.json
21:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65836 and previous config saved to /var/cache/conftool/dbconfig/20240704-211644-marostegui.json
20:17 jdrewniak@deploy1002: Finished scap: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) (duration: 12m 14s)
20:12 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Continuing with sync
20:08 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113)
19:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad
19:55 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad
18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65835 and previous config saved to /var/cache/conftool/dbconfig/20240704-182308-marostegui.json
18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
18:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65834 and previous config saved to /var/cache/conftool/dbconfig/20240704-182257-marostegui.json
18:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65833 and previous config saved to /var/cache/conftool/dbconfig/20240704-180749-marostegui.json
17:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65832 and previous config saved to /var/cache/conftool/dbconfig/20240704-175242-marostegui.json
17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65831 and previous config saved to /var/cache/conftool/dbconfig/20240704-173735-marostegui.json
17:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
16:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
16:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
16:15 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
16:14 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
16:14 btullis@cumin1002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
16:06 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
15:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
15:02 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
15:02 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65830 and previous config saved to /var/cache/conftool/dbconfig/20240704-143350-marostegui.json
14:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
14:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65829 and previous config saved to /var/cache/conftool/dbconfig/20240704-143327-marostegui.json
14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65827 and previous config saved to /var/cache/conftool/dbconfig/20240704-141820-marostegui.json
14:03 Lucas_WMDE: UTC afternoon backport+config window done
14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65826 and previous config saved to /var/cache/conftool/dbconfig/20240704-140313-marostegui.json
14:01 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
14:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65825 and previous config saved to /var/cache/conftool/dbconfig/20240704-140145-root.json
13:57 claime: Enabling puppet on cp4037.ulsfo.wmnet to test 1050293 - T367949
13:53 claime: disabling puppet on P:trafficserver::backend to merge 1049507 - T367949
13:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65824 and previous config saved to /var/cache/conftool/dbconfig/20240704-134806-marostegui.json
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65823 and previous config saved to /var/cache/conftool/dbconfig/20240704-134656-root.json
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65822 and previous config saved to /var/cache/conftool/dbconfig/20240704-134639-root.json
13:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) (duration: 08m 35s)
13:41 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
13:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
13:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65821 and previous config saved to /var/cache/conftool/dbconfig/20240704-134105-marostegui.json
13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Continuing with sync
13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:36 claime: Enabling puppet on cp6016.drmrs.wmnet to test 1050293 - T367949
13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900)
13:32 claime: disabling puppet on P:trafficserver::backend to merge 1050293 - T367949
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65820 and previous config saved to /var/cache/conftool/dbconfig/20240704-133150-root.json
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65819 and previous config saved to /var/cache/conftool/dbconfig/20240704-133133-root.json
13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65818 and previous config saved to /var/cache/conftool/dbconfig/20240704-132558-marostegui.json
13:20 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 03s)
13:20 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65817 and previous config saved to /var/cache/conftool/dbconfig/20240704-131643-root.json
13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65816 and previous config saved to /var/cache/conftool/dbconfig/20240704-131628-root.json
13:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:11 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65815 and previous config saved to /var/cache/conftool/dbconfig/20240704-131050-marostegui.json
13:09 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:07 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65814 and previous config saved to /var/cache/conftool/dbconfig/20240704-130137-root.json
13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65813 and previous config saved to /var/cache/conftool/dbconfig/20240704-130122-root.json
12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65812 and previous config saved to /var/cache/conftool/dbconfig/20240704-125543-marostegui.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65811 and previous config saved to /var/cache/conftool/dbconfig/20240704-124632-root.json
12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65810 and previous config saved to /var/cache/conftool/dbconfig/20240704-124617-root.json
12:36 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65808 and previous config saved to /var/cache/conftool/dbconfig/20240704-123127-root.json
12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65807 and previous config saved to /var/cache/conftool/dbconfig/20240704-123111-root.json
12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213', diff saved to https://phabricator.wikimedia.org/P65806 and previous config saved to /var/cache/conftool/dbconfig/20240704-122752-root.json
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65805 and previous config saved to /var/cache/conftool/dbconfig/20240704-121631-root.json
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65804 and previous config saved to /var/cache/conftool/dbconfig/20240704-121621-root.json
12:11 hashar@deploy1002: Finished scap: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) (duration: 07m 45s)
12:06 hashar@deploy1002: hashar, d3r1ck01: Continuing with sync
12:06 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:03 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
12:02 hashar@deploy1002: Sync cancelled.
12:02 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:56 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
11:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65803 and previous config saved to /var/cache/conftool/dbconfig/20240704-115522-marostegui.json
11:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
11:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
11:54 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
11:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
11:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
11:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
11:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
11:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
11:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213 db1185 T369250', diff saved to https://phabricator.wikimedia.org/P65802 and previous config saved to /var/cache/conftool/dbconfig/20240704-111324-root.json
10:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65801 and previous config saved to /var/cache/conftool/dbconfig/20240704-105205-marostegui.json
10:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
10:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65800 and previous config saved to /var/cache/conftool/dbconfig/20240704-105143-marostegui.json
10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65799 and previous config saved to /var/cache/conftool/dbconfig/20240704-103636-marostegui.json
10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65798 and previous config saved to /var/cache/conftool/dbconfig/20240704-102129-marostegui.json
10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65797 and previous config saved to /var/cache/conftool/dbconfig/20240704-100622-marostegui.json
09:53 topranks: Pushing updated BGP policy to cr2-eqord in Chiacago to re-announce codfw IP ranges there T367439
09:29 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
09:24 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
09:23 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1009.eqiad.wmnet with OS bullseye
09:23 claime: Manual cleanup of puppet certs for renamed servers mw1417.eqiad.wmnet mw1418.eqiad.wmnet mw2300.codfw.wmnet
09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
09:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
09:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
09:13 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
09:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.43.0-wmf.12" - T366957
09:03 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
09:00 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
08:59 elukey: restart mcrouter on mwmaint1002
08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:45 fabfur: enable puppet on A:cp-ulsfo (T365718)
08:45 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1009.eqiad.wmnet with OS bullseye
08:44 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
08:43 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
08:28 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
08:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
08:24 fabfur: temporary disable puppet on A:cp-ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1051198 (T365718)
08:10 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad
08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad
08:01 fabfur: start rebooting A:cp-eqiad (upload|text in parallel) for T366555
07:52 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
07:52 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
07:41 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
07:35 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
07:18 dcausse: closing the backport window
07:15 dcausse: refreshing the wikitech search indices
07:11 dcausse@deploy1002: Finished scap: Backport for cirrus: re-enable search updates on wikitech (duration: 08m 28s)
07:06 dcausse@deploy1002: dcausse: Continuing with sync
07:05 dcausse@deploy1002: dcausse: Backport for cirrus: re-enable search updates on wikitech synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:02 dcausse@deploy1002: Started scap sync-world: Backport for cirrus: re-enable search updates on wikitech
07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65794 and previous config saved to /var/cache/conftool/dbconfig/20240704-070100-marostegui.json
07:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
07:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65793 and previous config saved to /var/cache/conftool/dbconfig/20240704-070038-marostegui.json
06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65791 and previous config saved to /var/cache/conftool/dbconfig/20240704-063024-marostegui.json
06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65790 and previous config saved to /var/cache/conftool/dbconfig/20240704-061517-marostegui.json
05:11 marostegui: Deploy schema change on db1231 s6 eqiad dbmaint T367856
05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T369020', diff saved to https://phabricator.wikimedia.org/P65789 and previous config saved to /var/cache/conftool/dbconfig/20240704-050334-marostegui.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T369020', diff saved to https://phabricator.wikimedia.org/P65788 and previous config saved to /var/cache/conftool/dbconfig/20240704-050237-marostegui.json
05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T369020', diff saved to https://phabricator.wikimedia.org/P65787 and previous config saved to /var/cache/conftool/dbconfig/20240704-050216-marostegui.json
05:01 marostegui: Starting s6 eqiad failover from db1231 to db1173 - T369020
04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1173 with weight 0 T369020', diff saved to https://phabricator.wikimedia.org/P65786 and previous config saved to /var/cache/conftool/dbconfig/20240704-044429-marostegui.json
04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65785 and previous config saved to /var/cache/conftool/dbconfig/20240704-031151-marostegui.json
03:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
03:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65784 and previous config saved to /var/cache/conftool/dbconfig/20240704-031129-marostegui.json
02:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65783 and previous config saved to /var/cache/conftool/dbconfig/20240704-025622-marostegui.json
02:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
02:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65782 and previous config saved to /var/cache/conftool/dbconfig/20240704-024115-marostegui.json
02:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
02:31 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65781 and previous config saved to /var/cache/conftool/dbconfig/20240704-022608-marostegui.json
01:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
01:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65780 and previous config saved to /var/cache/conftool/dbconfig/20240704-014313-marostegui.json
01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65779 and previous config saved to /var/cache/conftool/dbconfig/20240704-012806-marostegui.json
01:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65778 and previous config saved to /var/cache/conftool/dbconfig/20240704-011258-marostegui.json
00:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65777 and previous config saved to /var/cache/conftool/dbconfig/20240704-005750-marostegui.json
00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parsoidtest1001.eqiad.wmnet with OS bullseye
00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
00:42 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
00:29 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
00:25 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
00:15 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye

2024-07-03

23:47 tzatziki: removing 11 files for legal compliance
23:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65776 and previous config saved to /var/cache/conftool/dbconfig/20240703-232302-marostegui.json
23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65775 and previous config saved to /var/cache/conftool/dbconfig/20240703-232221-marostegui.json
23:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65774 and previous config saved to /var/cache/conftool/dbconfig/20240703-232154-ladsgroup.json
23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65773 and previous config saved to /var/cache/conftool/dbconfig/20240703-230713-marostegui.json
23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65772 and previous config saved to /var/cache/conftool/dbconfig/20240703-230646-ladsgroup.json
22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65771 and previous config saved to /var/cache/conftool/dbconfig/20240703-225206-marostegui.json
22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65770 and previous config saved to /var/cache/conftool/dbconfig/20240703-225139-ladsgroup.json
22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65769 and previous config saved to /var/cache/conftool/dbconfig/20240703-223659-marostegui.json
22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65768 and previous config saved to /var/cache/conftool/dbconfig/20240703-223632-ladsgroup.json
22:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
21:40 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
21:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
21:35 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
20:13 cjming: end of UTC late backport window
20:11 cjming@deploy1002: Finished scap: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 08m 22s)
20:10 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
20:06 cjming@deploy1002: kgraessle, cjming: Continuing with sync
20:05 cjming@deploy1002: kgraessle, cjming: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
20:03 cjming@deploy1002: Started scap sync-world: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969)
19:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
19:55 cmooney@cumin1002: START - Cookbook sre.dns.netbox
19:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
19:49 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65766 and previous config saved to /var/cache/conftool/dbconfig/20240703-194055-marostegui.json
19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65765 and previous config saved to /var/cache/conftool/dbconfig/20240703-194033-marostegui.json
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65761 and previous config saved to /var/cache/conftool/dbconfig/20240703-192526-marostegui.json
19:25 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
19:24 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
19:19 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
19:16 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
19:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@d773cac]: (no justification provided) (duration: 00m 33s)
19:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@d773cac]: (no justification provided)
19:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65760 and previous config saved to /var/cache/conftool/dbconfig/20240703-191019-marostegui.json
19:08 SandraEbele_: deploying airflow dags
18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65759 and previous config saved to /var/cache/conftool/dbconfig/20240703-185511-marostegui.json
18:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
18:36 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
18:36 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
18:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
18:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
17:50 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
17:49 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
17:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
17:48 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
17:46 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
17:45 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
17:45 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
17:44 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
17:44 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
17:43 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
17:43 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
17:41 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
17:41 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
17:40 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
17:40 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
17:37 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
17:37 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
17:36 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
17:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65758 and previous config saved to /var/cache/conftool/dbconfig/20240703-173601-root.json
17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
17:34 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
17:33 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
17:33 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
17:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
17:30 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
17:29 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
17:28 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
17:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
17:22 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
17:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65756 and previous config saved to /var/cache/conftool/dbconfig/20240703-172055-root.json
17:19 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
17:19 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
17:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
17:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
17:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
17:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
17:09 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
17:08 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
17:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
17:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65755 and previous config saved to /var/cache/conftool/dbconfig/20240703-170549-root.json
16:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65754 and previous config saved to /var/cache/conftool/dbconfig/20240703-165044-root.json
16:47 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
16:46 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
16:44 jhathaway: adding inbound email servers mx-in{1001,2001} to our MX record
16:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65752 and previous config saved to /var/cache/conftool/dbconfig/20240703-163538-root.json
16:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65751 and previous config saved to /var/cache/conftool/dbconfig/20240703-162032-root.json
16:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 1%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240703-160521-root.json
16:04 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65750 and previous config saved to /var/cache/conftool/dbconfig/20240703-154716-marostegui.json
15:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65749 and previous config saved to /var/cache/conftool/dbconfig/20240703-154643-marostegui.json
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65748 and previous config saved to /var/cache/conftool/dbconfig/20240703-154142-arnaudb.json
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65747 and previous config saved to /var/cache/conftool/dbconfig/20240703-154121-arnaudb.json
15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65746 and previous config saved to /var/cache/conftool/dbconfig/20240703-154109-arnaudb.json
15:32 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
15:31 sukhe: restart haproxy on dns1005
15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65744 and previous config saved to /var/cache/conftool/dbconfig/20240703-153136-marostegui.json
15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65743 and previous config saved to /var/cache/conftool/dbconfig/20240703-152636-arnaudb.json
15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65742 and previous config saved to /var/cache/conftool/dbconfig/20240703-152616-arnaudb.json
15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65741 and previous config saved to /var/cache/conftool/dbconfig/20240703-152603-arnaudb.json
15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65740 and previous config saved to /var/cache/conftool/dbconfig/20240703-151628-marostegui.json
15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
15:13 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65739 and previous config saved to /var/cache/conftool/dbconfig/20240703-151131-arnaudb.json
15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65738 and previous config saved to /var/cache/conftool/dbconfig/20240703-151110-arnaudb.json
15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65737 and previous config saved to /var/cache/conftool/dbconfig/20240703-151057-arnaudb.json
15:10 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65736 and previous config saved to /var/cache/conftool/dbconfig/20240703-150411-marostegui.json
15:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
15:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65735 and previous config saved to /var/cache/conftool/dbconfig/20240703-150348-marostegui.json
15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65734 and previous config saved to /var/cache/conftool/dbconfig/20240703-150121-marostegui.json
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65733 and previous config saved to /var/cache/conftool/dbconfig/20240703-145625-arnaudb.json
14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65732 and previous config saved to /var/cache/conftool/dbconfig/20240703-145604-arnaudb.json
14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65731 and previous config saved to /var/cache/conftool/dbconfig/20240703-145552-arnaudb.json
14:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
14:51 fabfur: start rebooting A:cp-drmrs (upload|text in parallel) for T366555
14:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65730 and previous config saved to /var/cache/conftool/dbconfig/20240703-144841-marostegui.json
14:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
14:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65729 and previous config saved to /var/cache/conftool/dbconfig/20240703-144119-arnaudb.json
14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65728 and previous config saved to /var/cache/conftool/dbconfig/20240703-144059-arnaudb.json
14:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65727 and previous config saved to /var/cache/conftool/dbconfig/20240703-144046-arnaudb.json
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1006.eqiad.wmnet with OS bookworm
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1005.eqiad.wmnet with OS bookworm
14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
14:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:38 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
14:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
14:35 sukhe: [correction of previous A:dnsbox run] sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent"
14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65726 and previous config saved to /var/cache/conftool/dbconfig/20240703-143334-marostegui.json
14:33 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
14:32 sukhe: sudo cumin "A:wikidough" "run-puppet-agent"
14:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
14:32 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
14:30 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
14:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
14:27 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65725 and previous config saved to /var/cache/conftool/dbconfig/20240703-142614-arnaudb.json
14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json
14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json
14:25 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:21 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
14:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json
14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
14:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
14:11 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
14:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
14:04 topranks: rebooting lsw1-e2-eqiad to install updated JunOS version T365994
14:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
14:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
13:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
13:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
13:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
13:57 jayme@cumin1002: conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
13:56 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
13:56 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
13:56 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
13:55 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
13:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
13:52 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
13:48 Lucas_WMDE: UTC afternoon backport+config window done
13:48 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup (duration: 08m 38s)
13:44 jayme: draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994
13:43 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync
13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup
13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) (duration: 09m 28s)
13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync
13:31 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)
13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) (duration: 08m 20s)
13:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
13:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:19 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001
13:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 49.3.193.10.in-addr.arpa. on all recursors
13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 49.3.193.10.in-addr.arpa. on all recursors
13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2002.mgmt.codfw.wmnet on all recursors
13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2002.mgmt.codfw.wmnet on all recursors
13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'T365994 - depool db1191,db1196,db1197', diff saved to https://phabricator.wikimedia.org/P65721 and previous config saved to /var/cache/conftool/dbconfig/20240703-131715-arnaudb.json
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)
13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
13:15 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes kawikisource --fix # T363243; 34 pages to fix, 34 were resolvable; 774 links to fix, 774 were resolvable, 0 were deleted
13:15 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
13:14 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mswikisource --fix # T369047; 6 pages to fix, 6 were resolvable; 76 links to fix, 73 were resolvable, 3 were deleted
13:13 cmooney@cumin1002: START - Cookbook sre.dns.netbox
13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) (duration: 10m 39s)
13:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
13:04 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243)
12:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
12:47 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
12:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
12:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
12:37 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
12:34 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
12:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
12:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65720 and previous config saved to /var/cache/conftool/dbconfig/20240703-121009-ladsgroup.json
11:55 ladsgroup@deploy1002: Finished scap: Backport for rpc: Update function call in RunSingleJob (T363839) (duration: 08m 08s)
11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65719 and previous config saved to /var/cache/conftool/dbconfig/20240703-115504-ladsgroup.json
11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65718 and previous config saved to /var/cache/conftool/dbconfig/20240703-115211-marostegui.json
11:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65717 and previous config saved to /var/cache/conftool/dbconfig/20240703-115149-marostegui.json
11:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
11:49 ladsgroup@deploy1002: ladsgroup: Backport for rpc: Update function call in RunSingleJob (T363839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:47 ladsgroup@deploy1002: Started scap sync-world: Backport for rpc: Update function call in RunSingleJob (T363839)
11:45 ladsgroup@deploy1002: Finished scap: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) (duration: 09m 28s)
11:40 ladsgroup@deploy1002: volker-e, ladsgroup: Continuing with sync
11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65716 and previous config saved to /var/cache/conftool/dbconfig/20240703-113958-ladsgroup.json
11:39 ladsgroup@deploy1002: volker-e, ladsgroup: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65715 and previous config saved to /var/cache/conftool/dbconfig/20240703-113642-marostegui.json
11:35 ladsgroup@deploy1002: Started scap sync-world: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190)
11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65714 and previous config saved to /var/cache/conftool/dbconfig/20240703-112728-ladsgroup.json
11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65713 and previous config saved to /var/cache/conftool/dbconfig/20240703-112452-ladsgroup.json
11:21 cgoubert@deploy1002: Finished scap: mw-on-k8s: Move php.envvars to mediawiki-common - T365265 (duration: 05m 22s)
11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65712 and previous config saved to /var/cache/conftool/dbconfig/20240703-112135-marostegui.json
11:16 cgoubert@deploy1002: Started scap sync-world: mw-on-k8s: Move php.envvars to mediawiki-common - T365265
11:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
11:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65711 and previous config saved to /var/cache/conftool/dbconfig/20240703-110627-marostegui.json
10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65710 and previous config saved to /var/cache/conftool/dbconfig/20240703-103839-marostegui.json
10:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
10:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
10:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
09:49 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 07s)
09:49 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
09:31 mlitn@deploy1002: Finished scap: Backport for Handle campaigns where wikibase is not enabled (T369085) (duration: 12m 59s)
09:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
09:26 mlitn@deploy1002: mlitn: Continuing with sync
09:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
09:21 mlitn@deploy1002: mlitn: Backport for Handle campaigns where wikibase is not enabled (T369085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:20 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
09:20 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
09:20 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2008.wikimedia.org
09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2008.wikimedia.org with OS bookworm
09:20 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65709 and previous config saved to /var/cache/conftool/dbconfig/20240703-091956-marostegui.json
09:18 mlitn@deploy1002: Started scap sync-world: Backport for Handle campaigns where wikibase is not enabled (T369085)
09:09 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
09:06 topranks: merge host firewall changes to set default DSCP marking (T339850)
09:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
09:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
09:02 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
09:01 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
09:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
09:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
09:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
09:00 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
08:59 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
08:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:58 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
08:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:53 jayme: deployed istio (adding securityContext) to wikikube clusters - T362978
08:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
08:51 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
08:49 Lucas_WMDE: RELEASE_NAME=r72z2aop helmfile --file /srv/deployment-charts/helmfile.d/services/mw-script/helmfile.yaml --environment eqiad --selector name=r72z2aop destroy # clean up broken mwscript-k8s run I did just to test something
08:46 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
08:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:45 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
08:44 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
08:43 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
08:42 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
08:42 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
08:42 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
08:41 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
08:41 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
08:40 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
08:40 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
08:39 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
08:39 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
08:39 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
08:39 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
08:38 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
08:35 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
08:31 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1002.eqiad.wmnet
08:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
08:18 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1001.eqiad.wmnet
08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.12 refs T366957
08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65707 and previous config saved to /var/cache/conftool/dbconfig/20240703-081059-marostegui.json
08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
08:09 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host karapace1001.eqiad.wmnet
08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65706 and previous config saved to /var/cache/conftool/dbconfig/20240703-075245-marostegui.json
07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65705 and previous config saved to /var/cache/conftool/dbconfig/20240703-074321-marostegui.json
07:36 kart_: Updated MinT to 2024-07-02-060114-production (T364525)
07:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65704 and previous config saved to /var/cache/conftool/dbconfig/20240703-072814-marostegui.json
07:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
07:21 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
07:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65702 and previous config saved to /var/cache/conftool/dbconfig/20240703-071306-marostegui.json
07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
07:07 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
06:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65701 and previous config saved to /var/cache/conftool/dbconfig/20240703-065759-marostegui.json
06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65700 and previous config saved to /var/cache/conftool/dbconfig/20240703-062057-root.json
06:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65699 and previous config saved to /var/cache/conftool/dbconfig/20240703-060552-root.json
05:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65698 and previous config saved to /var/cache/conftool/dbconfig/20240703-055046-root.json
05:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65697 and previous config saved to /var/cache/conftool/dbconfig/20240703-053541-root.json
05:23 marostegui: Deploy schema change on db2207 s2 codfw dbmaint T367856
05:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
05:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T369130', diff saved to https://phabricator.wikimedia.org/P65696 and previous config saved to /var/cache/conftool/dbconfig/20240703-052118-root.json
05:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65695 and previous config saved to /var/cache/conftool/dbconfig/20240703-052035-root.json
05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T369130', diff saved to https://phabricator.wikimedia.org/P65694 and previous config saved to /var/cache/conftool/dbconfig/20240703-052029-root.json
05:20 marostegui: Starting s2 codfw failover from db2207 to db2204 - T369130
05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T369130', diff saved to https://phabricator.wikimedia.org/P65693 and previous config saved to /var/cache/conftool/dbconfig/20240703-050647-root.json
05:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
05:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65692 and previous config saved to /var/cache/conftool/dbconfig/20240703-050523-root.json
04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65691 and previous config saved to /var/cache/conftool/dbconfig/20240703-045109-marostegui.json
04:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65690 and previous config saved to /var/cache/conftool/dbconfig/20240703-045018-root.json
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65689 and previous config saved to /var/cache/conftool/dbconfig/20240703-043335-marostegui.json
04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65688 and previous config saved to /var/cache/conftool/dbconfig/20240703-043312-marostegui.json
04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65687 and previous config saved to /var/cache/conftool/dbconfig/20240703-041805-marostegui.json
04:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65686 and previous config saved to /var/cache/conftool/dbconfig/20240703-040258-marostegui.json
03:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65685 and previous config saved to /var/cache/conftool/dbconfig/20240703-034751-marostegui.json
01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65684 and previous config saved to /var/cache/conftool/dbconfig/20240703-011701-marostegui.json
01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
00:48 eileen: civicrm upgraded from 6e03cff2 to 84d6f5d1
00:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
00:16 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
00:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
00:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65683 and previous config saved to /var/cache/conftool/dbconfig/20240703-000506-marostegui.json

2024-07-02

23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65682 and previous config saved to /var/cache/conftool/dbconfig/20240702-234959-marostegui.json
23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65681 and previous config saved to /var/cache/conftool/dbconfig/20240702-233452-marostegui.json
23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65680 and previous config saved to /var/cache/conftool/dbconfig/20240702-231945-marostegui.json
22:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
22:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
22:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65679 and previous config saved to /var/cache/conftool/dbconfig/20240702-225835-marostegui.json
22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65678 and previous config saved to /var/cache/conftool/dbconfig/20240702-224328-marostegui.json
22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65677 and previous config saved to /var/cache/conftool/dbconfig/20240702-222820-marostegui.json
22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65676 and previous config saved to /var/cache/conftool/dbconfig/20240702-221312-marostegui.json
22:05 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
22:05 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
22:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
22:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
22:01 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
22:01 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
21:58 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
21:58 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
21:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
21:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
21:54 rzl@deploy1002: Finished scap: T369080 (duration: 04m 13s)
21:54 rzl@deploy1002: rzl: Continuing with sync
21:52 rzl@deploy1002: rzl: T369080 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:51 rzl@deploy1002: Started scap sync-world: T369080
21:26 eileen: civicrm upgraded from 08e568e4 to 6e03cff2
21:21 eileen: civicrm upgraded from 67bcfd72 to 08e568e4
20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:45 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
20:39 cmooney@cumin1002: START - Cookbook sre.hosts.provision for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:34 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
20:33 urbanecm@deploy1002: Finished scap: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) (duration: 11m 44s)
20:31 cmooney@cumin1002: START - Cookbook sre.dns.netbox
20:28 urbanecm@deploy1002: arlolra, urbanecm: Continuing with sync
20:25 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet
20:24 urbanecm@deploy1002: arlolra, urbanecm: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:21 urbanecm@deploy1002: Started scap sync-world: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720)
20:21 urbanecm@deploy1002: Finished scap: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) (duration: 16m 31s)
20:16 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Continuing with sync
20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
20:07 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:04 urbanecm@deploy1002: Started scap sync-world: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292)
19:45 jhathaway: running another email inbound mx test on mx-in1001
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65675 and previous config saved to /var/cache/conftool/dbconfig/20240702-194027-marostegui.json
19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65674 and previous config saved to /var/cache/conftool/dbconfig/20240702-194005-marostegui.json
19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65673 and previous config saved to /var/cache/conftool/dbconfig/20240702-192457-marostegui.json
19:21 eileen: civicrm upgraded from 64f23ed0 to 67bcfd72
19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65672 and previous config saved to /var/cache/conftool/dbconfig/20240702-190950-marostegui.json
18:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65671 and previous config saved to /var/cache/conftool/dbconfig/20240702-185443-marostegui.json
17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
17:20 jforrester@deploy1002: Finished scap: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) (duration: 10m 06s)
17:15 jforrester@deploy1002: jforrester: Continuing with sync
17:14 jforrester@deploy1002: jforrester: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
17:10 jforrester@deploy1002: Started scap sync-world: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010)
17:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
17:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
17:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
17:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
17:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
17:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
17:06 mutante: lists1004 - sudo systemctl start wmf_auto_restart_exim4 (T369017)
16:54 ejegg: fundraising civicrm upgraded from 41c1bd78 to 64f23ed0
16:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
16:13 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
16:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
16:01 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
15:58 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1004.eqiad.wmnet
15:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
15:51 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1004.eqiad.wmnet
15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
15:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
15:46 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
15:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65670 and previous config saved to /var/cache/conftool/dbconfig/20240702-154127-marostegui.json
15:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
15:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65669 and previous config saved to /var/cache/conftool/dbconfig/20240702-154105-marostegui.json
15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65668 and previous config saved to /var/cache/conftool/dbconfig/20240702-152558-marostegui.json
15:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
15:12 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
15:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65667 and previous config saved to /var/cache/conftool/dbconfig/20240702-151050-marostegui.json
15:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[2004-2006].codfw.wmnet
15:05 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
15:05 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
15:02 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
14:58 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
14:58 jiji@cumin1002: START - Cookbook sre.dns.netbox
14:58 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
14:55 fabfur: upgrading A:cp-esams to haproxy 2.8.10 (T367756)
14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65666 and previous config saved to /var/cache/conftool/dbconfig/20240702-145542-marostegui.json
14:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
14:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
14:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
14:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
14:52 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
14:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
14:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
14:51 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[1004-1006].eqiad.wmnet
14:51 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
14:51 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
14:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
14:48 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
14:47 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[2004-2006].codfw.wmnet
14:45 jiji@cumin1002: START - Cookbook sre.dns.netbox
14:38 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
14:37 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[1004-1006].eqiad.wmnet
14:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
14:19 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1008.eqiad.wmnet
14:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
14:12 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: decom
14:12 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
14:11 jiji@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2 days, 0:00:00 on 6 hosts with reason: decom
14:11 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
14:06 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns
14:06 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
14:05 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
14:05 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
14:05 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
14:05 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
14:05 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
14:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
14:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
14:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns
14:04 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
14:03 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
14:03 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
14:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org
14:02 sukhe: restart anycast-hc on dns6001
14:01 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org
13:58 effie: decom old eqiad and codfw kubetcd hosts
13:46 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
13:44 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
13:44 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
13:43 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
13:42 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
13:42 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
13:41 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
13:39 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
13:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2030.codfw.wmnet|wikikube-worker2031.codfw.wmnet|wikikube-worker2032.codfw.wmnet|wikikube-worker2033.codfw.wmnet|wikikube-worker2034.codfw.wmnet),cluster=kubernetes,service=kubesvc
13:35 claime: Pooling and uncordoning wikikube-worker2030.codfw.wmnet wikikube-worker2031.codfw.wmnet wikikube-worker2032.codfw.wmnet wikikube-worker2033.codfw.wmnet wikikube-worker2034.codfw.wmnet - T351074
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65665 and previous config saved to /var/cache/conftool/dbconfig/20240702-133100-marostegui.json
13:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65664 and previous config saved to /var/cache/conftool/dbconfig/20240702-133038-marostegui.json
13:30 Lucas_WMDE: UTC afternoon backport+config window done
13:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) (duration: 10m 22s)
13:22 claime: homer 'cr*codfw*' commit 'T351074'
13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Continuing with sync
13:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[1001-1002].eqiad.wmnet
13:21 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
13:21 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:18 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877)
13:16 jiji@cumin1002: START - Cookbook sre.dns.netbox
13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65663 and previous config saved to /var/cache/conftool/dbconfig/20240702-131531-marostegui.json
13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable EntitySchema data type on Wikidata (T332157) (duration: 10m 54s)
13:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2032.codfw.wmnet with OS bullseye
13:09 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[1001-1002].eqiad.wmnet
13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Enable EntitySchema data type on Wikidata (T332157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2033.codfw.wmnet with OS bullseye
13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable EntitySchema data type on Wikidata (T332157)
13:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65662 and previous config saved to /var/cache/conftool/dbconfig/20240702-130024-marostegui.json
12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2034.codfw.wmnet with OS bullseye
12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2031.codfw.wmnet with OS bullseye
12:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2030.codfw.wmnet with OS bullseye
12:55 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=kubemaster100[1-2].eqiad.wmnet
12:49 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster100[1-2].eqiad.wmnet
12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
12:46 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
12:46 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
12:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65661 and previous config saved to /var/cache/conftool/dbconfig/20240702-124517-marostegui.json
12:44 effie: decom eqiad old kubemasters - T353464
12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
12:41 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes1051.eqiad.wmnet
12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
12:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
12:25 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
12:25 marostegui: Deploy schema change on db2129 s6 codfw dbmaint T367856
12:25 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
12:24 jforrester@deploy1002: Finished scap: Backport for Reference widget: check for undefined config (T368736) (duration: 09m 59s)
12:19 jforrester@deploy1002: jforrester: Continuing with sync
12:19 jforrester@deploy1002: jforrester: Backport for Reference widget: check for undefined config (T368736) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2034.codfw.wmnet with OS bullseye
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2033.codfw.wmnet with OS bullseye
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2032.codfw.wmnet with OS bullseye
12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2031.codfw.wmnet with OS bullseye
12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2030.codfw.wmnet with OS bullseye
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2393 to wikikube-worker2034
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2034
12:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2034
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65660 and previous config saved to /var/cache/conftool/dbconfig/20240702-121638-root.json
12:16 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
12:16 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
12:16 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:15 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
12:15 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
12:14 jforrester@deploy1002: Started scap sync-world: Backport for Reference widget: check for undefined config (T368736)
12:11 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:11 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2393 to wikikube-worker2034
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2392 to wikikube-worker2033
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2033
12:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2033
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
12:09 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
12:08 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
12:07 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
12:07 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
12:05 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2392 to wikikube-worker2033
12:05 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2365 to wikikube-worker2032
12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2032
12:03 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2032
12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
12:01 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
12:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65659 and previous config saved to /var/cache/conftool/dbconfig/20240702-120133-root.json
12:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
12:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
12:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:59 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2365 to wikikube-worker2032
11:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2309 to wikikube-worker2031
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2031
11:58 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2031
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
11:58 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
11:58 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
11:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
11:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2309 to wikikube-worker2031
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2307 to wikikube-worker2030
11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2030
11:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2030
11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65658 and previous config saved to /var/cache/conftool/dbconfig/20240702-115026-marostegui.json
11:50 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
11:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
11:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65657 and previous config saved to /var/cache/conftool/dbconfig/20240702-115003-marostegui.json
11:48 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65656 and previous config saved to /var/cache/conftool/dbconfig/20240702-114627-root.json
11:44 root@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
11:43 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
11:43 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2307 to wikikube-worker2030
11:37 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
11:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
11:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65655 and previous config saved to /var/cache/conftool/dbconfig/20240702-113457-marostegui.json
11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65654 and previous config saved to /var/cache/conftool/dbconfig/20240702-113122-root.json
11:27 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2129 T369021', diff saved to https://phabricator.wikimedia.org/P65653 and previous config saved to /var/cache/conftool/dbconfig/20240702-112616-root.json
11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T369021', diff saved to https://phabricator.wikimedia.org/P65652 and previous config saved to /var/cache/conftool/dbconfig/20240702-112518-marostegui.json
11:24 marostegui: Starting s6 codfw failover from db2129 to db2214 - T369021
11:24 jayme: switched wikikube production clusters from PSP to PSS for restricted namespaces - T273507
11:23 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
11:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
11:21 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
11:21 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
11:21 claime: Uncordoning wikikube-ctrl2001.codfw.wmnet and wikikube-ctrl2002.codfw.wmnet
11:20 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65651 and previous config saved to /var/cache/conftool/dbconfig/20240702-111949-marostegui.json
11:17 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65650 and previous config saved to /var/cache/conftool/dbconfig/20240702-111616-root.json
11:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
11:12 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
11:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
11:12 claime: pooling and uncordoning wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet - T351074
11:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[2001-2002].codfw.wmnet
11:11 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
11:11 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
11:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
11:07 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T369021', diff saved to https://phabricator.wikimedia.org/P65649 and previous config saved to /var/cache/conftool/dbconfig/20240702-110750-root.json
11:07 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
11:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65648 and previous config saved to /var/cache/conftool/dbconfig/20240702-110442-marostegui.json
11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65647 and previous config saved to /var/cache/conftool/dbconfig/20240702-110111-root.json
10:56 jiji@cumin1002: START - Cookbook sre.dns.netbox
10:50 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[2001-2002].codfw.wmnet
10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65646 and previous config saved to /var/cache/conftool/dbconfig/20240702-104605-root.json
10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:42 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
10:41 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
10:35 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
10:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1003.eqiad.wmnet
10:32 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
10:28 fabfur: upgrading A:cp-eqiad to haproxy 2.8.10 (T367756)
10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
10:25 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1003.eqiad.wmnet
10:06 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65645 and previous config saved to /var/cache/conftool/dbconfig/20240702-100636-jynus.json
10:02 claime: homer 'cr*codfw*' commit 'T351074'
09:53 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster200[1-2].codfw.wmnet
09:52 elukey: volatile dir on puppetserver1001 with the new point release (12.6) for Bookworm
09:48 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
09:47 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
09:20 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
09:15 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65644 and previous config saved to /var/cache/conftool/dbconfig/20240702-091508-jynus.json
08:57 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65643 and previous config saved to /var/cache/conftool/dbconfig/20240702-085733-jynus.json
08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65642 and previous config saved to /var/cache/conftool/dbconfig/20240702-084447-marostegui.json
08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65641 and previous config saved to /var/cache/conftool/dbconfig/20240702-084425-marostegui.json
08:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp6009.*} and A:cp
08:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp6009.*} and A:cp
08:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
08:34 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.12 refs T366957
08:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
08:30 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet
08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65640 and previous config saved to /var/cache/conftool/dbconfig/20240702-082918-marostegui.json
08:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2031.*} and A:cp
08:20 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2031.*} and A:cp
08:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2030.*} and A:cp
08:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
08:15 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2030.*} and A:cp
08:15 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
08:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2028.*} and A:cp
08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65639 and previous config saved to /var/cache/conftool/dbconfig/20240702-081411-marostegui.json
08:13 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2028.*} and A:cp
08:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2027.*} and A:cp
08:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2027.*} and A:cp
08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65638 and previous config saved to /var/cache/conftool/dbconfig/20240702-081025-marostegui.json
08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65637 and previous config saved to /var/cache/conftool/dbconfig/20240702-080948-marostegui.json
08:07 jayme: draining kubernetes1051.eqiad.wmnet
08:07 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
08:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
08:01 jayme: cordon kubernetes1051.eqiad.wmnet because of several failed image pulls
07:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65635 and previous config saved to /var/cache/conftool/dbconfig/20240702-075904-marostegui.json
07:58 kharlan@deploy1002: Finished scap: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) (duration: 41m 45s)
07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65634 and previous config saved to /var/cache/conftool/dbconfig/20240702-075440-marostegui.json
07:52 kharlan@deploy1002: kharlan: Continuing with sync
07:51 kharlan@deploy1002: kharlan: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65633 and previous config saved to /var/cache/conftool/dbconfig/20240702-073933-marostegui.json
07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65632 and previous config saved to /var/cache/conftool/dbconfig/20240702-072426-marostegui.json
07:16 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
07:06 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
07:01 oblivian@deploy1002: Finished scap: Rebuilding images for change to the base image for httpd (duration: 26m 52s)
06:59 XioNoX: update netboot bookworm image to pickup new point release
06:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65631 and previous config saved to /var/cache/conftool/dbconfig/20240702-065831-root.json
06:35 oblivian@deploy1002: Started scap sync-world: Rebuilding images for change to the base image for httpd
06:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65629 and previous config saved to /var/cache/conftool/dbconfig/20240702-062820-root.json
06:21 _joe_: rebuilding httpd-fcgi, mediawiki-httpd images T363342 T368640
06:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65628 and previous config saved to /var/cache/conftool/dbconfig/20240702-061315-root.json
05:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65627 and previous config saved to /var/cache/conftool/dbconfig/20240702-055809-root.json
05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65626 and previous config saved to /var/cache/conftool/dbconfig/20240702-054304-root.json
05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65625 and previous config saved to /var/cache/conftool/dbconfig/20240702-052759-root.json
05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 T368371', diff saved to https://phabricator.wikimedia.org/P65624 and previous config saved to /var/cache/conftool/dbconfig/20240702-052543-root.json
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1209 to s8 primary and set section read-write T368371', diff saved to https://phabricator.wikimedia.org/P65623 and previous config saved to /var/cache/conftool/dbconfig/20240702-052447-marostegui.json
05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T368371', diff saved to https://phabricator.wikimedia.org/P65622 and previous config saved to /var/cache/conftool/dbconfig/20240702-052408-marostegui.json
05:23 marostegui: Starting s8 eqiad failover from db1192 to db1209 - T368371
04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 remove from API T368371', diff saved to https://phabricator.wikimedia.org/P65621 and previous config saved to /var/cache/conftool/dbconfig/20240702-045929-marostegui.json
04:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 with weight 0 T368371', diff saved to https://phabricator.wikimedia.org/P65620 and previous config saved to /var/cache/conftool/dbconfig/20240702-045856-marostegui.json
04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65619 and previous config saved to /var/cache/conftool/dbconfig/20240702-043349-marostegui.json
04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65618 and previous config saved to /var/cache/conftool/dbconfig/20240702-043326-marostegui.json
04:22 eileen: civicrm upgraded from f6af6380 to 41c1bd78
04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65617 and previous config saved to /var/cache/conftool/dbconfig/20240702-041819-marostegui.json
04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65616 and previous config saved to /var/cache/conftool/dbconfig/20240702-040705-marostegui.json
04:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
04:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
04:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65615 and previous config saved to /var/cache/conftool/dbconfig/20240702-040643-marostegui.json
04:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65614 and previous config saved to /var/cache/conftool/dbconfig/20240702-040312-marostegui.json
04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.9 (duration: 01m 02s)
03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.12 refs T366957 (duration: 51m 33s)
03:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65613 and previous config saved to /var/cache/conftool/dbconfig/20240702-035135-marostegui.json
03:51 eileen: civicrm upgraded from 52dc4f1d to f6af6380
03:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65612 and previous config saved to /var/cache/conftool/dbconfig/20240702-034805-marostegui.json
03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65611 and previous config saved to /var/cache/conftool/dbconfig/20240702-033628-marostegui.json
03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65610 and previous config saved to /var/cache/conftool/dbconfig/20240702-032121-marostegui.json
03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.12 refs T366957
00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65609 and previous config saved to /var/cache/conftool/dbconfig/20240702-004524-marostegui.json
00:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
00:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65608 and previous config saved to /var/cache/conftool/dbconfig/20240702-004502-marostegui.json
00:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65607 and previous config saved to /var/cache/conftool/dbconfig/20240702-002955-marostegui.json
00:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
00:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65606 and previous config saved to /var/cache/conftool/dbconfig/20240702-001448-marostegui.json
00:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
00:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
00:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"

2024-07-01

23:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65605 and previous config saved to /var/cache/conftool/dbconfig/20240701-235941-marostegui.json
23:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
23:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
23:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
23:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye
23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
23:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
23:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
22:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1038
22:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1038
22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
22:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
22:10 sbassett@deploy1002: Synchronized private/PrivateSettings.php: Un-deployed a PS.php mitigation for T341908 (duration: 07m 24s)
21:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
21:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
21:55 maryum: deployed patch for T366991
21:39 eileen: civicrm upgraded from f8b1f5c4 to 52dc4f1d
21:39 eileen: tools upgraded from c51f6e62 to 95f10b20
21:32 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Yann . # T368703
21:24 cjming: end of UTC late backport window
21:23 cjming@deploy1002: Finished scap: Backport for extension-list: Add Metrics Platform (T366234) (duration: 28m 16s)
21:16 cjming@deploy1002: cjming: Continuing with sync
21:16 cjming@deploy1002: cjming: Backport for extension-list: Add Metrics Platform (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65604 and previous config saved to /var/cache/conftool/dbconfig/20240701-210534-marostegui.json
21:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
21:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65603 and previous config saved to /var/cache/conftool/dbconfig/20240701-210512-marostegui.json
21:04 ejegg: fundraising civicrm upgraded from f9782670 to f8b1f5c4
20:55 cjming@deploy1002: Started scap sync-world: Backport for extension-list: Add Metrics Platform (T366234)
20:53 cjming@deploy1002: Finished scap: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) (duration: 09m 03s)
20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65602 and previous config saved to /var/cache/conftool/dbconfig/20240701-205003-marostegui.json
20:47 cjming@deploy1002: cjming, pppery: Continuing with sync
20:47 cjming@deploy1002: cjming, pppery: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:44 cjming@deploy1002: Started scap sync-world: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915)
20:42 cjming@deploy1002: Finished scap: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) (duration: 10m 39s)
20:36 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
20:35 ejegg: standalone SmashPig upgraded from c8993ec6 to 565c61e4
20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65601 and previous config saved to /var/cache/conftool/dbconfig/20240701-203456-marostegui.json
20:34 cjming@deploy1002: cjming, jdlrobson: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:31 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483)
20:30 cjming@deploy1002: Sync cancelled.
20:28 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:26 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
20:23 cjming@deploy1002: Sync cancelled.
20:23 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65600 and previous config saved to /var/cache/conftool/dbconfig/20240701-201949-marostegui.json
20:03 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
19:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
19:19 dancy@deploy1002: Installation of scap version "4.91.0" completed for 233 hosts
19:19 dancy@deploy1002: Installing scap version "4.91.0" for 233 hosts
19:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
19:15 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
19:14 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
19:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
18:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
18:56 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
18:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:49 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:49 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
17:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
17:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
17:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:42 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:42 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
17:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
17:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
17:36 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
17:35 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
17:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
17:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65599 and previous config saved to /var/cache/conftool/dbconfig/20240701-171609-marostegui.json
17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
17:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
17:05 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
17:04 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
16:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
16:51 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
16:51 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
16:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
16:33 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
16:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65598 and previous config saved to /var/cache/conftool/dbconfig/20240701-163010-marostegui.json
16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
16:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65597 and previous config saved to /var/cache/conftool/dbconfig/20240701-162948-marostegui.json
16:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
16:21 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1039
16:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1039
16:18 urandom: restarting Cassandra —restbase2023-{a,b,c}— troubleshooting storage utilization
16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
16:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65596 and previous config saved to /var/cache/conftool/dbconfig/20240701-161441-marostegui.json
16:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
16:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
15:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65595 and previous config saved to /var/cache/conftool/dbconfig/20240701-155934-marostegui.json
15:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65594 and previous config saved to /var/cache/conftool/dbconfig/20240701-154427-marostegui.json
15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65593 and previous config saved to /var/cache/conftool/dbconfig/20240701-153758-root.json
15:37 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:32 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
15:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65592 and previous config saved to /var/cache/conftool/dbconfig/20240701-152253-root.json
15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
15:16 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
15:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
15:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
15:14 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
15:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65591 and previous config saved to /var/cache/conftool/dbconfig/20240701-150747-root.json
15:07 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
15:05 akosiaris: reboot deploy1003 T364416
15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
14:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
14:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
14:55 claime: deploying statsd-exporter for mw-web - T365265
14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
14:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65590 and previous config saved to /var/cache/conftool/dbconfig/20240701-145242-root.json
14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
14:48 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
14:48 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
14:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
14:44 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
14:43 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
14:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
14:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
14:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65589 and previous config saved to /var/cache/conftool/dbconfig/20240701-143736-root.json
14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
14:35 fabfur: upgrading A:cp-codfw to haproxy 2.8.10 (T367756)
14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
14:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
14:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
14:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65587 and previous config saved to /var/cache/conftool/dbconfig/20240701-142231-root.json
14:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65586 and previous config saved to /var/cache/conftool/dbconfig/20240701-141640-marostegui.json
14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
14:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65585 and previous config saved to /var/cache/conftool/dbconfig/20240701-140725-root.json
14:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65584 and previous config saved to /var/cache/conftool/dbconfig/20240701-140133-marostegui.json
13:57 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
13:56 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
13:48 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
13:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65583 and previous config saved to /var/cache/conftool/dbconfig/20240701-134626-marostegui.json
13:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
13:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1040
13:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1040
13:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65581 and previous config saved to /var/cache/conftool/dbconfig/20240701-133118-marostegui.json
13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
13:30 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: sync
13:29 elukey@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: sync
13:29 urbanecm: mwmaint1002: [urbanecm@mwmaint1002 ~]$ foreachwiki DiscussionTools:FixTrailingWhitespaceIds (T356196)
13:27 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
13:27 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:25 urbanecm@deploy1002: Finished scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196) (duration: 08m 46s)
13:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
13:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
13:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
13:16 urbanecm@deploy1002: Started scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196)
13:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki map (T368862) (duration: 09m 01s)
13:14 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
13:10 urbanecm@deploy1002: urbanecm: Continuing with sync
13:10 urbanecm@deploy1002: urbanecm: Backport for Update interwiki map (T368862) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
13:07 urbanecm@deploy1002: Started scap: Backport for Update interwiki map (T368862)
12:56 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
12:56 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
12:56 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
12:55 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:51 claime: Running update-netboot-image bullseye for 11.10 release on puppetserver1001
12:49 fabfur: upgrading A:cp-magru to haproxy 2.8.10 (T367756)
12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
12:39 claime: Running update-netboot-image bullseye for 11.10 release
12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
12:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:35 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:32 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
12:32 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
12:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
12:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
12:30 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
12:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
12:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
12:27 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
12:23 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
12:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
12:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
12:21 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
12:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
12:19 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
12:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
12:17 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
12:16 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
12:14 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
12:12 daniel@deploy1002: Finished scap: Backport for REST: detect mismatching value types in json request (T305973) (duration: 32m 48s)
12:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
12:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
12:06 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
12:04 daniel@deploy1002: daniel: Continuing with sync
12:03 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:01 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
12:00 daniel@deploy1002: daniel: Backport for REST: detect mismatching value types in json request (T305973) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
11:58 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
11:51 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
11:49 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
11:46 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
11:45 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:45 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
11:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging FebinBellamy out of all services on: 2188 hosts
11:43 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
11:43 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging FebinBellamy out of all services on: 2188 hosts
11:41 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 2188 hosts
11:41 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 2188 hosts
11:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
11:39 daniel@deploy1002: Started scap: Backport for REST: detect mismatching value types in json request (T305973)
11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
11:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
11:29 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
11:27 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
10:57 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
10:49 claime: running /usr/local/bin/apply-config-kartotherian on maps-master
10:47 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
10:47 claime: running /usr/local/bin/apply-config-kartotherian on maps-replica
10:46 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
10:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
10:43 claime: running puppet on maps servers
10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
10:38 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
10:37 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
10:37 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
10:37 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65580 and previous config saved to /var/cache/conftool/dbconfig/20240701-102633-marostegui.json
10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65579 and previous config saved to /var/cache/conftool/dbconfig/20240701-102611-marostegui.json
10:23 fabfur: upgrading A:cp-drmrs to haproxy 2.8.10 (T367756)
10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65578 and previous config saved to /var/cache/conftool/dbconfig/20240701-101104-marostegui.json
09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65577 and previous config saved to /var/cache/conftool/dbconfig/20240701-095557-marostegui.json
09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65576 and previous config saved to /var/cache/conftool/dbconfig/20240701-094547-root.json
09:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65575 and previous config saved to /var/cache/conftool/dbconfig/20240701-094341-root.json
09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65574 and previous config saved to /var/cache/conftool/dbconfig/20240701-094050-marostegui.json
09:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65573 and previous config saved to /var/cache/conftool/dbconfig/20240701-093042-root.json
09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65572 and previous config saved to /var/cache/conftool/dbconfig/20240701-092835-root.json
09:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
09:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
09:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65570 and previous config saved to /var/cache/conftool/dbconfig/20240701-091536-root.json
09:14 urbanecm@deploy1002: Finished scap: Backport for JsonSchemaValidator: Measure duration (T365245) (duration: 22m 15s)
09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65569 and previous config saved to /var/cache/conftool/dbconfig/20240701-091329-root.json
09:06 urbanecm@deploy1002: urbanecm: Continuing with sync
09:06 urbanecm@deploy1002: urbanecm: Backport for JsonSchemaValidator: Measure duration (T365245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65568 and previous config saved to /var/cache/conftool/dbconfig/20240701-090031-root.json
08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65567 and previous config saved to /var/cache/conftool/dbconfig/20240701-085824-root.json
08:51 urbanecm@deploy1002: Started scap: Backport for JsonSchemaValidator: Measure duration (T365245)
08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65566 and previous config saved to /var/cache/conftool/dbconfig/20240701-084525-root.json
08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65565 and previous config saved to /var/cache/conftool/dbconfig/20240701-084318-root.json
08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65564 and previous config saved to /var/cache/conftool/dbconfig/20240701-083020-root.json
08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65563 and previous config saved to /var/cache/conftool/dbconfig/20240701-082813-root.json
08:18 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65562 and previous config saved to /var/cache/conftool/dbconfig/20240701-081811-jynus.json
08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65561 and previous config saved to /var/cache/conftool/dbconfig/20240701-081514-root.json
08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65560 and previous config saved to /var/cache/conftool/dbconfig/20240701-081307-root.json
08:07 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
07:44 elukey: `apt-get clean` on buil2001 to free some space in the root partition
07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1195 in s1 T368871', diff saved to https://phabricator.wikimedia.org/P65559 and previous config saved to /var/cache/conftool/dbconfig/20240701-070243-marostegui.json
06:36 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T368871', diff saved to https://phabricator.wikimedia.org/P65558 and previous config saved to /var/cache/conftool/dbconfig/20240701-063601-root.json
06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65557 and previous config saved to /var/cache/conftool/dbconfig/20240701-063344-marostegui.json
06:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
06:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
04:56 marostegui: Failover m2 from db1195 to db1228 - T368494
04:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
04:50 marostegui: dbmaint eqiad Rebuild pagelinks table on s8 master T364069
04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65556 and previous config saved to /var/cache/conftool/dbconfig/20240701-044945-marostegui.json
04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance

2024-07-09

2024-07-08

2024-07-07

2024-07-06

2024-07-05

2024-07-04

2024-07-03

2024-07-02

2024-07-01

Archives