Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
Jump to content

Server Admin Log

From Wikitech

2024-07-09

  • 10:40 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P66043 and previous config saved to /var/cache/conftool/dbconfig/20240709-104054-root.json
  • 10:37 Dreamy_Jazz: Finished running maintenance scripts for T366781
  • 10:34 marostegui@cumin1002: dbctl commit (dc=all): 'db2212 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66042 and previous config saved to /var/cache/conftool/dbconfig/20240709-103409-root.json
  • 10:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2212 T369515', diff saved to https://phabricator.wikimedia.org/P66041 and previous config saved to /var/cache/conftool/dbconfig/20240709-103331-root.json
  • 10:32 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2203 to s1 primary T369515', diff saved to https://phabricator.wikimedia.org/P66040 and previous config saved to /var/cache/conftool/dbconfig/20240709-103238-root.json
  • 10:32 marostegui: Starting s1 codfw failover from db2212 to db2203 - T369515
  • 10:29 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 db1198 db1199 T365995', diff saved to https://phabricator.wikimedia.org/P66039 and previous config saved to /var/cache/conftool/dbconfig/20240709-102947-root.json
  • 10:25 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P66038 and previous config saved to /var/cache/conftool/dbconfig/20240709-102549-root.json
  • 10:10 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P66037 and previous config saved to /var/cache/conftool/dbconfig/20240709-101043-root.json
  • 10:04 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 10:03 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 09:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
  • 09:57 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2203 with weight 0 T369515', diff saved to https://phabricator.wikimedia.org/P66036 and previous config saved to /var/cache/conftool/dbconfig/20240709-095659-root.json
  • 09:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 36 hosts with reason: Primary switchover s1 T369515
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'db1162 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P66035 and previous config saved to /var/cache/conftool/dbconfig/20240709-095538-root.json
  • 09:26 cparle@deploy1002: Finished deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided) (duration: 00m 32s)
  • 09:26 cparle@deploy1002: Started deploy [airflow-dags/platform_eng@0e9b3ac]: (no justification provided)
  • 09:06 vgutierrez: restart purged @ cp3073
  • 08:28 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 08:28 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 08:28 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 08:27 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 08:17 aklapper@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.13 refs T366958
  • 08:03 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 08:01 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 08:01 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 07:59 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 07:58 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 07:57 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts netbox-dev2002.codfw.wmnet
  • 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 07:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 07:40 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: netbox-dev2002.codfw.wmnet decommissioned, removing all IPs except the asset tag one - ayounsi@cumin1002"
  • 07:40 Dreamy_Jazz: Morning UTC backport window done
  • 07:38 vgutierrez: repool cp3073
  • 07:35 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 07:32 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on P{cp3073.*} and A:cp
  • 07:32 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3073.esams.wmnet
  • 07:30 dreamyjazz@deploy1002: Synchronized wmf-config/throttle.php: Deploying throttle change for T369522 (duration: 09m 50s)
  • 07:26 ayounsi@cumin1002: START - Cookbook sre.hosts.decommission for hosts netbox-dev2002.codfw.wmnet
  • 07:25 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
  • 07:12 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on P{cp3073.*} and A:cp
  • 07:10 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
  • 07:08 fabfur@cumin1002: END (ERROR) - Cookbook sre.cdn.roll-reboot (exit_code=97) rolling reboot on P{cp3073.*} and A:cp
  • 07:08 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on P{cp3073.*} and A:cp
  • 06:54 Dreamy_Jazz: Start `foreachwikiindblist group2.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
  • 05:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 05:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 05:20 marostegui: Deploy schema change on s2 eqiad db1162 dbmaint T367856
  • 05:20 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
  • 05:20 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1162.eqiad.wmnet with reason: Long schema change
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1162 T369339', diff saved to https://phabricator.wikimedia.org/P66034 and previous config saved to /var/cache/conftool/dbconfig/20240709-051911-marostegui.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1222 to s2 primary and set section read-write T369339', diff saved to https://phabricator.wikimedia.org/P66033 and previous config saved to /var/cache/conftool/dbconfig/20240709-051814-marostegui.json
  • 05:17 marostegui@cumin1002: dbctl commit (dc=all): 'Set s2 eqiad as read-only for maintenance - T369339', diff saved to https://phabricator.wikimedia.org/P66032 and previous config saved to /var/cache/conftool/dbconfig/20240709-051749-marostegui.json
  • 05:17 marostegui: Starting s2 eqiad failover from db1162 to db1222 - T369339
  • 04:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1222 with weight 0 T369339', diff saved to https://phabricator.wikimedia.org/P66031 and previous config saved to /var/cache/conftool/dbconfig/20240709-045814-marostegui.json
  • 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369339
  • 04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2156 (T367856)', diff saved to https://phabricator.wikimedia.org/P66030 and previous config saved to /var/cache/conftool/dbconfig/20240709-044128-marostegui.json
  • 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2156.codfw.wmnet with reason: Maintenance
  • 04:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66029 and previous config saved to /var/cache/conftool/dbconfig/20240709-044051-marostegui.json
  • 04:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66028 and previous config saved to /var/cache/conftool/dbconfig/20240709-042544-marostegui.json
  • 04:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149', diff saved to https://phabricator.wikimedia.org/P66027 and previous config saved to /var/cache/conftool/dbconfig/20240709-041036-marostegui.json
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.10 (duration: 00m 57s)
  • 03:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P66026 and previous config saved to /var/cache/conftool/dbconfig/20240709-035529-marostegui.json
  • 03:53 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.13 refs T366958 (duration: 50m 52s)
  • 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.13 refs T366958
  • 01:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66025 and previous config saved to /var/cache/conftool/dbconfig/20240709-014242-arnaudb.json
  • 01:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66024 and previous config saved to /var/cache/conftool/dbconfig/20240709-012735-arnaudb.json
  • 01:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217', diff saved to https://phabricator.wikimedia.org/P66023 and previous config saved to /var/cache/conftool/dbconfig/20240709-011227-arnaudb.json
  • 00:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66022 and previous config saved to /var/cache/conftool/dbconfig/20240709-005720-arnaudb.json
  • 00:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2217 (T367781)', diff saved to https://phabricator.wikimedia.org/P66021 and previous config saved to /var/cache/conftool/dbconfig/20240709-005456-arnaudb.json
  • 00:54 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 00:54 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2217.codfw.wmnet with reason: Maintenance
  • 00:49 pt1979@cumin2002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host sretest2001.codfw.wmnet
  • 00:14 pt1979@cumin2002: START - Cookbook sre.hosts.dhcp for host sretest2001.codfw.wmnet
  • 00:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 00:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2197.codfw.wmnet with reason: Maintenance
  • 00:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66020 and previous config saved to /var/cache/conftool/dbconfig/20240709-001324-arnaudb.json
  • 00:13 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 00:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1245.eqiad.wmnet with reason: Maintenance
  • 00:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66019 and previous config saved to /var/cache/conftool/dbconfig/20240709-001250-marostegui.json
  • 00:05 ejegg: payments-wiki upgraded from 82a5e588 to dc0c14d4

2024-07-08

  • 23:58 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66018 and previous config saved to /var/cache/conftool/dbconfig/20240708-235817-arnaudb.json
  • 23:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66017 and previous config saved to /var/cache/conftool/dbconfig/20240708-235742-marostegui.json
  • 23:52 fabfur@cumin1002: END (FAIL) - Cookbook sre.cdn.roll-reboot (exit_code=1) rolling reboot on A:cp-text_esams
  • 23:43 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193', diff saved to https://phabricator.wikimedia.org/P66016 and previous config saved to /var/cache/conftool/dbconfig/20240708-234310-arnaudb.json
  • 23:42 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230', diff saved to https://phabricator.wikimedia.org/P66015 and previous config saved to /var/cache/conftool/dbconfig/20240708-234235-marostegui.json
  • 23:28 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66014 and previous config saved to /var/cache/conftool/dbconfig/20240708-232803-arnaudb.json
  • 23:27 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P66013 and previous config saved to /var/cache/conftool/dbconfig/20240708-232728-marostegui.json
  • 23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2193 (T367781)', diff saved to https://phabricator.wikimedia.org/P66012 and previous config saved to /var/cache/conftool/dbconfig/20240708-232549-arnaudb.json
  • 23:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 23:25 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2193.codfw.wmnet with reason: Maintenance
  • 23:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66011 and previous config saved to /var/cache/conftool/dbconfig/20240708-232527-arnaudb.json
  • 23:10 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66010 and previous config saved to /var/cache/conftool/dbconfig/20240708-231020-arnaudb.json
  • 22:55 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180', diff saved to https://phabricator.wikimedia.org/P66009 and previous config saved to /var/cache/conftool/dbconfig/20240708-225513-arnaudb.json
  • 22:46 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 22:42 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_esams
  • 22:42 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3081.esams.wmnet
  • 22:40 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66008 and previous config saved to /var/cache/conftool/dbconfig/20240708-224006-arnaudb.json
  • 22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2180 (T367781)', diff saved to https://phabricator.wikimedia.org/P66007 and previous config saved to /var/cache/conftool/dbconfig/20240708-223752-arnaudb.json
  • 22:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:37 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2180.codfw.wmnet with reason: Maintenance
  • 22:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66006 and previous config saved to /var/cache/conftool/dbconfig/20240708-223741-arnaudb.json
  • 22:26 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 22:22 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66005 and previous config saved to /var/cache/conftool/dbconfig/20240708-222234-arnaudb.json
  • 22:07 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169', diff saved to https://phabricator.wikimedia.org/P66004 and previous config saved to /var/cache/conftool/dbconfig/20240708-220727-arnaudb.json
  • 21:52 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66003 and previous config saved to /var/cache/conftool/dbconfig/20240708-215220-arnaudb.json
  • 21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2169 (T367781)', diff saved to https://phabricator.wikimedia.org/P66002 and previous config saved to /var/cache/conftool/dbconfig/20240708-214954-arnaudb.json
  • 21:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:49 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2169.codfw.wmnet with reason: Maintenance
  • 21:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P66001 and previous config saved to /var/cache/conftool/dbconfig/20240708-214932-arnaudb.json
  • 21:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P66000 and previous config saved to /var/cache/conftool/dbconfig/20240708-213425-arnaudb.json
  • 21:24 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 21:23 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 21:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158', diff saved to https://phabricator.wikimedia.org/P65999 and previous config saved to /var/cache/conftool/dbconfig/20240708-211918-arnaudb.json
  • 21:16 catrope@deploy1002: Finished scap: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) (duration: 09m 23s)
  • 21:10 catrope@deploy1002: catrope, nmw03: Continuing with sync
  • 21:09 catrope@deploy1002: catrope, nmw03: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:06 catrope@deploy1002: Started scap sync-world: Backport for Enable VisualEditor by default on Italian Wikibooks (T369342)
  • 21:05 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
  • 21:05 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic109[3-5]* for T348977 - bking@cumin2002
  • 21:05 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
  • 21:05 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1093-1095].eqiad.wmnet with reason: T348977
  • 21:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65998 and previous config saved to /var/cache/conftool/dbconfig/20240708-210410-arnaudb.json
  • 21:02 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1023.eqiad.wmnet
  • 21:02 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3080.esams.wmnet
  • 21:01 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3072.esams.wmnet
  • 21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2158 (T367781)', diff saved to https://phabricator.wikimedia.org/P65997 and previous config saved to /var/cache/conftool/dbconfig/20240708-210144-arnaudb.json
  • 21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db2187.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2158.codfw.wmnet with reason: Maintenance
  • 21:01 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65996 and previous config saved to /var/cache/conftool/dbconfig/20240708-210106-arnaudb.json
  • 20:55 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1023.eqiad.wmnet
  • 20:52 catrope@deploy1002: Finished scap: Backport for Graph extension: Add tracking for data sources used in <graph> tags (duration: 13m 00s)
  • 20:50 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host wdqs1022.eqiad.wmnet
  • 20:47 catrope@deploy1002: catrope: Continuing with sync
  • 20:46 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65995 and previous config saved to /var/cache/conftool/dbconfig/20240708-204559-arnaudb.json
  • 20:43 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host wdqs1022.eqiad.wmnet
  • 20:42 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 20:42 catrope@deploy1002: catrope: Backport for Graph extension: Add tracking for data sources used in <graph> tags synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1230 (T367856)', diff saved to https://phabricator.wikimedia.org/P65994 and previous config saved to /var/cache/conftool/dbconfig/20240708-204042-marostegui.json
  • 20:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 20:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1230.eqiad.wmnet with reason: Maintenance
  • 20:39 catrope@deploy1002: Started scap sync-world: Backport for Graph extension: Add tracking for data sources used in <graph> tags
  • 20:38 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 20:35 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 20:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151', diff saved to https://phabricator.wikimedia.org/P65993 and previous config saved to /var/cache/conftool/dbconfig/20240708-203052-arnaudb.json
  • 20:28 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 20:27 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 20:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65992 and previous config saved to /var/cache/conftool/dbconfig/20240708-201545-arnaudb.json
  • 20:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2151 (T367781)', diff saved to https://phabricator.wikimedia.org/P65991 and previous config saved to /var/cache/conftool/dbconfig/20240708-201318-arnaudb.json
  • 20:13 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 20:13 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2151.codfw.wmnet with reason: Maintenance
  • 20:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65990 and previous config saved to /var/cache/conftool/dbconfig/20240708-201256-arnaudb.json
  • 20:08 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 19:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65989 and previous config saved to /var/cache/conftool/dbconfig/20240708-195749-arnaudb.json
  • 19:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2149 (T367856)', diff saved to https://phabricator.wikimedia.org/P65988 and previous config saved to /var/cache/conftool/dbconfig/20240708-194435-marostegui.json
  • 19:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 19:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2149.codfw.wmnet with reason: Maintenance
  • 19:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129', diff saved to https://phabricator.wikimedia.org/P65987 and previous config saved to /var/cache/conftool/dbconfig/20240708-194242-arnaudb.json
  • 19:39 bking@cumin2002: END (PASS) - Cookbook sre.wdqs.reboot (exit_code=0)
  • 19:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65986 and previous config saved to /var/cache/conftool/dbconfig/20240708-192735-arnaudb.json
  • 19:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2129 (T367781)', diff saved to https://phabricator.wikimedia.org/P65985 and previous config saved to /var/cache/conftool/dbconfig/20240708-192508-arnaudb.json
  • 19:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2129.codfw.wmnet with reason: Maintenance
  • 19:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65984 and previous config saved to /var/cache/conftool/dbconfig/20240708-192444-arnaudb.json
  • 19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3079.esams.wmnet
  • 19:21 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3071.esams.wmnet
  • 19:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 19:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 19:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65983 and previous config saved to /var/cache/conftool/dbconfig/20240708-190937-arnaudb.json
  • 19:02 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 18:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124', diff saved to https://phabricator.wikimedia.org/P65982 and previous config saved to /var/cache/conftool/dbconfig/20240708-185430-arnaudb.json
  • 18:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65981 and previous config saved to /var/cache/conftool/dbconfig/20240708-183923-arnaudb.json
  • 18:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db2124 (T367781)', diff saved to https://phabricator.wikimedia.org/P65980 and previous config saved to /var/cache/conftool/dbconfig/20240708-183658-arnaudb.json
  • 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2124.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db2114.codfw.wmnet with reason: Maintenance
  • 18:36 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:35 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on dbstore1009.eqiad.wmnet with reason: Maintenance
  • 18:35 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65979 and previous config saved to /var/cache/conftool/dbconfig/20240708-183548-arnaudb.json
  • 18:20 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65978 and previous config saved to /var/cache/conftool/dbconfig/20240708-182041-arnaudb.json
  • 18:06 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader2002.codfw.wmnet
  • 18:05 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231', diff saved to https://phabricator.wikimedia.org/P65977 and previous config saved to /var/cache/conftool/dbconfig/20240708-180533-arnaudb.json
  • 18:02 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader2002.codfw.wmnet
  • 17:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65976 and previous config saved to /var/cache/conftool/dbconfig/20240708-175026-arnaudb.json
  • 17:49 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1231 (T367781)', diff saved to https://phabricator.wikimedia.org/P65975 and previous config saved to /var/cache/conftool/dbconfig/20240708-174918-arnaudb.json
  • 17:49 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 17:48 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65974 and previous config saved to /var/cache/conftool/dbconfig/20240708-174823-arnaudb.json
  • 17:40 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3078.esams.wmnet
  • 17:38 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3070.esams.wmnet
  • 17:33 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65973 and previous config saved to /var/cache/conftool/dbconfig/20240708-173316-arnaudb.json
  • 17:18 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224', diff saved to https://phabricator.wikimedia.org/P65972 and previous config saved to /var/cache/conftool/dbconfig/20240708-171810-arnaudb.json
  • 17:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65971 and previous config saved to /var/cache/conftool/dbconfig/20240708-170302-arnaudb.json
  • 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1224 (T367781)', diff saved to https://phabricator.wikimedia.org/P65970 and previous config saved to /var/cache/conftool/dbconfig/20240708-170053-arnaudb.json
  • 17:00 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:00 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1224.eqiad.wmnet with reason: Maintenance
  • 17:00 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65969 and previous config saved to /var/cache/conftool/dbconfig/20240708-170031-arnaudb.json
  • 16:45 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65968 and previous config saved to /var/cache/conftool/dbconfig/20240708-164524-arnaudb.json
  • 16:39 ladsgroup@deploy1002: Finished scap: Backport for Reduce frequency of two query pages in commonswiki (T369024) (duration: 07m 50s)
  • 16:34 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 16:33 ladsgroup@deploy1002: ladsgroup: Backport for Reduce frequency of two query pages in commonswiki (T369024) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 16:31 ladsgroup@deploy1002: Started scap sync-world: Backport for Reduce frequency of two query pages in commonswiki (T369024)
  • 16:30 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201', diff saved to https://phabricator.wikimedia.org/P65967 and previous config saved to /var/cache/conftool/dbconfig/20240708-163017-arnaudb.json
  • 16:15 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
  • 16:15 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65966 and previous config saved to /var/cache/conftool/dbconfig/20240708-161510-arnaudb.json
  • 16:13 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1201 (T367781)', diff saved to https://phabricator.wikimedia.org/P65965 and previous config saved to /var/cache/conftool/dbconfig/20240708-161302-arnaudb.json
  • 16:12 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:12 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1201.eqiad.wmnet with reason: Maintenance
  • 16:12 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65964 and previous config saved to /var/cache/conftool/dbconfig/20240708-161238-arnaudb.json
  • 16:09 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
  • 16:08 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1011.eqiad.wmnet with OS bullseye
  • 15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3077.esams.wmnet
  • 15:57 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3069.esams.wmnet
  • 15:57 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65963 and previous config saved to /var/cache/conftool/dbconfig/20240708-155731-arnaudb.json
  • 15:51 jdrewniak@deploy1002: Synchronized portals: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 06m 28s)
  • 15:47 pfischer@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:46 pfischer@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:45 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 15:44 jdrewniak@deploy1002: Synchronized portals/wikipedia.org/assets: Wikimedia Portals Update: Bumping portals to master (T128546) (duration: 07m 54s)
  • 15:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 15:44 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:42 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187', diff saved to https://phabricator.wikimedia.org/P65962 and previous config saved to /var/cache/conftool/dbconfig/20240708-154224-arnaudb.json
  • 15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 15:38 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 15:27 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65961 and previous config saved to /var/cache/conftool/dbconfig/20240708-152717-arnaudb.json
  • 15:25 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1187 (T367781)', diff saved to https://phabricator.wikimedia.org/P65960 and previous config saved to /var/cache/conftool/dbconfig/20240708-152508-arnaudb.json
  • 15:25 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:24 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1187.eqiad.wmnet with reason: Maintenance
  • 15:24 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65959 and previous config saved to /var/cache/conftool/dbconfig/20240708-152446-arnaudb.json
  • 15:22 ladsgroup@cumin1002: dbctl commit (dc=all): 'Bumping db1227 weight (T366852)', diff saved to https://phabricator.wikimedia.org/P65958 and previous config saved to /var/cache/conftool/dbconfig/20240708-152222-ladsgroup.json
  • 15:16 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
  • 15:13 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1011.eqiad.wmnet with reason: host reimage
  • 15:09 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65957 and previous config saved to /var/cache/conftool/dbconfig/20240708-150939-arnaudb.json
  • 14:59 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1011.eqiad.wmnet with OS bullseye
  • 14:57 bking@cumin2002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host search-loader1002.eqiad.wmnet
  • 14:54 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180', diff saved to https://phabricator.wikimedia.org/P65956 and previous config saved to /var/cache/conftool/dbconfig/20240708-145432-arnaudb.json
  • 14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
  • 14:53 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
  • 14:53 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
  • 14:52 bking@cumin2002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host search-loader1002.eqiad.wmnet
  • 14:51 bking@cumin2002: START - Cookbook sre.hosts.reboot-single for host search-loader1002.eqiad.wmnet
  • 14:51 claime: cleaning up old shellbox files on mw1438
  • 14:43 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
  • 14:43 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1011.eqiad.wmnet
  • 14:39 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65955 and previous config saved to /var/cache/conftool/dbconfig/20240708-143925-arnaudb.json
  • 14:37 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1180 (T367781)', diff saved to https://phabricator.wikimedia.org/P65954 and previous config saved to /var/cache/conftool/dbconfig/20240708-143716-arnaudb.json
  • 14:37 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1180.eqiad.wmnet with reason: Maintenance
  • 14:36 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65953 and previous config saved to /var/cache/conftool/dbconfig/20240708-143654-arnaudb.json
  • 14:34 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1011.eqiad.wmnet
  • 14:31 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1011.eqiad.wmnet
  • 14:27 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:27 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:23 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mobileapps: apply
  • 14:22 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/mobileapps: apply
  • 14:22 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/mobileapps: apply
  • 14:21 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/mobileapps: apply
  • 14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:21 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65952 and previous config saved to /var/cache/conftool/dbconfig/20240708-142147-arnaudb.json
  • 14:21 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:21 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:20 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:20 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:18 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:17 bking@cumin2002: END (FAIL) - Cookbook sre.wdqs.reboot (exit_code=99)
  • 14:17 bking@cumin2002: START - Cookbook sre.wdqs.reboot
  • 14:17 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3068.esams.wmnet
  • 14:16 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3076.esams.wmnet
  • 14:14 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:14 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1216.eqiad.wmnet with reason: Maintenance
  • 14:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65951 and previous config saved to /var/cache/conftool/dbconfig/20240708-141432-marostegui.json
  • 14:13 claime: cleaning up old shellbox files on mw1446
  • 14:06 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168', diff saved to https://phabricator.wikimedia.org/P65950 and previous config saved to /var/cache/conftool/dbconfig/20240708-140640-arnaudb.json
  • 13:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65949 and previous config saved to /var/cache/conftool/dbconfig/20240708-135925-marostegui.json
  • 13:58 urbanecm@deploy1002: Finished scap: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 (duration: 10m 36s)
  • 13:53 urbanecm@deploy1002: phuedx, urbanecm: Continuing with sync
  • 13:51 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65948 and previous config saved to /var/cache/conftool/dbconfig/20240708-135132-arnaudb.json
  • 13:50 urbanecm@deploy1002: phuedx, urbanecm: Backport for lib: Update metrics-platform to 84ed8dcbe7c9 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1168 (T367781)', diff saved to https://phabricator.wikimedia.org/P65947 and previous config saved to /var/cache/conftool/dbconfig/20240708-135024-arnaudb.json
  • 13:50 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:50 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1168.eqiad.wmnet with reason: Maintenance
  • 13:50 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65946 and previous config saved to /var/cache/conftool/dbconfig/20240708-135002-arnaudb.json
  • 13:48 urbanecm@deploy1002: Started scap sync-world: Backport for lib: Update metrics-platform to 84ed8dcbe7c9
  • 13:47 urbanecm@deploy1002: Finished scap: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) (duration: 30m 38s)
  • 13:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213', diff saved to https://phabricator.wikimedia.org/P65945 and previous config saved to /var/cache/conftool/dbconfig/20240708-134418-marostegui.json
  • 13:42 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.rolling-operation (exit_code=0) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:39 urbanecm@deploy1002: tchin, jforrester, urbanecm: Continuing with sync
  • 13:34 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65944 and previous config saved to /var/cache/conftool/dbconfig/20240708-133456-arnaudb.json
  • 13:32 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:32 bking@cumin2002: END (FAIL) - Cookbook sre.elasticsearch.rolling-operation (exit_code=99) Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:32 urbanecm@deploy1002: tchin, jforrester, urbanecm: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:31 bking@cumin2002: START - Cookbook sre.elasticsearch.rolling-operation Operation.REBOOT (1 nodes at a time) for ElasticSearch cluster relforge: security update - bking@cumin2002 - T366555
  • 13:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65943 and previous config saved to /var/cache/conftool/dbconfig/20240708-132911-marostegui.json
  • 13:19 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165', diff saved to https://phabricator.wikimedia.org/P65942 and previous config saved to /var/cache/conftool/dbconfig/20240708-131948-arnaudb.json
  • 13:17 urbanecm@deploy1002: Started scap sync-world: Backport for EventStreamConfig: Add hive ingestion defaults (T367134), [wikifunctionswiki] Disable MobileFrontend in production (T349408)
  • 13:04 arnaudb@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65941 and previous config saved to /var/cache/conftool/dbconfig/20240708-130441-arnaudb.json
  • 13:03 arnaudb@cumin1002: dbctl commit (dc=all): 'Depooling db1165 (T367781)', diff saved to https://phabricator.wikimedia.org/P65940 and previous config saved to /var/cache/conftool/dbconfig/20240708-130333-arnaudb.json
  • 13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:03 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1015,1019,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 13:03 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 13:02 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 4:00:00 on db1165.eqiad.wmnet with reason: Maintenance
  • 12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:51 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host an-mariadb1002.eqiad.wmnet with OS bookworm
  • 12:51 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:48 vgutierrez: test bwlimit per url on cp4051 - T317799
  • 12:43 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65939 and previous config saved to /var/cache/conftool/dbconfig/20240708-124310-marostegui.json
  • 12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3067.esams.wmnet
  • 12:36 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3075.esams.wmnet
  • 12:35 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
  • 12:32 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on an-mariadb1002.eqiad.wmnet with reason: host reimage
  • 12:27 btullis@deploy1002: Finished deploy [airflow-dags/analytics@a2faba7]: (no justification provided) (duration: 00m 27s)
  • 12:27 btullis@deploy1002: Started deploy [airflow-dags/analytics@a2faba7]: (no justification provided)
  • 12:19 btullis@cumin1002: START - Cookbook sre.hosts.reimage for host an-mariadb1002.eqiad.wmnet with OS bookworm
  • 11:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65938 and previous config saved to /var/cache/conftool/dbconfig/20240708-115422-root.json
  • 11:47 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 262476
  • 11:47 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 262476
  • 11:39 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65937 and previous config saved to /var/cache/conftool/dbconfig/20240708-113917-root.json
  • 11:34 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 11:34 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2139.codfw.wmnet with reason: Maintenance
  • 11:27 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 11:26 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 11:26 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 11:25 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 11:25 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:25 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 11:24 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 11:24 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65936 and previous config saved to /var/cache/conftool/dbconfig/20240708-112411-root.json
  • 11:09 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65935 and previous config saved to /var/cache/conftool/dbconfig/20240708-110905-root.json
  • 10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3066.esams.wmnet
  • 10:55 fabfur@cumin1002: cookbooks.sre.cdn.roll-reboot finished rebooting cp3074.esams.wmnet
  • 10:54 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65934 and previous config saved to /var/cache/conftool/dbconfig/20240708-105400-root.json
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1213 (T367856)', diff saved to https://phabricator.wikimedia.org/P65933 and previous config saved to /var/cache/conftool/dbconfig/20240708-105348-marostegui.json
  • 10:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1213.eqiad.wmnet with reason: Maintenance
  • 10:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65932 and previous config saved to /var/cache/conftool/dbconfig/20240708-105325-marostegui.json
  • 10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_esams
  • 10:45 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_esams
  • 10:45 fabfur: rebooting A:cp-esams (T366555)
  • 10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270359
  • 10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270359
  • 10:43 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
  • 10:43 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
  • 10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 262476
  • 10:42 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 262476
  • 10:42 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 272432
  • 10:41 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 272432
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65931 and previous config saved to /var/cache/conftool/dbconfig/20240708-103854-root.json
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65930 and previous config saved to /var/cache/conftool/dbconfig/20240708-103818-marostegui.json
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'db2213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65929 and previous config saved to /var/cache/conftool/dbconfig/20240708-102347-root.json
  • 10:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210', diff saved to https://phabricator.wikimedia.org/P65928 and previous config saved to /var/cache/conftool/dbconfig/20240708-102311-marostegui.json
  • 10:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65927 and previous config saved to /var/cache/conftool/dbconfig/20240708-100804-marostegui.json
  • 10:06 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 10:02 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:00 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:58 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 09:55 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 09:50 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rest-gateway: sync
  • 09:50 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/rest-gateway: sync
  • 09:49 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/api-gateway: sync
  • 09:49 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/api-gateway: sync
  • 09:44 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/rest-gateway: sync
  • 09:44 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/rest-gateway: sync
  • 09:41 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/api-gateway: sync
  • 09:41 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/api-gateway: sync
  • 09:38 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: sync
  • 09:38 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: sync
  • 09:32 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: sync
  • 09:32 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: sync
  • 09:31 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifeeds: sync
  • 09:31 elukey@deploy1002: helmfile [staging] START helmfile.d/services/wikifeeds: sync
  • 09:17 arturo: aborrero@apt1002:~$ sudo -i reprepro --component thirdparty/k9s includedeb bookworm-wikimedia /home/aborrero/k9s_linux_amd64.deb (T366061)
  • 08:59 mvernon@cumin2002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-codfw
  • 08:56 mvernon@cumin2002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-codfw
  • 08:51 Dreamy_Jazz: Running `foreachwikiindblist group1.dblist extensions/CheckUser/maintenance/deleteReadOldRowsInCuChanges.php --batch-size=200` in a tmux session
  • 08:50 Dreamy_Jazz: Restarting MediaModeration scanning script - https://wikitech.wikimedia.org/wiki/MediaModeration
  • 08:42 arturo: update packages for thirdparty/kubeadm-k8s-1-25 bookworm-wikimedia in apt1002 (T369163)
  • 08:26 godog: re-enable business hours americas oncall - T369122
  • 07:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'configure' for AS: 270052
  • 07:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'configure' for AS: 270052
  • 06:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52455
  • 06:16 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52455
  • 06:15 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 137409
  • 06:14 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 137409
  • 06:13 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 27768
  • 06:13 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 27768
  • 06:11 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61512
  • 06:09 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61512
  • 06:08 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 269783
  • 06:08 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 269783
  • 06:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52320
  • 06:05 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52320
  • 06:05 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 7738
  • 06:04 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 7738
  • 06:04 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 52468
  • 06:03 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 52468
  • 06:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 270052
  • 06:01 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 270052
  • 05:59 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28008
  • 05:59 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28008
  • 05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 17072
  • 05:39 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 17072
  • 05:39 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 263522
  • 05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 263522
  • 05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61942
  • 05:38 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61942
  • 05:38 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 18013
  • 05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 18013
  • 05:37 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 268248
  • 05:37 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 268248
  • 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 61672
  • 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 61672
  • 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28352
  • 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28352
  • 05:36 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 999
  • 05:36 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 999
  • 05:35 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 4788
  • 05:34 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 4788
  • 05:34 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 132167
  • 05:33 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 132167
  • 05:33 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 6447
  • 05:32 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 6447
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1210 (T367856)', diff saved to https://phabricator.wikimedia.org/P65926 and previous config saved to /var/cache/conftool/dbconfig/20240708-053133-marostegui.json
  • 05:31 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 05:31 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1210.eqiad.wmnet with reason: Maintenance
  • 05:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65925 and previous config saved to /var/cache/conftool/dbconfig/20240708-053122-marostegui.json
  • 05:30 ayounsi@cumin1002: END (PASS) - Cookbook sre.network.peering (exit_code=0) with action 'email' for AS: 28306
  • 05:29 ayounsi@cumin1002: START - Cookbook sre.network.peering with action 'email' for AS: 28306
  • 05:24 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
  • 05:24 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2213.codfw.wmnet with reason: Long schema change
  • 05:24 marostegui: Deploy schema change on s5 codfw db2213 dbmaint T367856
  • 05:19 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2213 T369478', diff saved to https://phabricator.wikimedia.org/P65923 and previous config saved to /var/cache/conftool/dbconfig/20240708-051935-root.json
  • 05:18 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2123 to s5 primary T369478', diff saved to https://phabricator.wikimedia.org/P65922 and previous config saved to /var/cache/conftool/dbconfig/20240708-051840-root.json
  • 05:18 marostegui: Starting s5 codfw failover from db2213 to db2123 - T369478
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65921 and previous config saved to /var/cache/conftool/dbconfig/20240708-051615-marostegui.json
  • 05:16 marostegui@cumin1002: dbctl commit (dc=all): 'Remove db2123 from dump/slow', diff saved to https://phabricator.wikimedia.org/P65920 and previous config saved to /var/cache/conftool/dbconfig/20240708-051605-marostegui.json
  • 05:03 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2123 with weight 0 T369478', diff saved to https://phabricator.wikimedia.org/P65919 and previous config saved to /var/cache/conftool/dbconfig/20240708-050301-root.json
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 25 hosts with reason: Primary switchover s5 T369478
  • 04:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200', diff saved to https://phabricator.wikimedia.org/P65918 and previous config saved to /var/cache/conftool/dbconfig/20240708-045246-marostegui.json
  • 04:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65917 and previous config saved to /var/cache/conftool/dbconfig/20240708-043738-marostegui.json
  • 01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1200 (T367856)', diff saved to https://phabricator.wikimedia.org/P65916 and previous config saved to /var/cache/conftool/dbconfig/20240708-014044-marostegui.json
  • 01:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1200.eqiad.wmnet with reason: Maintenance
  • 01:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65915 and previous config saved to /var/cache/conftool/dbconfig/20240708-014022-marostegui.json
  • 01:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65914 and previous config saved to /var/cache/conftool/dbconfig/20240708-012515-marostegui.json
  • 01:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185', diff saved to https://phabricator.wikimedia.org/P65913 and previous config saved to /var/cache/conftool/dbconfig/20240708-011008-marostegui.json
  • 00:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65912 and previous config saved to /var/cache/conftool/dbconfig/20240708-005501-marostegui.json

2024-07-07

  • 21:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1185 (T367856)', diff saved to https://phabricator.wikimedia.org/P65911 and previous config saved to /var/cache/conftool/dbconfig/20240707-215014-marostegui.json
  • 21:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1185.eqiad.wmnet with reason: Maintenance
  • 21:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65910 and previous config saved to /var/cache/conftool/dbconfig/20240707-214952-marostegui.json
  • 21:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65909 and previous config saved to /var/cache/conftool/dbconfig/20240707-213445-marostegui.json
  • 21:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161', diff saved to https://phabricator.wikimedia.org/P65908 and previous config saved to /var/cache/conftool/dbconfig/20240707-211938-marostegui.json
  • 21:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65907 and previous config saved to /var/cache/conftool/dbconfig/20240707-210430-marostegui.json
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1161 (T367856)', diff saved to https://phabricator.wikimedia.org/P65906 and previous config saved to /var/cache/conftool/dbconfig/20240707-154059-marostegui.json
  • 15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1016,1020-1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance
  • 15:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1161.eqiad.wmnet with reason: Maintenance

2024-07-06

  • 18:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65905 and previous config saved to /var/cache/conftool/dbconfig/20240706-182625-marostegui.json
  • 18:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65904 and previous config saved to /var/cache/conftool/dbconfig/20240706-181117-marostegui.json
  • 17:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211', diff saved to https://phabricator.wikimedia.org/P65903 and previous config saved to /var/cache/conftool/dbconfig/20240706-175610-marostegui.json
  • 17:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65902 and previous config saved to /var/cache/conftool/dbconfig/20240706-174103-marostegui.json
  • 17:21 hnowlan@cumin1002: END (PASS) - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies (exit_code=0) rolling restart_daemons on A:swift-fe-eqiad
  • 17:18 hnowlan@cumin1002: START - Cookbook sre.swift.roll-restart-reboot-swift-ms-proxies rolling restart_daemons on A:swift-fe-eqiad
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2211 (T367856)', diff saved to https://phabricator.wikimedia.org/P65901 and previous config saved to /var/cache/conftool/dbconfig/20240706-124535-marostegui.json
  • 12:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 12:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2211.codfw.wmnet with reason: Maintenance
  • 07:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2201.codfw.wmnet with reason: Maintenance
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65900 and previous config saved to /var/cache/conftool/dbconfig/20240706-075448-marostegui.json
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65899 and previous config saved to /var/cache/conftool/dbconfig/20240706-073941-marostegui.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192', diff saved to https://phabricator.wikimedia.org/P65898 and previous config saved to /var/cache/conftool/dbconfig/20240706-072434-marostegui.json
  • 07:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65897 and previous config saved to /var/cache/conftool/dbconfig/20240706-070927-marostegui.json
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2192 (T367856)', diff saved to https://phabricator.wikimedia.org/P65896 and previous config saved to /var/cache/conftool/dbconfig/20240706-043535-marostegui.json
  • 04:35 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2192.codfw.wmnet with reason: Maintenance
  • 04:35 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65895 and previous config saved to /var/cache/conftool/dbconfig/20240706-043513-marostegui.json
  • 04:20 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65894 and previous config saved to /var/cache/conftool/dbconfig/20240706-042006-marostegui.json
  • 04:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178', diff saved to https://phabricator.wikimedia.org/P65893 and previous config saved to /var/cache/conftool/dbconfig/20240706-040459-marostegui.json
  • 03:49 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65892 and previous config saved to /var/cache/conftool/dbconfig/20240706-034952-marostegui.json
  • 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2178 (T367856)', diff saved to https://phabricator.wikimedia.org/P65891 and previous config saved to /var/cache/conftool/dbconfig/20240706-005648-marostegui.json
  • 00:56 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:56 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2178.codfw.wmnet with reason: Maintenance
  • 00:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65890 and previous config saved to /var/cache/conftool/dbconfig/20240706-005626-marostegui.json
  • 00:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65889 and previous config saved to /var/cache/conftool/dbconfig/20240706-004119-marostegui.json
  • 00:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171', diff saved to https://phabricator.wikimedia.org/P65888 and previous config saved to /var/cache/conftool/dbconfig/20240706-002612-marostegui.json
  • 00:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65887 and previous config saved to /var/cache/conftool/dbconfig/20240706-001105-marostegui.json

2024-07-05

  • 20:05 akosiaris@deploy1003: helmfile [staging] DONE helmfile.d/services/api-gateway: apply
  • 20:04 akosiaris@deploy1003: helmfile [staging] START helmfile.d/services/api-gateway: apply
  • 18:56 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2171 (T367856)', diff saved to https://phabricator.wikimedia.org/P65886 and previous config saved to /var/cache/conftool/dbconfig/20240705-185604-marostegui.json
  • 18:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2171.codfw.wmnet with reason: Maintenance
  • 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65885 and previous config saved to /var/cache/conftool/dbconfig/20240705-185542-marostegui.json
  • 18:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65884 and previous config saved to /var/cache/conftool/dbconfig/20240705-184034-marostegui.json
  • 18:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65883 and previous config saved to /var/cache/conftool/dbconfig/20240705-183428-root.json
  • 18:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157', diff saved to https://phabricator.wikimedia.org/P65882 and previous config saved to /var/cache/conftool/dbconfig/20240705-182527-marostegui.json
  • 18:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65881 and previous config saved to /var/cache/conftool/dbconfig/20240705-181923-root.json
  • 18:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65880 and previous config saved to /var/cache/conftool/dbconfig/20240705-181020-marostegui.json
  • 18:04 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65879 and previous config saved to /var/cache/conftool/dbconfig/20240705-180417-root.json
  • 17:56 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65878 and previous config saved to /var/cache/conftool/dbconfig/20240705-175653-ladsgroup.json
  • 17:49 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65877 and previous config saved to /var/cache/conftool/dbconfig/20240705-174912-root.json
  • 17:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65876 and previous config saved to /var/cache/conftool/dbconfig/20240705-174146-ladsgroup.json
  • 17:34 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65875 and previous config saved to /var/cache/conftool/dbconfig/20240705-173406-root.json
  • 17:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207', diff saved to https://phabricator.wikimedia.org/P65874 and previous config saved to /var/cache/conftool/dbconfig/20240705-172639-ladsgroup.json
  • 17:19 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65873 and previous config saved to /var/cache/conftool/dbconfig/20240705-171901-root.json
  • 17:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65872 and previous config saved to /var/cache/conftool/dbconfig/20240705-171131-ladsgroup.json
  • 17:03 marostegui@cumin1002: dbctl commit (dc=all): 'db1246 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65871 and previous config saved to /var/cache/conftool/dbconfig/20240705-170356-root.json
  • 17:00 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@73c6618]: (no justification provided) (duration: 00m 06s)
  • 17:00 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@73c6618]: (no justification provided)
  • 13:40 hashar@deploy1002: Finished deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484 (duration: 00m 06s)
  • 13:40 hashar@deploy1002: Started deploy [integration/docroot@18c8279]: Add AQS documentation to landing page - T368484
  • 12:57 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
  • 12:57 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 8:00:00 on db1246.eqiad.wmnet with reason: Long schema change
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2157 (T367856)', diff saved to https://phabricator.wikimedia.org/P65869 and previous config saved to /var/cache/conftool/dbconfig/20240705-125152-marostegui.json
  • 12:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2157.codfw.wmnet with reason: Maintenance
  • 12:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65868 and previous config saved to /var/cache/conftool/dbconfig/20240705-125130-marostegui.json
  • 12:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65867 and previous config saved to /var/cache/conftool/dbconfig/20240705-123623-marostegui.json
  • 12:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128', diff saved to https://phabricator.wikimedia.org/P65866 and previous config saved to /var/cache/conftool/dbconfig/20240705-122115-marostegui.json
  • 12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65865 and previous config saved to /var/cache/conftool/dbconfig/20240705-120608-marostegui.json
  • 11:57 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65864 and previous config saved to /var/cache/conftool/dbconfig/20240705-115703-ladsgroup.json
  • 11:53 dcausse: T369149: re-indexed wikidata P12861 (cirrus_rerender.rerender --wiki wikidatawiki allpages --namespace 120 --from-title P12861 --to-title P12861)
  • 11:41 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65863 and previous config saved to /var/cache/conftool/dbconfig/20240705-114157-ladsgroup.json
  • 11:30 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 11:29 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 4 days, 0:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 11:26 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65862 and previous config saved to /var/cache/conftool/dbconfig/20240705-112652-ladsgroup.json
  • 11:13 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db2207 (T352010)', diff saved to https://phabricator.wikimedia.org/P65861 and previous config saved to /var/cache/conftool/dbconfig/20240705-111322-ladsgroup.json
  • 11:13 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 11:13 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Maintenance
  • 11:11 ladsgroup@cumin1002: dbctl commit (dc=all): 'db1231 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65860 and previous config saved to /var/cache/conftool/dbconfig/20240705-111146-ladsgroup.json
  • 10:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 10:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1008.eqiad.wmnet with reason: Maintenance
  • 10:41 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) (duration: 21m 22s)
  • 10:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 10:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 10:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Define custom search-index-data-formatter-callback (T369149), Try looking up search index data formatters by data type (T369149)
  • 10:11 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 10:10 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:35 fabfur: running puppet on A:cp to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1052271 (T369345)
  • 09:26 XioNoX: netbox-dev2003: move from netbox-dev to netbox-next - T336275
  • 08:55 godog: silence NELNotReported NELByCountryNotReported until Tues - T369345
  • 08:54 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2128 (T367856)', diff saved to https://phabricator.wikimedia.org/P65858 and previous config saved to /var/cache/conftool/dbconfig/20240705-085406-marostegui.json
  • 08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2128.codfw.wmnet with reason: Maintenance
  • 08:53 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65857 and previous config saved to /var/cache/conftool/dbconfig/20240705-085329-marostegui.json
  • 08:38 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65856 and previous config saved to /var/cache/conftool/dbconfig/20240705-083821-marostegui.json
  • 08:23 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123', diff saved to https://phabricator.wikimedia.org/P65855 and previous config saved to /var/cache/conftool/dbconfig/20240705-082314-marostegui.json
  • 08:08 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65854 and previous config saved to /var/cache/conftool/dbconfig/20240705-080807-marostegui.json
  • 08:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 08:04 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1240.eqiad.wmnet with reason: Maintenance
  • 07:50 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:50 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:47 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 07:44 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 05:12 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 05:12 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 05:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65852 and previous config saved to /var/cache/conftool/dbconfig/20240705-051202-marostegui.json
  • 05:00 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2136', diff saved to https://phabricator.wikimedia.org/P65851 and previous config saved to /var/cache/conftool/dbconfig/20240705-050028-root.json
  • 04:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65850 and previous config saved to /var/cache/conftool/dbconfig/20240705-045655-marostegui.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2123 (T367856)', diff saved to https://phabricator.wikimedia.org/P65849 and previous config saved to /var/cache/conftool/dbconfig/20240705-045145-marostegui.json
  • 04:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2123.codfw.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65848 and previous config saved to /var/cache/conftool/dbconfig/20240705-044912-marostegui.json
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 04:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235', diff saved to https://phabricator.wikimedia.org/P65847 and previous config saved to /var/cache/conftool/dbconfig/20240705-044148-marostegui.json
  • 04:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65846 and previous config saved to /var/cache/conftool/dbconfig/20240705-042641-marostegui.json
  • 01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1235 (T364069)', diff saved to https://phabricator.wikimedia.org/P65845 and previous config saved to /var/cache/conftool/dbconfig/20240705-013250-marostegui.json
  • 01:32 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 01:32 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1235.eqiad.wmnet with reason: Maintenance
  • 01:32 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65844 and previous config saved to /var/cache/conftool/dbconfig/20240705-013229-marostegui.json
  • 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65843 and previous config saved to /var/cache/conftool/dbconfig/20240705-011721-marostegui.json
  • 01:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234', diff saved to https://phabricator.wikimedia.org/P65842 and previous config saved to /var/cache/conftool/dbconfig/20240705-010214-marostegui.json
  • 00:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65841 and previous config saved to /var/cache/conftool/dbconfig/20240705-004707-marostegui.json

2024-07-04

  • 22:04 mvolz@deploy1002: helmfile [eqiad] DONE helmfile.d/services/citoid: apply
  • 22:03 mvolz@deploy1002: helmfile [eqiad] START helmfile.d/services/citoid: apply
  • 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1234 (T364069)', diff saved to https://phabricator.wikimedia.org/P65840 and previous config saved to /var/cache/conftool/dbconfig/20240704-220227-marostegui.json
  • 22:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 22:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1234.eqiad.wmnet with reason: Maintenance
  • 22:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65839 and previous config saved to /var/cache/conftool/dbconfig/20240704-220205-marostegui.json
  • 22:01 mvolz@deploy1002: helmfile [codfw] DONE helmfile.d/services/citoid: apply
  • 22:00 mvolz@deploy1002: helmfile [codfw] START helmfile.d/services/citoid: apply
  • 21:59 mvolz@deploy1002: helmfile [staging] DONE helmfile.d/services/citoid: apply
  • 21:59 mvolz@deploy1002: helmfile [staging] START helmfile.d/services/citoid: apply
  • 21:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65838 and previous config saved to /var/cache/conftool/dbconfig/20240704-214658-marostegui.json
  • 21:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232', diff saved to https://phabricator.wikimedia.org/P65837 and previous config saved to /var/cache/conftool/dbconfig/20240704-213151-marostegui.json
  • 21:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65836 and previous config saved to /var/cache/conftool/dbconfig/20240704-211644-marostegui.json
  • 20:17 jdrewniak@deploy1002: Finished scap: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) (duration: 12m 14s)
  • 20:12 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Continuing with sync
  • 20:08 jdrewniak@deploy1002: jdlrobson, nmw03, jdrewniak, dreamyjazz: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 jdrewniak@deploy1002: Started scap sync-world: Backport for [July 4th] Reduce list of exclusions for dark mode (1.43.0-wmf.12), Remove modifications of wgCheckUserLogAdditionalRights (T346022), Add editcontentmodel to interface-admin for French Wikipedia (T369113)
  • 19:57 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_eqiad
  • 19:55 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_eqiad
  • 18:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1232 (T364069)', diff saved to https://phabricator.wikimedia.org/P65835 and previous config saved to /var/cache/conftool/dbconfig/20240704-182308-marostegui.json
  • 18:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 18:23 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1232.eqiad.wmnet with reason: Maintenance
  • 18:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65834 and previous config saved to /var/cache/conftool/dbconfig/20240704-182257-marostegui.json
  • 18:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65833 and previous config saved to /var/cache/conftool/dbconfig/20240704-180749-marostegui.json
  • 17:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219', diff saved to https://phabricator.wikimedia.org/P65832 and previous config saved to /var/cache/conftool/dbconfig/20240704-175242-marostegui.json
  • 17:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65831 and previous config saved to /var/cache/conftool/dbconfig/20240704-173735-marostegui.json
  • 17:10 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-worker1078.eqiad.wmnet
  • 16:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 16:19 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 16:15 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-worker1078.eqiad.wmnet
  • 16:14 btullis@cumin1002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
  • 16:14 btullis@cumin1002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 16:06 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 15:49 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 15:02 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:02 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1219 (T364069)', diff saved to https://phabricator.wikimedia.org/P65830 and previous config saved to /var/cache/conftool/dbconfig/20240704-143350-marostegui.json
  • 14:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 14:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1219.eqiad.wmnet with reason: Maintenance
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65829 and previous config saved to /var/cache/conftool/dbconfig/20240704-143327-marostegui.json
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65827 and previous config saved to /var/cache/conftool/dbconfig/20240704-141820-marostegui.json
  • 14:03 Lucas_WMDE: UTC afternoon backport+config window done
  • 14:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218', diff saved to https://phabricator.wikimedia.org/P65826 and previous config saved to /var/cache/conftool/dbconfig/20240704-140313-marostegui.json
  • 14:01 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65825 and previous config saved to /var/cache/conftool/dbconfig/20240704-140145-root.json
  • 13:57 claime: Enabling puppet on cp4037.ulsfo.wmnet to test 1050293 - T367949
  • 13:53 claime: disabling puppet on P:trafficserver::backend to merge 1049507 - T367949
  • 13:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65824 and previous config saved to /var/cache/conftool/dbconfig/20240704-134806-marostegui.json
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65823 and previous config saved to /var/cache/conftool/dbconfig/20240704-134656-root.json
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65822 and previous config saved to /var/cache/conftool/dbconfig/20240704-134639-root.json
  • 13:44 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) (duration: 08m 35s)
  • 13:41 claime: Enabling and running puppet on P:trafficserver::backend to merge 1050293 - T367949
  • 13:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on dbstore1007.eqiad.wmnet with reason: Maintenance
  • 13:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65821 and previous config saved to /var/cache/conftool/dbconfig/20240704-134105-marostegui.json
  • 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Continuing with sync
  • 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 dreamrimmer, lucaswerkmeister-wmde: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:36 claime: Enabling puppet on cp6016.drmrs.wmnet to test 1050293 - T367949
  • 13:36 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Remove "Create a book" link from sidebar on German Wikipedia (T368900)
  • 13:32 claime: disabling puppet on P:trafficserver::backend to merge 1050293 - T367949
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65820 and previous config saved to /var/cache/conftool/dbconfig/20240704-133150-root.json
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65819 and previous config saved to /var/cache/conftool/dbconfig/20240704-133133-root.json
  • 13:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65818 and previous config saved to /var/cache/conftool/dbconfig/20240704-132558-marostegui.json
  • 13:20 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 03s)
  • 13:20 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65817 and previous config saved to /var/cache/conftool/dbconfig/20240704-131643-root.json
  • 13:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65816 and previous config saved to /var/cache/conftool/dbconfig/20240704-131628-root.json
  • 13:11 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:11 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246', diff saved to https://phabricator.wikimedia.org/P65815 and previous config saved to /var/cache/conftool/dbconfig/20240704-131050-marostegui.json
  • 13:09 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:09 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:08 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:07 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65814 and previous config saved to /var/cache/conftool/dbconfig/20240704-130137-root.json
  • 13:01 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65813 and previous config saved to /var/cache/conftool/dbconfig/20240704-130122-root.json
  • 12:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65812 and previous config saved to /var/cache/conftool/dbconfig/20240704-125543-marostegui.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65811 and previous config saved to /var/cache/conftool/dbconfig/20240704-124632-root.json
  • 12:46 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65810 and previous config saved to /var/cache/conftool/dbconfig/20240704-124617-root.json
  • 12:36 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65808 and previous config saved to /var/cache/conftool/dbconfig/20240704-123127-root.json
  • 12:31 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65807 and previous config saved to /var/cache/conftool/dbconfig/20240704-123111-root.json
  • 12:27 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213', diff saved to https://phabricator.wikimedia.org/P65806 and previous config saved to /var/cache/conftool/dbconfig/20240704-122752-root.json
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1213 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65805 and previous config saved to /var/cache/conftool/dbconfig/20240704-121631-root.json
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db1185 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65804 and previous config saved to /var/cache/conftool/dbconfig/20240704-121621-root.json
  • 12:11 hashar@deploy1002: Finished scap: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) (duration: 07m 45s)
  • 12:06 hashar@deploy1002: hashar, d3r1ck01: Continuing with sync
  • 12:06 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:03 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
  • 12:02 hashar@deploy1002: Sync cancelled.
  • 12:02 hashar@deploy1002: hashar, d3r1ck01: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:56 hashar@deploy1002: Started scap sync-world: Backport for PermissionManager: Handle empty error array from TitleQuickPermissions (T369260)
  • 11:55 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1246 (T367856)', diff saved to https://phabricator.wikimedia.org/P65803 and previous config saved to /var/cache/conftool/dbconfig/20240704-115522-marostegui.json
  • 11:55 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 11:55 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1246.eqiad.wmnet with reason: Maintenance
  • 11:54 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
  • 11:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 11:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 11:40 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 11:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 11:14 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1185.eqiad.wmnet onto db1213.eqiad.wmnet
  • 11:13 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1213 db1185 T369250', diff saved to https://phabricator.wikimedia.org/P65802 and previous config saved to /var/cache/conftool/dbconfig/20240704-111324-root.json
  • 10:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1218 (T364069)', diff saved to https://phabricator.wikimedia.org/P65801 and previous config saved to /var/cache/conftool/dbconfig/20240704-105205-marostegui.json
  • 10:51 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1218.eqiad.wmnet with reason: Maintenance
  • 10:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65800 and previous config saved to /var/cache/conftool/dbconfig/20240704-105143-marostegui.json
  • 10:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65799 and previous config saved to /var/cache/conftool/dbconfig/20240704-103636-marostegui.json
  • 10:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207', diff saved to https://phabricator.wikimedia.org/P65798 and previous config saved to /var/cache/conftool/dbconfig/20240704-102129-marostegui.json
  • 10:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65797 and previous config saved to /var/cache/conftool/dbconfig/20240704-100622-marostegui.json
  • 09:53 topranks: Pushing updated BGP policy to cr2-eqord in Chiacago to re-announce codfw IP ranges there T367439
  • 09:29 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
  • 09:24 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
  • 09:23 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1009.eqiad.wmnet with OS bullseye
  • 09:23 claime: Manual cleanup of puppet certs for renamed servers mw1417.eqiad.wmnet mw1418.eqiad.wmnet mw2300.codfw.wmnet
  • 09:22 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 09:21 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Maintenance
  • 09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 09:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
  • 09:16 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: remove old sretest2005 IP - ayounsi@cumin1002"
  • 09:13 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 09:05 hashar@deploy1002: rebuilt and synchronized wikiversions files: Revert "group2 wikis to 1.43.0-wmf.12" - T366957
  • 09:03 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
  • 09:00 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1009.eqiad.wmnet with reason: host reimage
  • 08:59 elukey: restart mcrouter on mwmaint1002
  • 08:56 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:45 fabfur: enable puppet on A:cp-ulsfo (T365718)
  • 08:45 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1009.eqiad.wmnet with OS bullseye
  • 08:44 fabfur@cumin1002: conftool action : set/pooled=yes; selector: name=cp4037.ulsfo.wmnet
  • 08:43 fabfur@cumin1002: conftool action : set/pooled=no; selector: name=cp4037.ulsfo.wmnet
  • 08:28 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:28 hashar@deploy1002: rebuilt and synchronized wikiversions files: group2 wikis to 1.43.0-wmf.12 refs T366957
  • 08:24 fabfur: temporary disable puppet on A:cp-ulsfo to apply https://gerrit.wikimedia.org/r/c/operations/puppet/+/1051198 (T365718)
  • 08:10 isaranto@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_eqiad
  • 08:02 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_eqiad
  • 08:01 fabfur: start rebooting A:cp-eqiad (upload|text in parallel) for T366555
  • 07:52 root@cumin1002: END (PASS) - Cookbook sre.hardware.upgrade-firmware (exit_code=0) upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
  • 07:52 root@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1009.eqiad.wmnet
  • 07:41 root@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1009.eqiad.wmnet
  • 07:35 root@cumin1002: START - Cookbook sre.hardware.upgrade-firmware upgrade firmware for hosts cloudcephosd1009.eqiad.wmnet
  • 07:18 dcausse: closing the backport window
  • 07:15 dcausse: refreshing the wikitech search indices
  • 07:11 dcausse@deploy1002: Finished scap: Backport for cirrus: re-enable search updates on wikitech (duration: 08m 28s)
  • 07:06 dcausse@deploy1002: dcausse: Continuing with sync
  • 07:05 dcausse@deploy1002: dcausse: Backport for cirrus: re-enable search updates on wikitech synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:02 dcausse@deploy1002: Started scap sync-world: Backport for cirrus: re-enable search updates on wikitech
  • 07:01 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1207 (T364069)', diff saved to https://phabricator.wikimedia.org/P65794 and previous config saved to /var/cache/conftool/dbconfig/20240704-070100-marostegui.json
  • 07:00 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1207.eqiad.wmnet with reason: Maintenance
  • 07:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65793 and previous config saved to /var/cache/conftool/dbconfig/20240704-070038-marostegui.json
  • 06:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206', diff saved to https://phabricator.wikimedia.org/P65791 and previous config saved to /var/cache/conftool/dbconfig/20240704-063024-marostegui.json
  • 06:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65790 and previous config saved to /var/cache/conftool/dbconfig/20240704-061517-marostegui.json
  • 05:11 marostegui: Deploy schema change on db1231 s6 eqiad dbmaint T367856
  • 05:08 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
  • 05:08 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1231.eqiad.wmnet with reason: Long schema change
  • 05:03 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1231 T369020', diff saved to https://phabricator.wikimedia.org/P65789 and previous config saved to /var/cache/conftool/dbconfig/20240704-050334-marostegui.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1173 to s6 primary and set section read-write T369020', diff saved to https://phabricator.wikimedia.org/P65788 and previous config saved to /var/cache/conftool/dbconfig/20240704-050237-marostegui.json
  • 05:02 marostegui@cumin1002: dbctl commit (dc=all): 'Set s6 eqiad as read-only for maintenance - T369020', diff saved to https://phabricator.wikimedia.org/P65787 and previous config saved to /var/cache/conftool/dbconfig/20240704-050216-marostegui.json
  • 05:01 marostegui: Starting s6 eqiad failover from db1231 to db1173 - T369020
  • 04:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
  • 04:44 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1173 with weight 0 T369020', diff saved to https://phabricator.wikimedia.org/P65786 and previous config saved to /var/cache/conftool/dbconfig/20240704-044429-marostegui.json
  • 04:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369020
  • 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1206 (T364069)', diff saved to https://phabricator.wikimedia.org/P65785 and previous config saved to /var/cache/conftool/dbconfig/20240704-031151-marostegui.json
  • 03:11 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 03:11 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1206.eqiad.wmnet with reason: Maintenance
  • 03:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65784 and previous config saved to /var/cache/conftool/dbconfig/20240704-031129-marostegui.json
  • 02:56 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65783 and previous config saved to /var/cache/conftool/dbconfig/20240704-025622-marostegui.json
  • 02:43 ryankemper@cumin2002: END (PASS) - Cookbook sre.hadoop.reboot-workers (exit_code=0) for Hadoop analytics cluster
  • 02:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196', diff saved to https://phabricator.wikimedia.org/P65782 and previous config saved to /var/cache/conftool/dbconfig/20240704-024115-marostegui.json
  • 02:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
  • 02:31 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
  • 02:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65781 and previous config saved to /var/cache/conftool/dbconfig/20240704-022608-marostegui.json
  • 01:43 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 01:43 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1239.eqiad.wmnet with reason: Maintenance
  • 01:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65780 and previous config saved to /var/cache/conftool/dbconfig/20240704-014313-marostegui.json
  • 01:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65779 and previous config saved to /var/cache/conftool/dbconfig/20240704-012806-marostegui.json
  • 01:12 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233', diff saved to https://phabricator.wikimedia.org/P65778 and previous config saved to /var/cache/conftool/dbconfig/20240704-011258-marostegui.json
  • 00:57 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65777 and previous config saved to /var/cache/conftool/dbconfig/20240704-005750-marostegui.json
  • 00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 00:43 dzahn@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
  • 00:42 dzahn@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - dzahn@cumin1002"
  • 00:29 dzahn@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
  • 00:25 dzahn@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on parsoidtest1001.eqiad.wmnet with reason: host reimage
  • 00:15 dzahn@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye

2024-07-03

  • 23:47 tzatziki: removing 11 files for legal compliance
  • 23:23 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1196 (T364069)', diff saved to https://phabricator.wikimedia.org/P65776 and previous config saved to /var/cache/conftool/dbconfig/20240703-232302-marostegui.json
  • 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet,db1154.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1196.eqiad.wmnet with reason: Maintenance
  • 23:22 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65775 and previous config saved to /var/cache/conftool/dbconfig/20240703-232221-marostegui.json
  • 23:21 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65774 and previous config saved to /var/cache/conftool/dbconfig/20240703-232154-ladsgroup.json
  • 23:07 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65773 and previous config saved to /var/cache/conftool/dbconfig/20240703-230713-marostegui.json
  • 23:06 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65772 and previous config saved to /var/cache/conftool/dbconfig/20240703-230646-ladsgroup.json
  • 22:52 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195', diff saved to https://phabricator.wikimedia.org/P65771 and previous config saved to /var/cache/conftool/dbconfig/20240703-225206-marostegui.json
  • 22:51 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192', diff saved to https://phabricator.wikimedia.org/P65770 and previous config saved to /var/cache/conftool/dbconfig/20240703-225139-ladsgroup.json
  • 22:37 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65769 and previous config saved to /var/cache/conftool/dbconfig/20240703-223659-marostegui.json
  • 22:36 ladsgroup@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65768 and previous config saved to /var/cache/conftool/dbconfig/20240703-223632-ladsgroup.json
  • 22:36 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 21:56 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 21:40 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 21:40 ryankemper@cumin2002: END (FAIL) - Cookbook sre.hadoop.reboot-workers (exit_code=99) for Hadoop analytics cluster
  • 21:35 ryankemper@cumin2002: START - Cookbook sre.hadoop.reboot-workers for Hadoop analytics cluster
  • 20:13 cjming: end of UTC late backport window
  • 20:11 cjming@deploy1002: Finished scap: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) (duration: 08m 22s)
  • 20:10 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
  • 20:06 cjming@deploy1002: kgraessle, cjming: Continuing with sync
  • 20:05 cjming@deploy1002: kgraessle, cjming: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:05 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 20:04 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 20:03 cjming@deploy1002: Started scap sync-world: Backport for Remove QuickSurvey for Automoderator patroller workstream survey (T362969)
  • 19:56 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 19:55 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 19:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 19:49 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1195 (T364069)', diff saved to https://phabricator.wikimedia.org/P65766 and previous config saved to /var/cache/conftool/dbconfig/20240703-194055-marostegui.json
  • 19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1195.eqiad.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65765 and previous config saved to /var/cache/conftool/dbconfig/20240703-194033-marostegui.json
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 19:30 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65761 and previous config saved to /var/cache/conftool/dbconfig/20240703-192526-marostegui.json
  • 19:25 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 19:24 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host sretest2002.codfw.wmnet with OS bookworm
  • 19:19 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 19:16 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=93) for host sretest2002.codfw.wmnet with OS bookworm
  • 19:12 ebysans@deploy1002: Finished deploy [airflow-dags/analytics@d773cac]: (no justification provided) (duration: 00m 33s)
  • 19:11 ebysans@deploy1002: Started deploy [airflow-dags/analytics@d773cac]: (no justification provided)
  • 19:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186', diff saved to https://phabricator.wikimedia.org/P65760 and previous config saved to /var/cache/conftool/dbconfig/20240703-191019-marostegui.json
  • 19:08 SandraEbele_: deploying airflow dags
  • 18:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65759 and previous config saved to /var/cache/conftool/dbconfig/20240703-185511-marostegui.json
  • 18:54 cmooney@cumin1002: START - Cookbook sre.hosts.reimage for host sretest2002.codfw.wmnet with OS bookworm
  • 18:36 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 18:36 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 18:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 18:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 17:50 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 17:49 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 17:49 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 17:48 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 17:46 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 17:45 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 17:45 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 17:44 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 17:44 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 17:43 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 17:43 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 17:41 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 17:41 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 17:40 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 17:40 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 17:37 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 17:37 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 17:36 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 17:36 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65758 and previous config saved to /var/cache/conftool/dbconfig/20240703-173601-root.json
  • 17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 17:35 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 17:34 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:33 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 17:33 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 17:31 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 17:30 cdanis@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 17:29 cdanis@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 17:28 cdanis@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 17:28 cdanis@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 17:22 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 17:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65756 and previous config saved to /var/cache/conftool/dbconfig/20240703-172055-root.json
  • 17:19 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:19 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:17 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:17 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:15 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:11 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 17:10 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 17:10 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 17:09 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 17:08 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 17:07 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 17:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65755 and previous config saved to /var/cache/conftool/dbconfig/20240703-170549-root.json
  • 16:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65754 and previous config saved to /var/cache/conftool/dbconfig/20240703-165044-root.json
  • 16:47 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
  • 16:46 btullis@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on an-presto1004.eqiad.wmnet with reason: Cold booting to investigate RAM issue
  • 16:44 jhathaway: adding inbound email servers mx-in{1001,2001} to our MX record
  • 16:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65752 and previous config saved to /var/cache/conftool/dbconfig/20240703-163538-root.json
  • 16:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65751 and previous config saved to /var/cache/conftool/dbconfig/20240703-162032-root.json
  • 16:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2207 (re)pooling @ 1%: Repooling', diff saved to Unable to send diff to phaste and previous config saved to /var/cache/conftool/dbconfig/20240703-160521-root.json
  • 16:04 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:47 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1186 (T364069)', diff saved to https://phabricator.wikimedia.org/P65750 and previous config saved to /var/cache/conftool/dbconfig/20240703-154716-marostegui.json
  • 15:47 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1186.eqiad.wmnet with reason: Maintenance
  • 15:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65749 and previous config saved to /var/cache/conftool/dbconfig/20240703-154643-marostegui.json
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65748 and previous config saved to /var/cache/conftool/dbconfig/20240703-154142-arnaudb.json
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65747 and previous config saved to /var/cache/conftool/dbconfig/20240703-154121-arnaudb.json
  • 15:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 100%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65746 and previous config saved to /var/cache/conftool/dbconfig/20240703-154109-arnaudb.json
  • 15:32 kevinbazira@deploy1002: helmfile [ml-staging-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 15:31 sukhe: restart haproxy on dns1005
  • 15:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65744 and previous config saved to /var/cache/conftool/dbconfig/20240703-153136-marostegui.json
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65743 and previous config saved to /var/cache/conftool/dbconfig/20240703-152636-arnaudb.json
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65742 and previous config saved to /var/cache/conftool/dbconfig/20240703-152616-arnaudb.json
  • 15:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 75%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65741 and previous config saved to /var/cache/conftool/dbconfig/20240703-152603-arnaudb.json
  • 15:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184', diff saved to https://phabricator.wikimedia.org/P65740 and previous config saved to /var/cache/conftool/dbconfig/20240703-151628-marostegui.json
  • 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:14 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
  • 15:13 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: 208.80.152.129 v6 - ayounsi@cumin1002"
  • 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65739 and previous config saved to /var/cache/conftool/dbconfig/20240703-151131-arnaudb.json
  • 15:11 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65738 and previous config saved to /var/cache/conftool/dbconfig/20240703-151110-arnaudb.json
  • 15:10 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 50%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65737 and previous config saved to /var/cache/conftool/dbconfig/20240703-151057-arnaudb.json
  • 15:10 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 15:04 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1233 (T367856)', diff saved to https://phabricator.wikimedia.org/P65736 and previous config saved to /var/cache/conftool/dbconfig/20240703-150411-marostegui.json
  • 15:04 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 15:03 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1233.eqiad.wmnet with reason: Maintenance
  • 15:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65735 and previous config saved to /var/cache/conftool/dbconfig/20240703-150348-marostegui.json
  • 15:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65734 and previous config saved to /var/cache/conftool/dbconfig/20240703-150121-marostegui.json
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65733 and previous config saved to /var/cache/conftool/dbconfig/20240703-145625-arnaudb.json
  • 14:56 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65732 and previous config saved to /var/cache/conftool/dbconfig/20240703-145604-arnaudb.json
  • 14:55 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 25%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65731 and previous config saved to /var/cache/conftool/dbconfig/20240703-145552-arnaudb.json
  • 14:54 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
  • 14:51 fabfur@cumin1002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
  • 14:51 fabfur: start rebooting A:cp-drmrs (upload|text in parallel) for T366555
  • 14:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65730 and previous config saved to /var/cache/conftool/dbconfig/20240703-144841-marostegui.json
  • 14:46 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 14:45 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65729 and previous config saved to /var/cache/conftool/dbconfig/20240703-144119-arnaudb.json
  • 14:41 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65728 and previous config saved to /var/cache/conftool/dbconfig/20240703-144059-arnaudb.json
  • 14:40 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 10%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65727 and previous config saved to /var/cache/conftool/dbconfig/20240703-144046-arnaudb.json
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1006.eqiad.wmnet with OS bookworm
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1005.eqiad.wmnet with OS bookworm
  • 14:40 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host an-conf1004.eqiad.wmnet with OS bookworm
  • 14:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:38 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 14:38 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 14:35 sukhe: [correction of previous A:dnsbox run] sudo cumin -b1 -s60 "A:dnsbox" "run-puppet-agent"
  • 14:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229', diff saved to https://phabricator.wikimedia.org/P65726 and previous config saved to /var/cache/conftool/dbconfig/20240703-143334-marostegui.json
  • 14:33 sukhe: sudo cumin "A:dnsbox" "run-puppet-agent"
  • 14:32 sukhe: sudo cumin "A:wikidough" "run-puppet-agent"
  • 14:32 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.remove-downtime (exit_code=0) for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
  • 14:32 jayme@cumin1002: START - Cookbook sre.hosts.remove-downtime for kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet
  • 14:30 jayme@cumin1002: conftool action : set/pooled=yes; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
  • 14:27 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 14:27 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 14:26 arnaudb@cumin1002: dbctl commit (dc=all): 'db1191 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65725 and previous config saved to /var/cache/conftool/dbconfig/20240703-142614-arnaudb.json
  • 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1196 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65724 and previous config saved to /var/cache/conftool/dbconfig/20240703-142553-arnaudb.json
  • 14:25 arnaudb@cumin1002: dbctl commit (dc=all): 'db1197 (re)pooling @ 5%: post T365994 repool', diff saved to https://phabricator.wikimedia.org/P65723 and previous config saved to /var/cache/conftool/dbconfig/20240703-142541-arnaudb.json
  • 14:25 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:21 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
  • 14:18 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 14:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65722 and previous config saved to /var/cache/conftool/dbconfig/20240703-141826-marostegui.json
  • 14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
  • 14:17 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1013,1017,1021].eqiad.wmnet with reason: T365994
  • 14:17 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
  • 14:16 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 0:45:00 on db1154.eqiad.wmnet with reason: T365994
  • 14:11 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:09 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 14:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 14:07 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 14:04 topranks: rebooting lsw1-e2-eqiad to install updated JunOS version T365994
  • 14:01 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
  • 14:00 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on 22 hosts with reason: JunOS upgrade lsw1-e2-eqiad
  • 13:59 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
  • 13:59 bking@cumin2002: START - Cookbook sre.hosts.downtime for 4:00:00 on elastic[1091-1092].eqiad.wmnet,wdqs[1018,1020].eqiad.wmnet with reason: T348977
  • 13:58 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
  • 13:58 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:40:00 on lsw1-e2-eqiad,lsw1-e2-eqiad IPv6,ssw1-e1-eqiad.mgmt,ssw1-f1-eqiad.mgmt with reason: JunOS upgrade lsw1-e2-eqiad
  • 13:57 jayme@cumin1002: conftool action : set/pooled=no; selector: name=(wikikube-worker1007.eqiad.wmnet|wikikube-worker1021.eqiad.wmnet|kubernetes1060.eqiad.wmnet)
  • 13:56 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
  • 13:56 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1091*,elastic1092* for T348977 - bking@cumin2002
  • 13:56 jayme@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
  • 13:55 jayme@cumin1002: START - Cookbook sre.hosts.downtime for 1:20:00 on kubernetes1060.eqiad.wmnet,wikikube-worker[1007,1021].eqiad.wmnet with reason: Upgrade EVPN switches Eqiad row E-F to JunOS 22.2
  • 13:53 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
  • 13:52 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:50:00 on lsw1-e2-eqiad.mgmt with reason: prep JunOS upgrade lsw1-e2-eqiad
  • 13:48 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:48 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup (duration: 08m 38s)
  • 13:44 jayme: draining wikikube-worker1007.eqiad.wmnet wikikube-worker1021.eqiad.wmnet kubernetes1060.eqiad.wmnet for T365994
  • 13:43 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Continuing with sync
  • 13:42 logmsgbot: lucaswerkmeister-wmde@deploy1002 dcausse, lucaswerkmeister-wmde: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:39 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for noc: fail with a 404 when the selected wiki is nonexistent, CirrusSearch: add wgCirrusSearchIndexFieldsToCleanup
  • 13:38 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) (duration: 09m 28s)
  • 13:33 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Continuing with sync
  • 13:31 logmsgbot: lucaswerkmeister-wmde@deploy1002 kharlan, lucaswerkmeister-wmde: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1006.eqiad.wmnet with OS bookworm
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1005.eqiad.wmnet with OS bookworm
  • 13:29 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host an-conf1004.eqiad.wmnet with OS bookworm
  • 13:28 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for GlobalRenameQueue: Fix issues with wiki ID and row query (T369147)
  • 13:25 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) (duration: 08m 20s)
  • 13:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host parsoidtest1001.eqiad.wmnet with OS bullseye
  • 13:20 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host parsoidtest1001
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:19 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:19 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host parsoidtest1001
  • 13:18 arnaudb@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
  • 13:18 arnaudb@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db[1191,1196-1197].eqiad.wmnet with reason: T365994
  • 13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) 49.3.193.10.in-addr.arpa. on all recursors
  • 13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache 49.3.193.10.in-addr.arpa. on all recursors
  • 13:17 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) sretest2002.mgmt.codfw.wmnet on all recursors
  • 13:17 cmooney@cumin1002: START - Cookbook sre.dns.wipe-cache sretest2002.mgmt.codfw.wmnet on all recursors
  • 13:17 arnaudb@cumin1002: dbctl commit (dc=all): 'T365994 - depool db1191,db1196,db1197', diff saved to https://phabricator.wikimedia.org/P65721 and previous config saved to /var/cache/conftool/dbconfig/20240703-131715-arnaudb.json
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for PropertyValueExpertsModule: Turn on enableModuleContentVersion() (T369155)
  • 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:16 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 13:15 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes kawikisource --fix # T363243; 34 pages to fix, 34 were resolvable; 774 links to fix, 774 were resolvable, 0 were deleted
  • 13:15 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 13:14 Lucas_WMDE: lucaswerkmeister-wmde@mwmaint1002:~$ mwscript namespaceDupes mswikisource --fix # T369047; 6 pages to fix, 6 were resolvable; 76 links to fix, 73 were resolvable, 3 were deleted
  • 13:13 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 13:12 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) (duration: 10m 39s)
  • 13:07 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Continuing with sync
  • 13:04 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde, anzx: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:01 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for mswikisource: create author and translation namespaces and add namespace aliases (T369047), kawikisource: create author namespace, add namespace aliases and sitename (T363243)
  • 12:51 elukey@deploy1002: helmfile [eqiad] DONE helmfile.d/services/thumbor: sync
  • 12:47 elukey@deploy1002: helmfile [eqiad] START helmfile.d/services/thumbor: sync
  • 12:39 jiji@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:39 jiji@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 12:37 elukey@deploy1002: helmfile [codfw] DONE helmfile.d/services/thumbor: sync
  • 12:34 elukey@deploy1002: helmfile [codfw] START helmfile.d/services/thumbor: sync
  • 12:30 jiji@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 12:17 jiji@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 12:10 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Maint over', diff saved to https://phabricator.wikimedia.org/P65720 and previous config saved to /var/cache/conftool/dbconfig/20240703-121009-ladsgroup.json
  • 11:55 ladsgroup@deploy1002: Finished scap: Backport for rpc: Update function call in RunSingleJob (T363839) (duration: 08m 08s)
  • 11:55 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Maint over', diff saved to https://phabricator.wikimedia.org/P65719 and previous config saved to /var/cache/conftool/dbconfig/20240703-115504-ladsgroup.json
  • 11:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1184 (T364069)', diff saved to https://phabricator.wikimedia.org/P65718 and previous config saved to /var/cache/conftool/dbconfig/20240703-115211-marostegui.json
  • 11:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1184.eqiad.wmnet with reason: Maintenance
  • 11:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65717 and previous config saved to /var/cache/conftool/dbconfig/20240703-115149-marostegui.json
  • 11:50 ladsgroup@deploy1002: ladsgroup: Continuing with sync
  • 11:49 ladsgroup@deploy1002: ladsgroup: Backport for rpc: Update function call in RunSingleJob (T363839) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:47 ladsgroup@deploy1002: Started scap sync-world: Backport for rpc: Update function call in RunSingleJob (T363839)
  • 11:45 ladsgroup@deploy1002: Finished scap: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) (duration: 09m 28s)
  • 11:40 ladsgroup@deploy1002: volker-e, ladsgroup: Continuing with sync
  • 11:39 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Maint over', diff saved to https://phabricator.wikimedia.org/P65716 and previous config saved to /var/cache/conftool/dbconfig/20240703-113958-ladsgroup.json
  • 11:39 ladsgroup@deploy1002: volker-e, ladsgroup: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65715 and previous config saved to /var/cache/conftool/dbconfig/20240703-113642-marostegui.json
  • 11:35 ladsgroup@deploy1002: Started scap sync-world: Backport for Optimize static footer 'a Wikimedia project' icon further (T256190)
  • 11:27 ladsgroup@cumin1002: dbctl commit (dc=all): 'Depooling db1192 (T352010)', diff saved to https://phabricator.wikimedia.org/P65714 and previous config saved to /var/cache/conftool/dbconfig/20240703-112728-ladsgroup.json
  • 11:27 ladsgroup@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 11:27 ladsgroup@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1192.eqiad.wmnet with reason: Maintenance
  • 11:24 ladsgroup@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Maint over', diff saved to https://phabricator.wikimedia.org/P65713 and previous config saved to /var/cache/conftool/dbconfig/20240703-112452-ladsgroup.json
  • 11:21 cgoubert@deploy1002: Finished scap: mw-on-k8s: Move php.envvars to mediawiki-common - T365265 (duration: 05m 22s)
  • 11:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169', diff saved to https://phabricator.wikimedia.org/P65712 and previous config saved to /var/cache/conftool/dbconfig/20240703-112135-marostegui.json
  • 11:16 cgoubert@deploy1002: Started scap sync-world: mw-on-k8s: Move php.envvars to mediawiki-common - T365265
  • 11:15 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 11:15 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 11:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65711 and previous config saved to /var/cache/conftool/dbconfig/20240703-110627-marostegui.json
  • 10:38 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1229 (T367856)', diff saved to https://phabricator.wikimedia.org/P65710 and previous config saved to /var/cache/conftool/dbconfig/20240703-103839-marostegui.json
  • 10:38 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 10:38 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1229.eqiad.wmnet with reason: Maintenance
  • 10:33 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 10:32 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 09:49 logmsgbot: andrewtavis-wmde@deploy1002 Finished deploy [airflow-dags/wmde@d773cac]: (no justification provided) (duration: 00m 07s)
  • 09:49 logmsgbot: andrewtavis-wmde@deploy1002 Started deploy [airflow-dags/wmde@d773cac]: (no justification provided)
  • 09:31 mlitn@deploy1002: Finished scap: Backport for Handle campaigns where wikibase is not enabled (T369085) (duration: 12m 59s)
  • 09:27 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
  • 09:26 mlitn@deploy1002: mlitn: Continuing with sync
  • 09:26 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "testvm2008 - ayounsi@cumin1002"
  • 09:21 mlitn@deploy1002: mlitn: Backport for Handle campaigns where wikibase is not enabled (T369085) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:20 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 09:20 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 09:20 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.ganeti.makevm (exit_code=0) for new host testvm2008.wikimedia.org
  • 09:20 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2008.wikimedia.org with OS bookworm
  • 09:20 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 09:19 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65709 and previous config saved to /var/cache/conftool/dbconfig/20240703-091956-marostegui.json
  • 09:18 mlitn@deploy1002: Started scap sync-world: Backport for Handle campaigns where wikibase is not enabled (T369085)
  • 09:09 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2002.codfw.wmnet
  • 09:06 topranks: merge host firewall changes to set default DSCP marking (T339850)
  • 09:06 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
  • 09:02 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2008.wikimedia.org with reason: host reimage
  • 09:02 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2002.codfw.wmnet
  • 09:01 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 09:01 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 09:00 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifeeds: apply
  • 09:00 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifeeds: apply
  • 09:00 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifeeds: apply
  • 08:59 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/wikifeeds: apply
  • 08:58 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:58 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch2001.codfw.wmnet
  • 08:57 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:53 jayme: deployed istio (adding securityContext) to wikikube clusters - T362978
  • 08:51 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch2001.codfw.wmnet
  • 08:51 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1002.eqiad.wmnet
  • 08:49 Lucas_WMDE: RELEASE_NAME=r72z2aop helmfile --file /srv/deployment-charts/helmfile.d/services/mw-script/helmfile.yaml --environment eqiad --selector name=r72z2aop destroy # clean up broken mwscript-k8s run I did just to test something
  • 08:46 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2008.wikimedia.org with OS bookworm
  • 08:45 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:45 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.ganeti.makevm: created new VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:44 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1002.eqiad.wmnet
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.wipe-cache (exit_code=0) testvm2008.wikimedia.org on all recursors
  • 08:44 ayounsi@cumin1002: START - Cookbook sre.dns.wipe-cache testvm2008.wikimedia.org on all recursors
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 08:44 ayounsi@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:43 ayounsi@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Add records for VM testvm2008.wikimedia.org - ayounsi@cumin1002"
  • 08:43 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox: apply
  • 08:42 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox: apply
  • 08:42 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox: apply
  • 08:42 brouberol@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host kafka-stretch1001.eqiad.wmnet
  • 08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox: apply
  • 08:41 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:41 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-timeline: apply
  • 08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-timeline: apply
  • 08:41 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:41 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-timeline: apply
  • 08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 ayounsi@cumin1002: START - Cookbook sre.dns.netbox
  • 08:40 ayounsi@cumin1002: START - Cookbook sre.ganeti.makevm for new host testvm2008.wikimedia.org
  • 08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-media: apply
  • 08:40 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-media: apply
  • 08:40 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-syntaxhighlight: apply
  • 08:40 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-media: apply
  • 08:39 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-media: apply
  • 08:39 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:39 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/shellbox-constraints: apply
  • 08:39 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/shellbox-constraints: apply
  • 08:38 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/shellbox-constraints: apply
  • 08:35 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host kafka-stretch1001.eqiad.wmnet
  • 08:31 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1002.eqiad.wmnet
  • 08:22 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1002.eqiad.wmnet
  • 08:18 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=1) for host karapace1001.eqiad.wmnet
  • 08:11 hashar@deploy1002: rebuilt and synchronized wikiversions files: group1 wikis to 1.43.0-wmf.12 refs T366957
  • 08:11 marostegui@cumin1002: dbctl commit (dc=all): 'Give more weight to db2136 - running 10.11 T365805', diff saved to https://phabricator.wikimedia.org/P65707 and previous config saved to /var/cache/conftool/dbconfig/20240703-081059-marostegui.json
  • 08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 08:09 brouberol@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host karapace1001.eqiad.wmnet
  • 08:09 brouberol@cumin1002: START - Cookbook sre.hosts.reboot-single for host karapace1001.eqiad.wmnet
  • 07:52 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1169 (T364069)', diff saved to https://phabricator.wikimedia.org/P65706 and previous config saved to /var/cache/conftool/dbconfig/20240703-075245-marostegui.json
  • 07:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:52 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1169.eqiad.wmnet with reason: Maintenance
  • 07:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65705 and previous config saved to /var/cache/conftool/dbconfig/20240703-074321-marostegui.json
  • 07:36 kart_: Updated MinT to 2024-07-02-060114-production (T364525)
  • 07:33 kartik@deploy1002: helmfile [eqiad] DONE helmfile.d/services/machinetranslation: apply
  • 07:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65704 and previous config saved to /var/cache/conftool/dbconfig/20240703-072814-marostegui.json
  • 07:23 kartik@deploy1002: helmfile [eqiad] START helmfile.d/services/machinetranslation: apply
  • 07:21 kartik@deploy1002: helmfile [codfw] DONE helmfile.d/services/machinetranslation: apply
  • 07:14 kartik@deploy1002: helmfile [codfw] START helmfile.d/services/machinetranslation: apply
  • 07:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216', diff saved to https://phabricator.wikimedia.org/P65702 and previous config saved to /var/cache/conftool/dbconfig/20240703-071306-marostegui.json
  • 07:12 kartik@deploy1002: helmfile [staging] DONE helmfile.d/services/machinetranslation: apply
  • 07:07 kartik@deploy1002: helmfile [staging] START helmfile.d/services/machinetranslation: apply
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65701 and previous config saved to /var/cache/conftool/dbconfig/20240703-065759-marostegui.json
  • 06:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65700 and previous config saved to /var/cache/conftool/dbconfig/20240703-062057-root.json
  • 06:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65699 and previous config saved to /var/cache/conftool/dbconfig/20240703-060552-root.json
  • 05:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65698 and previous config saved to /var/cache/conftool/dbconfig/20240703-055046-root.json
  • 05:35 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65697 and previous config saved to /var/cache/conftool/dbconfig/20240703-053541-root.json
  • 05:23 marostegui: Deploy schema change on db2207 s2 codfw dbmaint T367856
  • 05:23 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 05:22 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2207.codfw.wmnet with reason: Long schema change
  • 05:21 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2207 T369130', diff saved to https://phabricator.wikimedia.org/P65696 and previous config saved to /var/cache/conftool/dbconfig/20240703-052118-root.json
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65695 and previous config saved to /var/cache/conftool/dbconfig/20240703-052035-root.json
  • 05:20 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2204 to s2 primary T369130', diff saved to https://phabricator.wikimedia.org/P65694 and previous config saved to /var/cache/conftool/dbconfig/20240703-052029-root.json
  • 05:20 marostegui: Starting s2 codfw failover from db2207 to db2204 - T369130
  • 05:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
  • 05:06 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2204 with weight 0 T369130', diff saved to https://phabricator.wikimedia.org/P65693 and previous config saved to /var/cache/conftool/dbconfig/20240703-050647-root.json
  • 05:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s2 T369130
  • 05:05 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65692 and previous config saved to /var/cache/conftool/dbconfig/20240703-050523-root.json
  • 04:51 marostegui@cumin1002: dbctl commit (dc=all): 'Pool with small weight T365805', diff saved to https://phabricator.wikimedia.org/P65691 and previous config saved to /var/cache/conftool/dbconfig/20240703-045109-marostegui.json
  • 04:50 marostegui@cumin1002: dbctl commit (dc=all): 'db2129 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65690 and previous config saved to /var/cache/conftool/dbconfig/20240703-045018-root.json
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2216 (T364069)', diff saved to https://phabricator.wikimedia.org/P65689 and previous config saved to /var/cache/conftool/dbconfig/20240703-043335-marostegui.json
  • 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2216.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65688 and previous config saved to /var/cache/conftool/dbconfig/20240703-043312-marostegui.json
  • 04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65687 and previous config saved to /var/cache/conftool/dbconfig/20240703-041805-marostegui.json
  • 04:02 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203', diff saved to https://phabricator.wikimedia.org/P65686 and previous config saved to /var/cache/conftool/dbconfig/20240703-040258-marostegui.json
  • 03:47 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65685 and previous config saved to /var/cache/conftool/dbconfig/20240703-034751-marostegui.json
  • 01:17 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2203 (T364069)', diff saved to https://phabricator.wikimedia.org/P65684 and previous config saved to /var/cache/conftool/dbconfig/20240703-011701-marostegui.json
  • 01:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 01:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2203.codfw.wmnet with reason: Maintenance
  • 00:48 eileen: civicrm upgraded from 6e03cff2 to 84d6f5d1
  • 00:27 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-upload_drmrs
  • 00:16 brett@cumin2002: END (PASS) - Cookbook sre.cdn.roll-reboot (exit_code=0) rolling reboot on A:cp-text_drmrs
  • 00:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1225.eqiad.wmnet with reason: Maintenance
  • 00:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65683 and previous config saved to /var/cache/conftool/dbconfig/20240703-000506-marostegui.json

2024-07-02

  • 23:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65682 and previous config saved to /var/cache/conftool/dbconfig/20240702-234959-marostegui.json
  • 23:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222', diff saved to https://phabricator.wikimedia.org/P65681 and previous config saved to /var/cache/conftool/dbconfig/20240702-233452-marostegui.json
  • 23:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65680 and previous config saved to /var/cache/conftool/dbconfig/20240702-231945-marostegui.json
  • 22:58 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2202.codfw.wmnet with reason: Maintenance
  • 22:58 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65679 and previous config saved to /var/cache/conftool/dbconfig/20240702-225835-marostegui.json
  • 22:43 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65678 and previous config saved to /var/cache/conftool/dbconfig/20240702-224328-marostegui.json
  • 22:28 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188', diff saved to https://phabricator.wikimedia.org/P65677 and previous config saved to /var/cache/conftool/dbconfig/20240702-222820-marostegui.json
  • 22:13 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65676 and previous config saved to /var/cache/conftool/dbconfig/20240702-221312-marostegui.json
  • 22:05 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 22:05 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 22:05 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-parsoid: apply
  • 22:04 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-parsoid: apply
  • 22:04 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-parsoid: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-parsoid: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-mcrouter: apply
  • 22:03 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-jobrunner: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-int: apply
  • 22:02 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-api-ext: apply
  • 22:01 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-api-ext: apply
  • 22:01 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-api-ext: apply
  • 22:01 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-api-ext: apply
  • 21:58 rzl@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-debug: apply
  • 21:58 rzl@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-debug: apply
  • 21:58 rzl@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-debug: apply
  • 21:57 rzl@deploy1002: helmfile [codfw] START helmfile.d/services/mw-debug: apply
  • 21:54 rzl@deploy1002: Finished scap: T369080 (duration: 04m 13s)
  • 21:54 rzl@deploy1002: rzl: Continuing with sync
  • 21:52 rzl@deploy1002: rzl: T369080 synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:51 rzl@deploy1002: Started scap sync-world: T369080
  • 21:26 eileen: civicrm upgraded from 08e568e4 to 6e03cff2
  • 21:21 eileen: civicrm upgraded from 67bcfd72 to 08e568e4
  • 20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:50 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:49 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:45 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:45 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.provision (exit_code=99) for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:39 cmooney@cumin1002: START - Cookbook sre.hosts.provision for host sretest2002.mgmt.codfw.wmnet with reboot policy FORCED
  • 20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 20:35 cmooney@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:34 cmooney@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: add sretest2002 entries - cmooney@cumin1002"
  • 20:33 urbanecm@deploy1002: Finished scap: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) (duration: 11m 44s)
  • 20:31 cmooney@cumin1002: START - Cookbook sre.dns.netbox
  • 20:28 urbanecm@deploy1002: arlolra, urbanecm: Continuing with sync
  • 20:25 cmooney@cumin1002: END (FAIL) - Cookbook sre.hosts.dhcp (exit_code=99) for host wikikube-ctrl2002.codfw.wmnet
  • 20:24 urbanecm@deploy1002: arlolra, urbanecm: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:21 urbanecm@deploy1002: Started scap sync-world: Backport for Follow the defaults for Parsoid on MFE on officewiki (T363720)
  • 20:21 urbanecm@deploy1002: Finished scap: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) (duration: 16m 31s)
  • 20:16 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Continuing with sync
  • 20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 20:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 20:07 urbanecm@deploy1002: jdlrobson, arlolra, urbanecm: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:04 urbanecm@deploy1002: Started scap sync-world: Backport for [July 2nd] Mobile: Enable dark mode for all users for tier 1 wikis (T367151), Remove unused Linter configs (T343292)
  • 19:45 jhathaway: running another email inbound mx test on mx-in1001
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2188 (T364069)', diff saved to https://phabricator.wikimedia.org/P65675 and previous config saved to /var/cache/conftool/dbconfig/20240702-194027-marostegui.json
  • 19:40 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2188.codfw.wmnet with reason: Maintenance
  • 19:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65674 and previous config saved to /var/cache/conftool/dbconfig/20240702-194005-marostegui.json
  • 19:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65673 and previous config saved to /var/cache/conftool/dbconfig/20240702-192457-marostegui.json
  • 19:21 eileen: civicrm upgraded from 64f23ed0 to 67bcfd72
  • 19:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176', diff saved to https://phabricator.wikimedia.org/P65672 and previous config saved to /var/cache/conftool/dbconfig/20240702-190950-marostegui.json
  • 18:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65671 and previous config saved to /var/cache/conftool/dbconfig/20240702-185443-marostegui.json
  • 17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:40 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:39 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:36 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/airflow: apply
  • 17:34 bking@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/airflow: apply
  • 17:20 jforrester@deploy1002: Finished scap: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) (duration: 10m 06s)
  • 17:15 jforrester@deploy1002: jforrester: Continuing with sync
  • 17:14 jforrester@deploy1002: jforrester: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 17:10 jforrester@deploy1002: Started scap sync-world: Backport for Update OOUI to v0.50.3, Update OOUI to v0.50.3 (T369010)
  • 17:07 dani@deploy1002: helmfile [codfw] DONE helmfile.d/services/miscweb: apply
  • 17:07 dani@deploy1002: helmfile [codfw] START helmfile.d/services/miscweb: apply
  • 17:07 dani@deploy1002: helmfile [eqiad] DONE helmfile.d/services/miscweb: apply
  • 17:06 dani@deploy1002: helmfile [eqiad] START helmfile.d/services/miscweb: apply
  • 17:06 dani@deploy1002: helmfile [staging] DONE helmfile.d/services/miscweb: apply
  • 17:06 dani@deploy1002: helmfile [staging] START helmfile.d/services/miscweb: apply
  • 17:06 mutante: lists1004 - sudo systemctl start wmf_auto_restart_exim4 (T369017)
  • 16:54 ejegg: fundraising civicrm upgraded from 41c1bd78 to 64f23ed0
  • 16:16 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
  • 16:13 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-upload_drmrs
  • 16:02 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 16:01 brett@cumin2002: START - Cookbook sre.cdn.roll-reboot rolling reboot on A:cp-text_drmrs
  • 15:58 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1004.eqiad.wmnet
  • 15:57 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 15:51 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1004.eqiad.wmnet
  • 15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/dse-k8s-services/superset: apply
  • 15:50 brouberol@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/dse-k8s-services/superset: apply
  • 15:49 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_esams
  • 15:46 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_esams
  • 15:44 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 15:44 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 20:00:00 on kubernetes1051.eqiad.wmnet with reason: Hardware issue
  • 15:43 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2176 (T364069)', diff saved to https://phabricator.wikimedia.org/P65670 and previous config saved to /var/cache/conftool/dbconfig/20240702-154127-marostegui.json
  • 15:41 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2176.codfw.wmnet with reason: Maintenance
  • 15:41 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65669 and previous config saved to /var/cache/conftool/dbconfig/20240702-154105-marostegui.json
  • 15:25 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65668 and previous config saved to /var/cache/conftool/dbconfig/20240702-152558-marostegui.json
  • 15:17 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host testvm2007.codfw.wmnet with OS bookworm
  • 15:12 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/thumbor: sync
  • 15:12 elukey@deploy1002: helmfile [staging] START helmfile.d/services/thumbor: sync
  • 15:10 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174', diff saved to https://phabricator.wikimedia.org/P65667 and previous config saved to /var/cache/conftool/dbconfig/20240702-151050-marostegui.json
  • 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[2004-2006].codfw.wmnet
  • 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 15:05 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 15:03 ayounsi@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 15:02 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[2004-2006].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 14:58 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-misc: apply
  • 14:58 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 14:58 ayounsi@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on testvm2007.codfw.wmnet with reason: host reimage
  • 14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_esams
  • 14:55 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_esams
  • 14:55 fabfur: upgrading A:cp-esams to haproxy 2.8.10 (T367756)
  • 14:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65666 and previous config saved to /var/cache/conftool/dbconfig/20240702-145542-marostegui.json
  • 14:53 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-misc: apply
  • 14:53 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-misc: apply
  • 14:53 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-misc: apply
  • 14:52 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:52 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid analytics cluster: Reboot Druid nodes
  • 14:52 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-wikifunctions: apply
  • 14:52 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-wikifunctions: apply
  • 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubetcd[1004-1006].eqiad.wmnet
  • 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 14:51 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 14:50 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-wikifunctions: apply
  • 14:48 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubetcd[1004-1006].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 14:47 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[2004-2006].codfw.wmnet
  • 14:45 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 14:38 ayounsi@cumin1002: START - Cookbook sre.hosts.reimage for host testvm2007.codfw.wmnet with OS bookworm
  • 14:37 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubetcd[1004-1006].eqiad.wmnet
  • 14:28 dcaro@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host cloudcephosd1008.eqiad.wmnet
  • 14:19 dcaro@cumin1002: START - Cookbook sre.hosts.reboot-single for host cloudcephosd1008.eqiad.wmnet
  • 14:15 root@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 14:12 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:12 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:11 jiji@cumin1002: END (ERROR) - Cookbook sre.hosts.downtime (exit_code=97) for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:11 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on 6 hosts with reason: decom
  • 14:07 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 14:06 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org,service=recdns
  • 14:06 filippo@deploy1002: helmfile [eqiad] DONE helmfile.d/services/page-analytics: apply
  • 14:05 filippo@deploy1002: helmfile [eqiad] START helmfile.d/services/page-analytics: apply
  • 14:05 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 14:05 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 14:05 filippo@deploy1002: helmfile [codfw] DONE helmfile.d/services/page-analytics: apply
  • 14:05 filippo@deploy1002: helmfile [codfw] START helmfile.d/services/page-analytics: apply
  • 14:04 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Unbanning all hosts in search_eqiad
  • 14:04 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Unbanning all hosts in search_eqiad
  • 14:04 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org,service=recdns
  • 14:04 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 14:03 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 14:03 sukhe@puppetmaster1001: conftool action : set/pooled=yes; selector: name=dns6001.wikimedia.org
  • 14:02 sukhe: restart anycast-hc on dns6001
  • 14:01 sukhe@puppetmaster1001: conftool action : set/pooled=no; selector: name=dns6001.wikimedia.org
  • 13:58 effie: decom old eqiad and codfw kubetcd hosts
  • 13:46 jforrester@deploy1002: helmfile [eqiad] DONE helmfile.d/services/wikifunctions: apply
  • 13:44 jforrester@deploy1002: helmfile [eqiad] START helmfile.d/services/wikifunctions: apply
  • 13:44 jforrester@deploy1002: helmfile [codfw] DONE helmfile.d/services/wikifunctions: apply
  • 13:43 jforrester@deploy1002: helmfile [codfw] START helmfile.d/services/wikifunctions: apply
  • 13:42 jforrester@deploy1002: helmfile [staging] DONE helmfile.d/services/wikifunctions: apply
  • 13:42 jforrester@deploy1002: helmfile [staging] START helmfile.d/services/wikifunctions: apply
  • 13:41 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid analytics cluster: Reboot Druid nodes
  • 13:39 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.reboot-workers (exit_code=0) for Druid public cluster: Reboot Druid nodes
  • 13:35 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2030.codfw.wmnet|wikikube-worker2031.codfw.wmnet|wikikube-worker2032.codfw.wmnet|wikikube-worker2033.codfw.wmnet|wikikube-worker2034.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 13:35 claime: Pooling and uncordoning wikikube-worker2030.codfw.wmnet wikikube-worker2031.codfw.wmnet wikikube-worker2032.codfw.wmnet wikikube-worker2033.codfw.wmnet wikikube-worker2034.codfw.wmnet - T351074
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1222 (T367856)', diff saved to https://phabricator.wikimedia.org/P65665 and previous config saved to /var/cache/conftool/dbconfig/20240702-133100-marostegui.json
  • 13:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1222.eqiad.wmnet with reason: Maintenance
  • 13:30 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65664 and previous config saved to /var/cache/conftool/dbconfig/20240702-133038-marostegui.json
  • 13:30 Lucas_WMDE: UTC afternoon backport+config window done
  • 13:27 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) (duration: 10m 22s)
  • 13:22 claime: homer 'cr*codfw*' commit 'T351074'
  • 13:22 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Continuing with sync
  • 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[1001-1002].eqiad.wmnet
  • 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 13:21 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 13:20 logmsgbot: lucaswerkmeister-wmde@deploy1002 sgimeno, jforrester, lucaswerkmeister-wmde: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:18 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[1001-1002].eqiad.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 13:17 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for [wikifunctions] Grant wikifunctions-staff enum and converter rights (T366610 T367270), GrowthExperiments: add community updates module flag (T365877)
  • 13:16 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 13:15 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65663 and previous config saved to /var/cache/conftool/dbconfig/20240702-131531-marostegui.json
  • 13:14 logmsgbot: lucaswerkmeister-wmde@deploy1002 Finished scap: Backport for Enable EntitySchema data type on Wikidata (T332157) (duration: 10m 54s)
  • 13:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2032.codfw.wmnet with OS bullseye
  • 13:09 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[1001-1002].eqiad.wmnet
  • 13:09 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Continuing with sync
  • 13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:08 elukey@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'sync'.
  • 13:06 logmsgbot: lucaswerkmeister-wmde@deploy1002 lucaswerkmeister-wmde: Backport for Enable EntitySchema data type on Wikidata (T332157) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2033.codfw.wmnet with OS bullseye
  • 13:03 logmsgbot: lucaswerkmeister-wmde@deploy1002 Started scap sync-world: Backport for Enable EntitySchema data type on Wikidata (T332157)
  • 13:00 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197', diff saved to https://phabricator.wikimedia.org/P65662 and previous config saved to /var/cache/conftool/dbconfig/20240702-130024-marostegui.json
  • 12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2034.codfw.wmnet with OS bullseye
  • 12:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2031.codfw.wmnet with OS bullseye
  • 12:56 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2030.codfw.wmnet with OS bullseye
  • 12:55 jiji@cumin1002: conftool action : set/pooled=inactive; selector: name=kubemaster100[1-2].eqiad.wmnet
  • 12:49 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster100[1-2].eqiad.wmnet
  • 12:49 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
  • 12:46 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
  • 12:46 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[1001-1002].eqiad.wmnet with reason: decom
  • 12:45 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
  • 12:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65661 and previous config saved to /var/cache/conftool/dbconfig/20240702-124517-marostegui.json
  • 12:44 effie: decom eqiad old kubemasters - T353464
  • 12:42 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
  • 12:41 jayme@cumin1002: END (FAIL) - Cookbook sre.hosts.reboot-single (exit_code=99) for host kubernetes1051.eqiad.wmnet
  • 12:39 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
  • 12:36 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2032.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2033.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2034.codfw.wmnet with reason: host reimage
  • 12:34 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2031.codfw.wmnet with reason: host reimage
  • 12:33 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2030.codfw.wmnet with reason: host reimage
  • 12:25 brouberol@cumin1002: START - Cookbook sre.druid.reboot-workers for Druid public cluster: Reboot Druid nodes
  • 12:25 marostegui: Deploy schema change on db2129 s6 codfw dbmaint T367856
  • 12:25 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 12:24 jforrester@deploy1002: Finished scap: Backport for Reference widget: check for undefined config (T368736) (duration: 09m 59s)
  • 12:19 jforrester@deploy1002: jforrester: Continuing with sync
  • 12:19 jforrester@deploy1002: jforrester: Backport for Reference widget: check for undefined config (T368736) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 12:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2034.codfw.wmnet with OS bullseye
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2033.codfw.wmnet with OS bullseye
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2032.codfw.wmnet with OS bullseye
  • 12:18 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2031.codfw.wmnet with OS bullseye
  • 12:17 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2030.codfw.wmnet with OS bullseye
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2393 to wikikube-worker2034
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2034
  • 12:17 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2034
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
  • 12:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65660 and previous config saved to /var/cache/conftool/dbconfig/20240702-121638-root.json
  • 12:16 eoghan@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
  • 12:16 eoghan@cumin1002: START - Cookbook sre.hosts.downtime for 14 days, 0:00:00 on lists1001.wikimedia.org with reason: Pre-decommissioning lists1001
  • 12:16 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:15 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2393 to wikikube-worker2034 - cgoubert@cumin1002"
  • 12:14 jforrester@deploy1002: Started scap sync-world: Backport for Reference widget: check for undefined config (T368736)
  • 12:11 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 12:11 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2393 to wikikube-worker2034
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2392 to wikikube-worker2033
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2033
  • 12:09 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2033
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:09 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
  • 12:09 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 12:08 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2392 to wikikube-worker2033 - cgoubert@cumin1002"
  • 12:07 root@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 12:07 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 12:05 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 12:05 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2392 to wikikube-worker2033
  • 12:05 root@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1008.eqiad.wmnet with reason: host reimage
  • 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2365 to wikikube-worker2032
  • 12:04 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2032
  • 12:03 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2032
  • 12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 12:03 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
  • 12:01 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2365 to wikikube-worker2032 - cgoubert@cumin1002"
  • 12:01 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 12:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65659 and previous config saved to /var/cache/conftool/dbconfig/20240702-120133-root.json
  • 12:01 jayme@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 12:00 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 12:00 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:59 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:59 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2365 to wikikube-worker2032
  • 11:59 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2309 to wikikube-worker2031
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2031
  • 11:58 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2031
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:58 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
  • 11:58 jayme@deploy1002: helmfile [staging] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:58 jayme@deploy1002: helmfile [staging] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 11:57 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2309 to wikikube-worker2031 - cgoubert@cumin1002"
  • 11:55 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:55 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2309 to wikikube-worker2031
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.rename (exit_code=0) from mw2307 to wikikube-worker2030
  • 11:54 cgoubert@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host wikikube-worker2030
  • 11:52 cgoubert@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host wikikube-worker2030
  • 11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:52 cgoubert@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2174 (T364069)', diff saved to https://phabricator.wikimedia.org/P65658 and previous config saved to /var/cache/conftool/dbconfig/20240702-115026-marostegui.json
  • 11:50 cgoubert@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: Renaming mw2307 to wikikube-worker2030 - cgoubert@cumin1002"
  • 11:50 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 11:50 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2174.codfw.wmnet with reason: Maintenance
  • 11:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65657 and previous config saved to /var/cache/conftool/dbconfig/20240702-115003-marostegui.json
  • 11:48 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 11:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65656 and previous config saved to /var/cache/conftool/dbconfig/20240702-114627-root.json
  • 11:44 root@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 11:43 cgoubert@cumin1002: START - Cookbook sre.dns.netbox
  • 11:43 cgoubert@cumin1002: START - Cookbook sre.hosts.rename from mw2307 to wikikube-worker2030
  • 11:37 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid analytics cluster: Roll restart of Druid jvm daemons.
  • 11:36 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
  • 11:36 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2129.codfw.wmnet with reason: Long schema change
  • 11:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65655 and previous config saved to /var/cache/conftool/dbconfig/20240702-113457-marostegui.json
  • 11:31 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65654 and previous config saved to /var/cache/conftool/dbconfig/20240702-113122-root.json
  • 11:27 brouberol@cumin1002: END (PASS) - Cookbook sre.druid.roll-restart-workers (exit_code=0) for Druid public cluster: Roll restart of Druid jvm daemons.
  • 11:26 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host eventlog1003.eqiad.wmnet
  • 11:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db2129 T369021', diff saved to https://phabricator.wikimedia.org/P65653 and previous config saved to /var/cache/conftool/dbconfig/20240702-112616-root.json
  • 11:25 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db2214 to s6 primary T369021', diff saved to https://phabricator.wikimedia.org/P65652 and previous config saved to /var/cache/conftool/dbconfig/20240702-112518-marostegui.json
  • 11:24 marostegui: Starting s6 codfw failover from db2129 to db2214 - T369021
  • 11:24 jayme: switched wikikube production clusters from PSP to PSS for restricted namespaces - T273507
  • 11:23 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 11:22 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host eventlog1003.eqiad.wmnet
  • 11:22 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 11:22 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 11:21 jayme@cumin1002: START - Cookbook sre.hosts.reboot-single for host kubernetes1051.eqiad.wmnet
  • 11:21 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 11:21 claime: Uncordoning wikikube-ctrl2001.codfw.wmnet and wikikube-ctrl2002.codfw.wmnet
  • 11:20 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 11:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173', diff saved to https://phabricator.wikimedia.org/P65651 and previous config saved to /var/cache/conftool/dbconfig/20240702-111949-marostegui.json
  • 11:17 root@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1008.eqiad.wmnet with OS bullseye
  • 11:16 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65650 and previous config saved to /var/cache/conftool/dbconfig/20240702-111616-root.json
  • 11:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_eqiad
  • 11:12 cgoubert@cumin1002: conftool action : set/weight=10:pooled=yes; selector: name=(wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet),cluster=kubernetes,service=kubesvc
  • 11:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 11:12 claime: pooling and uncordoning wikikube-worker2025.codfw.wmnet|wikikube-worker2026.codfw.wmnet|wikikube-worker2027.codfw.wmnet|wikikube-worker2028.codfw.wmnet|wikikube-worker2029.codfw.wmnet - T351074
  • 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.decommission (exit_code=0) for hosts kubemaster[2001-2002].codfw.wmnet
  • 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 11:11 jiji@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 11:07 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
  • 11:07 marostegui@cumin1002: dbctl commit (dc=all): 'Set db2214 with weight 0 T369021', diff saved to https://phabricator.wikimedia.org/P65649 and previous config saved to /var/cache/conftool/dbconfig/20240702-110750-root.json
  • 11:07 jiji@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: kubemaster[2001-2002].codfw.wmnet decommissioned, removing all IPs except the asset tag one - jiji@cumin1002"
  • 11:07 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 27 hosts with reason: Primary switchover s6 T369021
  • 11:04 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65648 and previous config saved to /var/cache/conftool/dbconfig/20240702-110442-marostegui.json
  • 11:01 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65647 and previous config saved to /var/cache/conftool/dbconfig/20240702-110111-root.json
  • 10:56 jiji@cumin1002: START - Cookbook sre.dns.netbox
  • 10:50 jiji@cumin1002: START - Cookbook sre.hosts.decommission for hosts kubemaster[2001-2002].codfw.wmnet
  • 10:46 marostegui@cumin1002: dbctl commit (dc=all): 'db2165 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65646 and previous config saved to /var/cache/conftool/dbconfig/20240702-104605-root.json
  • 10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:42 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:42 pfischer@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 10:41 pfischer@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 10:35 brouberol@cumin1002: START - Cookbook sre.druid.roll-restart-workers for Druid public cluster: Roll restart of Druid jvm daemons.
  • 10:34 btullis@cumin1002: END (PASS) - Cookbook sre.hosts.reboot-single (exit_code=0) for host an-master1003.eqiad.wmnet
  • 10:32 brouberol@cumin1002: END (PASS) - Cookbook sre.k8s.reboot-nodes (exit_code=0) rolling reboot on A:dse-k8s-worker
  • 10:28 fabfur: upgrading A:cp-eqiad to haproxy 2.8.10 (T367756)
  • 10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_eqiad
  • 10:27 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_eqiad
  • 10:25 btullis@cumin1002: START - Cookbook sre.hosts.reboot-single for host an-master1003.eqiad.wmnet
  • 10:06 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 100% weight T363812', diff saved to https://phabricator.wikimedia.org/P65645 and previous config saved to /var/cache/conftool/dbconfig/20240702-100636-jynus.json
  • 10:02 claime: homer 'cr*codfw*' commit 'T351074'
  • 09:53 jiji@cumin1002: conftool action : set/pooled=no; selector: name=kubemaster200[1-2].codfw.wmnet
  • 09:52 elukey: volatile dir on puppetserver1001 with the new point release (12.6) for Bookworm
  • 09:48 jiji@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
  • 09:47 jiji@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on kubemaster[2001-2002].codfw.wmnet with reason: decom
  • 09:20 brouberol@cumin1002: START - Cookbook sre.k8s.reboot-nodes rolling reboot on A:dse-k8s-worker
  • 09:15 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 50% weight T363812', diff saved to https://phabricator.wikimedia.org/P65644 and previous config saved to /var/cache/conftool/dbconfig/20240702-091508-jynus.json
  • 08:57 jynus@cumin1002: dbctl commit (dc=all): 'Repool es1025 at 10% weight T363812', diff saved to https://phabricator.wikimedia.org/P65643 and previous config saved to /var/cache/conftool/dbconfig/20240702-085733-jynus.json
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1197 (T367856)', diff saved to https://phabricator.wikimedia.org/P65642 and previous config saved to /var/cache/conftool/dbconfig/20240702-084447-marostegui.json
  • 08:44 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1197.eqiad.wmnet with reason: Maintenance
  • 08:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65641 and previous config saved to /var/cache/conftool/dbconfig/20240702-084425-marostegui.json
  • 08:40 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp6009.*} and A:cp
  • 08:38 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp6009.*} and A:cp
  • 08:36 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
  • 08:34 hashar@deploy1002: rebuilt and synchronized wikiversions files: group0 wikis to 1.43.0-wmf.12 refs T366957
  • 08:34 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
  • 08:30 jayme@cumin1002: conftool action : set/pooled=inactive; selector: name=kubernetes1051.eqiad.wmnet
  • 08:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65640 and previous config saved to /var/cache/conftool/dbconfig/20240702-082918-marostegui.json
  • 08:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2031.*} and A:cp
  • 08:20 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2031.*} and A:cp
  • 08:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2030.*} and A:cp
  • 08:16 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 08:15 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2030.*} and A:cp
  • 08:15 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 08:14 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2028.*} and A:cp
  • 08:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188', diff saved to https://phabricator.wikimedia.org/P65639 and previous config saved to /var/cache/conftool/dbconfig/20240702-081411-marostegui.json
  • 08:13 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2028.*} and A:cp
  • 08:12 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on P{cp2027.*} and A:cp
  • 08:11 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on P{cp2027.*} and A:cp
  • 08:10 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2173 (T364069)', diff saved to https://phabricator.wikimedia.org/P65638 and previous config saved to /var/cache/conftool/dbconfig/20240702-081025-marostegui.json
  • 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on db2186.codfw.wmnet with reason: Maintenance
  • 08:10 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2173.codfw.wmnet with reason: Maintenance
  • 08:09 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65637 and previous config saved to /var/cache/conftool/dbconfig/20240702-080948-marostegui.json
  • 08:07 jayme: draining kubernetes1051.eqiad.wmnet
  • 08:07 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
  • 08:06 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
  • 08:01 jayme: cordon kubernetes1051.eqiad.wmnet because of several failed image pulls
  • 07:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65635 and previous config saved to /var/cache/conftool/dbconfig/20240702-075904-marostegui.json
  • 07:58 kharlan@deploy1002: Finished scap: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) (duration: 41m 45s)
  • 07:54 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65634 and previous config saved to /var/cache/conftool/dbconfig/20240702-075440-marostegui.json
  • 07:52 kharlan@deploy1002: kharlan: Continuing with sync
  • 07:51 kharlan@deploy1002: kharlan: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 07:39 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170', diff saved to https://phabricator.wikimedia.org/P65633 and previous config saved to /var/cache/conftool/dbconfig/20240702-073933-marostegui.json
  • 07:24 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65632 and previous config saved to /var/cache/conftool/dbconfig/20240702-072426-marostegui.json
  • 07:16 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
  • 07:06 kharlan@deploy1002: Started scap sync-world: Backport for Revert "QuickSurveys: Add testing survey configuration" (T368459)
  • 07:01 oblivian@deploy1002: Finished scap: Rebuilding images for change to the base image for httpd (duration: 26m 52s)
  • 06:59 XioNoX: update netboot bookworm image to pickup new point release
  • 06:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65631 and previous config saved to /var/cache/conftool/dbconfig/20240702-065831-root.json
  • 06:35 oblivian@deploy1002: Started scap sync-world: Rebuilding images for change to the base image for httpd
  • 06:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65629 and previous config saved to /var/cache/conftool/dbconfig/20240702-062820-root.json
  • 06:21 _joe_: rebuilding httpd-fcgi, mediawiki-httpd images T363342 T368640
  • 06:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65628 and previous config saved to /var/cache/conftool/dbconfig/20240702-061315-root.json
  • 05:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65627 and previous config saved to /var/cache/conftool/dbconfig/20240702-055809-root.json
  • 05:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65626 and previous config saved to /var/cache/conftool/dbconfig/20240702-054304-root.json
  • 05:27 marostegui@cumin1002: dbctl commit (dc=all): 'db1192 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65625 and previous config saved to /var/cache/conftool/dbconfig/20240702-052759-root.json
  • 05:25 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1192 T368371', diff saved to https://phabricator.wikimedia.org/P65624 and previous config saved to /var/cache/conftool/dbconfig/20240702-052543-root.json
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Promote db1209 to s8 primary and set section read-write T368371', diff saved to https://phabricator.wikimedia.org/P65623 and previous config saved to /var/cache/conftool/dbconfig/20240702-052447-marostegui.json
  • 05:24 marostegui@cumin1002: dbctl commit (dc=all): 'Set s8 eqiad as read-only for maintenance - T368371', diff saved to https://phabricator.wikimedia.org/P65622 and previous config saved to /var/cache/conftool/dbconfig/20240702-052408-marostegui.json
  • 05:23 marostegui: Starting s8 eqiad failover from db1192 to db1209 - T368371
  • 04:59 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 remove from API T368371', diff saved to https://phabricator.wikimedia.org/P65621 and previous config saved to /var/cache/conftool/dbconfig/20240702-045929-marostegui.json
  • 04:59 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
  • 04:58 marostegui@cumin1002: dbctl commit (dc=all): 'Set db1209 with weight 0 T368371', diff saved to https://phabricator.wikimedia.org/P65620 and previous config saved to /var/cache/conftool/dbconfig/20240702-045856-marostegui.json
  • 04:58 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on 33 hosts with reason: Primary switchover s8 T368371
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2170 (T364069)', diff saved to https://phabricator.wikimedia.org/P65619 and previous config saved to /var/cache/conftool/dbconfig/20240702-043349-marostegui.json
  • 04:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2170.codfw.wmnet with reason: Maintenance
  • 04:33 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65618 and previous config saved to /var/cache/conftool/dbconfig/20240702-043326-marostegui.json
  • 04:22 eileen: civicrm upgraded from f6af6380 to 41c1bd78
  • 04:18 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65617 and previous config saved to /var/cache/conftool/dbconfig/20240702-041819-marostegui.json
  • 04:07 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1188 (T367856)', diff saved to https://phabricator.wikimedia.org/P65616 and previous config saved to /var/cache/conftool/dbconfig/20240702-040705-marostegui.json
  • 04:06 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 04:06 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1188.eqiad.wmnet with reason: Maintenance
  • 04:06 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65615 and previous config saved to /var/cache/conftool/dbconfig/20240702-040643-marostegui.json
  • 04:03 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153', diff saved to https://phabricator.wikimedia.org/P65614 and previous config saved to /var/cache/conftool/dbconfig/20240702-040312-marostegui.json
  • 04:01 mwpresync@deploy1002: Pruned MediaWiki: 1.43.0-wmf.9 (duration: 01m 02s)
  • 03:54 mwpresync@deploy1002: Finished scap: testwikis wikis to 1.43.0-wmf.12 refs T366957 (duration: 51m 33s)
  • 03:51 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65613 and previous config saved to /var/cache/conftool/dbconfig/20240702-035135-marostegui.json
  • 03:51 eileen: civicrm upgraded from 52dc4f1d to f6af6380
  • 03:48 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65612 and previous config saved to /var/cache/conftool/dbconfig/20240702-034805-marostegui.json
  • 03:36 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182', diff saved to https://phabricator.wikimedia.org/P65611 and previous config saved to /var/cache/conftool/dbconfig/20240702-033628-marostegui.json
  • 03:21 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65610 and previous config saved to /var/cache/conftool/dbconfig/20240702-032121-marostegui.json
  • 03:03 mwpresync@deploy1002: Started scap sync-world: testwikis wikis to 1.43.0-wmf.12 refs T366957
  • 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2153 (T364069)', diff saved to https://phabricator.wikimedia.org/P65609 and previous config saved to /var/cache/conftool/dbconfig/20240702-004524-marostegui.json
  • 00:45 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:45 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2153.codfw.wmnet with reason: Maintenance
  • 00:45 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65608 and previous config saved to /var/cache/conftool/dbconfig/20240702-004502-marostegui.json
  • 00:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65607 and previous config saved to /var/cache/conftool/dbconfig/20240702-002955-marostegui.json
  • 00:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1038.eqiad.wmnet with OS bullseye
  • 00:16 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:15 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146', diff saved to https://phabricator.wikimedia.org/P65606 and previous config saved to /var/cache/conftool/dbconfig/20240702-001448-marostegui.json
  • 00:14 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 00:14 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 00:13 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"

2024-07-01

  • 23:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65605 and previous config saved to /var/cache/conftool/dbconfig/20240701-235941-marostegui.json
  • 23:57 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 23:55 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:54 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 23:54 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1038.eqiad.wmnet with reason: host reimage
  • 23:51 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1037.eqiad.wmnet with reason: host reimage
  • 23:41 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 23:40 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:39 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 23:36 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 23:34 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1036.eqiad.wmnet with reason: host reimage
  • 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1037.eqiad.wmnet with OS bullseye
  • 23:25 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1038.eqiad.wmnet with OS bullseye
  • 23:22 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 23:19 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1035.eqiad.wmnet with reason: host reimage
  • 23:05 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1036.eqiad.wmnet with OS bullseye
  • 23:02 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1035.eqiad.wmnet with OS bullseye
  • 22:54 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1038
  • 22:54 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1038
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 22:47 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 22:10 sbassett@deploy1002: Synchronized private/PrivateSettings.php: Un-deployed a PS.php mitigation for T341908 (duration: 07m 24s)
  • 21:59 bking@cumin2002: END (PASS) - Cookbook sre.elasticsearch.ban (exit_code=0) Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
  • 21:59 bking@cumin2002: START - Cookbook sre.elasticsearch.ban Banning hosts: elastic1089*,elastic1090*,elastic1104* for T348977 - bking@cumin2002
  • 21:58 bking@cumin2002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
  • 21:58 bking@cumin2002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on elastic[1089-1090,1104].eqiad.wmnet with reason: T348977
  • 21:55 maryum: deployed patch for T366991
  • 21:39 eileen: civicrm upgraded from f8b1f5c4 to 52dc4f1d
  • 21:39 eileen: tools upgraded from c51f6e62 to 95f10b20
  • 21:32 zabe: zabe@mwmaint1002:/tmp/upload$ mwscript importImages.php --wiki=commonswiki --comment-ext=txt --sleep=3600 --user=Yann . # T368703
  • 21:24 cjming: end of UTC late backport window
  • 21:23 cjming@deploy1002: Finished scap: Backport for extension-list: Add Metrics Platform (T366234) (duration: 28m 16s)
  • 21:16 cjming@deploy1002: cjming: Continuing with sync
  • 21:16 cjming@deploy1002: cjming: Backport for extension-list: Add Metrics Platform (T366234) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2146 (T364069)', diff saved to https://phabricator.wikimedia.org/P65604 and previous config saved to /var/cache/conftool/dbconfig/20240701-210534-marostegui.json
  • 21:05 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 21:05 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2146.codfw.wmnet with reason: Maintenance
  • 21:05 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65603 and previous config saved to /var/cache/conftool/dbconfig/20240701-210512-marostegui.json
  • 21:04 ejegg: fundraising civicrm upgraded from f9782670 to f8b1f5c4
  • 20:55 cjming@deploy1002: Started scap sync-world: Backport for extension-list: Add Metrics Platform (T366234)
  • 20:53 cjming@deploy1002: Finished scap: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) (duration: 09m 03s)
  • 20:50 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65602 and previous config saved to /var/cache/conftool/dbconfig/20240701-205003-marostegui.json
  • 20:47 cjming@deploy1002: cjming, pppery: Continuing with sync
  • 20:47 cjming@deploy1002: cjming, pppery: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:44 cjming@deploy1002: Started scap sync-world: Backport for Missing.php: don't redirect to unprefixed nan incubator (T86915)
  • 20:42 cjming@deploy1002: Finished scap: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) (duration: 10m 39s)
  • 20:36 cjming@deploy1002: cjming, jdlrobson: Continuing with sync
  • 20:35 ejegg: standalone SmashPig upgraded from c8993ec6 to 565c61e4
  • 20:34 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145', diff saved to https://phabricator.wikimedia.org/P65601 and previous config saved to /var/cache/conftool/dbconfig/20240701-203456-marostegui.json
  • 20:34 cjming@deploy1002: cjming, jdlrobson: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:31 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151), Change color of notification icon in dark-mode (T368120), Do not invert images that have been tagged with no invert classes (T368483)
  • 20:30 cjming@deploy1002: Sync cancelled.
  • 20:28 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:26 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
  • 20:23 cjming@deploy1002: Sync cancelled.
  • 20:23 cjming@deploy1002: jdlrobson, cjming: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 20:19 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65600 and previous config saved to /var/cache/conftool/dbconfig/20240701-201949-marostegui.json
  • 20:03 cjming@deploy1002: Started scap sync-world: Backport for [July 1st] Mobile: Enable dark mode for all tier 1 wikis (logged in) (T367151)
  • 19:33 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 19:19 dancy@deploy1002: Installation of scap version "4.91.0" completed for 233 hosts
  • 19:19 dancy@deploy1002: Installing scap version "4.91.0" for 233 hosts
  • 19:16 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 19:15 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
  • 19:14 dancy@deploy1002: Installing scap version "4.91.0" for 234 hosts
  • 19:13 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1041.eqiad.wmnet with reason: host reimage
  • 18:57 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 18:56 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 18:56 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:49 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.dns.netbox (exit_code=0)
  • 17:49 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 17:48 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.dns.netbox: added network and mgmt for dbproxy1028,9 - jclark@cumin1002"
  • 17:45 jclark@cumin1002: START - Cookbook sre.dns.netbox
  • 17:44 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:44 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:42 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:42 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 17:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 17:37 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 17:36 cmooney@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
  • 17:35 cmooney@cumin1002: START - Cookbook sre.hosts.downtime for 0:30:00 on cr2-codfw,ssw1-a[1,8]-codfw.mgmt with reason: reboot ssw1-d8-codfw
  • 17:27 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 17:27 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 17:16 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2145 (T364069)', diff saved to https://phabricator.wikimedia.org/P65599 and previous config saved to /var/cache/conftool/dbconfig/20240701-171609-marostegui.json
  • 17:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 17:15 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2145.codfw.wmnet with reason: Maintenance
  • 17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 17:08 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 17:05 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 17:04 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 16:51 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 16:51 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 16:51 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 16:38 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 16:35 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1039.eqiad.wmnet with reason: host reimage
  • 16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
  • 16:34 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
  • 16:33 dancy@deploy1002: Installing scap version "4.90.0" for 234 hosts
  • 16:30 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1182 (T367856)', diff saved to https://phabricator.wikimedia.org/P65598 and previous config saved to /var/cache/conftool/dbconfig/20240701-163010-marostegui.json
  • 16:30 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1182.eqiad.wmnet with reason: Maintenance
  • 16:29 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65597 and previous config saved to /var/cache/conftool/dbconfig/20240701-162948-marostegui.json
  • 16:22 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1039.eqiad.wmnet with OS bullseye
  • 16:21 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1039
  • 16:20 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1039
  • 16:18 urandom: restarting Cassandra —restbase2023-{a,b,c}— troubleshooting storage utilization
  • 16:17 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1041.eqiad.wmnet with OS bullseye
  • 16:14 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65596 and previous config saved to /var/cache/conftool/dbconfig/20240701-161441-marostegui.json
  • 16:11 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1041
  • 16:11 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1041
  • 15:59 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156', diff saved to https://phabricator.wikimedia.org/P65595 and previous config saved to /var/cache/conftool/dbconfig/20240701-155934-marostegui.json
  • 15:44 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65594 and previous config saved to /var/cache/conftool/dbconfig/20240701-154427-marostegui.json
  • 15:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65593 and previous config saved to /var/cache/conftool/dbconfig/20240701-153758-root.json
  • 15:37 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:32 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:25 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_codfw
  • 15:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65592 and previous config saved to /var/cache/conftool/dbconfig/20240701-152253-root.json
  • 15:22 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:22 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_codfw
  • 15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:21 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 15:20 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 15:16 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 15:15 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 15:15 jclark@cumin1002: END (PASS) - Cookbook sre.puppet.sync-netbox-hiera (exit_code=0) generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 15:14 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/cirrus-streaming-updater: apply
  • 15:10 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 15:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65591 and previous config saved to /var/cache/conftool/dbconfig/20240701-150747-root.json
  • 15:07 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:07 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:05 akosiaris: reboot deploy1003 T364416
  • 15:04 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-page-content-change-enrich: apply
  • 15:03 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/mw-page-content-change-enrich: apply
  • 14:57 cgoubert@deploy1002: helmfile [eqiad] DONE helmfile.d/services/mw-web: apply
  • 14:56 cgoubert@deploy1002: helmfile [eqiad] START helmfile.d/services/mw-web: apply
  • 14:56 cgoubert@deploy1002: helmfile [codfw] DONE helmfile.d/services/mw-web: apply
  • 14:55 claime: deploying statsd-exporter for mw-web - T365265
  • 14:54 swfrench@deploy1002: helmfile [staging] DONE helmfile.d/services/commons-impact-analytics: apply
  • 14:54 cgoubert@deploy1002: helmfile [codfw] START helmfile.d/services/mw-web: apply
  • 14:52 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65590 and previous config saved to /var/cache/conftool/dbconfig/20240701-145242-root.json
  • 14:50 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 14:49 jclark@cumin1002: START - Cookbook sre.puppet.sync-netbox-hiera generate netbox hiera data: "Triggered by cookbooks.sre.hosts.reimage: Host reimage - jclark@cumin1002"
  • 14:48 dcausse@deploy1002: helmfile [eqiad] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:48 dcausse@deploy1002: helmfile [eqiad] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:48 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 14:44 swfrench@deploy1002: helmfile [staging] START helmfile.d/services/commons-impact-analytics: apply
  • 14:44 dcausse@deploy1002: helmfile [codfw] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 14:43 dcausse@deploy1002: helmfile [codfw] START helmfile.d/services/rdf-streaming-updater: apply
  • 14:40 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 14:40 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 14:37 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65589 and previous config saved to /var/cache/conftool/dbconfig/20240701-143736-root.json
  • 14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_codfw
  • 14:36 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_codfw
  • 14:35 fabfur: upgrading A:cp-codfw to haproxy 2.8.10 (T367756)
  • 14:32 jclark@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 14:31 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 14:27 jclark@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on cloudcephosd1040.eqiad.wmnet with reason: host reimage
  • 14:22 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65587 and previous config saved to /var/cache/conftool/dbconfig/20240701-142231-root.json
  • 14:16 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:16 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2141.codfw.wmnet with reason: Maintenance
  • 14:16 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65586 and previous config saved to /var/cache/conftool/dbconfig/20240701-141640-marostegui.json
  • 14:10 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 14:07 marostegui@cumin1002: dbctl commit (dc=all): 'db2158 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65585 and previous config saved to /var/cache/conftool/dbconfig/20240701-140725-root.json
  • 14:03 jclark@cumin1002: END (FAIL) - Cookbook sre.hosts.reimage (exit_code=99) for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 14:01 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65584 and previous config saved to /var/cache/conftool/dbconfig/20240701-140133-marostegui.json
  • 13:57 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/rdf-streaming-updater: apply
  • 13:56 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/rdf-streaming-updater: apply
  • 13:48 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 13:48 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 13:46 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130', diff saved to https://phabricator.wikimedia.org/P65583 and previous config saved to /var/cache/conftool/dbconfig/20240701-134626-marostegui.json
  • 13:42 jclark@cumin1002: START - Cookbook sre.hosts.reimage for host cloudcephosd1040.eqiad.wmnet with OS bullseye
  • 13:41 jclark@cumin1002: END (PASS) - Cookbook sre.network.configure-switch-interfaces (exit_code=0) for host cloudcephosd1040
  • 13:41 jclark@cumin1002: START - Cookbook sre.network.configure-switch-interfaces for host cloudcephosd1040
  • 13:37 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.reimage (exit_code=0) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 13:31 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65581 and previous config saved to /var/cache/conftool/dbconfig/20240701-133118-marostegui.json
  • 13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/api-gateway: sync
  • 13:30 elukey@deploy1002: helmfile [staging] START helmfile.d/services/api-gateway: sync
  • 13:30 elukey@deploy1002: helmfile [staging] DONE helmfile.d/services/rest-gateway: sync
  • 13:29 elukey@deploy1002: helmfile [staging] START helmfile.d/services/rest-gateway: sync
  • 13:29 urbanecm: mwmaint1002: [urbanecm@mwmaint1002 ~]$ foreachwiki DiscussionTools:FixTrailingWhitespaceIds (T356196)
  • 13:27 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 13:27 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:26 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:25 urbanecm@deploy1002: Finished scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196) (duration: 08m 46s)
  • 13:21 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 13:21 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 13:19 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_magru
  • 13:17 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 13:17 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_magru
  • 13:16 urbanecm@deploy1002: Started scap: Backport for FixTrailingWhitespaceIds: Don't crash on complex conflicts (T356196)
  • 13:16 urbanecm@deploy1002: Finished scap: Backport for Update interwiki map (T368862) (duration: 09m 01s)
  • 13:14 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 13:10 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 13:10 urbanecm@deploy1002: urbanecm: Backport for Update interwiki map (T368862) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 13:07 urbanecm@deploy1002: Started scap: Backport for Update interwiki map (T368862)
  • 12:56 jgiannelos@deploy1002: helmfile [staging] DONE helmfile.d/services/mobileapps: apply
  • 12:56 jgiannelos@deploy1002: helmfile [staging] START helmfile.d/services/mobileapps: apply
  • 12:56 dcausse@deploy1002: helmfile [staging] DONE helmfile.d/services/cirrus-streaming-updater: apply
  • 12:55 dcausse@deploy1002: helmfile [staging] START helmfile.d/services/cirrus-streaming-updater: apply
  • 12:55 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:54 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:51 claime: Running update-netboot-image bullseye for 11.10 release on puppetserver1001
  • 12:49 fabfur: upgrading A:cp-magru to haproxy 2.8.10 (T367756)
  • 12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_magru
  • 12:49 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_magru
  • 12:39 claime: Running update-netboot-image bullseye for 11.10 release
  • 12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:35 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 12:35 jayme@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:35 jayme@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:34 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:33 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:32 klausman@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:32 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:32 jayme@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:32 jayme@deploy1002: helmfile [aux-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:31 klausman@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'apply'.
  • 12:31 klausman@deploy1002: helmfile [ml-serve-codfw] DONE helmfile.d/admin 'apply'.
  • 12:30 jayme@deploy1002: helmfile [aux-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 12:29 klausman@deploy1002: helmfile [ml-serve-codfw] START helmfile.d/admin 'apply'.
  • 12:28 klausman@deploy1002: helmfile [ml-serve-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:27 klausman@deploy1002: helmfile [ml-serve-eqiad] START helmfile.d/admin 'apply'.
  • 12:23 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-reverted' for release 'main' .
  • 12:22 jayme@deploy1002: helmfile [codfw] DONE helmfile.d/admin 'apply'.
  • 12:21 jayme@deploy1002: helmfile [codfw] START helmfile.d/admin 'apply'.
  • 12:21 jayme@deploy1002: helmfile [eqiad] DONE helmfile.d/admin 'apply'.
  • 12:20 jayme@deploy1002: helmfile [eqiad] START helmfile.d/admin 'apply'.
  • 12:19 jayme@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'apply'.
  • 12:18 jayme@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'apply'.
  • 12:17 jayme@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'apply'.
  • 12:16 jayme@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'apply'.
  • 12:14 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-goodfaith' for release 'main' .
  • 12:12 daniel@deploy1002: Finished scap: Backport for REST: detect mismatching value types in json request (T305973) (duration: 32m 48s)
  • 12:09 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-editquality-damaging' for release 'main' .
  • 12:08 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-drafttopic' for release 'main' .
  • 12:06 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-draftquality' for release 'main' .
  • 12:04 daniel@deploy1002: daniel: Continuing with sync
  • 12:03 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articletopic' for release 'main' .
  • 12:01 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:01 cgoubert@cumin1002: END (ERROR) - Cookbook sre.hosts.reimage (exit_code=97) for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 12:00 daniel@deploy1002: daniel: Backport for REST: detect mismatching value types in json request (T305973) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 11:58 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'revscoring-articlequality' for release 'main' .
  • 11:51 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'readability' for release 'main' .
  • 11:49 klausman@deploy1002: helmfile [ml-serve-eqiad] 'sync' command on namespace 'ores-legacy' for release 'main' .
  • 11:46 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'llm' for release 'main' .
  • 11:45 klausman@deploy1002: helmfile [ml-serve-codfw] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:45 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'experimental' for release 'main' .
  • 11:43 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging FebinBellamy out of all services on: 2188 hosts
  • 11:43 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'articletopic-outlink' for release 'main' .
  • 11:43 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging FebinBellamy out of all services on: 2188 hosts
  • 11:41 slyngshede@cumin1002: END (PASS) - Cookbook sre.idm.logout (exit_code=0) Logging AKhatun out of all services on: 2188 hosts
  • 11:41 slyngshede@cumin1002: START - Cookbook sre.idm.logout Logging AKhatun out of all services on: 2188 hosts
  • 11:40 cgoubert@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 11:39 daniel@deploy1002: Started scap: Backport for REST: detect mismatching value types in json request (T305973)
  • 11:37 klausman@deploy1002: helmfile [ml-serve-eqiad] Ran 'sync' command on namespace 'article-descriptions' for release 'main' .
  • 11:37 cgoubert@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on wikikube-worker2026.codfw.wmnet with reason: host reimage
  • 11:33 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-text_drmrs
  • 11:30 fabfur@cumin1002: END (PASS) - Cookbook sre.cdn.roll-upgrade-haproxy (exit_code=0) rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 11:29 btullis@cumin1002: END (FAIL) - Cookbook sre.wikireplicas.update-views (exit_code=99)
  • 11:27 btullis@cumin1002: START - Cookbook sre.wikireplicas.update-views
  • 11:19 cgoubert@cumin1002: START - Cookbook sre.hosts.reimage for host wikikube-worker2026.codfw.wmnet with OS bullseye
  • 10:57 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:49 claime: running /usr/local/bin/apply-config-kartotherian on maps-master
  • 10:47 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:47 claime: running /usr/local/bin/apply-config-kartotherian on maps-replica
  • 10:46 elukey@deploy1002: helmfile [ml-staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:46 elukey@deploy1002: helmfile [ml-staging-codfw] START helmfile.d/admin 'sync'.
  • 10:43 claime: running puppet on maps servers
  • 10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-upload_drmrs
  • 10:39 fabfur@cumin1002: START - Cookbook sre.cdn.roll-upgrade-haproxy rolling upgrade of HAProxy on A:cp-text_drmrs
  • 10:38 elukey@deploy1002: helmfile [staging-codfw] DONE helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [staging-codfw] START helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [staging-eqiad] DONE helmfile.d/admin 'sync'.
  • 10:37 elukey@deploy1002: helmfile [staging-eqiad] START helmfile.d/admin 'sync'.
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2130 (T364069)', diff saved to https://phabricator.wikimedia.org/P65580 and previous config saved to /var/cache/conftool/dbconfig/20240701-102633-marostegui.json
  • 10:26 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 10:26 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2130.codfw.wmnet with reason: Maintenance
  • 10:26 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65579 and previous config saved to /var/cache/conftool/dbconfig/20240701-102611-marostegui.json
  • 10:23 fabfur: upgrading A:cp-drmrs to haproxy 2.8.10 (T367756)
  • 10:11 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65578 and previous config saved to /var/cache/conftool/dbconfig/20240701-101104-marostegui.json
  • 09:55 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116', diff saved to https://phabricator.wikimedia.org/P65577 and previous config saved to /var/cache/conftool/dbconfig/20240701-095557-marostegui.json
  • 09:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65576 and previous config saved to /var/cache/conftool/dbconfig/20240701-094547-root.json
  • 09:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 100%: Repooling', diff saved to https://phabricator.wikimedia.org/P65575 and previous config saved to /var/cache/conftool/dbconfig/20240701-094341-root.json
  • 09:40 marostegui@cumin1002: dbctl commit (dc=all): 'Repooling after maintenance db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65574 and previous config saved to /var/cache/conftool/dbconfig/20240701-094050-marostegui.json
  • 09:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65573 and previous config saved to /var/cache/conftool/dbconfig/20240701-093042-root.json
  • 09:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 75%: Repooling', diff saved to https://phabricator.wikimedia.org/P65572 and previous config saved to /var/cache/conftool/dbconfig/20240701-092835-root.json
  • 09:26 btullis@deploy1002: helmfile [dse-k8s-eqiad] DONE helmfile.d/admin 'apply'.
  • 09:15 btullis@deploy1002: helmfile [dse-k8s-eqiad] START helmfile.d/admin 'apply'.
  • 09:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65570 and previous config saved to /var/cache/conftool/dbconfig/20240701-091536-root.json
  • 09:14 urbanecm@deploy1002: Finished scap: Backport for JsonSchemaValidator: Measure duration (T365245) (duration: 22m 15s)
  • 09:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 50%: Repooling', diff saved to https://phabricator.wikimedia.org/P65569 and previous config saved to /var/cache/conftool/dbconfig/20240701-091329-root.json
  • 09:06 urbanecm@deploy1002: urbanecm: Continuing with sync
  • 09:06 urbanecm@deploy1002: urbanecm: Backport for JsonSchemaValidator: Measure duration (T365245) synced to the testservers (https://wikitech.wikimedia.org/wiki/Mwdebug)
  • 09:00 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65568 and previous config saved to /var/cache/conftool/dbconfig/20240701-090031-root.json
  • 08:58 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 25%: Repooling', diff saved to https://phabricator.wikimedia.org/P65567 and previous config saved to /var/cache/conftool/dbconfig/20240701-085824-root.json
  • 08:51 urbanecm@deploy1002: Started scap: Backport for JsonSchemaValidator: Measure duration (T365245)
  • 08:45 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65566 and previous config saved to /var/cache/conftool/dbconfig/20240701-084525-root.json
  • 08:43 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 10%: Repooling', diff saved to https://phabricator.wikimedia.org/P65565 and previous config saved to /var/cache/conftool/dbconfig/20240701-084318-root.json
  • 08:30 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65564 and previous config saved to /var/cache/conftool/dbconfig/20240701-083020-root.json
  • 08:28 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 5%: Repooling', diff saved to https://phabricator.wikimedia.org/P65563 and previous config saved to /var/cache/conftool/dbconfig/20240701-082813-root.json
  • 08:18 jynus@cumin1002: dbctl commit (dc=all): 'Depool es1025 for backups T363812', diff saved to https://phabricator.wikimedia.org/P65562 and previous config saved to /var/cache/conftool/dbconfig/20240701-081811-jynus.json
  • 08:15 marostegui@cumin1002: dbctl commit (dc=all): 'db1169 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65561 and previous config saved to /var/cache/conftool/dbconfig/20240701-081514-root.json
  • 08:13 marostegui@cumin1002: dbctl commit (dc=all): 'db1195 (re)pooling @ 1%: Repooling', diff saved to https://phabricator.wikimedia.org/P65560 and previous config saved to /var/cache/conftool/dbconfig/20240701-081307-root.json
  • 08:07 marostegui@cumin1002: END (PASS) - Cookbook sre.mysql.clone (exit_code=0) of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
  • 07:44 elukey: `apt-get clean` on buil2001 to free some space in the root partition
  • 07:02 marostegui@cumin1002: dbctl commit (dc=all): 'Place db1195 in s1 T368871', diff saved to https://phabricator.wikimedia.org/P65559 and previous config saved to /var/cache/conftool/dbconfig/20240701-070243-marostegui.json
  • 06:36 marostegui@cumin1002: START - Cookbook sre.mysql.clone of db1169.eqiad.wmnet onto db1195.eqiad.wmnet
  • 06:36 marostegui@cumin1002: dbctl commit (dc=all): 'Depool db1169 T368871', diff saved to https://phabricator.wikimedia.org/P65558 and previous config saved to /var/cache/conftool/dbconfig/20240701-063601-root.json
  • 06:33 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db2116 (T364069)', diff saved to https://phabricator.wikimedia.org/P65557 and previous config saved to /var/cache/conftool/dbconfig/20240701-063344-marostegui.json
  • 06:33 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 06:33 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db2116.codfw.wmnet with reason: Maintenance
  • 05:02 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
  • 05:02 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2:00:00 on db1195.eqiad.wmnet with reason: Reboot
  • 04:56 marostegui: Failover m2 from db1195 to db1228 - T368494
  • 04:52 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
  • 04:51 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1:00:00 on db[2133,2160].codfw.wmnet,db[1195,1217,1228].eqiad.wmnet with reason: m2 switchover T368494
  • 04:50 marostegui: dbmaint eqiad Rebuild pagelinks table on s8 master T364069
  • 04:49 marostegui@cumin1002: dbctl commit (dc=all): 'Depooling db1156 (T367856)', diff saved to https://phabricator.wikimedia.org/P65556 and previous config saved to /var/cache/conftool/dbconfig/20240701-044945-marostegui.json
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 2 days, 0:00:00 on an-redacteddb1001.eqiad.wmnet,clouddb[1014,1018,1021].eqiad.wmnet,db1155.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: END (PASS) - Cookbook sre.hosts.downtime (exit_code=0) for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance
  • 04:49 marostegui@cumin1002: START - Cookbook sre.hosts.downtime for 1 day, 0:00:00 on db1156.eqiad.wmnet with reason: Maintenance


Archives

See Server Admin Log/Archives.