Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
BBC RussianHomePhabricator
Log In
Maniphest T368366

Upgrade K8s docker images running in Wikimedia production on Buster to either Bullseye or Bookworm
Open, MediumPublic

Description

Hi folks!

Our dear Buster is not going to be supported soon by Debian, so we should upgrade to either Bullseye or Bookworm. Debmonitor is now able to report the Debian version-id for a lot of Docker images in our registry:
https://debmonitor.wikimedia.org/images/

Core images:

  • envoy
  • envoy-future
  • cfssl-issuer
  • coredns
  • helm-state-metrics
  • echoserver (not really but used for testing by folks etc..)

Mediawiki-related:

  • mcrouter
  • prometheus-mcrouter-exporter

Misc:

  • fluent-bit (Removed since not used anymore)
  • haproxy

Will add more as I review the Debmonitor's report in more details.

Details

SubjectRepoBranchLines +/-
generated-data-platform/datasets/image-suggestionsmain+3 -2
operations/deployment-chartsmaster+6 -0
operations/puppetproduction+2 -2
operations/deployment-chartsmaster+6 -1
operations/deployment-chartsmaster+67 -2
operations/deployment-chartsmaster+8 -2
operations/puppetproduction+1 -2
operations/deployment-chartsmaster+2 -0
operations/docker-images/production-imagesmaster+7 -1
operations/puppetproduction+1 -0
operations/deployment-chartsmaster+0 -3
operations/deployment-chartsmaster+1 -0
operations/deployment-chartsmaster+2 -2
operations/deployment-chartsmaster+10 -3
operations/deployment-chartsmaster+1 -0
operations/deployment-chartsmaster+1 -0
operations/docker-images/production-imagesmaster+19 -7
operations/docker-images/production-imagesmaster+10 -3
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+14 -2
operations/docker-images/production-imagesmaster+7 -1
operations/docker-images/production-imagesmaster+14 -2
operations/docker-images/production-imagesmaster+8 -2
operations/software/cfssl-issuermain+1 -1
Show related patches Customize query in gerrit

Event Timeline

Dupe of T362981, I thik.

We can keep T362981 as subtask, the other images do need to be migrate as well.

Change #1049577 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] coredns: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049577

Change #1049578 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] envoy: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049578

Change #1049586 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] helm-state-metrics: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049586

Change #1049587 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] mcrouter: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049587

Change #1049588 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] prometheus-exporters: upgrade mcrouter and statsd to Bookworm

https://gerrit.wikimedia.org/r/1049588

Change #1049590 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] service-checker: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049590

Change #1049591 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] nutcracker: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049591

Change #1049825 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/software/cfssl-issuer@main] Makefile: use 'go install' instead of 'go get'

https://gerrit.wikimedia.org/r/1049825

Change #1049828 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] echoserver: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049828

Change #1049825 merged by Elukey:

[operations/software/cfssl-issuer@main] Makefile: use 'go install' instead of 'go get'

https://gerrit.wikimedia.org/r/1049825

Change #1049838 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] cfssl-issuer: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049838

Change #1049577 merged by Elukey:

[operations/docker-images/production-images@master] coredns: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049577

Change #1049578 merged by Elukey:

[operations/docker-images/production-images@master] envoy: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049578

Change #1049586 merged by Elukey:

[operations/docker-images/production-images@master] helm-state-metrics: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049586

Change #1049588 merged by Elukey:

[operations/docker-images/production-images@master] prometheus-exporters: upgrade mcrouter and statsd to Bookworm

https://gerrit.wikimedia.org/r/1049588

Change #1049590 merged by Elukey:

[operations/docker-images/production-images@master] service-checker: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049590

Change #1049591 merged by Elukey:

[operations/docker-images/production-images@master] nutcracker: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049591

Change #1049828 merged by Elukey:

[operations/docker-images/production-images@master] echoserver: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049828

Change #1049838 merged by Elukey:

[operations/docker-images/production-images@master] cfssl-issuer: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049838

Change #1050568 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: upgrade coredns to 1.8.7-2

https://gerrit.wikimedia.org/r/1050568

Change #1050569 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: upgrade cfssl-issuer's Docker image

https://gerrit.wikimedia.org/r/1050569

Change #1050570 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] api,rest-gateway: upgrade Envoy version

https://gerrit.wikimedia.org/r/1050570

Change #1050571 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: update helm-state-metrics' Docker image version

https://gerrit.wikimedia.org/r/1050571

Change #1049587 merged by Elukey:

[operations/docker-images/production-images@master] mcrouter: upgrade to Bookworm

https://gerrit.wikimedia.org/r/1049587

Change #1050568 merged by jenkins-bot:

[operations/deployment-charts@master] admin_ng: upgrade coredns to 1.8.7-2

https://gerrit.wikimedia.org/r/1050568

Change #1050569 merged by Elukey:

[operations/deployment-charts@master] admin_ng: upgrade cfssl-issuer's Docker image

https://gerrit.wikimedia.org/r/1050569

Change #1051111 had a related patch set uploaded (by JMeybohm; author: JMeybohm):

[operations/deployment-charts@master] cfssl-issuer: Add container securityContext

https://gerrit.wikimedia.org/r/1051111

Change #1051111 merged by jenkins-bot:

[operations/deployment-charts@master] cfssl-issuer: Add container securityContext

https://gerrit.wikimedia.org/r/1051111

Change #1051132 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] admin_ng: remove coredns image tag override for ml-staging-codfw

https://gerrit.wikimedia.org/r/1051132

Change #1051132 merged by Elukey:

[operations/deployment-charts@master] admin_ng: remove coredns image tag override for ml-staging-codfw

https://gerrit.wikimedia.org/r/1051132

Change #1050570 merged by Elukey:

[operations/deployment-charts@master] api,rest-gateway: upgrade Envoy version

https://gerrit.wikimedia.org/r/1050570

Change #1050571 merged by Elukey:

[operations/deployment-charts@master] admin_ng: update helm-state-metrics' Docker image version

https://gerrit.wikimedia.org/r/1050571

elukey triaged this task as Medium priority.

Built and rolled out the images listed in the description to staging envs. The next step is to roll them out to production (all clusters).

Caveats:

  • the envoy image is spread to a ton of containers since we use it as mesh sidecar, so rolling out the change will be interesting :D (maybe we can just let next deployments to pick it up over time).

Change #1051402 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/docker-images/production-images@master] wmfdebug: Upgrade to Bookworm

https://gerrit.wikimedia.org/r/1051402

Change #1051740 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mw-mcouter: use bookworm images

https://gerrit.wikimedia.org/r/1051740

Change #1052080 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::builder: add mcrouter uid for docker-pkg

https://gerrit.wikimedia.org/r/1052080

Change #1052080 merged by Elukey:

[operations/puppet@production] role::builder: add mcrouter uid for docker-pkg

https://gerrit.wikimedia.org/r/1052080

From the mcrouter side of things, we hope to have T346690 sorted soon, which will mean that mw-* pods will not have a mcrouter container, but they will be using the mw-mrouter daemonset. In other words, rollout and deployment will be faster.

Unless anything unexpected comes up, we may roll out the bookworm images sometime next week or the week after.

Change #1051402 merged by Elukey:

[operations/docker-images/production-images@master] wmfdebug: Upgrade to Bookworm

https://gerrit.wikimedia.org/r/1051402

Change #1052263 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/deployment-charts@master] services: upgrade mesh's envoy Docker version

https://gerrit.wikimedia.org/r/1052263

Change #1052263 merged by jenkins-bot:

[operations/deployment-charts@master] services: upgrade mesh's envoy Docker version

https://gerrit.wikimedia.org/r/1052263

Change #1052691 had a related patch set uploaded (by Elukey; author: Elukey):

[operations/puppet@production] role::deployment_server::kubernetes: update Envoy's version

https://gerrit.wikimedia.org/r/1052691

Change #1052691 merged by Elukey:

[operations/puppet@production] role::deployment_server::kubernetes: update Envoy's version

https://gerrit.wikimedia.org/r/1052691

In T368523 we’re seeing an “unable to get local issuer certificate” error that may or may not be related to the new Envoy version; it’s not very urgent (only affects a test wiki) but I’d be very thankful if someone could take a look :)

Edit: Further investigation shows the failure is not related to envoy after all.

Change #1054367 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-debug

https://gerrit.wikimedia.org/r/1054367

Change #1054368 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-debug

https://gerrit.wikimedia.org/r/1054368

Change #1054367 merged by jenkins-bot:

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-debug

https://gerrit.wikimedia.org/r/1054367

Change #1054368 merged by jenkins-bot:

[operations/deployment-charts@master] mcrouter: test bookworm image on mw-api-int

https://gerrit.wikimedia.org/r/1054368

Change #1054507 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/puppet@production] kubernetes: update mcrouter images to bookworm

https://gerrit.wikimedia.org/r/1054507

Change #1054511 had a related patch set uploaded (by Effie Mouzeli; author: Effie Mouzeli):

[operations/deployment-charts@master] mw-mcrouter: use bookworm images

https://gerrit.wikimedia.org/r/1054511

Change #1054511 merged by jenkins-bot:

[operations/deployment-charts@master] mw-mcrouter: use bookworm images

https://gerrit.wikimedia.org/r/1054511

Change #1054507 merged by Effie Mouzeli:

[operations/puppet@production] kubernetes: update mcrouter images to bookworm

https://gerrit.wikimedia.org/r/1054507

Jdforrester-WMF renamed this task from Upgrade K8s docker images to running in production on Buster with either Bullseye or Bookworm to Upgrade K8s docker images running in Wikimedia production on Buster to either Bullseye or Bookworm.Wed, Jul 17, 7:19 PM

I used this horrible bash script to get a breakdown of image versions deployed on a given cluster:

for ns in `kubectl get ns | cut -d " " -f 1 | grep -v NAME`; do echo -e "\nnamespace: $ns\n"; kubectl get pods -n $ns -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |tr -s '[[:space:]]' '\n' |sort |uniq -c; done

This is still painful but keeping a note in here anyway :)

I used this horrible bash script to get a breakdown of image versions deployed on a given cluster:

for ns in `kubectl get ns | cut -d " " -f 1 | grep -v NAME`; do echo -e "\nnamespace: $ns\n"; kubectl get pods -n $ns -o jsonpath="{.items[*].spec['initContainers', 'containers'][*].image}" |tr -s '[[:space:]]' '\n' |sort |uniq -c; done

This is still painful but keeping a note in here anyway :)

Nice! I keep a collection of horrors like that one at https://wikitech.wikimedia.org/wiki/Kubernetes/Kubectl/Cheat_Sheet please feel free to extend! :)

Change #1051740 abandoned by Effie Mouzeli:

[operations/deployment-charts@master] mw-mcouter: use bookworm images

Reason:

already done

https://gerrit.wikimedia.org/r/1051740

Change #1058588 had a related patch set uploaded (by Elukey; author: Elukey):

[generated-data-platform/datasets/image-suggestions@main] blubber: update build syntax and use Bookworm and Golang 1.21

https://gerrit.wikimedia.org/r/1058588

From docker report (k8s images) set to work only with Bullseye+ images:

Jul 29 15:37:59 build2001 docker-report-k8s[4134263]: 2024-07-29 15:37:59,515 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/blubber-buildkit:v0.12.0. The image is not supported.
Jul 29 15:48:35 build2001 docker-report-k8s[4134263]: 2024-07-29 15:48:35,508 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/generated-data-platform-datasets-image-suggestions:stable. The image is not supported.
Jul 29 16:39:50 build2001 docker-report-k8s[4134263]: 2024-07-29 16:39:50,697 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-libs-shellbox:video. The image is not supported.
Jul 29 16:39:51 build2001 docker-report-k8s[4134263]: 2024-07-29 16:39:51,181 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-multiversion:protoprod. The image is not supported.
Jul 29 16:45:51 build2001 docker-report-k8s[4134263]: 2024-07-29 16:45:51,655 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-services-geoshapes:2021-03-04-093059-publish. The image is not supported.
Jul 29 16:46:30 build2001 docker-report-k8s[4134263]: 2024-07-29 16:46:30,345 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-services-kartotherian:kartotherian. The image is not supported.
Jul 29 16:57:27 build2001 docker-report-k8s[4134263]: 2024-07-29 16:57:27,261 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/mediawiki-webserver:production. The image is not supported.
Jul 29 17:01:18 build2001 docker-report-k8s[4134263]: 2024-07-29 17:01:18,096 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/research-mwaddlink:test. The image is not supported.
Jul 29 17:03:55 build2001 docker-report-k8s[4134263]: 2024-07-29 17:03:55,491 WARNING[docker-report] Unable to create a report for docker-registry.wikimedia.org/wikimedia/wikimedia-portals:2024-07-29-122629-production. The image is not supported.