Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
BBC RussianHomePhabricator
Log In

Data Engineering and Event Platform TeamGroup
ArchivedPublic

Watchers

  • This project does not have any watchers.
  • View All

Details

Description

See Data-Engineering instead.

This was for the subteam Data Engineering and Event Platform, part of Data Platform Engineering.

Recent Activity

Tue, Jun 18

Maintenance_bot removed a project from T325315: Add support for redirects in CirrusSearch: Patch-For-Review.
Tue, Jun 18, 9:31 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Data Engineering and Event Platform Team, MW-1.41-notes (1.41.0-wmf.16; 2023-07-04), Data-Engineering, Event-Platform, Discovery-Search (Current work)
ReleaseTaggerBot added a project to T325315: Add support for redirects in CirrusSearch: MW-1.43-notes (1.43.0-wmf.11; 2024-06-25).
Tue, Jun 18, 9:00 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Data Engineering and Event Platform Team, MW-1.41-notes (1.41.0-wmf.16; 2023-07-04), Data-Engineering, Event-Platform, Discovery-Search (Current work)
gerritbot added a comment to T325315: Add support for redirects in CirrusSearch.

Change #1047166 merged by jenkins-bot:

[mediawiki/extensions/EventBus@master] Defend against RedirectTarget getPage returning null

https://gerrit.wikimedia.org/r/1047166

Tue, Jun 18, 8:56 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Data Engineering and Event Platform Team, MW-1.41-notes (1.41.0-wmf.16; 2023-07-04), Data-Engineering, Event-Platform, Discovery-Search (Current work)
gerritbot added a project to T325315: Add support for redirects in CirrusSearch: Patch-For-Review.
Tue, Jun 18, 7:26 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Data Engineering and Event Platform Team, MW-1.41-notes (1.41.0-wmf.16; 2023-07-04), Data-Engineering, Event-Platform, Discovery-Search (Current work)
gerritbot added a comment to T325315: Add support for redirects in CirrusSearch.

Change #1047166 had a related patch set uploaded (by Ottomata; author: Ottomata):

[mediawiki/extensions/EventBus@master] Defend against RedirectTarget getPage returning null

https://gerrit.wikimedia.org/r/1047166

Tue, Jun 18, 7:25 PM · MW-1.43-notes (1.43.0-wmf.11; 2024-06-25), Data Engineering and Event Platform Team, MW-1.41-notes (1.41.0-wmf.16; 2023-07-04), Data-Engineering, Event-Platform, Discovery-Search (Current work)

Feb 12 2024

lbowmaker closed T304926: CI/CD Pipeline Design, a subtask of T304409: [Airflow] Implement CI/CD pipelines for shared infrastructure., as Resolved.
Feb 12 2024, 2:40 PM · Epic, Data Engineering and Event Platform Team, Data Pipelines
lbowmaker closed T333006: [NEEDS GROOMING] Support migration of simple (Hive > Hive) jobs, a subtask of T332997: Support for Product Analytics Data Pipelines Migration to Airflow, as Resolved.
Feb 12 2024, 2:03 PM · Data Engineering and Event Platform Team, Data Pipelines, Epic
lbowmaker closed T304929: CI/CD Pipeline Implementation , a subtask of T304409: [Airflow] Implement CI/CD pipelines for shared infrastructure., as Declined.
Feb 12 2024, 2:01 PM · Epic, Data Engineering and Event Platform Team, Data Pipelines

Nov 29 2023

calbon moved T332953: Migrate PipelineLib repos to GitLab from Watching to 2023-2024 Q3 Done on the Machine-Learning-Team board.
Nov 29 2023, 2:19 PM · Wikimedia-Developer-Portal, cloud-services-team, Data Engineering and Event Platform Team, Data-Platform-SRE, API Platform, Patch-For-Review, [DEPRECATED] wdwb-tech, Wikidata, Security-Team, SRE, Wikidata-Campsite, Anti-Harassment, Wikispeech, Structured-Data-Backlog, Platform Engineering, Wikipedia-iOS-App-Backlog, Wikipedia-Android-App-Backlog, Editing-team, Content-Transform-Team, Metrics Platform Backlog, Machine-Learning-Team, GitLab (Project Migration), Release-Engineering-Team (Priority Backlog 📥)

Nov 22 2023

Maintenance_bot removed a project from T342252: Migrate rdf-streaming-updater to connect to mw-on-k8s: Patch-For-Review.
Nov 22 2023, 10:31 AM · Data Engineering and Event Platform Team, SRE, serviceops, MW-on-K8s
gerritbot added a comment to T342252: Migrate rdf-streaming-updater to connect to mw-on-k8s.

Change 941900 abandoned by Clément Goubert:

[operations/deployment-charts@master] mw-api-int: Raise number of replicas to 10

Reason:

We've gone way beyond 10 replicas by now

https://gerrit.wikimedia.org/r/941900

Nov 22 2023, 10:28 AM · Data Engineering and Event Platform Team, SRE, serviceops, MW-on-K8s

Nov 10 2023

Aklapper set the color for Data Engineering and Event Platform Team to Red.
Nov 10 2023, 3:55 PM
lbowmaker archived Data Engineering and Event Platform Team.
Nov 10 2023, 2:52 PM
lbowmaker moved T336513: Figure out a way to automatize deployment of the spark assembly file from Backlog to SRE on the Data-Platform board.
Nov 10 2023, 2:47 PM · Data-Engineering, Data-Platform-SRE, Data-Platform, Data Pipelines
lbowmaker moved T262201: Gather all data-purge into a single job from Data Products & Metrics to Icebox (not considered in current quarter) on the Data-Engineering board.
Nov 10 2023, 2:26 PM · Data Pipelines, Data-Engineering
lbowmaker moved T349640: [Event Platform] eventutilities-python should convert pyflink Instants to python DateTimes from Incoming (new tickets) to Event Platform Maintenance (current quarter) on the Data-Engineering board.
Nov 10 2023, 1:43 PM · Patch-For-Review, Data-Engineering, Event-Platform
lbowmaker moved T340471: [Iceberg Migration] P.O.C. on Iceberg sensor using Snapshot metadata to keep status of updates from Incoming (new tickets) to Apache Iceberg Migration on the Data-Engineering board.
Nov 10 2023, 1:32 PM · Data-Engineering
lbowmaker moved T340466: [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates from Incoming (new tickets) to Apache Iceberg Migration on the Data-Engineering board.
Nov 10 2023, 1:32 PM · Data-Platform-SRE, Data-Engineering
lbowmaker moved T340463: [Iceberg Migration] P.O.C. on Iceberg sensor using Iceberg table to keep status of updates from Incoming (new tickets) to Apache Iceberg Migration on the Data-Engineering board.
Nov 10 2023, 1:32 PM · Data-Engineering
lbowmaker moved T338065: [Iceberg Migration] Implement mechanism for automatic Iceberg data deletion and optimization from Incoming (new tickets) to Apache Iceberg Migration on the Data-Engineering board.
Nov 10 2023, 1:32 PM · Dumps 2.0 (Kanban Board), Data-Engineering
lbowmaker moved T347690: [Iceberg Migration] Migrate pageview tables to Iceberg from Incoming (new tickets) to Apache Iceberg Migration on the Data-Engineering board.
Nov 10 2023, 1:32 PM · Data-Engineering
lbowmaker moved T333013: [Iceberg Migration] Apache Iceberg Migration from Incoming (new tickets) to Apache Iceberg Migration on the Data-Engineering board.
Nov 10 2023, 1:32 PM · Data-Engineering, Epic
lbowmaker edited projects for T333013: [Iceberg Migration] Apache Iceberg Migration, added: Data-Engineering; removed Data Pipelines.
Nov 10 2023, 1:31 PM · Data-Engineering, Epic
lbowmaker edited projects for T338065: [Iceberg Migration] Implement mechanism for automatic Iceberg data deletion and optimization, added: Data-Engineering; removed Data Pipelines.
Nov 10 2023, 1:31 PM · Dumps 2.0 (Kanban Board), Data-Engineering
lbowmaker edited projects for T340466: [Iceberg Migration] P.O.C. on Iceberg sensor using Postgres table to keep status of updates, added: Data-Engineering; removed Data Pipelines.
Nov 10 2023, 1:31 PM · Data-Platform-SRE, Data-Engineering
lbowmaker added a project to T340463: [Iceberg Migration] P.O.C. on Iceberg sensor using Iceberg table to keep status of updates: Data-Engineering.
Nov 10 2023, 1:31 PM · Data-Engineering
lbowmaker edited projects for T340471: [Iceberg Migration] P.O.C. on Iceberg sensor using Snapshot metadata to keep status of updates, added: Data-Engineering; removed Data Pipelines.
Nov 10 2023, 1:31 PM · Data-Engineering
lbowmaker edited projects for T347690: [Iceberg Migration] Migrate pageview tables to Iceberg, added: Data-Engineering; removed Data Pipelines.
Nov 10 2023, 1:31 PM · Data-Engineering

Nov 9 2023

TBurmeister added a subtask for T350911: Redesign Data Platform docs on Wikitech: T350914: Create Data Platform landing page(s).
Nov 9 2023, 7:37 PM · Epic, Data-Engineering, Goal, Tech-Docs-Team
TBurmeister moved T350911: Redesign Data Platform docs on Wikitech from Data Eng Backlog to Radar (External Teams) on the Data Engineering and Event Platform Team board.
Nov 9 2023, 7:32 PM · Epic, Data-Engineering, Goal, Tech-Docs-Team
TBurmeister added a project to T350911: Redesign Data Platform docs on Wikitech: Data Engineering and Event Platform Team.
Nov 9 2023, 7:32 PM · Epic, Data-Engineering, Goal, Tech-Docs-Team
Ahoelzl added a project to T349472: Search dag image_suggestions_weekly failed waiting for analytics_platform_eng.image_suggestions_search_index_delta/snapshot=2023-10-09: Data Engineering and Event Platform Team.
Nov 9 2023, 3:56 PM · Data Engineering and Event Platform Team, Discovery-Search (Current work), Structured-Data-Backlog, Image-Suggestions

Nov 8 2023

brouberol added a comment to T304615: Airflow scheduler and webserver logs should be readable by airflow instance admins.

Same goes for the following groups:

  • analytics-research-admins
  • airflow-search-admins
  • airflow-analytics-product-admins
  • airflow-wmde-admins
Nov 8 2023, 2:34 PM · Data-Engineering, Data-Platform-SRE
brouberol added a comment to T304615: Airflow scheduler and webserver logs should be readable by airflow instance admins.

Is this ticket still needed? I see that admin groups such as analytics-platform-eng-admins can run sudo journalctl -u airflow-* (https://gerrit.wikimedia.org/r/plugins/gitiles/operations/puppet/+/production/modules/admin/data/data.yaml#1034)

Nov 8 2023, 2:32 PM · Data-Engineering, Data-Platform-SRE
BTullis closed T347938: Prepare and check storage layer for fonwiki as Resolved.

This is done now. Once again it took two executions of the cookbook, but the openstack error was slightly different this time.
That error was:

keystoneauth1.exceptions.connection.ConnectFailure: Unable to establish connection to https://openstack.eqiad1.wikimediacloud.org:29001/v2/zones/55dc12de-d948-4c52-9ebb-f90705b10864/recordsets?name=nawiktionary.web.db.svc.wikimedia.cloud.: ('Connection aborted.', RemoteDisconnected('Remote end closed connection without response'))

However, after the second run everything seems fine.

btullis@tools-sgebastion-10:~$ sql fonwiki
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
Nov 8 2023, 12:00 PM · Data-Platform-SRE, Data Engineering and Event Platform Team, Data-Services, DBA
Stashbot added a comment to T347938: Prepare and check storage layer for fonwiki.

Mentioned in SAL (#wikimedia-operations) [2023-11-08T11:04:48Z] <btullis@cumin1001> Added views for new wiki: fonwiki T347938

Nov 8 2023, 11:04 AM · Data-Platform-SRE, Data Engineering and Event Platform Team, Data-Services, DBA

Nov 7 2023

Ottomata created T350732: workflow_utils conda gitlab CI templates broken.
Nov 7 2023, 8:00 PM · Patch-For-Review, Data Engineering and Event Platform Team (Sprint 4), Data-Engineering
BTullis added a comment to T349910: Enable the TagManager plugin for Matomo.

Adding Security-Team to request their review.
Has your team any concerns about our enabling the TagManager plugin for Matomo?
Thanks.

Has the code changed significantly in the last 6 months or so? If not, this would likely only warrant a privacy review - I'll tag Privacy Engineering so it gets in their queue.

No, the TagManager plugin was already a part of the Matomo codebase when it was originally installed. Thanks, we will wait on the privacy review.

Nov 7 2023, 4:04 PM · Data-Engineering, SecTeam-Processed, Privacy Engineering, Data-Platform-SRE
JAllemandou added a comment to T350617: Use hive metastore when registering views.

The mediawiki-history job was built at a time our spark and hive integration was not so good due to versions of hive mismatch. We overcame the issue by reading parquet files directly instead of their hive counterpart. We now should be able to read from Hive and prevent the problem of schema changes.

Nov 7 2023, 10:52 AM · Data-Engineering

Nov 6 2023

Maintenance_bot removed a project from T350489: Airflow DAG mediawiki_history_denormalize failed with NPE: Patch-For-Review.
Nov 6 2023, 7:10 PM · Data-Engineering
Milimetric created T350617: Use hive metastore when registering views.
Nov 6 2023, 6:58 PM · Data-Engineering
CodeReviewBot added a comment to T350489: Airflow DAG mediawiki_history_denormalize failed with NPE.

milimetric merged https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/535

Nov 6 2023, 6:37 PM · Data-Engineering
xcollazo assigned T350489: Airflow DAG mediawiki_history_denormalize failed with NPE to Milimetric.
Nov 6 2023, 5:32 PM · Data-Engineering
gerritbot added a comment to T350489: Airflow DAG mediawiki_history_denormalize failed with NPE.

Change 971978 merged by Milimetric:

[analytics/refinery/source@master] Update project namespace map view

https://gerrit.wikimedia.org/r/971978

Nov 6 2023, 4:21 PM · Data-Engineering
CodeReviewBot added a comment to T350489: Airflow DAG mediawiki_history_denormalize failed with NPE.

milimetric opened https://gitlab.wikimedia.org/repos/data-engineering/airflow-dags/-/merge_requests/535

Nov 6 2023, 4:02 PM · Data-Engineering
Gehel moved T346046: [Search Update Pipeline] Source streams for private wikis from Incoming to Blocked/Waiting on the Discovery-Search (Current work) board.
Nov 6 2023, 3:57 PM · Patch-For-Review, Discovery-Search (Current work), Data-Engineering, CirrusSearch
gerritbot added a project to T350489: Airflow DAG mediawiki_history_denormalize failed with NPE: Patch-For-Review.
Nov 6 2023, 3:24 PM · Data-Engineering
gerritbot added a comment to T350489: Airflow DAG mediawiki_history_denormalize failed with NPE.

Change 971978 had a related patch set uploaded (by Milimetric; author: Milimetric):

[analytics/refinery/source@master] Update project namespace map view

https://gerrit.wikimedia.org/r/971978

Nov 6 2023, 3:24 PM · Data-Engineering
xcollazo added a comment to T350489: Airflow DAG mediawiki_history_denormalize failed with NPE.

Perhaps related to recent changes on mediawikihistory to pickup more info from Dumps?

CC @Milimetric

Nov 6 2023, 3:13 PM · Data-Engineering
gerritbot added a comment to T286344: Remove StreamConfig::INTERNAL_SETTINGS logic from EventStreamConfig and do it in EventLogging client instead.

Change 910471 abandoned by Ottomata:

[operations/deployment-charts@master] Remove deprecated all_settings streamconfigs param

Reason:

Done in a different patch

https://gerrit.wikimedia.org/r/910471

Nov 6 2023, 2:47 PM · Data Engineering and Event Platform Team, Data-Engineering, Metrics Platform Backlog (Metrics Platform Kanban), MW-1.41-notes (1.41.0-wmf.2; 2023-03-27), MW-1.40-notes (1.40.0-wmf.24; 2023-02-20), Patch-For-Review, Event-Platform