< Wikimedia Performance Team

2022/2023

Status

  • 🔴 < 30% done
  • 🟡 < 70% done
  • 🟢 70 to 100% done

Q4 (April-June)

Goals Status Assignees notes
ResourceLoader: Implement support for Source Maps 🟡 Tim & Timo
Create a runbook for Save Timing alerts 🔴 Aaron
Run ForeignResourceManager verification on MediaWiki core commits 🟢 Timo
Per component/extension profiling of hooks and pre-send DeferredUpdates with Grafana dashboards 🟡 Aaron
Raise Grade A JavaScript requirement from ES5 (2009) to ES6 (2015) 🟢 Timo
Move synthetic tests from AWS to bare metal 🟢 Peter
Record long tasks in navtiming 🟢 Barakat
Reduce complexity of LB and LBF 🟡 Amir, Aaron, Timo
Increase retention of ArcLamp SVGs to 2 years 🟢 Timo
Documentation improvement for FE and BE guidelines and best practices 🟢 Timo, Peter, Aaron
Support Serve production traffic via Kubernetes 🟢 Timo
LoadMonitor connection weighting reimagined 🔴 Aaron, Tim
Decommission coal and coal web 🟢
Reliable measure how fast a Wikipedia article would be without JavaScript 🔴 Peter

Q3 (Jan-Mar)

Goals Status Assignees notes
Onboard two one new team members 🟢 Aaron, Peter, Tim, Timo, Larissa End of February we have 2 new team members joining us
Synthetic testing on bare metal 🟢 Peter In Q2 we evaluated in-house and external suppliers. We ended up choosing Hetzner. The server is already available and accounted for in our budget
Create blog entries about multi-dc 🟢 Aaron We plan to write 2 blog posts about multi-DC. One targeting a non-technical audience and the second focuses on technical audiences.
Navtiming on prometheus 🟢 Peter & Timo Blocker: the prometheus python client turned out to be a bottleneck in our non-parallelized setup. We are currently exploring possibilities to reduce the cardinality
Add per-request flamegraph option to WikimediaDebug 🟢 Tim & Timo
Create a runbook for Save Timing alerts 🔴 Aaron
LoadMonitor connection weighing improvements 🟢 Aaron
ResourceLoader: Implement support for Source Maps 🟡 Tim & Timo
Per component/extension profiling of hooks and pre-send DeferredUpdates with Grafana dashboards 🟢 Aaron
Out of perf scope support for various teams 🟡 Tim

Q2 (Oct-Dec)

Goals Status Assignees notes
Navtiming on Prometheus 🟡 Peter & Timo
Expand navigation timing metrics to include user experience metrics and modernise navigation timing 🟢 Peter & Timo
Move Synthetic tests to bare metal 🔴 Larissa & Peter Moved to Q3
Understand the status of SLOs on Product side 🟢 Larissa Have been talking with Suman and Desiree trying to restart the discussions
Bitbar: Add firefox capabilities 🟢 Peter
Multi-DC BagOStuff interfaces Aaron
PHP 8.1 🟢 Tim & Timo Done from our side. Waiting on Service Ops. They are prioritizing mediawiki-on-k8s until end of Q3, but will be able to tackle PHP 8.1 in the beginning of Q4 2022/2023
Cross-DC query Alerts 🟢 Aaron We haven't seen rate exceed 1/sec in the last 7 days. So we are considering this done
Better LoadBalancer connection pooling 🟢 Aaron
Update how we measure Layoutshift to reflect CLS metrics 🟢 Peter
Find someone to run user interviews 🟡 Larissa Both Desiree and Marshal cannot help us at this time. Marshal suggested I run a couple of interviews on my own first, but we currently don't have the bandwidth to come up with a solid interview script and do the necessary pre-work


2021-2022

See also internal 2021-2022 roadmap and internal Jan-Mar 2022 achievements.

Outreach:

  • Support product development by Inuka Team (Wikipedia Preview), Reading Web (NearbyPages, and RelatedArticles), CPT (WebAuthn), Design Systems Team (WVUI/Vue.js), and WMDE (Kartographer-revid)
  • Participate in SLO working group to help establish an SLO around MediaWiki Save Timing SLO.
  • Participate in W3C WebPerf WG, provide feedback to Chrome team on Google Web Vitals and Chrome bugs.
  • Organise four Web Perf Hero awards.

Insights:

  • Migrate our device lab to BitBar.
  • Evaluate and build proof-of-concept synthetic testing on bare metal instead of at AWS.
  • Write runbooks for investigating RUM alerts, WPT alerts, and WPR alerts.
  • Support to SRE Observablity in developing a new Prometheus-compatible MW-Stats client library.
  • On-going maintenance of WebPageTest, WebPageReplay, and Fresh-node.

Improvement:

  • Multi-DC: Deploy MainStash DB and migrate away from Redis-based MainStash (T212129).
  • Multi-DC: MariaDB-TLS tested and enabled for all wikis.
  • Multi-DC: CDN routing logic written and deployed to Beta and Prod behind feature flag.
  • ResourceLoader debug mode v2, reduce wait time on complex pages from ~1 minute to ~1 second.
  • Guidance and code review for DBA-led normalization of "templatelinks" MediaWiki database table, to reduce storage pressure and improve query performance. (T299417)
  • Support to SRE ServiceOps for MW-on-K8s project.
  • Develop precache-based GlobalUserEdit API for CentralAuth, following an incident.

2020-2021

See also internal 2020-2021 roadmap.

Outreach:

Insights:

  • Expand navtiming RUM metrics pipeline with new Layout Shift metric.
  • Kobiton setup for our device lab, expand to include iOS in addition to Android.
  • Explore BitBar for our device lab.
  • Explore moving WPT/WPR infra away from AWS.

Improvement:

  • Multi-DC: Implement multi-dc strategy for ChronologyProtector (T254634).
  • Multi-DC: Determine and start implementing strategy for MainStash DB (T212129).

2019-2020

See also 2019-20 Q1#Performance and internal 2019-2020 roadmap.

  • Outreach:
    • Support product launches by Parsing Team (Parsoid-PHP launch), Editing Team (DiscussionTools launch), Growth Team (GrowthExperiments launch), and Inuka Team (Wikipedia KaiOS app launch).
    • Support RelEng around establishing production error triage workflows and semi-automation thereof.
    • Organise the first Web Performance conference at FOSDEM (blogpost, recordings).
    • Organise WMF-wide frontend web performance training.
    • Provide performance expertise to Frontend Architecture Working Group (FAWG).
    • Get published in the Web Performance Calendar (2x: Measuring LT and FID, Big questions on RUM)
  • Insights:
    • Organise and oversee implementation of First Paint metric in WebKit for Apple Safari (blogpost).
    • Introduce detailed metrics from WANCache time spans for MediaWiki developers (T197849).
    • Explore new RUM metrics for navtiming pipeline, such as First Input Delay.
    • Participate in Chrome Origin trial for Element Timing and provide feedback on upcoming W3C standard (blogpost).
    • Release WikimediaDebug v2 (blogpost).
    • Create our own Mobile Device Lab.
    • On-going maintenance of WebPageTest, WebPageReplay, and XHGui (Migrate from Mongo to MySQL).
  • Improvements:
    • PHP7 Transition: Finish the transition from HHVM and support SRE with instrumentation, sampling, and benchmarking.
    • Multi-DC: Start work on MainStash DB.
    • Reduce MediaWiki backend startup time to reclaim PHP7 latency increase in certain areas. (T233886, T189966).
    • Reduce frontend page startup cost in ResourceLoader (blogpost).

2018-2019

See also 2018-19 Q1, 2018-19 Q2, and internal 2018-2019 roadmap.

Insights:

  • Annual Plans/FY2019/TEC1: Current levels of service are maintained and/or improved.
  • Expand synthetic testing to more non-English wikis.
  • Introduce Excimer, sampling profiler for PHP 7 to replace HHVM Xenon (T176916).
  • Introduce Fresnel, performance testing in MediaWiki CI jobs. (T133646).
  • Research and develop and test new RUM metrics that better match user perception (T187299, Rossi 2019 paper).

Outreach:

Improvement:

  • Annual Plans/FY2019/TEC1: Improve MediaWiki availability and reduce read-only impact from data center switchovers.
  • Annual Plans/FY2019/TEC4: PHP7 Migration: Guide the work and support other teams.
  • Introduce support for packageFiles to ResourceLoader (T133462).
  • Introduce support for WebP compression format to Thumbor.
  • Reduce page load time by refactoring the startup module to need only one roundtrip instead of two (T192623).
  • Guidance, CR and testing for new AbuseFilter parser (development by Daimona) to improve Save Timing (T156095).

2017-2018

See also Annual Plan/2017-2018#Technology, 2017-18 Q3, 2017-18 Q4, and internal 2017-2019 roadmap.

Outreach:

  • Measure performance from Asia both pre- and post- Singapore data center coming online. Includes: Add capability to navtiming for geographic oversampling.
  • Publish in the Web Performance Calendar (Automate performance regression alerts).

Insights:

  • Program 1. Availability, performance, and maintenance.
    • All production sites and services maintain current levels of availability or better.
    • Maintain a comprehensive toolset to measure the performance of our platforms.
  • Enhance performance testing infrastructure using the Chrome Tracelog (T182510).
  • Review current research on performance perception (T165272).
  • Build sampling profiler for PHP 7 to replace HHVM Xenon (T176916). Includes creation of the new php-excimer extension.
  • Implement new "Backend-Timing" metric on Apache PHP web servers, as first full measurement of MediaWiki latencies. Backed by Prometheus. (T131894)
  • Develop new "navtiming2" metric definitions, addressing what we learned since 2015, and enable use of stacked graphs (T104902).
  • Migrate WebPageTest hosting from Windows to Linux.

Improvement:

  • Support for HHVM-PHP7 migration and upgrade.
  • Expand support in Thumbor to private wikis.
  • Program 8. Multi-datacenter support.

2016-2017

See Annual Plan/2016-2017Program 4: Improve site performance on Meta-Wiki.

See also

This article is issued from Mediawiki. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.