< Wikimedia Performance Team
2022/2023
Status
- 🔴 < 30% done
- 🟡 < 70% done
- 🟢 70 to 100% done
Q4 (April-June)
Goals | Status | Assignees | notes |
---|---|---|---|
ResourceLoader: Implement support for Source Maps | 🟡 | Tim & Timo | |
Create a runbook for Save Timing alerts | 🔴 | Aaron | |
Run ForeignResourceManager verification on MediaWiki core commits | 🟢 | Timo | |
Per component/extension profiling of hooks and pre-send DeferredUpdates with Grafana dashboards | 🟡 | Aaron | |
Raise Grade A JavaScript requirement from ES5 (2009) to ES6 (2015) | 🟢 | Timo | |
Move synthetic tests from AWS to bare metal | 🟢 | Peter | |
Record long tasks in navtiming | 🟢 | Barakat | |
Reduce complexity of LB and LBF | 🟡 | Amir, Aaron, Timo | |
Increase retention of ArcLamp SVGs to 2 years | 🟢 | Timo | |
Documentation improvement for FE and BE guidelines and best practices | 🟢 | Timo, Peter, Aaron | |
Support Serve production traffic via Kubernetes | 🟢 | Timo | |
LoadMonitor connection weighting reimagined | 🔴 | Aaron, Tim | |
Decommission coal and coal web | 🟢 | ||
Reliable measure how fast a Wikipedia article would be without JavaScript | 🔴 | Peter |
Q3 (Jan-Mar)
Goals | Status | Assignees | notes |
---|---|---|---|
Onboard |
🟢 | Aaron, Peter, Tim, Timo, Larissa | End of February we have 2 new team members joining us |
Synthetic testing on bare metal | 🟢 | Peter | In Q2 we evaluated in-house and external suppliers. We ended up choosing Hetzner. The server is already available and accounted for in our budget |
Create blog entries about multi-dc | 🟢 | Aaron | We plan to write 2 blog posts about multi-DC. One targeting a non-technical audience and the second focuses on technical audiences. |
Navtiming on prometheus | 🟢 | Peter & Timo | Blocker: the prometheus python client turned out to be a bottleneck in our non-parallelized setup. We are currently exploring possibilities to reduce the cardinality |
Add per-request flamegraph option to WikimediaDebug | 🟢 | Tim & Timo | |
Create a runbook for Save Timing alerts | 🔴 | Aaron | |
LoadMonitor connection weighing improvements | 🟢 | Aaron | |
ResourceLoader: Implement support for Source Maps | 🟡 | Tim & Timo | |
Per component/extension profiling of hooks and pre-send DeferredUpdates with Grafana dashboards | 🟢 | Aaron | |
Out of perf scope support for various teams | 🟡 | Tim |
|
Q2 (Oct-Dec)
Goals | Status | Assignees | notes |
---|---|---|---|
Navtiming on Prometheus | 🟡 | Peter & Timo | |
Expand navigation timing metrics to include user experience metrics and modernise navigation timing | 🟢 | Peter & Timo | |
Move Synthetic tests to bare metal | 🔴 | Larissa & Peter | Moved to Q3 |
Understand the status of SLOs on Product side | 🟢 | Larissa | Have been talking with Suman and Desiree trying to restart the discussions |
Bitbar: Add firefox capabilities | 🟢 | Peter | |
Multi-DC BagOStuff interfaces | Aaron | ||
PHP 8.1 | 🟢 | Tim & Timo | Done from our side. Waiting on Service Ops. They are prioritizing mediawiki-on-k8s until end of Q3, but will be able to tackle PHP 8.1 in the beginning of Q4 2022/2023 |
Cross-DC query Alerts | 🟢 | Aaron | We haven't seen rate exceed 1/sec in the last 7 days. So we are considering this done |
Better LoadBalancer connection pooling | 🟢 | Aaron | |
Update how we measure Layoutshift to reflect CLS metrics | 🟢 | Peter | |
Find someone to run user interviews | 🟡 | Larissa | Both Desiree and Marshal cannot help us at this time. Marshal suggested I run a couple of interviews on my own first, but we currently don't have the bandwidth to come up with a solid interview script and do the necessary pre-work |
2021-2022
See also internal 2021-2022 roadmap and internal Jan-Mar 2022 achievements.
Outreach:
- Support product development by Inuka Team (Wikipedia Preview), Reading Web (NearbyPages, and RelatedArticles), CPT (WebAuthn), Design Systems Team (WVUI/Vue.js), and WMDE (Kartographer-revid)
- Participate in SLO working group to help establish an SLO around MediaWiki Save Timing SLO.
- Participate in W3C WebPerf WG, provide feedback to Chrome team on Google Web Vitals and Chrome bugs.
- Organise four Web Perf Hero awards.
Insights:
- Migrate our device lab to BitBar.
- Evaluate and build proof-of-concept synthetic testing on bare metal instead of at AWS.
- Write runbooks for investigating RUM alerts, WPT alerts, and WPR alerts.
- Support to SRE Observablity in developing a new Prometheus-compatible MW-Stats client library.
- On-going maintenance of WebPageTest, WebPageReplay, and Fresh-node.
Improvement:
- Multi-DC: Deploy MainStash DB and migrate away from Redis-based MainStash (T212129).
- Multi-DC: MariaDB-TLS tested and enabled for all wikis.
- Multi-DC: CDN routing logic written and deployed to Beta and Prod behind feature flag.
- ResourceLoader debug mode v2, reduce wait time on complex pages from ~1 minute to ~1 second.
- Guidance and code review for DBA-led normalization of "templatelinks" MediaWiki database table, to reduce storage pressure and improve query performance. (T299417)
- Support to SRE ServiceOps for MW-on-K8s project.
- Develop precache-based GlobalUserEdit API for CentralAuth, following an incident.
2020-2021
See also internal 2020-2021 roadmap.
Outreach:
- Support product launch by Anti-Harrasment Team (IPInfo extension), and CPT (API Portal skin, API Portal OAuth extension, Changes to OAuth ext).
- Support development kick-off of Abstract Wikipedia (WikiLambda) through early check-in and 1-month team residency/matrixing in both directions.
- Organise the Web Performance devroom for FOSDEM 2021 (recordings).
- Organise the first Web Perf Hero award.
- Speak at the We Love Speed conference (recording).
- Get published in the Web Performance Calendar (4x: Human performance metrics, Profiling PHP at scale, Future of Web Vitals from a non-Googler, Setting up a device lab).
- Enable teams to create their own production error dashboards in Logstash with a template, written guide, and video presentation.
Insights:
- Expand navtiming RUM metrics pipeline with new Layout Shift metric.
- Kobiton setup for our device lab, expand to include iOS in addition to Android.
- Explore BitBar for our device lab.
- Explore moving WPT/WPR infra away from AWS.
Improvement:
2019-2020
See also 2019-20 Q1#Performance and internal 2019-2020 roadmap.
- Outreach:
- Support product launches by Parsing Team (Parsoid-PHP launch), Editing Team (DiscussionTools launch), Growth Team (GrowthExperiments launch), and Inuka Team (Wikipedia KaiOS app launch).
- Support RelEng around establishing production error triage workflows and semi-automation thereof.
- Organise the first Web Performance conference at FOSDEM (blogpost, recordings).
- Organise WMF-wide frontend web performance training.
- Provide performance expertise to Frontend Architecture Working Group (FAWG).
- Get published in the Web Performance Calendar (2x: Measuring LT and FID, Big questions on RUM)
- Insights:
- Organise and oversee implementation of First Paint metric in WebKit for Apple Safari (blogpost).
- Introduce detailed metrics from WANCache time spans for MediaWiki developers (T197849).
- Explore new RUM metrics for navtiming pipeline, such as First Input Delay.
- Participate in Chrome Origin trial for Element Timing and provide feedback on upcoming W3C standard (blogpost).
- Release WikimediaDebug v2 (blogpost).
- Create our own Mobile Device Lab.
- On-going maintenance of WebPageTest, WebPageReplay, and XHGui (Migrate from Mongo to MySQL).
- Improvements:
- PHP7 Transition: Finish the transition from HHVM and support SRE with instrumentation, sampling, and benchmarking.
- Multi-DC: Start work on MainStash DB.
- Reduce MediaWiki backend startup time to reclaim PHP7 latency increase in certain areas. (T233886, T189966).
- Reduce frontend page startup cost in ResourceLoader (blogpost).
2018-2019
See also 2018-19 Q1, 2018-19 Q2, and internal 2018-2019 roadmap.
Insights:
- Annual Plans/FY2019/TEC1: Current levels of service are maintained and/or improved.
- Expand synthetic testing to more non-English wikis.
- Introduce Excimer, sampling profiler for PHP 7 to replace HHVM Xenon (T176916).
- Introduce Fresnel, performance testing in MediaWiki CI jobs. (T133646).
- Research and develop and test new RUM metrics that better match user perception (T187299, Rossi 2019 paper).
Outreach:
- Design and implement the AS Report, to expand and formalize collaborations to leverage our influence with browsers vendors and ISPs. (Announcement on Techblog).
- Initiate and work on Wikimedia Foundation becoming an official W3C member organization. This expands the Performance Team's participation in web standards and moves us from an "invited expert" (individual) to a represented membership organisation. (Announcement on wikimediafoundation.org)
- Publish the first post in the Perf Matters at Wikipedia series.
- Get published in the Web Performance Calendar (5x: Magic numbers, Comparing HAR, Measuring Wikipedia, Why perf matters, AVIF).
Improvement:
- Annual Plans/FY2019/TEC1: Improve MediaWiki availability and reduce read-only impact from data center switchovers.
- Annual Plans/FY2019/TEC4: PHP7 Migration: Guide the work and support other teams.
- Introduce support for packageFiles to ResourceLoader (T133462).
- Introduce support for WebP compression format to Thumbor.
- Reduce page load time by refactoring the startup module to need only one roundtrip instead of two (T192623).
- Guidance, CR and testing for new AbuseFilter parser (development by Daimona) to improve Save Timing (T156095).
2017-2018
See also Annual Plan/2017-2018#Technology, 2017-18 Q3, 2017-18 Q4, and internal 2017-2019 roadmap.
Outreach:
- Measure performance from Asia both pre- and post- Singapore data center coming online. Includes: Add capability to navtiming for geographic oversampling.
- Publish in the Web Performance Calendar (Automate performance regression alerts).
Insights:
- Program 1. Availability, performance, and maintenance.
- All production sites and services maintain current levels of availability or better.
- Maintain a comprehensive toolset to measure the performance of our platforms.
- Enhance performance testing infrastructure using the Chrome Tracelog (T182510).
- Review current research on performance perception (T165272).
- Build sampling profiler for PHP 7 to replace HHVM Xenon (T176916). Includes creation of the new php-excimer extension.
- Implement new "Backend-Timing" metric on Apache PHP web servers, as first full measurement of MediaWiki latencies. Backed by Prometheus. (T131894)
- Develop new "navtiming2" metric definitions, addressing what we learned since 2015, and enable use of stacked graphs (T104902).
- Migrate WebPageTest hosting from Windows to Linux.
Improvement:
- Support for HHVM-PHP7 migration and upgrade.
- Expand support in Thumbor to private wikis.
- Program 8. Multi-datacenter support.
2016-2017
See Annual Plan/2016-2017Program 4: Improve site performance on Meta-Wiki.
See also
This article is issued from Mediawiki. The text is licensed under Creative Commons - Attribution - Sharealike. Additional terms may apply for the media files.