This page documents the internals of the MachineVision extension, with a focus on the backend (PHP) logic.
Overview
When a new image is uploaded to Wikimedia Commons, the MachineVision extension triggers a delayed job request to ensure that the image is still present (i.e., not deleted), and if so, request and store image label suggestions generated by one or more machine vision labeling providers. These label suggestions are then filtered and served to reviewers on the Special:SuggestedTags page on Commons. Accepted label suggestions are saved to the image's structured data as depicts (P180) statements.
In addition to new uploads, lists of image file page titles may be passed to the maintenance script fetchSuggestions.php
to have label suggestions retrieved and stored on demand.
When label suggestions are received for an image, an Echo event is fired to notify the uploader that image labels suggestions are available for review, according to the uploader's notification preferences.
The extension is designed to support arbitrary machine vision providers (including issuing requests to multiple providers simultaneously), but the only provider for which support is currently implemented is Google Cloud Vision.
Concepts
Image
Images are stored by their SHA1 hash in the machine_vision_image
table. This means that if an image file is uploaded that is identical to one for which a record exists in the DB, it is the same image for the MachineVision extension's purposes, and labels will not be requested again.
The extension only handles bitmap images and disregards all other file types.
Label
A label is stored as a Wikidata item ID (Q-number) in the machine_vision_label
table. Human-readable labels associated with the item ID are fetched from Wikidata at the point of presentation to the end-user. A label will be associated with an image no more than once, even if the label is subsequently suggested by a different machine vision provider.
The distinction between labels and suggestions (below) is to ensure that a suggested label does not receive more than one set of votes (which may be inconsistent); labels should only be voted upon once regardless of the number of times they are suggested by different providers.
Suggestion
A suggestion (stored in the machine_vision_suggestion
table) refers to a single instance of a label being suggested for an image. There may be more than one suggestion that refers to an image-label pair, that is, one for each provider that suggests the same label for a given image.
As of September 2020, since there has only ever been one provider configured (namely Google Cloud Vision), there should be a one-to-one relationship between labels and suggestions in practice.
Waiting period
A waiting period is enforced between upload time and the submission of an image to a machine vision provider for label suggestions. This is to reduce the likelihood of making a labeling request for an image that is soon to be deleted. As of September 2020, the waiting period is 48 hours. This value is configured in $wgMachineVisionNewUploadLabelingJobDelay
.
Review state
Review state is a critical concept in the MachineVision extension, because it governs which images are presented on Special:SuggestedTags, and to which audiences. It is important to note that, in the extension's internal logic, review states apply to labels rather than to images (on which see "Data model" under "Quirks and gotchas" below). The review states are represented as integers, with a default state of 0 (unreviewed). Possible states include the following:
- Unreviewed (0): The default label review state. The label may be presented in either the "popular" or "personal uploads" tab on Special:SuggestedTags.
- Accepted (1): The label was accepted by a contributor. A corresponding depicts statement should have been created, and the it should no longer appear on Special:SuggestedTags.
- Rejected (-1): The label was rejected by a contributor. It should no longer appear on Special:SuggestedTags.
- Withhold from "popular" (-2): The initial review state for a label which is unreviewed but should be withheld from the "popular" tab and only shown to its uploader in the "personal uploads" tab. A label may receive this review state based on the SafeSearch ratings of the image to which it pertains.
- Withhold from all (-3): The review state for a label pertaining to an image which should be withheld completely from Special:SuggestedTags.
- Not displayed (-4): A special review state assigned to labels when an attempt to display them fails because a human-readable label could not be found in the requested language. This results in the label no longer being shown on Special:SuggestedTags.
Concept mapping
To interpret suggested labels from Google Cloud Vision as Wikidata item IDs, we rely on a historical mapping (publicly available here) between Freebase IDs and Wikidata item IDs. We take advantage of the fact that many Google entity IDs originated as Freebase IDs, and have only changed in their format; for example, the Freebase ID m.123
would correspond to the Google entity ID /m/123
. As part of the extension setup, these mappings must be retrieved from their public archive and loaded into the machine_vision_freebase_mapping
table.
A drawback to the current setup is that these mappings date from 2013 (when Freebase was acquired by Google) and are naturally becoming outdated over time, as new concepts are added by Google, and Wikidata items are added, updated, and deleted (see also "Redirects and deletions" under "Quirks and gotchas" below). Task T231105 has been filed to create a strategy for keeping our concept mappings up to date.
Priority
Images that are part of target classes of images are to be shown in the "popular" tab on Special:SuggestedTags in preference to general user uploads. To support this, images are assigned a numeric priority value. This value is stored in the machine_vision_image
table as mvi_priority
.
ATM (Dec 2020) we prioritize based on whether or not an image has been categorized - images with the "Uncategorized" template come first in the queue.
Label suggestion lifecycle
New uploads
In a handler for the UploadComplete hook, the MachineVision extension checks whether the uploaded file is a bitmap image and whether it is the initial version of the file to be uploaded. If so, and if the extension is configured to request labels for new uploads, the extension creates a new FetchGoogleCloudVisionAnnotationsJob
and enqueues it on the job queue. If a waiting period is configured, the job is created with a jobReleaseTimestamp
value of the current time plus the configured waiting period. When the job is executed, if the file still exists (i.e., has not been deleted), a request for labels and SafeSearch annotations is created and sent to Google Cloud Vision via GoogleCloudVisionClient
.
When a response is received, the label annotations from Google are mapped to Wikidata item IDs, and filters are applied (see "Image and label filtering" below). If any label suggestions remain after filtering, they are stored in the database, and an Echo event is fired to trigger a notification to the uploader that image label suggestions are available for review. Label suggestions are eventually served on Special:SuggestedTags and updated with their votes by reviewers.
Custom image lists
The label suggestion lifecycle for suggestions fetched through fetchSuggestions.php
for custom image lists is similar to that for new uploads. The main difference is that instead of scheduling annotation fetching jobs, fetchSuggestions.php
directly invokes GoogleCloudVisionClient::fetchAnnotations
in each image on the list.
Developer setup
Developer setup for the extension is well documented in the README file.
Quirks and gotchas
Image and label filtering
Label suggestions have multiple filters applied in GoogleCloudVisionClient
before storage, and each operates differently from the others.
The first filtering pass, based on $wgMachineVisionWithholdImageList
, is intended to withhold images completely from being shown on Special:SuggestedTags. If a label in $wgMachineVisionWithholdImageList
is among the suggested labels returned for an image, the initial review state for all suggested labels is set to WITHHOLD_ALL, which has the effect of excluding it completely. The image is not shown in either the "popular" or "personal uploads" tab on Special:SuggestedTags. The labels are, however, retained in the database.
The second filtering pass, based on $wgMachineVisionGoogleSafeSearchLimits
, conditionally withholds images from the "popular" tab. If an image receives a SafeSearch rating that exceeds the allowed value on any of the configured dimensions, it is withheld from the "popular" tab but still available to the uploader in the "personal uploads" tab on Special:SuggestedTags. All suggested labels are retained in the database.
The third and final pass, based on $wgMachineVisionWikidataIdBlacklist
, is intended to discard specific label suggestions judged not to be useful to the projects. Suggestions corresponding to labels in $wgMachineVisionWikidataIdBlacklist
are simply discarded before the remaining suggested labels are stored.
Review state and data model
There is a conceptual mismatch between the extension's data model and its presentation layer. Because the extension was written to support multiple providers, review state is a property of a label rather than an image. In practice, however, all labels for an image are reviewed at once on Special:SuggestedTags on an image-by-image basis, and there is only one labeling provider (Google). This means that in practice, the data model is unnecessarily complicated; an image's eligibility for presentation on Special:SuggestedTags must be derived from the review states of its various suggested labels rather than being stored as a property of the image itself. Besides being needlessly confusing, this created early problems with query performance.
Redirects and deletions
A common source of bugs is that values in the Freebase-Wikidata mappings may refer to a Wikidata item which has been redirected or deleted. The code attempts to resolve redirects as needed to mitigate the effects of outdated mappings, but it is not perfect.