User:Fæ/Project list/DoD
Scope
[edit]Images from the large archives available from the U.S. Ministry of Defense (DoD), now primarily available to the public through http://dvidshub.net. Uploads prior to 2015 may be sourced to http://www.defenseimagery.mil which is no longer available to the public, though almost all images can be found at DVIDS. The ingestion template {{milim}} is used for all uploads.
The number of DoD images on Commons exceeded 200,000 in 2015, about 1% of the media on Commons at that time. Some DoD categories are large, so there is pressure to ensure large batch uploads are chosen for their added-value. At the same time, it is recognized that because the DoD is a highly reliable source for verifiable public domain images, generic images such as of equipment or medical treatment which are hard to obtain without concern for copyright or subject consent, may be of high educational value. For example when the ebola crisis was newsworthy in 2015, Category:Ebolavirus DoD images became useful for illustrating Wikipedia articles and presentations on hazard containment, testing and prevention methods that were recommended by the Centre for Disease Control.
In June 2016, DVIDS updated the design of the website. This dropped the user 'stars' rating system, which was useful to find high quality images, for example Category:DVIDS photographs rated 5 stars. The licensing terms are now displayed more clearly against the image, with all being shown as public domain.
Licensing
[edit]The standard public domain is not as useful or accurate as a specific license, for example it is better to use {{PD-USGov-Military-National Guard}} for an image created by a National Guard employee, rather than the generic {{PD-US-Gov}}. A diffusion project has been created for Category:PD US Government, see User:Fæ/code/PD-USGov.
Batch uploads use the VIRIN (photo image number) to work out which license is best. Refer to the table for the service affiliation codes at COM:VIRIN to see how the logic works.
There has been past discussion and debates in deletion requests for courtesy images and images from foreign nationals (including many files being deleted). This seems resolved after unambiguous statements by DVIDS about licensing and their website content. Refer to User:Fæ/email/DoD. Where files have been deleted in the past, this seems a good rationale to raise undeletion requests at COM:UNDEL.
Searches and filtering for quality
[edit]The DVIDS site can search for key words, as well as showing images by military unit or photographer. Choosing useful filters is essential in limiting the number of files uploaded to those with a greater chance of having reuse value.
Standard filters on batch uploads currently include:
- DVIDS search. This is the most basic filter, for example Warrior+Games+2016 gives 1,944 image matches. Adding a filter by recognized quality photographer like Roger Wollenberg brings this down to 185 matches.
- Skip images with no category matching. Keywords in the description are searched for using regular expressions and matched up with a predefined list of acceptable existing categories. For example
(Coast Guard Air Station|CGAS) Cape Cod
finds matches suitable for Category:CGAS Cape Cod. - Photo width in pixels. This is an easy way of skipping lower resolution files, for example a lot of recent photographs taken for the Warrior Games 2016 are with high resolution cameras, so a filter of 5,500 pixels wide can be a reasonable way of limiting to higher quality photographs.
Checks and error handling
[edit]As of June 2016, Fæ's processes for uploading new DoD images involves several robust customized checks using pywikibot-core, and additional modules available in Python. To enable this to work, there is always a locally cached copy of the image. Any image that fails these checks is skipped unless another action is specified.
- There is a text search for the intended Commons filename to ensure it has not been created previously.
- The VIRIN is checked to see if the source is given as a foreign national. These are skipped on the basis of having been controversial in past deletion requests.
- Commons is searched for text matches to the VIRIN. If there are any matches then the file has most likely been uploaded before, though possibly with a different set of EXIF data.
- Commons is searched for matches to the local SHA1 value. Any matches will be digitally identical files.
- The local file is loaded to memory locally using the Python Image Library, giving confidence that the file behaves like an image and is not obviously corrupted or truncated.
- The memory copy of the image has its last image row checked to ensure it is not all RGB(128,128,128), which is a feature of badly truncated or partial files.
- Immediately after upload the image text page is checked to see it is greater than 300 characters, if not then the text page is regenerated. This is an outstanding rare bug for uploads and may be related to server drop-outs. Refer to Phab:T113878.
- Post upload there is a check that the on-wiki Commons SHA1 value via the Commons API matches the local file using the standard Python hashlib. If this fails there is one additional attempt to reupload.
By accessing 100px wide thumbnails on Commons to save on bandwidth, there is a maintenance task to check for previously uploaded images with visible corruption. This checks the last image row colour, as above, and then attempts to first find a DVIDS source link and reupload the file, or to adds it to Category:Images from DoD (digital errors). Images that after reuploading still appear corrupt can be deduced to be corrupt on dvidshub.com, and are put up for speedy deletion.
Reports
[edit]You can surf through Category:Images from DoD uploaded by Fæ or follow the reports below for ideas.
Wikimedia usage
[edit]-
137
-
107
-
106
-
64
-
58
-
54
-
34
-
33
-
33
-
33
-
33
-
25
-
25
-
21
-
20
-
19
-
18
-
16
-
16
-
16
-
16
-
16
-
15
-
15
Improvement suggestions
[edit]Ten randomly selected files with a single mainspace use on Wikimedia projects:
Up to ten randomly selected files with the lowest category counts in the project: