Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
BBC RussianHomePhabricator
Log In
Maniphest T45646

"MediaWiki:Copyright" message allows raw HTML
Open, LowestPublic

Description

[[MediaWiki:Copyright]] still allows raw html input which can be maliciously used by rogue admins by adding <img src="http://my_host/index.php?title=Special:UserLogout"/> to
[[MediaWiki:Copyright]] so everyone will be forcefully logged out.

Did talk to the security responsible dude an age ago (one year ago approx), but nothing seems to have been done to address this issue, nor has any bug been written.

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
Restricted Application changed the edit policy from "Custom Policy" to "Custom Policy". · View Herald TranscriptDec 17 2014, 10:20 PM
matmarex changed the edit policy from "Custom Policy" to "Custom Policy".
matmarex changed Security from Software security bug to None.

Urgh, looks like that didn't work actually. Chris, can you fix this please?

I also agree that this should be public. (Just to be clear, it currently isn't, "matmarex changed Security from Security or Sensitive Bug to none" above is a lie.)

Jdforrester-WMF changed the visibility from "Custom Policy" to "Public (No Login Required)".
Jdforrester-WMF changed the edit policy from "Custom Policy" to "All Users".

Fine. I profoundly disagree that giving public step-by-step exploit instructions is a good idea as a way to convince others to patch security issues, however.

I also agree that this should be public. (Just to be clear, it currently isn't, "matmarex changed Security from Security or Sensitive Bug to none" above is a lie.)

It's not a lie. It's just that changing the "Security" dropdown to "none" doesn't reset the visibility or edit policy.

Fine. I profoundly disagree that giving public step-by-step exploit instructions is a good idea as a way to convince others to patch security issues, however.

The fact that a message allows raw HTML isn't in itself a security issue, or T2212 wouldn't be public. As is already noted, admins have many other ways to break things.

Is there any consensus about whether a fix for this needs to be made (i.e., throw it through the parser and make a workaround for the little things)? Or can we close it as invalid considering, as has been noted, this is the least of our concerns with admin powers.

This isn't an invalid bug. While not very security sensitive at this point in time, it is technical debt. We do not support the abilities provided by raw HTML in production. Any and all use cases should be comforted through wikitext (or sanitised Parsoid HTML) already. As such, we should work towards replacing this message with a wikitext-only message.

I agree that we should continue to have messages not be raw HTML, but that is a more general issue, and it does not really justify having an individual bug for every message that needs to be parse-ified.

This isn't an invalid bug. While not very security sensitive at this point in time, it is technical debt. We do not support the abilities provided by raw HTML in production. Any and all use cases should be comforted through wikitext (or sanitised Parsoid HTML) already. As such, we should work towards replacing this message with a wikitext-only message.

I think some people use the raw html of that message as an easy way to include analytics code. Enwikinews used to include it to add CC metadata tags so that it shows up in CC search engines. But honestly, I don't think that's a use case we should care about.

I agree it should probably be changed, I agree its pretty low priority, but I think it legitimately could have its own bug given that it is one of the most prominent raw html messages and its one of the raw html messages that's included on almost all pages.

Raw html messages are almost gone now (at least in core). It makes sense to track the remaining ones individually, which are the most complicated ones to fix.

I think some people use the raw html of that message as an easy way to include analytics code. Enwikinews used to include it to add CC metadata tags so that it shows up in CC search engines. But honestly, I don't think that's a use case we should care about.

The former is an invalid use case that would violate the Privacy policy.

The latter is something one should request on Phabricator for the software to provide (be it MediaWiki core, or an extension).

Other wikis may have different privacy policies :)

Yes, to be clear, I meant third parties might use it for analytics code (albeit i have no idea if anyone does so in practise)

With the interface-admin proposal (T190015) progressing, I hope it is apparent that progress needs to be made on this task at the same time.

Here's an idea for how to solve this issue/vulnerability:

  1. Replace the copyright message with a new message copyrightfooter or similar, which would be standard parsed wikitext. Given the aggressive message caching built into MediaWiki, this shouldn't cause any undue additional load on the parser.
  2. Similarly, replace the wikimedia-copyright message in the WikimediaMessages extension with wikimedia-copyrightfooter.
  3. To handle the use case of wikis using this message to place Google Analytics tracking beacons etc, either:
    • Create a config variable $wgExtraHtml which can be set to some custom HTML served at the bottom of every page. This would shift the ability to add analytics scripts from wiki sysops to server admins. The only way to restore this ability to wiki sysops would be by writing a custom hook (or a very simple extension).
    • Create a config variable $wgAllowExtraHtml = false, which when set to true, appends the contents of MediaWiki:Extrahtml to the end of the page. This way, wiki sysops can continue to exert control over analytics scripts on wikis where this is desired, while providing safety by default for all other wikis.

Challenges:

  • Wikis could face legal implications if they are displaying the wrong copyright message for even a brief period of time. Some kind of migration step might be required in the MediaWiki updater that converts the HTML contents of copyright into wikitext in copyrightfooter. It might convert <a> links into appropriate internal/external links, and strip all other tags.
  • Some wikis use <a rel="license" ...> in this message. The parser cannot currently emit this attribute. A new magic word might need to be defined, like {{#licenselink:https://example.org/license|Example License}}.
  • Is there evidence to suggest that people are actually using this for analytics scripts? Or was that just a hypothetical?

My more minimalist short-term solution would be to just create a static list of messages which are known to contain raw HTML, and on a title match / subpage match require editsitejs.
(In the longer term, maybe the parsing method should be a global property of the message, not decided by the caller, and edit permissions assigned automatically based on that.)

In T45646#4447533, @Tgr wrote:

My more minimalist short-term solution would be to just create a static list of messages which are known to contain raw HTML, and on a title match / subpage match require editsitejs.

Heh, yes, I didn't think of that. That would certainly be a better short-term solution, particularly as these messages need to be changed extremely infrequently.

(In the longer term, maybe the parsing method should be a global property of the message, not decided by the caller, and edit permissions assigned automatically based on that.)

Ideally in the long term we would not have any raw HTML messages :)

In T45646#4447533, @Tgr wrote:

(In the longer term, maybe the parsing method should be a global property of the message, not decided by the caller, and edit permissions assigned automatically based on that.)

It's not that simple. Whether a message accessed as ->plain() or ->text() has raw HTML, wikitext, or plain text depends on how the output is subsequently used.

In T45646#4447303, @TTO wrote:
  • Create a config variable $wgExtraHtml which can be set to some custom HTML served at the bottom of every page. This would shift the ability to add analytics scripts from wiki sysops to server admins. The only way to restore this ability to wiki sysops would be by writing a custom hook (or a very simple extension).

We don't need a new variable for this. There are plenty of methods and hooks already for this to be done from LocalSettings.php and/or an extension. Including:

  • BeforePageDisplay hook - call $out->addScript() with raw HTML.
  • SkinAfterBottomScripts hook - append raw HTML to $text.
  • SkinBuildSidebar hook - add a key to $bar with raw HTML.

These can all be done with a one-liner from LocalSettings.php as well.

  • Wikis could face legal implications if they are displaying the wrong copyright message for even a brief period of time. Some kind of migration step might be required in the MediaWiki updater that converts the HTML contents of copyright into wikitext in copyrightfooter. It might convert <a> links into appropriate internal/external links, and strip all other tags.

Rather than converting one into the other, I'd recommend using the current logic (unchanged) during the migration step, but providing a way to disable it for security reasons.

For example, we would introduce the copyrightfooter message, and change the old copyright message to be empty/disabled by default (containing -). If at run-time the old message doesn't exist, display the new one as wikitext (not raw HTML). If the old one does have an override, use it as raw HTML but only if the feature wasn't disabled. E.g. wgAllowCopyrightHtml or some such, which would be true by default for one release cycle, during which it would trigger a deprecation warning for developers. Then, in the next release, the variable and the support for copyright would be removed.

This way:

  • No risk or complication during upgrade for developers (behaviour remains the same, and if your wiki depends on old copyright it will still work, and you get a warning.)
  • After the upgrade, they have until the next release to migrate at-ease from copyright to copyrightfooter and/or a hook-based approach.
  • Developers may disable wgAllowCopyrightHtml to start benefitting the increased security as soon as possible (including WMF).
  • In the upgrade, the variable and support for copyright will be removed.

Why not simply change the usage of the current message as to not output raw HTML? Wikis which don't have the message customized, will continue to work. Wikis that use raw HTML, will display visible HTML code, but that will be also very noticeable for site admins that can adapt the message. This will prevent the problem of displaying a wrong copyright message: it would be correct, but badly formatted, only for the cases where non-allowed HTML tags are used.

The deprecation/warning path for next release and final substitution on a later release can go unnoticed for those that skip versions, or upgrade only from LTS to LTS.

Change 449626 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Require editsitecss/editsitejs for editing raw messages

https://gerrit.wikimedia.org/r/449626

@Ciencia_Al_Poder In principle, we aim that site administrator are always able to upgrade without immediate breakage. An administrator should then address any deprecation warnings before the next upgrade in order to keep free of breakage.

In other words: Breakage should not happen unless the removed behaviour was already deprecated/avoidable in the previous release, with the exception of core maintainers agreeing beforehand that deprecation is not needed (e.g. due to the behaviour being unused, rarely used, or infeasible/impossible to deprecate).

Okay, my comment was about the T45646#4447303 idea, which would be a major breakage in legal terms. Gerrit changeset 449626 would be fine, though, if it doesn't follow the path of renaming system messages.

Change 449689 had a related patch set uploaded (by Gergő Tisza; owner: Gergő Tisza):
[mediawiki/core@master] Replace raw HTML copyright footer message with wikitext one

https://gerrit.wikimedia.org/r/449689

Change 449626 merged by jenkins-bot:
[mediawiki/core@master] Require editsitecss/editsitejs for editing raw messages

https://gerrit.wikimedia.org/r/449626

What happened here with https://gerrit.wikimedia.org/r/c/449689 ?

Someone fixed this but then it got conflicted and notfixed?

What happened here with https://gerrit.wikimedia.org/r/c/449689 ?

Someone fixed this but then it got conflicted and notfixed?

Someone proposed a fix, however nobody reviewed it so it never got merged.

Change 449689 abandoned by SBassett:

[mediawiki/core@master] Replace raw HTML copyright footer message with wikitext one

Reason:

This is so out-of-date at this point (the Skin.php code isn't remotely similar these days) that it's likely worth a complete revisit.

https://gerrit.wikimedia.org/r/449689

Is <a rel="license" really the only reason to use raw HTML? If it is, is there really a reason to keep this raw HTML functionality?

Is <a rel="license" actually useful? It's nice to follow guidelines¹ for using semantic HTML attributes, but do they really do anything useful from the point of view of metadata analysis, search engines, language models, etc.? If not, then maybe it's just dead weight? In the last few days, I've asked several people who I thought know about how this rel="license" things, and no one really knows.

If <link rel="license" is enough and <a rel="license" is redundant, then the raw HTML feature on the copyright messages should be just removed. (I am not saying that it's enough. It's just a wild guess and it might be wrong.)

If <a rel="license" is really necessary, it shouldn't be written as raw HTML in messages. Ideally, it should not be in messages at all, but inserted automatically by the software.

The main current reason I care about this is T360497. By itself, that issue directly affects only translatewiki staff. However, it would be just great to get rid of this raw HTML usage to make life easier for all the translatewiki volunteers, who curently have to copy lots of markup in this message, as well as in all its variants in WikimediaMessages.


¹ Which guidelines, actually? Perhaps these? I was a bit surprised that they mention only <a> and not <link>. Maybe we don't need <link rel="license"> either?

¹ Which guidelines, actually? Perhaps these? I was a bit surprised that they mention only <a> and not <link>. Maybe we don't need <link rel="license"> either?

The authoritative guideline is https://html.spec.whatwg.org/dev/links.html#link-type-license; the most thorough guideline (but quite dated) is https://microformats.org/wiki/rel-license I think. Based on these I'd say the <a rel="license" is entirely superfluous and should be removed.

Is <a rel="license" really the only reason to use raw HTML? If it is, is there really a reason to keep this raw HTML functionality?

We do need some functionality at https://wiki.documentfoundation.org/Main_Page to get some footer. There is no way to get the footer "ok" (either broken for logged in or for logged out visitors).

Looks like only wikitext formatting is needed?

Raw HTML:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the <a href="https://www.libreoffice.org/download/license/">Mozilla Public License v2.0</a>. "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our <a href="/TradeMark Policy">trademark policy</a> (see <a href="/Project:Copyrights">Project:Copyrights</a> for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Wikitext:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the [https://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported License], unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the GNU Lesser General Public License ([https://www.libreoffice.org/download/license/ LGPLv3]). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our [[TradeMark Policy|trademark policy]] (see [[Project:Copyrights]] for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Looks like only wikitext formatting is needed?

Raw HTML:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the <a href="https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the <a href="https://www.libreoffice.org/download/license/">Mozilla Public License v2.0</a>. "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our <a href="/TradeMark Policy">trademark policy</a> (see <a href="/Project:Copyrights">Project:Copyrights</a> for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Wikitext:

Please note that all contributions to The Document Foundation Wiki are considered to be released under the [https://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-ShareAlike 3.0 Unported License], unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the GNU Lesser General Public License ([https://www.libreoffice.org/download/license/ LGPLv3]). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our [[TradeMark Policy|trademark policy]] (see [[Project:Copyrights]] for details). LibreOffice was based on OpenOffice.org.<br />If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

then it is broken the other way round. either use wikitext and it is broken for logged in user, or vise versa. Thanks, I had tested this already. ;-)

Is <a rel="license" really the only reason to use raw HTML? If it is, is there really a reason to keep this raw HTML functionality?

We do need some functionality at https://wiki.documentfoundation.org/Main_Page to get some footer. There is no way to get the footer "ok" (either broken for logged in or for logged out visitors).

If it is always wikitext, is there a way to make it OK? What functionality do you need there exactly?

If it is always wikitext, is there a way to make it OK?

Only-wikitext would work (hence no bug in mediawiki), but not for us, sadly.

What functionality do you need there exactly?

External link linking! We need to put our legal disclaimer and post address (In German "Impressum") to be linked for having it on one page for all services. (otherwise we end up changing 100+ pages just to get the Impressum updated)

Basically, at the moment it reads:

https://wiki.documentfoundation.org/Main_Page?uselang=de

Der Inhalt ist verfügbar unter der Lizenz the <a href="Creative" class="remarkup-link" target="_blank" rel="noreferrer">https://creativecommons.org/licenses/by-sa/3.0/">Creative Commons Attribution-ShareAlike 3.0 Unported License</a>, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the <a href="Mozilla" class="remarkup-link" target="_blank" rel="noreferrer">https://www.libreoffice.org/download/license/">Mozilla Public License v2.0</a>. "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our <a href="/TradeMark Policy">trademark policy</a> (see <a href="/Project:Copyrights">Project:Copyrights</a> for details). LibreOffice was based on OpenOffice.org.<br/>If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here., sofern nicht anders angegeben.

or

https://wiki.documentfoundation.org/Main_Page?uselang=en

Please note that all contributions to The Document Foundation Wiki are considered to be released under the Creative Commons Attribution-ShareAlike 3.0 Unported License, unless otherwise specified. This does not include the source code of LibreOffice, which is licensed under the GNU Lesser General Public License (LGPLv3). "LibreOffice" and "The Document Foundation" are registered trademarks of their corresponding registered owners or are in actual use as trademarks in one or more countries. Their respective logos and icons are also subject to international copyright laws. Use thereof is explained in our trademark policy (see Project:Copyrights for details). LibreOffice was based on OpenOffice.org.
If you do not want your writing to be edited mercilessly and redistributed at will, then do not submit it here.

Please check both links on your own: the markup visibility / differences just to switch (logged out!) the user interface of MediaWiki. There is at least one bug! This MediaWiki instance is sadly (in a global project) using German as default language.

Again: I believe this is a corner case that hits us. We do need to provide $wgRightsPage and/or $wgRightsUrl, but we do need some special/individual text! Most wikis do work with the actual system.

If using custom text, then please simply allow changing the whole text, ignoring all part before and afterward, allowing external links and using HTML/CSS stuff. It should be a new server property ($wgRightsText or the like) and thus can only be changed by the sysadmin - best with i18n possibilities (within the server config)
If the sysadmin wants to fuck up the reader/editor, then he can add some stuff by JS, HTML, PHP, whatever - no security concerns by this way! (and I really do not like this MediaWiki-NS as too many people can blindly mess it up without communicating)

Two ideas for making progress on this:

  • Rather than convert the html to wikitext, one avenue here might be to pass the "raw html" through the sanitizer. Sanitizer::removeSomeTags() will prevent many bad things, while still allowing the external links that German wiki wants. It would be able to block the <img> tag of the original bug report. If third party wikis wanted to embed images in the footer they could do that with CSS rather than embedded <img> tags.
  • We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia. That would allow us to incrementally improve our security footing without necessarily breaking german wiki or third parties which might rely on this.
    • A variant of this would be to add a new message: MediaWiki:CopyrightWikitext. If that message is non-empty, it is used in place of MediaWiki:Copyright and rendered from wikitext. (Or vice-versa, maybe we'd want to use MediaWiki:CopyrightWikitext only if MediaWiki:Copyright was empty?) That would also allow wiki-by-wiki conversions so that only those wikis which actually need raw html use it.
  • We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia. That would allow us to incrementally improve our security footing without necessarily breaking german wiki or third parties which might rely on this.

So maybe something like:

'wmgCopyrightRawHtml' => [
    'default' => false,
    'dewiki' => true
];

foreach ( $wmgCopyrightRawHtml as $k => $v ) {
    global $wgDbName;
    if ( $v === true && $k == $wgDbName ) {
        global $wgRawHtmlMessages;
        $wgRawHtmlMessages[] = 'copyright';
    }
}

in IS.php and then remove 'copyright' as a default value for $wgRawHtmlMessages within various config-schema files?

  • We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia.

Why does dewiki need this? What in https://de.wikipedia.org/w/index.php?title=MediaWiki:Copyright&action=edit or https://de.wikipedia.org/w/index.php?title=MediaWiki:Wikimedia-copyright&action=edit requires raw HTML?

  • A variant of this would be to add a new message: MediaWiki:CopyrightWikitext. If that message is non-empty, it is used in place of MediaWiki:Copyright and rendered from wikitext. (Or vice-versa, maybe we'd want to use MediaWiki:CopyrightWikitext only if MediaWiki:Copyright was empty?) That would also allow wiki-by-wiki conversions so that only those wikis which actually need raw html use it.

This seems like the best idea to me. I think it will make for the easiest way to migrate them, and less potential to accidentally treat the message the wrong way.

Apparently it was already proposed back in 2018: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/449689

Rather than convert the html to wikitext, one avenue here might be to pass the "raw html" through the sanitizer.

That's simple to do but we'd end up with a custom MediaWiki page that behaves differently from any other MediaWiki page. Not great IMO.

We could also make a site-wide configuration variable for "copyright is raw html", and default it to false, and set it to true only for german wikipedia. That would allow us to incrementally improve our security footing...

Security-wise we are OK I think since the message is listed in $wgRawHtmlMessages. It would be a usability improvement (would allow more people to edit the message on other wikis). IMO not worth the complexity.

A variant of this would be to add a new message: MediaWiki:CopyrightWikitext. If that message is non-empty, it is used in place of MediaWiki:Copyright and rendered from wikitext.

Apparently it was already proposed back in 2018: https://gerrit.wikimedia.org/r/c/mediawiki/core/+/449689

Yeah it just didn't get any reviews.

It's slightly more complicated because a hook is also involved, but only slightly.