Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
BBC RussianHomePhabricator
Log In
Maniphest T186726

Security review WikibaseLexeme extension
Closed, ResolvedPublic

Description

Project Information

Description of the tool/project

The WikibaseLexeme extension sits on top of MediaWiki-extensions-WikibaseRepository and implements a new entity type that allows storing lexiographical data (information about words in all languages) in a more structured way than the default Wikibase entity types could. See https://www.mediawiki.org/wiki/Extension:WikibaseLexeme/Data_Model for this new entity type. The target audience is the Wikidata community, together with all existing Wiktionary communities.

The WikibaseLexeme extension provides all backend services required to make the new entity type work with the framework Wikibase provides, as well as a dedicated frontend for the new "Lexeme" namespace.

Description of how the tool will be used at WMF

The WikibaseLexeme extension will be activated on wikidata.org to be able to collect lexicographical data in a new namespace called "Lexeme". It will later be activated on existing Wiktionary wikis (one by one) to be able to use the data there.

Dependencies

Has this project been reviewed before?

No.

Working test environment

Post-deployment

The responsible team is Wikidata via @Lydia_Pintscher.

Event Timeline

Just to confirm this is ready for review? The [WIP] in the title of the bug confuses me.

p.s. in regards to

Please link or describe setup process for setting up a test environment. @WMDE-leszek, can you help with this?

Don't worry about that. Its a MW extension, I only need instructions if the setup is super weird.

Removed the point on describing the setup. Sorry, I forgot to do it earlier. Having installed Wikibase, the setup is really just wfLoadExtension, unless one wants to customize stuff.

Embarrassingly, README file in the extension is empty. See https://gerrit.wikimedia.org/r/#/c/413757/ adding the first draft.

Re whether this is still WIP, I take the liberty to say it is ready to be reviewed and remove the WIP.
@Lydia_Pintscher please pull the brake if what I say is not valid.

WMDE-leszek renamed this task from [WIP] Security review WikibaseLexeme extension to Security review WikibaseLexeme extension.Mar 2 2018, 8:32 AM

I added the WIP originally. The only open dependency is a separate security review of https://github.com/wmde/php-vuejs-templating, which needs a separate Application Security Reviews ticket. Everything else is resolved, so this is ready to go from my point of view. :-)

Review of WikidataLexeme & php-vuejs-templating (To be clear, I didn't look at vue.js in general, since that was already reviewed by Darian afaik).

  • "FormIdFormatter.php" line 69 & "SenseIdFormatter.php" line 73 - It looks like this is not properly escaped. However, it also looks like this is just a couple hard coded examples. Kind of confused by this class tbh.
  • "SensesView.php" line 127 - $sense->getId()->getSerialization() - should be html escaped I think.
  • "FormsView.php" line 86 - $this->textProvider->get( 'comma-separator' ) - Please avoid using messages as raw html in new code.
  • [This may be an issue with Wikidata in general] ClickJacking: Since this allows edit interaction directly on wikipage, it should take steps to prevent click jacking. Either javascript should detect when the page is being framed, and refuse to load the editing related js code (Since the editing related code only happens if js is enabled, its safe to detect this condition in JS), or the extension can call OutputPage::preventClickjacking() (Which will totally disables frames altogether for both js and non-js users).

In php-vuejs

Change 416406 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Add missing htmlspecialchars() to SensesView

https://gerrit.wikimedia.org/r/416406

I uploaded fixes for two issues:

  • https://gerrit.wikimedia.org/r/416406
  • https://github.com/wmde/php-vuejs-templating/pull/6
  • FormIdFormatter and SenseIdFormatter are dummy implementations. The escaping is fine. I created T188899: [Task] Remove hard-coded demo data from Lexeme code base to properly deal with these later.
  • comma-separator is a message from core, and intentionally contains HTML (but no wiki syntax). As far as I can see we are using it correctly. Yes, we are aware messages with HTML are problematic, and we avoid it whenever possible.
  • Click-jacking prevention is a general issue for all of Wikidata.
    • MediaWiki does have a global $wgEditPageFrameOptions setting for the "X-Frame-Options" HTTP header that defaults to "DENY". The code using this setting is \OutputPage::getFrameOptions().
    • There are methods that should be called to make sure all pages intentionally set or don't set this HTTP header:
      • \OutputPage::preventClickjacking() and \OutputPage::allowClickjacking().
      • Alternatively, \ParserOutput::preventClickjacking( true ) can be used.
    • TODO: Create an investigation ticket to better understand clickjacking prevention, and how and where Wikibase code needs to utilize it.

Change 416406 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Add missing htmlspecialchars() to SensesView

https://gerrit.wikimedia.org/r/416406

thiemowmde triaged this task as Medium priority.Mar 5 2018, 1:48 PM

Thanks a lot @Bawolff for the review. We believe we've addressed all the issues pointed out, as elaborated above by @thiemowmde. Could you please have another look whether we're good now?
As mentioned above, some of the issues were not "fixed". If you think those still need to be addressed, please say so (e.g. if you are convinced this is the time Wikibase in general should solve the click-jacking issue). We would appreciate any other remedy suggestion wherever you have one!

re: FormIdFormatter and SenseIdFormatter - I thought they might later be extended to a real implementation, which is why I was concerned, but as long as its just a dummy implementation that's eventually going away, that's all cool.

re: click-jacking: Yeah, it really is a wikidata issue. It does not block deployment of wikibaselexeme. As long as someone intends to look into the issue, I'm happy.

comma-separator is a message from core, and intentionally contains HTML (but no wiki syntax). As far as I can see we are using it correctly. Yes, we are aware messages with HTML are problematic, and we avoid it whenever possible.

In MediaWiki core, that message is always used as wfMessage( 'comma-separator' )->escaped(); WikibaseLexeme should similarly escape it. (In mediawiki core, the wfMessage()->escaped() format does not double encode entities, so   stays as   after escaping).


So I'd still like to see the comma-separator thing changed (which is very a minor issue), but otherwise this looks good and is good to go.

Change 418715 had a related patch set uploaded (by Thiemo Kreuz (WMDE); owner: Thiemo Kreuz (WMDE)):
[mediawiki/extensions/WikibaseLexeme@master] Escape HTML in comma-separator message in FormsView

https://gerrit.wikimedia.org/r/418715

Change 418715 merged by jenkins-bot:
[mediawiki/extensions/WikibaseLexeme@master] Escape HTML in comma-separator message in FormsView

https://gerrit.wikimedia.org/r/418715

Thanks @Bawolff. So the comma-separator escaping happened in https://gerrit.wikimedia.org/r/418715. Click jacking deserves its own ticket, so I've filed T189491 about it.

Hi @Bawolff, it's me again. With https://gerrit.wikimedia.org/r/418715 would you be able to claim the security review was done?

Yep, you are good.