Svoboda | Graniru | BBC Russia | Golosameriki | Facebook
BBC RussianHomePhabricator
Log In
Maniphest T341244

ParserOptions and Title::getPageViewLanguage may disagree on the lang/dir
Closed, ResolvedPublic

Description

Background

During a page view, Article.php is responsible for rendering the parser output.

  • The ParserOptions for a page view come from Article::getParserOptions (via WikiPage::makeParserOptions and ParserOptions::newFromContext), where ParserOptions::newFromContext takes care of user preferences such as language variant. Then, before invoking the ParserCache and Parser, Article.php runs the the onArticleParserOptions hook, which may call setTargetLanguage to render the article in a different language code (or variant).
  • Today, I believe, when ParserOutput formats its HTML in getText, it adds the mw-parser-output class, but it does not set any lang or dir attributes. It is currently the responsibility of the skin to do this.

At the Skin layer, and in the OutputPage class, there are a handful of calls to Title::getPageViewLanguage() that each independently try to determine what language the parser output is rendered in by taking the page language (from the page source/database) and applying a similar "current User" transformation to it for language variants as ParserOptions::newFromContext.

The "page view language" is then used to set the mw-content-ltr (or rtl) class name, and lang/dir attributes on the <div> that wraps the parser output (in Skin::wrapHTML, called from SkinTemplate). It is also used by OutputPage for mw.config wgPageContentLanguage and wgPageViewLanguage (ref T303375).

Problem

The "page view language" (from Title::getPageViewLanguage) has no influence on the actual rendering of the parser output. It does not factor into ParserOptions, or ParserCache key, or the ParserOutput. Instead, this method merely attempts to reverse engineering it. This is duplicative, risky, and we now know of at least one case where this is wrong (https://gerrit.wikimedia.org/r/c/mediawiki/extensions/WikimediaIncubator/+/935091, ref T299369, uses the onArticleParserOptions hook and ParserOptions::setTargetLanguage method).

Ideally, I think, the Skin layer should be informed by OutputPage/ParserOutput/Article.php what the language of the rendered content is. It shouldn't have to reverse engineer or guess anything. Injecting from there into the Skin layer would also reduce computation overhead from re-computing it.

Once the ParserOutput object is available, there is not an obvious way to obtain the target language.

Strawman proposal

  1. As far as I can tell, the result of Title::getPageViewLanguage is never used to "decide" anything. I suggest we instead let ParserOutput dictate the user/lang attributes the Skin should use when wrapping the parser output in <div class="mw-content-ltr">. The element can remain owned by the Skin so that it can customise this, but the values would be informed by ParserOutput metadata instead of from the Title object.
  1. The recently introduced ParserOutput::getLanguage might be suitable for this purpose (cc @cscott, change 836297. Otherwise, we may need to introduce a new method to expose this information.
  1. If the above are both feasible and considered improvements, then we should deprecate and remove Title::getPageViewLanguage. While this is still a relatively new method, it appears by design any caller will be misinformed. It is also a violation of current design principles, because it involves implied global state (it doesn't even take a User or RequestContext parameter currently). The replacement would be to defer to the Skin, which in turn defers to the combined ParserOutput object that Article.php injects into OutputPage.

Event Timeline

Krinkle created this task.
Krinkle updated the task description. (Show Details)

I agree that Title::getPageViewLanguage is broken by design.

In my mind, we need to have three things:

  1. Page language (on PageRecord): The language the page is (primarily?) written in. This is associated with the page itself, in the database or derived from the title.
  2. Target language (in ParserOptions): The language (variant) the user wants. This is derived from user options or information present in the request.
  3. Output Language (in ParserOutput): The language of the HTML generated by the parser. This is controleld by ContentHandler. It would typically be either the page language or the target language, but may also be a best effort approximation, or something else entirely.

Once the ParserOutput object is available, there is not an obvious way to obtain the target language.

The output language may be different from the target language. The relevant thing for the skin is the actual language of the output. For this reason, we recently added ParserOutput::getLanguage() 836297. However, this method still often returns null. In that case, it is probably safe to assume the wiki's content language.

Change 939783 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] [WIP] OutputPage: Add get/setContentLanguage, deprecate Title::getPageViewLanguage

https://gerrit.wikimedia.org/r/939783

Krinkle removed a project: Essential-Work.

Change 963193 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/core@master] parser: Improve ParserOutput docs and fix absoluteURLs default

https://gerrit.wikimedia.org/r/963193

Change 963193 merged by jenkins-bot:

[mediawiki/core@master] parser: Improve ParserOutput docs and fix absoluteURLs default

https://gerrit.wikimedia.org/r/963193

Change 967327 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/AbuseFilter@master] phpunit: Avoid hardcoding exact mw-parser-output class attribute

https://gerrit.wikimedia.org/r/967327

Change 967329 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/ProofreadPage@master] phpunit: Avoid hardcoding exact mw-parser-output class attribute

https://gerrit.wikimedia.org/r/967329

Change 967330 had a related patch set uploaded (by Krinkle; author: Krinkle):

[mediawiki/extensions/Wikibase@master] tests: Update hardcoded mw-parser-output in ApiFormatReferenceTest

https://gerrit.wikimedia.org/r/967330

Change 967327 merged by jenkins-bot:

[mediawiki/extensions/AbuseFilter@master] phpunit: Avoid hardcoding exact mw-parser-output class attribute

https://gerrit.wikimedia.org/r/967327

Change 967330 merged by jenkins-bot:

[mediawiki/extensions/Wikibase@master] tests: Update hardcoded mw-parser-output in ApiFormatReferenceTest

https://gerrit.wikimedia.org/r/967330

Change 967329 merged by jenkins-bot:

[mediawiki/extensions/ProofreadPage@master] phpunit: Avoid hardcoding exact mw-parser-output class attribute

https://gerrit.wikimedia.org/r/967329

Change 939783 merged by jenkins-bot:

[mediawiki/core@master] parser: Move lang/dir and mw-content-ltr to ParserOutput::getText

https://gerrit.wikimedia.org/r/939783