-
Notifications
You must be signed in to change notification settings - Fork 68
Add information about search engine crawling and indexing #795
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: gh-pages
Are you sure you want to change the base?
Conversation
✅ Deploy Preview for i18n-drafts ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
After discussing with @r12a, we decided to limit this PR to the "When to use language negotiation" part and put "How to do it" (such as the last two paragraphs about The new article should probably be in Navigation. Also related to #521. I'll try to do this. |
jsahleen
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple suggestions/questions, but nothing that should block merging.
|
|
||
| <section id="searchengines"> | ||
| <h4>Search engine discovery challenges</h4> | ||
| <p>A major limitation of language negotiation is how search engines discover and index multilingual content. Search engine crawlers often do not send an <code class="kw" translate="no">Accept-Language</code> header when requesting pages, or may default to a specific language such as English. For example, <a href="https://developers.google.com/search/docs/specialty/international/locale-adaptive-pages" target="_blank">Googlebot frequently crawls without this header</a>. This means the crawler may only discover and index the default language version of a page. Consequently, other language versions remain invisible to users searching in those languages, significantly limiting the reach of multilingual content.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Suggest: "Another limitation of language negotiation has do with the way search engines discover and index multilingual content."
questions/qa-when-lang-neg.en.html
Outdated
| <p>A major limitation of language negotiation is how search engines discover and index multilingual content. Search engine crawlers often do not send an <code class="kw" translate="no">Accept-Language</code> header when requesting pages, or may default to a specific language such as English. For example, <a href="https://developers.google.com/search/docs/specialty/international/locale-adaptive-pages" target="_blank">Googlebot frequently crawls without this header</a>. This means the crawler may only discover and index the default language version of a page. Consequently, other language versions remain invisible to users searching in those languages, significantly limiting the reach of multilingual content.</p> | ||
| <p>Furthermore, serving different content to search engines and users is considered "cloaking", a practice that search engines may penalize. While language negotiation itself is not inherently cloaking, improper implementation can be misconstrued as such.</p> | ||
| <p>To address these issues, search engines explicitly recommend using separate URLs for each language version of a page. This approach provides clear signals to search engines about available language variations. Combined with <code class="kw" translate="no">hreflang</code> annotations in the HTML <code class="kw" translate="no">head</code> or in XML sitemaps, separate URLs help search engines understand which language or regional version to show users based on their language and location settings.</p> | ||
| <p>If you implement language negotiation, you should also provide language-specific URLs (such as example.com/de/page.html for German and example.com/fr/page.html for French) that can be discovered and indexed by search engines. The language-generic URL (example.com/page.html) can still use content negotiation for direct visitor access, but the language-specific URLs ensure all versions are discoverable.</p> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This only applies if you want the pages to be discoverable and indexable, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, as I said above, I plan to remove this paragraph and the previous paragraph and move them into a new article.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah. Yes. Apologies. I wasn't clear on the intent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved to #799
|
|
||
|
|
||
| <section id="answer"> | ||
| <h2>Answer</h2> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These changes are unrelated to the title. You should pull them separately?
| correspond to one another any more.</p> | ||
|
|
||
| <section id="searchengines"> | ||
| <h4>Search engine discovery challenges</h4> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This section makes a number of assumptions about how language negotiated pages are structured, stored, and served. Those assumptions may not hold true in practice and are an important consideration in how a given site structures its language negotiation. For example, in my proposed article about language negotiation, I call out the need for domain, path, or query based language identification. One reason for this is to expose per-page language/locale to robots and crawlers.
This also helps with things like offline links in a specific language (if I have a display ad in Spanish, I want the link to to go the Spanish language experience page)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any specific suggestions regarding the text here? After discussing with Richard, we decided to limit the content of this document to the "when" (and a bit "why") aspects, while the "what" and "how" parts will be discussed in other articles.
|
Discussed in today's WG telecon: https://www.w3.org/2026/01/15-i18n-minutes.html#de51 |
Fix #794.
Preview