User agent string chrome linux

Overview of Google crawlers and fetchers (user agents)

Google uses crawlers and fetchers to perform actions for its products, either automatically or triggered by user request.

«Crawler» (sometimes also called a «robot» or «spider») is a generic term for any program that is used to automatically discover and scan websites by following links from one web page to another. Google’s main crawler is called Googlebot.

Fetchers, like a browser, are tools that request a single URL when prompted by a user.

The following tables show the Google crawlers and fetchers used by various products and services, how you may see in your referrer logs, and how to specify them in robots.txt.

  • The user agent token is used in the User-agent: line in robots.txt to match a crawler type when writing crawl rules for your site. Some crawlers have more than one token, as shown in the table; you need to match only one crawler token for a rule to apply. This list is not complete, but covers most crawlers you might see on your website.
  • The full user agent string is a full description of the crawler, and appears in the HTTP request and your web logs. Caution: The user agent string can be spoofed. Learn how to verify if a visitor is a Google crawler.

Common crawlers

Google’s common crawlers are used for building Google’s search indices, perform other product specific crawls, and for analysis. They always obey robots.txt rules and generally crawl from the IP ranges published in the googlebot.json object.

Googlebot Smartphone

Googlebot Desktop

  • Mozilla/5.0 AppleWebKit/537.36 (KHTML, like Gecko; compatible; Googlebot/2.1; +http://www.google.com/bot.html) Chrome/W.X.Y.Z Safari/537.36
  • Rarely:
    • Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
    • Googlebot/2.1 (+http://www.google.com/bot.html)

    Googlebot Image

    Used for crawling image bytes for Google Images and products dependent on images.

    Googlebot News

    Googlebot News uses the Googlebot for crawling news articles, however it respects its historic user agent token Googlebot-News .

    Googlebot Video

    Used for crawling video bytes for Google Video and products dependent on videos.

    Google Favicon

    Google StoreBot

    The Google Storebot crawls through certain types of pages, including, but not limited to, product details pages, cart pages, and checkout pages.

    • Desktop agent:
      Mozilla/5.0 (X11; Linux x86_64; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/79.0.3945.88 Safari/537.36
    • Mobile agent:
      Mozilla/5.0 (Linux; Android 8.0; Pixel 2 Build/OPD3.170816.012; Storebot-Google/1.0) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.138 Mobile Safari/537.36

    Google-InspectionTool

    Google-InspectionTool is the crawler used by Search testing tools such as the Rich Result Test and URL inspection in Search Console. Apart from the user agent and user agent token, it mimics Googlebot.

    • Google-InspectionTool
    • Googlebot
    • Mobile
      Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/W.X.Y.Z Mobile Safari/537.36 (compatible; Google-InspectionTool/1.0)
    • Desktop
      Mozilla/5.0 (compatible; Google-InspectionTool/1.0)

    GoogleOther

    Generic crawler that may be used by various product teams for fetching publicly accessible content from sites. For example, it may be used for one-off crawls for internal research and development.

    Special-case crawlers

    The special-case crawlers are used by specific products where there’s an agreement between the crawled site and the product about the crawl process. For example, AdsBot ignores the global robots.txt user agent ( * ) with the ad publisher’s permission. The special-case crawlers may ignore robots.txt rules and so they operate from a different IP range than the common crawlers. The IP ranges are published in the special-crawlers.json object.

    APIs-Google

    Used by Google APIs to deliver push notification messages. Ignores the global user agent ( * ) in robots.txt.

    AdsBot Mobile Web Android

    Checks Android web page ad quality. Ignores the global user agent ( * ) in robots.txt.

    AdsBot Mobile Web

    Checks iPhone web page ad quality. Ignores the global user agent ( * ) in robots.txt.

    AdsBot

    Checks desktop web page ad quality. Ignores the global user agent ( * ) in robots.txt.

    AdSense

    The AdSense crawler visits your site to determine its content in order to provide relevant ads. Ignores the global user agent ( * ) in robots.txt.

    Mobile AdSense

    The Mobile AdSense crawler visits your site to determine its content in order to provide relevant ads. Ignores the global user agent ( * ) in robots.txt.

    User-triggered fetchers

    User-triggered fetchers are triggered by users to perform a product specific function. For example, Google Site Verifier acts on a user’s request. Because the fetch was requested by a user, these fetchers generally ignore robots.txt rules. The IP ranges the user-triggered fetchers use are published in the user-triggered-fetchers.json object.

    Feedfetcher

    Feedfetcher is used for crawling RSS or Atom feeds for Google Podcasts, Google News, and PubSubHubbub.

    Google Publisher Center

    Fetches and processes feeds that publishers explicitly supplied through the Google Publisher Center to be used in Google News landing pages.

    Google Read Aloud

    Upon user request, Google Read Aloud fetches and reads out web pages using text-to-speech (TTS).

    • Desktop agent:
      Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/41.0.2272.118 Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)
    • Mobile agent:
      Mozilla/5.0 (Linux; Android 7.0; SM-G930V Build/NRD90M) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.125 Mobile Safari/537.36 (compatible; Google-Read-Aloud; +https://support.google.com/webmasters/answer/1061943)

    Former agent (deprecated):

    Google Site Verifier

    Google Site Verifier fetches upon user request Search Console verification tokens.

    A note about Chrome/W.X.Y.Z in user agents

    Wherever you see the string Chrome/W.X.Y.Z in the user agent strings in the table, W.X.Y.Z is actually a placeholder that represents the version of the Chrome browser used by that user agent: for example, 41.0.2272.96 . This version number will increase over time to match the latest Chromium release version used by Googlebot.

    If you are searching your logs or filtering your server for a user agent with this pattern, use wildcards for the version number rather than specifying an exact version number.

    User agents in robots.txt

    Where several user agents are recognized in the robots.txt file, Google will follow the most specific. If you want all of Google to be able to crawl your pages, you don’t need a robots.txt file at all. If you want to block or allow all of Google’s crawlers from accessing some of your content, you can do this by specifying Googlebot as the user agent. For example, if you want all your pages to appear in Google Search, and if you want AdSense ads to appear on your pages, you don’t need a robots.txt file. Similarly, if you want to block some pages from Google altogether, blocking the Googlebot user agent will also block all Google’s other user agents.

    But if you want more fine-grained control, you can get more specific. For example, you might want all your pages to appear in Google Search, but you don’t want images in your personal directory to be crawled. In this case, use robots.txt to disallow the Googlebot-Image user agent from crawling the files in your personal directory (while allowing Googlebot to crawl all files), like this:

    User-agent: Googlebot Disallow: User-agent: Googlebot-Image Disallow: /personal

    To take another example, say that you want ads on all your pages, but you don’t want those pages to appear in Google Search. Here, you’d block Googlebot, but allow the Mediapartners-Google user agent, like this:

    User-agent: Googlebot Disallow: / User-agent: Mediapartners-Google Disallow:

    Controlling crawl speed

    Each Google crawler accesses sites for a specific purpose and at different rates. Google uses algorithms to determine the optimal crawl rate for each site. If a Google crawler is crawling your site too often, you can reduce the crawl rate.

    Retired Google crawlers

    The following Google crawlers are no longer in use, and are only noted here for historical reference.

    Duplex on the web

    Supported the Duplex on the web service.

    Web Light

    Checked for the presence of the no-transform header whenever a user clicked your page in search under appropriate conditions. The Web Light user agent was used only for explicit browse requests of a human visitor, and so it ignored robots.txt rules, which are used to block automated crawling requests.

    Mobile Apps Android

    Checks Android app page ad quality. Obeys AdsBot-Google robots rules, but ignores the global user agent ( * ) in robots.txt.

    Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under the Apache 2.0 License. For details, see the Google Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

    Last updated 2023-05-24 UTC.

    Источник

    User Agent Strings

    A browser’s user agent string (UA) helps identify which browser is being used, what version, and on which operating system. When feature detection APIs are not available, use the UA to customize behavior or content to specific browser versions.

    Like all other browsers, Chrome for Android sends this information in the User-Agent HTTP header every time it makes a request to any site. It’s also available in the client through JavaScript using the navigator.userAgent call.

    Chrome for Android

    Chrome for Android reports its UA in the following formats, depending on whether the device is a phone or a tablet.

    Mozilla/5.0 (Linux; ; ) AppleWebKit/ (KHTML, like Gecko) Chrome/ Mobile Safari/
    Mozilla/5.0 (Linux; ; ) AppleWebKit/(KHTML, like Gecko) Chrome/ Safari/

    Here’s an example of the Chrome user agent string on a Galaxy Nexus:

    Mozilla/5.0 (Linux; Android 4.0.4; Galaxy Nexus Build/IMM76B) AppleWebKit/535.19 (KHTML, like Gecko) Chrome/18.0.1025.133 Mobile Safari/535.19

    If you are parsing user agent strings using regular expressions, the following can be used to check against Chrome on Android phones and tablets:

    • Phone pattern: ‘Android’ + ‘Chrome/[.0-9]* Mobile’
    • Tablet pattern: ‘Android’ + ‘Chrome/[.0-9]* (?!Mobile)’

    Chrome for iOS

    The UA in Chrome for iOS is the same as the Mobile Safari user agent, with CriOS/ instead of Version/ .

    Here’s an example of the Chrome UA on iPhone:

    Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/602.1.50 (KHTML, like Gecko) CriOS/56.0.2924.75 Mobile/14E5239e Safari/602.1

    For comparison, the Mobile Safari UA:

    Mozilla/5.0 (iPhone; CPU iPhone OS 10_3 like Mac OS X) AppleWebKit/603.1.23 (KHTML, like Gecko) Version/10.0 Mobile/14E5239e Safari/602.1

    Up to Chrome 84, when the Request Desktop Site feature is enabled, the Desktop Safari UA is sent:

    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_10_4) AppleWebKit/600.7.12 (KHTML, like Gecko) Version/8.0.7 Safari/600.7.12

    Starting from Chrome 85, when the Request Desktop Site feature is enabled, the UA is the same as the Desktop Safari UA with CriOS/ being added:

    Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_5) AppleWebKit/605.1.15 (KHTML, like Gecko) CriOS/85 Version/11.1.1 Safari/605.1.15

    WebView on Android

    The Android 4.4 (KitKat) Chromium-based WebView adds Chrome/_version_ to the user agent string.

    Mozilla/5.0 (Linux; U; Android 4.1.1; en-gb; Build/KLP) AppleWebKit/534.30 (KHTML, like Gecko) Version/4.0 Safari/534.30

    WebView UA in KitKat to Lollipop

    Mozilla/5.0 (Linux; Android 4.4; Nexus 5 Build/_BuildID_) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/30.0.0.0 Mobile Safari/537.36

    If you’re attempting to differentiate between the WebView and Chrome for Android, you should look for the presence of the Version/_X.X_ string in the WebView user-agent string. Don’t rely on the specific Chrome version number (for example, 30.0.0.0) as the version numbers changes with each release.

    WebView UA in Lollipop and Above

    In the newer versions of WebView, you can differentiate the WebView by looking for the wv field as highlighted below.

    Mozilla/5.0 (Linux; Android 5.1.1; Nexus 5 Build/LMY48B; wv) AppleWebKit/537.36 (KHTML, like Gecko) Version/4.0 Chrome/43.0.2357.65 Mobile Safari/537.36

    Content available under the CC-By 3.0 license

    Источник

    Читайте также:  Linux get running apps
Оцените статью
Adblock
detector