Closing a site from indexing in Google and Yandex: how to implement it correctly

Closing a site from indexing in Google and Yandex: how to implement it correctly

The promotion of any site aims to get it to the top of search results. To do this, SEO specialists have to perform a fairly extensive range of work, spending a lot of time and effort on it. And one of the most important tasks here is to ensure that the site pages are indexed by search bots as quickly as possible. This is what in itself guarantees that the site will be included in the search results.

However, as practice shows, there are situations when specialists are faced with the need to close their pages from indexing or prohibit their scanning. At first glance, such an action may seem illogical, but in reality, each of you can face something similar. What situations are we talking about now? How can you use the robots.txt file in this case? What should be done to close the site pages from indexing by Google and Yandex search bots? For what reasons do problems most often arise when checking sites with crawlers?

We will consider all these questions in detail as part of today's review. The information provided will help you delve deeply into this issue, understand for yourself the feasibility of performing these works and implement everything, if necessary, as quickly and efficiently as possible. So, let's take things in order.

Cases when you need to close a site page from indexing

If you have already worked in the field of SEO, then you probably know that a new site, and even its individual pages, do not start to appear in search results immediately after it is posted. The reason for this is quite trivial: search bots cannot instantly index all pages. This work takes time and money. And only after the crawler checks your page, information about it will be sent to the search engine database, that is, it will start to be displayed in search results, and, accordingly, people will be able to see it. That is, if you see a certain selection of pages in front of you in response to your request, it means that each of them contains relevant content.

In parallel with target pages aimed at the audience, almost every site also has those where working documentation and related files are posted that are of no use to visitors. This is what SEO specialists, site administrators, and developers need in their work. Most of these are temporary files, internal links, working documentation, pages that are currently still in the development stage, as well as all sorts of service information. If such pages get into the index, they will not directly benefit the visitor. But the process of searching the site can become much more complicated due to a more complex and incomprehensible structure. That is, in this case, we get a negative impact on the usability of the site as a whole. Therefore, it would be quite reasonable to close all these service pages from indexing.

In some cases, such a solution also turns out to be reasonable for entire sites. Most of these are sites that are still in the development stage, when their content, design, structure are changing or other transformations are being performed. Experts recommend closing a site or its individual pages from indexing in the following cases:

  • The desire to maintain the site's position in search results. Experience shows that the presence of indexed service files, pages under development, and other information on the site that are not useful to users will eventually lead to a systematic decrease in the place in the search results. The search engine strives to ensure that only the content that is valuable to the end user is posted on sites. Therefore, it is important to hide pages that do not meet this criterion.
  • The desire to speed up the entry into the index of pages that are valuable to your site. This is especially true in cases where you are just launching your site and would like to ensure that the pages get into the search results as quickly as possible. Surely you have already heard of such a term as krauling budget. This is a kind of limit on the number of pages for scanning, which is allocated to each site. And it would be quite logical to spend it on the content that will be useful for the audience. This is what will allow you to start attracting an audience to your site as quickly as possible.
  • The desire to meet the search engine requirements for unique content. The presence of exclusively unique materials on the pages is another mandatory requirement that the search engine imposes on sites. If you are currently testing your site on another domain, it would be advisable to close it from indexing. If this is not foreseen, search bots will consider such pages as duplicates. This means that your site will not benefit from this.
  • The desire to maintain high usability. This means that for a certain period of time, you want to hide your site from being checked by search bots in the event that you decide to make adjustments to the design or structure of the site. If you do not do this, the guards will notice the deterioration of usability and may lower the position of the resource, while you, on the contrary, are trying to improve it.

As you can see, there are not so many reasons for hiding a site from indexing, but, nevertheless, each of you may face the need to perform such work. And this means that it is important to understand how all this can be implemented in practice.

Ways to manage the process of indexing site pages

A modern webmaster has 2 tools that will help to keep the process of indexing site pages under full control:

  1. Site map. Sitemap.xml is the solution that helps search bots navigate the structure of your site as easily and simply as possible, understand what is contained in a particular document, folder. In this file, you can specify the frequency of content updates, set priorities for scanning individual pages, thereby literally directing search bots around your site and forcing them to perform exactly the actions you need. Creating a map will be especially useful for fairly large sites, those with a large number of pages. It will be extremely difficult for a search bot to check them all and correctly set priorities. At the very least, it will take a lot of time. You can significantly speed up the solution of these tasks, as well as improve their accuracy and quality by creating a site map. But for landings and small resources, there is no need to perform such work.
  2. robots.txt. This is a specialized file where you yourself write down all the rules on the basis of which search bots will work. Here you can specify key parameters for scanning your site, as well as set restrictions on checking certain pages.

Along with these methods, you can also use HTML markup and specialized consoles for webmasters provided by Yandex Webmaster and Google Search Console to hide certain pages from indexing. That is, there are opportunities, you just need to know how to use them correctly.

What elements of the site should be hidden from indexing

Using the robots.txt file, HTML markup, and sitemap in practice, you can hide from indexing both individual documents or files, and certain pages, the site as a whole, some links, and even paragraphs of text. In particular, in this way you can hide from search bots:

  • Pages intended for official use. Here we can include everything that is directly related to the administration of the site, namely the relevant sections of the sites, service directories. For example, this could be an authorization form placed on the control panel. It is absolutely not needed by the user, since it will not help him get acquainted with the product, place an order, or at least find out useful information. It is only required by the administrator to manage the site. So why launch it for indexing then? A more reasonable solution would be to hide it from search bots.
  • Various forms provided on the site. Here we are talking about technical solutions intended for entering information, registering, placing orders. That is, these are the solutions that do not affect the search at all. It also makes sense to close captcha, tag clouds, pop-up windows, a shopping cart, the "Favorites" section on the site from indexing.
  • PDF documents. These can be files intended exclusively for official use, as well as those that are accessed by users. This includes documents containing information about the privacy policy, information about the range of products on the site, price lists, terms of user agreements. To understand whether it is worth adding all this to the index, you need to analyze the search results in your niche. If you see that PDF documents are displayed in higher positions compared to regular pages for the same queries, then it would be advisable to close them, thereby ensuring the availability of more important information to users. But we would recommend closing official documentation from the index without fail.
  • A page that is currently under development. If your site has a page that you are currently working on, or changing its text or graphic content, then it would be advisable to hide them from the index until such actions are completed. At the same time, it is worth hiding duplicate pages to ensure the uniqueness of each page.
  • Web pages intended for printing. In some cases, in order to improve the usability of the site, webmasters add additional functions to it, such as printing documents. The mechanism itself in this case will involve the creation of duplicate content. And if you do not close what should be sent to print from indexing, then by mistake crawlers can index them, while the main pages will remain, so to speak, behind the scenes. That is, for the system, copies of the main pages can become a priority.
  • Backup copy of the site. Most of the sites have it, as it allows you to restore its functionality as quickly as possible in the event of technical difficulties on the main resource. To implement such work, 301 redirect is used. But such a copy should never be added for indexing, because in this case your pages will compete with each other for a place in the search results.
  • Pages containing information about the search results on the site. They are valuable for users who are looking for this or that information for themselves on the site. But they do not have any significant impact on the search itself. This means that it is better not to add them to the index at all.
  • Personal user data. This includes the information that the user leaves when authorizing on your site when filling out a feedback form or order card. That is, we are talking about full name, email address, phone number and other contact information, payment details, purchase history, etc.

These are just basic recommendations. This list can be expanded individually depending on the specifics of your site.

Closing the site from indexing by Google search bots

If you want to hide your site or individual pages from indexing by Google search bots, you can use robots.txt and HTML markup to implement this idea. You can also write the appropriate rules in the HTTP header for Googlebot.

Now let's look at the main points.

Blocking access to Google bots via robots.txt

Googlebot is the key search bot used by Google today. It is the one that indexes pages and also checks whether the site has been adjusted for mobile devices. Surely you know that adaptability for mobile gadgets is one of the mandatory requirements today in this search engine. But, along with Googlebot, dozens of other bots designed to solve narrow-profile problems are also actively used here. For example, Googlebot-Image and Googlebot-Video are used when scanning graphic and video content posted on the site, respectively. Googlebot-News is a bot for indexing news pages with their subsequent addition to Google News.

To configure the indexing of individual pages or the entire site for Google search bots using the robots.txt file, you will need to use the Disallow directive. To do this, you need to do the following:

  1. In the robots.txt file itself, specify the following information: “User-agent: Googlebot”, followed by “Disallow: /”.
  2. If you want to close individual pages from indexing, you should specify: “User-agent: Googlebot”, then “ “Disallow: / page”.
  3. To close a section of the site from scanning, you need to write the following directives: “User-agent: Googlebot”, and then “Disallow: / catalogue”.
  4. If you are faced with the task of closing the news section from indexing by the search bot, then in this case the directive will look like this: “User-agent: Googlebot-News” and “Disallow: / news”. Similar actions should be taken if you want to close video content or images from verification.

One more thing: if you initially want to close your page from indexing for all search engines, then when specifying the directive near User-agent, do not specify the bot name, but put the sign “*”, and leave the slash “/” near Disallow.

Closing a site from indexing in Google via HTML markup

Using HTML markup, you can hide both the entire page and individual fragments from indexing by Google search bots. To implement this idea, you need to add the “noindex” or “none” directive to the “Googlebot” meta tag. In this case, you can use the following commands:

  1. To hide certain content on the page, and also to send you a warning if it still appears in the search or in the Google News section, you will need to write the following phrase in the tags «meta name="googlebot" content="noindex, nofollow"».
  2. If the task is to hide certain images on the page from indexing, then in this case your phrase in the tags should look like this: «meta name="googlebot" content="noimageindex"».
  3. If your site contains content that quickly becomes outdated, which is especially relevant for news sites, online store pages with promotional offers, then it makes sense to write a command that will remove this page from the index after it becomes irrelevant. Let's assume that you have a page on your site through which you inform your audience about promotional offers. So, you launch information on it that from August 20 to 31 you will have discounts on school backpacks. That is, on September 1 your offer will no longer be relevant, so you want this page to disappear from the index on August 31. In this case, the command in the meta tags will look like this: «meta name="googlebot" content="unavailable_after: 31-Aug-2025 23:00:00 EST".
  4. If we are talking about closing links from indexing, then in this case you can use 2 different parameters, indicating to the search engine what links originate you are interested in at a given time. In particular, the "rel="ugc"" option is used when it comes to presence on the forum site. That is, where the audience can leave their comments, reviews. Accordingly, they will also most likely contain links. But it will be extremely difficult to say for sure how high-quality these links will be, whether they will provide real help to your site or, on the contrary, harm it. Therefore, for security purposes, it is better for you to let the search bot know where this link came from. If the site contains advertisements with links, then adding the "rel="sponsored"" you will tell the bot that this link is provided as part of the affiliate program.

Checking the page status through Google Search Console

If you use the Google Search Console service in your work, then with its help you can check the status of your page and understand whether it has been indexed in the Google search engine. So, you need to go to the "URL Check Tool" tab in the vertical sidebar. After that, a new window will open in front of you with a column where you need to specify the address of your page, and then click on the "Examine the scanned page" button. If this page is not found, then the received report will indicate that this URL is not in the search engine index.

If you wish, you can use the Google Search Console service to check absolutely all pages of your site. To do this, you need to go to the "Indexing" tab, select the "Pages" option, and then create a report with the scanning results. The system itself will prepare a document for you, grouping the results by individual statuses: "Error", "Warning", "No errors". If errors are detected, their cause will be indicated here, which will help to make the necessary adjustments as quickly as possible.

Closing the site from indexing by search bots

You can prohibit indexing of a site or pages by Yandex search bots in different ways, in particular, through robots.txt, HTML markup, or by authorization on the site.

Let's now consider each of these options in more detail so that you can do this work yourself.

Using robots.txt to close a page from indexing in the Yandex search engine

As in the case of Google, the Yandex search engine also has a main bot - YandexBot. You will need to specify all the directives for closing the entire site or individual pages from indexing to it. There are also about 10 other bots here, designed to perform narrow-profile tasks. In particular, YandexMetrika is designed to work with Yandex Metrica, YandexMarket is a bot that will service Yandex Market, and the YandexMedia bot is provided for indexing multimedia elements.

But if you are faced with the task of closing the entire site or individual pages from indexing, be it duplicates, service pages, pages with confidential data, all kinds of logs, then all directives must be written directly for the main bot - YandexBot. To perform such work, you need to go to the robots.txt file and specify the appropriate directives here. Disallow is also used here. The command type will directly depend on the task you are solving at the moment:

  1. If you need to completely close the entire site from Yandex search bots, then you will need to specify the following parameters: “User-agent: Yandex”, and then “Disallow: /”.
  2. You can set a ban on crawling certain pages of the product catalog using the command “User-agent: Yandex”, and then “Disallow: / catalogue”.
  3. If you want to prohibit indexing of pages whose URL addresses contain parameters, then in this case your directive will look like this “User-agent: Yandex”, and then “Disallow: /page?”. That is, “page?” will indicate that you prohibit the search bot from visiting pages that have parameters in their address.

Closing elements from Yandex indexing using HTML markup

Using HTML markup is useful in practice when you need to hide from indexing not an entire site, but a separate section, page, or certain elements. This can be done by specifying the meta tag “robots” in the “head” element of your markup with directives such as “noindex”, “nofollow” or “none”. Each of these attributes has its own special purpose:

  1. noindex. This way you will prevent search bots from indexing the text presented on the page. That is, we get that the entire page, which will be hidden here, will not be included in the search results.
  2. nofollow. With this directive, you prohibit search bots from following links posted on a particular page.
  3. none. This element completely closes the page, including both the content posted on it and all links.

Let's assume that you want to hide some part of the content on your page. In this case, you will need to specify the following phrase in the HTML code: «meta name=«yandexbot» content=«noindex, nofollow». If the task is to hide absolutely the entire section of your site from the search results, then similar actions with the «noindex» directive will need to be done in the HTML code of absolutely every page.

If you need to hide a separate part of the text on the page from the bot, then you should use the noindex element in the HTML code of the page. As a result, you will get the following phrase: “text that needs to be hidden from indexing” or “”, that is, the “noscript” directive is used. In the second case, you can prohibit indexing of part of the pages, as well as hide the fragment you need from users. This can only be implemented if the user's browser supports JavaScript technology.

If you want to hide not the page from indexing, but prohibit the bot from following links placed inside the content, you will need to use the “nofollow” directive. In this case, it is written directly in the HTML markup right inside the text where your link is placed.

Features of prohibiting indexing in Yandex through authorization

In the case when you need to hide the main page of your site from being checked by bots, the most appropriate solution would be to use authorization. The effectiveness of such actions will be much higher than using the robots.txt file or HTML markup. The fact is that even if you prohibit the bot from going through this page using the options specified above, you will not be able to exclude it from being indexed due to the presence of third-party links leading to it from any other sites.

Checking the status of your page through Yandex Webmaster

If you have completed the necessary work aimed at closing your site or individual pages from indexing in the Yandex search engine, it is still important to make sure that the bots do not ignore the requirements you specified. In practice, there have been cases where even after such work, pages still appeared in search results. Therefore, it is in your interests to check their status and make sure that they are not in the index. To perform such work, you can use Yandex Webmaster.

In particular, you need to go to the "Indexing" tab, and then select the "Page check" tool. In the appropriate field, enter the URL of the page and click the "Check" button. As a result, you will receive a report containing the following data:

  • server response time;
  • encoding;
  • HTTP request status code;
  • contents of the title and description tags;
  • date and exact time of the last scan of this page by bots;
  • page status at the time of the last bot pass through it;
  • current page status in search;
  • JavaScript code execution.

If you want to see information about all indexed pages, you need to go to the "Indexing" tab, select the "Pages in search" tool. The report that you will receive in this case will include a table containing a list of page URLs, their status along with titles and the date of the last bot pass. To see pages in this report that have been removed from search results, you need to pay attention to the information provided in the «Status» column. You are interested in all those pages that have the «HTTP Error» mark. If you analyze the code of this error, you will be able to understand what exactly is being discussed at this point in time.

The most common reasons for problems with website indexing

If you have not taken any actions aimed at hiding certain pages of the site from indexing, but the search engine still does not add them to the results, then you need to find out the reason for what happened. This is the only way you can eliminate the problem itself and ensure that the site is guaranteed to be included in organic search. So, from the main reasons why a site is not indexed, it is worth highlighting:

  • We are talking about a new site. At the beginning of our review, we talked about the fact that the system needs a certain period of time to check the new site and add its pages to the results. If you see that this process is delayed, you can speed it up a little by requesting indexing of pages through the Google Search Console or Yandex Webmaster services.
  • Your site was blocked in the robots.txt file. You need to check this file and make sure that you have not accidentally blocked the entire site or a specific page from indexing.
  • Privacy settings are used. This is especially true for sites with a ready-made CMS. Quite often, privacy settings are already activated by default, thereby prohibiting indexing.
  • There were problems with the service or hosting. If at the time when your site was checked by a search bot, it was unavailable, then this process will not be completed and the pages will not be included in the search results.
  • Access to the site is blocked in the htaccess file. You need to check this file for commands that prohibit indexing.
  • Errors at the scanning stage. If your site has many technical flaws, then the bots will not be able to efficiently crawl all the pages and add them to the search engine database. In this case, you need to use webmaster panels to identify the relevant problems.
  • No sitemap. This is relevant for large sites. We have already mentioned above that the sitemap.xml file is a kind of guide for the bot on the site, indicating the pages that require indexing.

In most cases, eliminating such problems will ensure prompt verification of the site and adding its pages to the search results.

Summing up

We hope that the information we presented in today's review helped you to better understand how search engine bots work in general. You can independently manage their work by prohibiting indexing of certain pages. which do not bring value to users, but only spend the surveillance budget, lead to the fact that your own pages begin to compete with each other in the search results, and also cause a number of related problems. Think about which pages should be hidden from indexing in your particular case and make all the necessary settings, adhering to the recommendations that we have provided in this article.

To significantly simplify the work on promoting the site as a whole, provide yourself with access to any sites, services from different countries of the world, avoid blocking when multi-accounting, using tools that automate actions on the network, as well as protect yourself from unauthorized access, maintain the confidentiality of work, connect mobile proxies from the MobileProxy.Space service. By clicking on the link https://mobileproxy.space/en/user.html?buyproxy you can get to know this product in as much detail as possible, undergo free testing and see for yourself the high simplicity, convenience and functionality of this solution. Also at your service will be a 24-hour technical support service, whose specialists immediately respond to user requests.

Don't forget to evaluate the available tariffs for such wide functionality and versatility in application.


分享文章: