We have a development server at dev.example.com that is being indexed by Google. We are using AWS Lightsail to duplicate the development server to our production environment in totality โ the same robots.txt file is used on both dev.example.com and www.example.com. and the "dev.example.com" is staging url, and www.example.com is official one, but somehow google is indexing the staging url, how to prevent this
To prevent Google from indexing your staging URL (dev.example.com
) while allowing indexing for your production URL (www.example.com
), you can take several steps:
- Robots.txt: Make sure your robots.txt file on the staging server (
dev.example.com
) explicitly disallows indexing:
User-agent: * Disallow: /
This will instruct search engine crawlers not to index any content on the staging server.
- Meta Robots Tag: In addition to the robots.txt file, you can also include a meta robots tag in the
<head>
section of your staging site's HTML pages to reinforce the disallow directive:
<meta name="robots" content="noindex, nofollow">
This tag tells search engines not to index the current page (noindex
) and not to follow any links on the page (nofollow
).
Google Search Console: Use Google Search Console to explicitly tell Google not to index the staging site. You can submit the staging URL (
dev.example.com
) as a separate property and then use the removal tool to request removal of URLs from the Google index.Canonical Tags: Ensure that all pages on your production site include canonical tags pointing to the corresponding pages on the production site. This helps search engines understand which version of the page is the preferred one for indexing.
<link rel="canonical" href="
https://www.example.com/page
">
By following these steps, you can effectively prevent Google from indexing your staging URL (dev.example.com
) while allowing indexing for your production URL (www.example.com
).