You are here:

Preventing Search Engine Indexing of Secure Pages

0 out of 5 based on 0 ratings. 0 user reviews.

Many web sites have portions of their site configured to use SSL. This allows the transfer of information between the server and the browser to take place over an encrypted connection.

URLs of such pages begin with https rather than http to indicate the secure protocol.

You may experience serious canonicalization problems if the secure portions of your site have been fully indexed along with your standard site.

These problems arise only if you have the secure pages within the same subdomain as your standard pages.

In the cases where you have a secure subdomain, that portion of your site can be excluded from indexing using the robots.txt file in the root folder for that subdomain.

In some cases, only a single page within a site (such as a contact form, or a payment form) require the use of SSL. It may be seen as a simpler option in such cases to have the secure page within your standard site structure.

Only the protocol type would need to be changed in this case (from http to https), not the subdomain or directory.

However, this technique can result in a search engine indexing the secure page, as well as following links from the page. If given as relative links, i.e. to index.html, these would be interpreted as links to secure versions of your standard pages.

Google, and possibly other search engines, could regard this as duplicated content, thus reducing your page ranking in their search results. Once indexed, Google will continue to visit these pages, unless excluded by a robots.txt file or special meta tags in the head of each file.


seo software


So how do you stop Google from visiting these pages?

If you find yourself in such a position, it may seem like there is no simple way to get out.

There is a way to redirect secure requests for the robots.txt file to a secondary file which will exclude web crawling programs from your secure pages.

In order for this solution to work you must be using an Apache webserver with mod_rewrite enabled.

First, you should create a second robots.txt, calling it robots_ssl.txt (or whichever filename you prefer), making sure it blocks all spiders. Upload this file to the root level of your domain.

For further information about the robots.txt file; what it is, and how it works; see The Web Robots Pages.

Here is an example of such a file:

User-agent: Googlebot
Disallow: /

User-agent: *
Disallow: /

You will then need to add the following commands to your .htaccess file in the root document folder of your webserver:

RewriteEngine on
Options +FollowSymlinks
RewriteCond %{SERVER_PORT} ^443$
RewriteRule ^robots.txt$ robots_ssl.txt

Essentially, this command instructs the webserver to direct any requests for the robots.txt file made on port 443 (used for SSL web connections rather than port 80 for standard web connections) to the second file that we created, disallowing indexing.

In order to verify that this is working, enter the URL for your robots.txt file into your browser (http://www.yourdomain.com/robots.txt) to see your standard commands. Then enter the secure URL (https://www.yourdomain.com/robots.txt) and you should see the contents of the new robots_ssl.txt file.


seo software


What if I can’t use mod_rewrite?

If you are unable to use mod_rewrite, either because you are not using an Apache webserver, or that function is not enabled, all is not lost.

Using PHP you can check to see if SSL is in use, and, if so, include a meta tag in the head of your documents to disallow indexing by the search engines.

The following code, placed somewhere in the head of your document, will insert the meta tag if the HTTPS server variable is set, and its value is on:

<?php
if (isset($_SERVER['HTTPS']) && strtolower($_SERVER['HTTPS']) == 'on') {
echo '<meta name="robots" content="noindex,follow">'. "\n";
}
?>

To verify this code, view any page with this code using SSL (https://www.yourdomain.com/filename.php) in your browser.

Using the view source command of your browser you should see this meta tag inserted in the head of the document.


seo software


About this SEO tutorial

This tutorial was written by , Forensic SEO & Social Semantic Web Consultant at SEO Workers and was published October 03, 2006.

Copyright reserved. Not to be reproduced.

Attention Comment Spammers!

  • Google says that spamming other sites can lower your site rankings;
  • Signature links under comments are not accepted;
  • Highly relevant links in the body such as the web site or blog URL are allowed, but only upon our approval;
  • You can expect that your comment may be deleted if you are attempting to get a backlink link rather than adding real value to the conversation;
  • SEO or Link Builders, if we catch you commenting but linking to a client's site, your online reputation, your firm's and that of your client are being put at severe risk;
  • Link to a page or a blog that can be identified as being about you, so our readers can gain additional perspective on your point of view and where necessary sufficient disclosure.
Comment & Rate