Google Crawling HTML Forms IS Harmful to Your Rankings
A couple of months ago Google officially announced it would be “exploring some HTML forms to try to discover new web pages“. I imagine more than a few SEO’s were baffled by this decision as was I but were probably not too concerned about the decision as Google promised us all “this change doesn’t reduce PageRank for your other pages” and would only increase your exposure in the engines.
During the month of April I began to notice a lot of our internal search pages were not only indexed but outranking the relevant pages for a user’s query. For instance, if you Googled “SubConscious Subs” the first page to appear in the SERP’s would be something like:
http://raleigh.ohsohandy.com/ads/search?q=tables
rather than the page for the establishment:
http://raleigh.ohsohandy.com/review/27571-sub-concious-subs
This wasn’t just a random occurrence. It was happening a lot and in addition to the landing pages being far less relevant for the user, they weren’t optimized for the best placement in the search engines so they were appearing in position #20 instead of say position #6. These local search pages even had pagerank usually between 2 and 3.
Hmm, Just How Bad is This Problem
Eventually I began to realize how often I was running into this in Google, noticed my recent, slow, decline in traffic and it occurred to me this may be a real problem. I’ve never linked to any local search pages on OhSoHandy.com and I couldn’t see that anyone else had either. I queried to find out how many search pages Google had indexed:
Whoa. 5,000+ pages of junk in the index with pagerank. I slept on it for a night, got up the next morning and plugged in
Disallow: /ads/search?q=*
in robots.txt (and threw in a meta robots noindex on those pages for safe measure). Within a week we saw a big improvement in rankings due to properly optimized pages trumping crap and traffic is up 25% since the change and back to trending upwards weekly instead of stagnant, slow decline.
Bit of Advice
The robots.txt disallow works but it is slow to remove the URL’s from Google’s index. I added the meta noindex tag to the search pages a week later and saw much faster results.