I've just finished reading a pdf I discovered via Barry Schwartz at Search Engine Round Table. The article was briefly discussing a leaked copy of the Google Quality Guidelines. It is a fascinating pdf to read. It was originally found by PotPieGirl. According to Barry, the document is copyrighted and it likely won't be available online for long. But, you may still be able to catch it here.
In case you aren't able to find the article, or if you simply don't want to read all 125 pages, I thought I'd summarize some of the paragraphs on webspam that I found to be interesting. I'm not sure why these guidelines were written, but see the end of my article for a slight conspiracy theory of mine.
What is spam?
The guide states that anything that is intended to trick search engines and draw in users is webspam. Having a page that is junky looking isn't enough to have it called spam. There has to be some deception present. Spam pages generally have very little content that is useful to users.
How to detect spam:
The guideline stated that someone checking for spam could use Ctrl-A or Command-A to highlight any hidden text. It really isn't a surprise though to know that a page with hidden text is considered spam. With that being said, though, be careful in how you program your pages. I wonder if perhaps there are instances where you can have hidden text without malicious intent, but still be dinged for being a potential spammer?
The guide instructs the reader to use a Firefox Addon which disables CSS and Javascript on a page. They can then see if there is a difference in the content. So, if a page uses CSS or Javascript to hide content then this could be considered spammy. This made me concerned. I wonder if this is a factor in the Google algorithm? I can think of many instances where it is ok to hide text. For example, on one of my sites I have several quizzes that use jquery hide() in order to display only one question at a time.
Keyword Stuffing
The reader is also instructed to look at the source code to try to detect unusual amounts of keyword stuffing. I found this quote fascinating:
URLs may also contain keyword stuffing. These URLs are computer-generated based on the words in the query and are often formatted with many hyphens (dashes) in them. They are a strong spam signal.
I have a site that currently seems to be taking a Panda hit. It has thousands of pages with good content. However, the urls are long strings of hyphenated words, many of which are page keywords. My intent wasn't to spam, but I wonder if the Google algorithm has considered this keyword stuffing?
Sneaky Redirects
If a page redirects to another with spam intent it's considered sneaky. If you have urls that redirect through several pages then this can be a spam signal. If a page redirects to a well known page like Amazon it can be considered sneaky. I'm assuming that this is because the site may place an affiliate cookie along with the redirect?
Similarly, cloaking is a no-no. If a page is cloaked with javascript or a 100% iframe so that the user sees a different page than a search engine would then this is obvious spam.
Duplicate Content
Copied content was identified as spam.
A page that contains only RSS feeds and PPC ads was considered spam. A page that contains content copied from Wikipedia or DMOZ and is surrounded by ads is spam. A page that looks like a search page but only contains PPC ads is considered spam. However, if you have lyrics, quotes, or poetry that is duplicated elsewhere on the web you're likely not going to be considered a spammer unless that content is surrounded by ads.
Conclusion of the article:
The article concluded by saying that if a page looked like it existed mostly to make money then it was likely spam. If you removed all of the PPC and duplicate content and there wasn't much left, then the page is likely spam.
My thoughts:
I think that the "leaking" of this article is a brilliant sneaky plan by Google. What Google wants is for webmasters to produce the best content possible. After reading this article, I see that I have some things on some of my sites that could possibly be picked up by a Google algorithm as spammy. I plan to be making some changes ASAP. And as such, we both win. Google has found a way to convince me to change, and I have a site that ends up getting more visitors.
I'd love to hear your thoughts on this leaked article!
Comments
Thank you for sharing this information. What bothers me is how subjective this is. I know of many websites which use dashes in the URL so that the name of the website is clearer. But, it could happen that a rater may just dismiss the site as spammy when in fact it is not. And I also know as a fact that people are told to try and include some keywords in the Name of the website. So now that this appears to be changing it’s no wonder that many sites are going down in ranking. The other thing that is bothersome is that one mistake could cause your whole site to go down. That’s pretty tough. So I’m thinking that the alternative is to market your website as best you can with social media. Drive your own traffic. I think that google in its efforts to satisfy users is being too finicky. In many subjects I search I still see junky stuff and sketchy(meaning so few related pages) at the top of the engines. What’s up with that? I know people who have over 200 pages on one subject and still can’t rank. What’s your thought on the number of pages for the ideal site? Also, do you think its possible that the algorithm is rating the subjects of the site. For instance, you don’t need to be a rocket scientist to see that the web is inundated with sites on Self Help. The larger Self Help sites which covers everything but the “dog”(LOL) are at the top. And, they stay on the top. So should one even attempt to do a site just on one topic?
Any ideas on the best site subjects in this type of playing field?
Great comments Gail!
I think it’s important to note that these guidelines are just guidelines. I believe they were created in order to help Google ranks sites manually and then see if they can create an algorithm that produces the same results.
I’m undecided about the whole URL length thing. My thought is that Google would treat a crazy long url as potential spam, but likely a few words isn’t going to hurt. For example, the url of this post ends in google-quality-rating-guidelines. I think that’s ok. But if I had called it something/google-ranking-rating-leaked-document-guidelines-2011-seo-webspam-website-quality-etc-etc, then it would be more obvious that I was just trying to stuff as many keywords as possible into the url.
I’m going to do an experiment with my url that has a couple of thousand long keyword rich urls. I’m going to use a canonical tag to convert them to something much shorter and then measure my results.
I don’t think there is a limit on how many pages a site should have. If you have 200 pages on a subject but you’re not ranking then you need to figure out whether you’ve got a site wide penalty or perhaps this is really thin content. There are lots of variables!
Thank you for sharing this information. What bothers me is how subjective this is. I know of many websites which use dashes in the URL so that the name of the website is clearer. But, it could happen that a rater may just dismiss the site as spammy when in fact it is not. And I also know as a fact that people are told to try and include some keywords in the Name of the website. So now that this appears to be changing it’s no wonder that many sites are going down in ranking. The other thing that is bothersome is that one mistake could cause your whole site to go down. That’s pretty tough. So I’m thinking that the alternative is to market your website as best you can with social media. Drive your own traffic. I think that google in its efforts to satisfy users is being too finicky. In many subjects I search I still see junky stuff and sketchy(meaning so few related pages) at the top of the engines. What’s up with that? I know people who have over 200 pages on one subject and still can’t rank. What’s your thought on the number of pages for the ideal site? Also, do you think its possible that the algorithm is rating the subjects of the site. For instance, you don’t need to be a rocket scientist to see that the web is inundated with sites on Self Help. The larger Self Help sites which covers everything but the “dog”(LOL) are at the top. And, they stay on the top. So should one even attempt to do a site just on one topic?
Any ideas on the best site subjects in this type of playing field?
Great comments Gail!
I think it’s important to note that these guidelines are just guidelines. I believe they were created in order to help Google ranks sites manually and then see if they can create an algorithm that produces the same results.
I’m undecided about the whole URL length thing. My thought is that Google would treat a crazy long url as potential spam, but likely a few words isn’t going to hurt. For example, the url of this post ends in google-quality-rating-guidelines. I think that’s ok. But if I had called it something/google-ranking-rating-leaked-document-guidelines-2011-seo-webspam-website-quality-etc-etc, then it would be more obvious that I was just trying to stuff as many keywords as possible into the url.
I’m going to do an experiment with my url that has a couple of thousand long keyword rich urls. I’m going to use a canonical tag to convert them to something much shorter and then measure my results.
I don’t think there is a limit on how many pages a site should have. If you have 200 pages on a subject but you’re not ranking then you need to figure out whether you’ve got a site wide penalty or perhaps this is really thin content. There are lots of variables!
What Google wants us to do is stop selling stuff and leave that to their Adwords and Adsense so they can make more money. A “quality positive user experience,” my donkey. Oddly enough, the “best webmasters possible,” just happens to come from their biggest advertisers!
What Google wants us to do is stop selling stuff and leave that to their Adwords and Adsense so they can make more money. A “quality positive user experience,” my donkey. Oddly enough, the “best webmasters possible,” just happens to come from their biggest advertisers!
thanks Doc….I see you’re up at the ‘forefront’ once again….hope thing’s are good for you and yours! 🙂
Thanks Jim! Still having a blast learning, applying and succeeding! Hope you are well too!
Aww, it’s a reunion here, hey Doc!
😉
-JoshZ
Hey Josh! LTNS! I should pop back in to “that other place” soon! I miss you guys!
thanks Doc….I see you’re up at the ‘forefront’ once again….hope thing’s are good for you and yours! 🙂
Thanks Jim! Still having a blast learning, applying and succeeding! Hope you are well too!
You wrote :”…I think that the “leaking” of this article is a brilliant sneaky plan by Google…”
and “… I’m not sure why these guidelines were written, but see the end of my article for a slight conspiracy theory of mine…”
please stop these kind of theories.
All quality raters going to train for the exam will receive this Handbook.
If you don’t read this manual, you will not succeed to the exam.
As simple as this !
Sorry to say that there are not complots neither any big GG communication campaign…
You wrote :”…I think that the “leaking” of this article is a brilliant sneaky plan by Google…”
and “… I’m not sure why these guidelines were written, but see the end of my article for a slight conspiracy theory of mine…”
please stop these kind of theories.
All quality raters going to train for the exam will receive this Handbook.
If you don’t read this manual, you will not succeed to the exam.
As simple as this !
Sorry to say that there are not complots neither any big GG communication campaign…