Penguin recovery by using 301 redirects and robots.txt?

I came across something very interesting on a Google Webmaster Central Office Hours video today. A participant asked John Mueller whether it was possible to remove bad links to your site by doing some tricky redirect work. You can listen to the question at the 18:28 mark of this video. I have also transcribed the important parts below:

Office Hours participant: John, is there another way around that?...Could you 301 your homepage to another page and that page is blocked by robots.txt and that page then redirects your website to /home.php?

John Mueller: You could theoretically do that. What would happen there is the pagerank from those incoming links would essentially remain on the robotted page. So, if that robotted page is within your website then essentially those problematic links...the signals from those links would remain within your website. So, that’s kind of a tricky situation when it comes to the home page. In a case like that it might make more sense to just move to a different domain completely. But, it’s something where if at all possible I’d recommend trying to stick to deleting those links, changing them or adding them to your disavow file.

Let's break down the scenario here. Let's say I have a site that has been affected by the Penguin algorithm and, as is often the case, the majority of the bad links are pointing towards to home page. Ideally, to recover, I would need to remove as many bad links as possible and disavow the remaining ones. (It's still up for debate whether disavowing is just as good as removing. I'll give my thoughts on that in another article soon.) However, if you were willing to give up ALL of your links to your home page then this is what is being suggested:

301 redirect the home page to another page on my site. If I'm understanding this correctly, then what I could do is 301 redirect my home page to a page like example.com/temp.html. I could do that by adding the following to my .htaccess:
```
Redirect /index.html http://www.example.com/temp.html
```
Note: I am no expert in .htaccess. If you can suggest better examples, please do so in the comments and I'll give you credit!

What this should do is a few things:
-When a user lands on the home page they will redirect to example.com/temp.html
-Google will attribute all of the link equity that was formerly given to the homepage to example.com/temp.html.
-Google will transfer any link penalty or link related devaluation (i.e Penguin) to example.com/temp.html.
Block /example.com via robots.txt. To do this you can add the following to your robots.txt file:
```
User-agent: *
Disallow: /temp.html
```
The idea of this is to tell Google not to crawl this page any more.

Redirect /temp.html to a new page such as /home.html

Redirect /temp.html http://www.example.com/home.html

Will this work to remove the bad links to your site and help you escape Penguin?

Unless, there's something I am missing, I can't see how this will work. The reason for this is that according to this interview with Matt Cutts, pages that are blocked by robots.txt can still gather PageRank. Matt says,

Now, robots.txt says you are not allowed to crawl a page, and Google therefore does not crawl pages that are forbidden in robots.txt. However, they can accrue PageRank, and they can be returned in our search results.

OK. So, our /temp.html page can accrue PageRank, but will it pass PageRank (and therefore pass the bad links) if it is redirected to /home.html? Or is it possible that a page that is blocked by robots.txt does not pass PageRank? I could not find any evidence of this. (Added: See the end of the article - apparently it is possible after all.) It's not exactly the same, but in the same Matt Cutts article he mentions that noindexed pages can pass PageRank. My gut instinct is that pages blocked by robots will pass pagerank too.

Then I wondered if perhaps you could follow the above steps, but put a meta index nofollow tag on the /temp.html page so that links pointing to it would not pass pagerank to the rest of the site and in particular, to /home.html. I still think that this will not work to avert a Penguin hit. The reason for this is that the Penguin algorithm will likely still see the bad links that are pointing to your site, but it will see them as pointing to a single page (/temp.html). This would isolate your "bad link juice" to just one page of your site. When Penguin first rolled out I really thought that it was an algorithm that affected a site on a page by page level. If this were true, then you would find that that one particular page would not rank well. But, this actually isn't the case. In several Webmaster Central Hangouts John Mueller calls the Penguin algorithm a sitewide algorithm. At the 26:20 mark of this Webmaster Hangout I asked John Mueller if Penguin affected just certain keywords or pages:

Here is what John says:

Penguin is generally a sitewide algorithm, so it's not something that would be affecting specific keywords or specific pages within your website.

He makes a similar comment in this video at the 25:29 mark:

If our algorithms determine that your website is problematic, then we're looking at your whole website and treating it as being problematic. It's not tied to specific links...it's more done on a website basis.

John said that if you have a site that the Penguin algorithm is looking at unfavorably, then if you don't clean up the link profile, then moving forward will be like driving the car with the handbrake on, or having an anchor that is pulling you down.

My point in saying all of this is that the bad links will still be pointing at your site. Even though the page they are pointing at is blocked by robots.txt, they will still accrue PageRank which means that the Penguin issues will still be there. And, the Penguin issue will affect your whole site, not just that one page.

I believe this is what John meant when he said in the original video that I mentioned: "those problematic links…the signals from those links would remain within your website." Doing this robots.txt and redirect trickery really wouldn't cause the Penguin issue to go away.

Am I missing something?

This was a complicated post and kudos to you if you followed my train of thought. I thought it was an interesting idea, but I just can't see how it would work. But, am I missing something? If you can see a way that you can escape from Penguin by using a variation of the techniques below then let me know in the comments. I still think that the only answer to Penguin recovery is a thorough pruning of bad links and the presence of a site that is able to attract links naturally.

Added later:

Thank you to Jeff McRitchie, who pointed out this line from the Google quality guidelines under the topic of "link schemes":

Note that PPC (pay-per-click) advertising links that don’t pass PageRank to the buyer of the ad do not violate our guidelines. You can prevent PageRank from passing in several ways, such as:

-Adding a rel="nofollow" attribute to the <a> tag
-Redirecting the links to an intermediate page that is blocked from search engines with a robots.txt file

That confuses matters a little bit, doesn't it? So this means that you can block the PageRank from flowing to the rest of your site by blocking an intermediate page with robots.txt. However, we still know that pages blocked by robots.txt can accrue PageRank even though they wouldn't flow it to the rest of the site. So, to me this would still mean that the "bad equity" from those unnatural links would still be affecting your site and thus, Penguin would still be an issue.

I think Alan Bleiweiss summed things up nicely with his tweet tonight:

@stonetemple @seosmarty @Marie_Haynes @jimboykin sigh. Tricks are not a best practice. Redirecting home page? Crazy. Nonsensical.

— Alan Bleiweiss (@AlanBleiweiss) October 30, 2013

Good question and I don’t know the answer to this 100% but I likely would not try it. If you use the exact same content on the non www as the www then Google will attribute the old links to the new site and say, “via this intermediate link”. I’m not sure if that transfers penalty issues as well but I wouldn’t chance it.

I also think that it’s possible that Google can recognize that the www and non www of a site are really the same site. If this is the case then the distrust issues that come with Penguin would still be on your site. Again, it’s possible that it could work but my gut says that Penguin would be still affecting the site.

And finally, you would lose any new link equity if people accidentally linked to the www rather than the non-www. You’d have to make sure that all new links came to the non-www and that’s hard to do if you are attracting links naturally.

Comments

What about if all the problematic links go to the http://www.domain.com version of your site?, since www. is a subdomain could you just 404 on http://www.domain.com and use domain.com version instead?

Yes it wouldn’t be without it’s problems even if it did work, but might be something interesting to try if all else fails though..

Another thing I have been thinking about is would it be better for most new sites (if they plan on doing anything even slightly manipulative) to use domain.com/home as their homepage?, then if there are link related algorithm problems in the future they can 404 /home and use domain.com and any natural links may well have gone to the “domain.com” anyway which would be kept.

I think that this is something I would consider if I was working on a site where I was building manipulative links on purpose. But, I think that Google is getting better and better at detecting which links are natural, so for me, I want to really just concentrate on earning links.

Can you recover from Penguin by 301 redirecting your homepage to a page blocked by robots.txt?

Will this work to remove the bad links to your site and help you escape Penguin?

Am I missing something?

Added later:

Dr. Marie Haynes

Comments