The following post discusses what links might still pass value when the Penguin 2.0 update hits the SERPS (or Penguin 4 as some have dubbed this update).
Google is a system that is still unofficially in Beta albeit, perhaps coming properly of age soon when it finally tunes its algorithm to show the fairest results. In 10 years time we will look back and laugh (or cry) at how a private business could get away with controlling the Worlds internet with an incomplete algorithm.
The PageRank aspect of the algorithm is admittedly well intentioned and is based on the offline citations model used by academics and writers.
There is talk in high places about this model being used strictly by Google in the near future. In this post I’m going to explore why dialling up the algorithm to ensure all links adhere to this model could be a disaster for Google and what links I think Penguin 2.0 will reward but first I want to look at how things stand now.
Google left the door open…but it’s closing fast
Many people in the SEO world talk about links being classed as votes from one page to another but what does that really mean and do Google really see links like that too?
Let’s look at how easy it was to game Google just a year or so ago (and I’m sure some Black Hatters will still say that is the case now). By taking a piece of content and spinning it into a thousand unique but often less legible versions and then submitting to multiple article directories, blog networks and other sources you could get a page ranked in many verticals. Other factors required included relevant anchor text and a spattering of related keywords in titles and body copy and for many verticals as long as the site matched up in other areas (too many factors to list) it was job done.
What does that say about the effectiveness of the pre 2012 PageRank algorithm as a voting system? It says that although all links weren’t equal any link was counted as a vote regardless of where it came from; hence the reason it was so easy to game – link volume and keyword rich anchor text all the way!
At this point are you starting to see how irresponsible Google could be regarded when it comes to looking after the interests of web sites that play fair?
Fortunately things have been changing and fast. Since Penguin 1.0 (April 2012) in order for a link to pass value to another page on another site the quality parameters have changed . Certainly there have been unnatural link warnings flying about so Google now demonstrates that it recognises crap links and blog networks. This has prompted mass tidying up of link profiles and blog network exoduses (understandably).
However, here’s an argument that says that Google might not have got link quality quite nailed quite yet; many SEO’s talk about the percentage of anchor text that is causing a penalty to be triggered (for example at 60%). If this is true then it shows that even the much feared quality focused Penguin was or is flawed, and why? Because how about if we had 54% dodgy anchor text, does that mean my pile of directory links with keyword anchor text are all ok? In some cases you would still get the message of link death as the 60% isn’t a fixed cut off point but still it begs the question.
The next wave of link murder?
Clearly Google is getting its act together now so let’s assume Penguin 2.0 (or Penguin 4) is going to be more ruthless.
Eric Enge in his recent post here talks about the possibility of Google getting much closer to the citations model it originally introduced in its PageRank thesis and that any deviation is a problem. I’d like to explore that idea a bit more.
Let’s look at what a citation is. The definitions are:
A quotation from or reference to a book, paper, or author, esp. in a scholarly work.
A mention of a praiseworthy act or achievement in an official report, esp. that of a member of the armed forces in wartime.
For the purpose of acknowledging the relevance of the works of others to the topic of discussion
To uphold intellectual honesty
To attribute prior or unoriginal work and ideas to the correct sources
To allow the reader to determine independently whether the referenced material supports the author’s argument in the claimed way
To help the reader gauge the strength and validity of the material the author has used
So if citation style links are pure gold what about the rest?
How does the above definition define the many millions of hyperlinks all over the internet? How many of them are actually strictly speaking ‘citations’?
If a webpage about Alsatian dogs links to a local kennels it happens to like in the area how does that fit into the above definitions? Is that link still a vote? Will it still pass any power to the kennels web site? Will it be considered a Penguin friendly link?
Based on this thinking it seems not and this is why I think that Google will deal with links like this – which make up a vast proportion of the links online – in a different way (let’s call them ‘non-citation’ links) and still assign some value. Here’s why I think that they can’t just dump millions of links into the spam bucket.
If Google pulls the plug on anything that is not a citation then how will the search results look?
The answer to this question really depends on how much Google allows the PageRank algorithm to affect their search engine results pages. There are of course many other factors Google must take into account when ranking web pages, however, take the following scenario example on board I’m going to use to illustrate my point.
A guy called Tim works for an insurance firm in a tiny town in Yorkshire, England, let’s call them www.Smithstinyinsuranceco.co.uk. Tim is insane about all things insurance and decides to write the ultimate guide to insurance. Jill is a Dr lecturing in finance at Stanford University and decides to link to Tim’s article from the Faculty’s blog. This blog post get shared socially in the intellectual community, linked to from a few more University web pages that discuss insurance and then finally due to this gets a link from the insurance page on Wikipedia.
Tim’s web page has amassed genuine citations. In comparison over the first ten years of Google’s life the insurance sites ranking for the keyword insurance have amassed thousands of crap links (that have flown under the radar and helped them rank). Google then introduces its new strict ‘citation’ based Penguin 2.0 PageRank algorithm update in 2013 and boom Tim’s site ranks number on for insurance (happy boss).
In reality there are too many other factors that affect searchengine results but if hyperlinks are the foundation then changing things drastically could rock the boat a bit too much for the end user. Ultimately my Nan doesn’t care about link manipulation, she just wants to get the best result when she searches for bingo offers, not a ‘well linked to’ scientific paper about bingo playing statistics!
So how could Google evaluate our more commonly found ‘non-citation’ style hyperlinks?
The question to ask is why would a link from one site to another, that is not a ‘citation’, be of any use to Google in determining the value of a linked to page?
Let’s consider are these valid reasons to assign a value to a link?
1. The owner of the sites recommends the service/product
Google’s answer: So what. Unless your site is trusted we don’t care what product/services you recommend
2. The owner of the site thinks that the article is useful further reading
Google’s answer: So what. Unless we trust your opinion we don’t care what you think is further reading
3. The site/page/link is relevant to the one it is linking to
Google answer: So what. Unless the site is trusted we don’t care if it is relevant
Basically the theme running through the above reasons for adding value to a link is trust. So how can trust be generated to create ‘non-citation’ based link value?
If you have considered the above Google responses you might already be thinking how content validation and site authority are going to play an important role in assigning value to links. This is why the Google+ authorship strategy is so important to Google and was even openly discussed by former Google CEO Eric Schmidt .
This means that even if a link is in content on an unknown blog it can be verified and valued via the G+ profile and assigned a fair value. Maybe even some of the authors ‘juice’ (sounds iffy) could be allocated to the domain authority/PR? This would make authors a valuable commercial resource in more ways than they might have imagined but that’s a blog post for another time.
Below is my shot at how links might be rated
Relevance relevance relevance…
As for relevance, everyone is bashing on about it and of course link relevance is important but it is understanding ‘relevance’ that is important. Relevance is defined as “the condition of being relevant, or connected with the matter at hand”.
Take this example scenario. How would a machine based algorithm assign relevance based on the above definition?
Site name: www.teenagersaregreat.com
Article title: 20 things to do before you hit 20
Links to: www.learntodrivenow.co.uk
Is the site relevant to the linked site? No.
Is the article relevant to the linked site? No (or at least on the surface appears not to be).
Is the link relevant and useful to the reader? Yes! The article lists learning to drive as a thing to do before hitting 20. A teenager could find the linked page useful so it has relevance.
My point here is that Google is smart and is able to spot what humans might not on the surface see as relevant, Bill Slawski’s excellent article does a great job of outlining how this might work.
One of Google’s search quality team was quoted as saying…
“…getting a link from a high PR page used to always be valuable, today it’s more the relevance of the site’s theme in regards to yours, relevance is the new PR.”
I have to agree a link from a linking high authority, on-topic site within a tightly related article is going to be powerful but it doesn’t mean you should write off the rest. For example, how about a link from a PageRank 9 page that is totally off topic but the link is relevant within the context of the article? I’ll take that over a PR0 relevant link from a relevant article all day long.
So to round up
Citations are likely to be officially crowned as the most powerful links on the web when the penguin pecks again (no surprises there then as they probably already are).
Our regular ‘non-citation’ based hyperlinks will continue to pass power but with contextual relevance, authority of site and author and social visibility used as their primary weighting factors.
Links with no authority and no author authority will pass nothing regardless of being relevant or not.
I think the only saviour for small sites with lesser known authors and no authority will be social visibility and engagement metrics but ultimately well engaged sites will build the other metrics anyway.