This is a Guest post by Peter van der Graaf. Peter is a big fan of behavioral psychology and mathematics. He mainly helps his clients with their internal SEO evangelism, link building strategies and international SEO effort. The scale in which he and his partners perform search algorithm tests has the potential to give great insights.
I was surprised to find out how much is written about the Google Panda update and how little information is shared so far about what is really happening.
Machine learning
Google’s algorithm on the characteristics of unnatural pages is periodically updated by a machine learning background job. This means it is not a live algorithm! The much reported Panda versions 1.0 to 2.5 are algorithm changes which are first calculated on a training dataset and combined with the existing learnings they are exported to the live Google environment as more static algorithm tests.
This means that while bounce-rate (in this case: visitors returning to search results quickly) isn’t used as a direct ranking factor, it is used to teach the Panda new tricks. Signals like bounce rate are fed as bamboo to the Panda background system with the instruction to find out what patterns can be derived from characteristics that form thin content, unnatural text and excessive on-page advertising. The system picks various combinations of attributes combined to get a high degree of certainty for someone’s spammy activities.
For those familiar with “distributed tree learning”, look up the works of Google engineer Biswanath Panda. After whom the Panda update was named. He will explain how continuously splitting sites into groups with similar attribute values helps you afterwards derive which attributes effected a certain outcome (like high bounce-rate) the most. It also gives some indication of the thresholds to be used and it can signal when false positives or negatives are likely to occur.
If Panda will ever become a live (continuously updated) algorithm remains to be seen. It can even be that the derived tests become so effective, that no further updates are required.
Steep or sloping threshold?
Because Panda consists of large combinations of factors it seems to be more certain of its outcome. While existing algorithms for unnatural behavior used a sloping threshold in which the increasing evidence pushed you gradually towards lower ranking, Panda currently uses a more thorough approach.
Gradually increasing the degree of unnatural text maintained existing ranking for quite some time, but eventually resulted in a steep drop in ranking for all tested websites. Individual elements within the algorithm for thin content are hard to reverse engineer, but once you cross a certain point you are sure to be hit. Because signals are inspected in combinations that include link value attributes, not every site has the same threshold.
You might even argue that Panda has replaced a previous algorithm that had a sloping threshold, because many sites with thin content below the Panda threshold have returned in top-10 positions.
Domain, section or page based effect?
Panda affects large amounts of pages within the same domain. It doesn’t target long-tail keywords, but pages with these keywords tend to be in sections with many pages that have low quality content.
Sections of pages can be grouped by many factors like block element buildup. Once a threshold within these pages is reached, all pages in the section are affected, including ones with a slightly higher quality.
Once you have been hit, recovering requires more effort than just increasing quality below the threshold again. Changing domain however (including 301-redirect) seems to return your ranking if you barely stay below it. Just changing URLs within the same domain doesn’t seem to have this effect.
Solution against Panda plagues
Sites with large amounts of pages below a quality threshold are targeted by Panda. When you use sentences in which you only replace a couple of keywords compared to other pages; If you have a lot of content from other websites; If you make a lot of spelling or gramatical errors; And when you have excessive ads on your page be prepared for Panda claws. Assuring quality for all pages might be hard, but make sure you do this for all pages that are important for your visitors and for Google. All pages below a logical quality should be removed or excluded from the Google index (canonical tag/noindex/etc).
Pages with sentences like “no results found for [keyword]” are often crawlable by Google. Misconceptions of malintent like this should also be taken into account.
If that doesn’t work, you can always build a Panda trap.
Hopefully this article has clarified some misconceptions. Note that this is the consensus of many search experts and represents the supposed current situation. If there is any proof to refute this article, please comment. We’re all more than willing to learn.
Posted in Google, Guest post, Technical SEO | Tags: Google, Panda Update, Technical SEO


Great stuff Peter, thanks for sharing your insights. Probably the most accurate description of what Panda really is that’s been published on the interwebz so far.
[...] Why content for SEO?Brafton (blog)Was the Google Panda 2.5 Panic Warranted?Search Engine WatchPanda in detailState of SearchWebProNews -TMC Net -Econsultancy (blog)all 16 news [...]
So, the supposition is that on and offpage (that is to say inlinks) links and link based metrics play no role in Panda? Is it just correlation then that many of the sites pandalized also seem to have spammy backlinks?
twitter: @joshbachynski
Signals found by the Panda background system definately include link profile signatures. I tried affecting various different websites by increasing unnatural content at a steady pase. All sites that were affected only after an enormous amount of spam had one thing in common compared to ones that were effected early on: They had a much more authoritive incoming link profile.
I wouldn’t say that it proves anything, but “links have an effect on the threshold” is definately a conclusion you could derive from various tests. Anyone with similar or contradicting evidence?
great post. Biswanath’s paper should be required reading for any SEO trying to understand the Panda algo. I think it also highlights how easy it is for ‘innocent’ sites to get nailed by the algo and how the recent change to ‘sessions’ will bring about the panda penalty.
[...] DNA: Algorithm Tests on the Google Panda Update – Search Engine Watch (#SEW) Panda in detail – Google, Guest post, Technical SEO – State of Search Last edited by ramchip; Today at 01:02 AM. leporello likes this. My Gambling Affiliate [...]
[...] is not as bad as it sounds at first. You could say it is a logical step to take in the light of the Panda Update last year. This change is mainly aimed at sites that have a high number of ads on the page in [...]
Panda has gotten more sophisticated and they’ve added Penguin to it. Now it’s extremely harsh for the newcomers especially. Breaking into the search engine market is harder than ever.