This is a guest post by Grosen Fris, SEO at OnlinePartners in Denmark
Google’s hreflang option for international SEO has been available for more than a year now, so we decided it was time to conduct a clinical SEO test to see if it works as promised.
In addition to testing if Google’s hreflang option has an effect on how your web sites’ performs in Google’s country-specific indexes, like e.g. Google.co.uk and Google.dk, we also tested whether hreflang can be combined with canonical in case you have problems with duplicate content on your web sites.
Why test the combination of hreflang and canonical?
Hreflang is very interesting for web sites that have e.g. more or less identical english content spread across different sub domains or country code top-level domains (ccTLD) – e.g. mydomain.co.uk for UK and mydomain.ie for Ireland.
You may get the following advantages, when you e.g. have several web shops each targeting a specific country, despite the fact that their content is almost 100% identical and thus have major problems with duplicate content.
- You can get a better country-specific representation in Google’s search results, which many users no doubt appreciate. E.g. you get mydomain.co.uk to appear in search results in Google.co.uk instead of mydomain.com
- You also let Google help you send the user to the most relevant web shop and this way you increase the likelihood that the user immediately sees the most relevant currency and price. You also let Google help you send the user to the web shop from where delivery is possible. Imagine you have a webshop on mydomain.com and mydomain.co.uk and let’s assume that it is mydomain.com that appears in Google’s Google.co.uk search results. This would send the user to a web shop that might show the user the wrong currency and price, and perhaps shipment to the UK is not possible from mydomain.com. Here you might need special features on each web shop that tries to detect where in the world the user is located based on e.g. his/her IP and redirect him/her from e.g. mydomain.com to mydomain.co.uk
We also wanted to test hreflang in combination with canonical because Google on the one hand states that you should do so if you have problems with duplicate content, on the other hand we have also spoken with many SEO’s who were not sure about this.
However, it does make sense to be able to combine hreflang and canonical.
- If you have domains with unique content targeted different countries, then you do not need canonical. Here you only need hreflang that gives you the opportunity to tell Google how all your various domains are linked together across many countries.
- If you on the other hand have identical content in the same language across multiple domains targeted different countries where they speak the same language, then it makes perfect sense to combine hreflang and canonical.
Test conducted on .com domain and related sub domains
We have used the following (sub)domains to conduct this test, and we encourage all to take a look at how they are set up.
Structure of a test web site:
When you look at a single test site, none of the pages have duplicate content, this is ensured due to the use of gibberish english – i.e. english words automatically and randomly selected for each page. However if you compare each test web site you will see that they are 100% identical across the four test (sub)domains..
Each test web site is set up as follows.
- 5 levels:
- Home page
- Below the home page there are 3 levels and each has 9 sub-pages
- 5th and lowest level consists of link-out-pages
- The test web sites reside on 1 main .com domain and 3 related sub domains
- Hosted on an IP address related to Denmark (126.96.36.199) Test yourself via ipligence.com/geolocation
- The only link building made for the test web sites are from web sites related to Denmark
- We deliberately chose to use sub domains instead of ccTLD’s as ccTLD’s themselves give Google a strong signal of target country and language, that is not the case for a .com domain and related sub domains
- Since the site: command seems to be phased out by Google, it does not give you a good overview of the indexing of the test web sites, so we decided to submit all 4 test web sites to the same Google Webmaster Tools (GWT) account. We did not use GWT to “cheat” by setting a target country for each test web site inside GWT We only used GWT to monitor the indexing of each test web site.
Structure of and content on a page
Each page contains the following:
- Meta description
- Hreflang og canonical
- Main headline wrapped in <h1> tag
- Sub headline wrapped in <h2> tag
- 1-3 paragraphs wrapped in <p> tag
- Navigation and outgoing links
Configuration of hreflang and canonical on a page
The configuration of hreflang and canonical on a page is as follows
<link rel="alternate" hreflang="en" href="http://href-lang.com/chordospartium-pane.html" />
<link rel="alternate" hreflang="en-ie" href="http://ie.href-lang.com/chordospartium-pane.html" />
<link rel="alternate" hreflang="en-au" href="http://au.href-lang.com/chordospartium-pane.html" />
<link rel="alternate" hreflang="en-gb" href="http://uk.href-lang.com/chordospartium-pane.html" />
<link rel="canonical" href="http://href-lang.com/chordospartium-pane.html" />
Here you can see the complete setup of a page – click on image to enlarge (original here)
Google indexing from start until now
We conducted site: searches in Google and we watched the indexing in GWT.
Initially, both the main domain and the sub domains where indexed in Google, but when the sub domains reached up to approx. 80-110 pages being indexed, the indexing stopped and began to roll back. I assume it is because Google’s bot first crawls the pages on the test web sites, and then later another routine is doing analysis of other elements such as hreflang and canonical. Thus Google’s search results do not immediately reflect the use of hreflang and canonical. At this moment where I write this blog post GWT states that is has reviewed approx. 870 of the 901 pages on each sub domain and that there are only approx. 16-31 pages on each sub domain that are still indexed in Google, however we expect that to be fully adjusted in the near future. All in all what we saw in GWT related to the indexing of the 3 sub domains were as we expected.
Unfortunately the two screen dumps below are in danish as it was not possible for me to change the GWT interface from danish to english.
- Blue: Total pages indexed
- Red: Total pages reviewed
- Yellow: Total pages blocked from being indexed (e.g. via robots.txt)
- Purple: Total pages removed
Click on image to enlarge (original here)
However, the indexing of the main domain was a bit of a surprise, the reason is that due to the use of hreflang and canonical it seems as if GWT perceived the 4 test web sites as one single web site. The 4 test web sites consists of 4 x 901 pages = 3,604 pages, and as this blog post is being written GWT states that 4,409 pages have been crawled and reviewed. That is 800 pages more than actually exists on the 4 test web sites and I have no immediate idea why GWT is so inaccurate on this specific number?
Click on image to enlarge (original here)
Below is a list of how many pages Google so far has reviewed for each test web site, the maximum number of pages that have been indexed and how many pages is currently indexed in Google.
We have conducted tests in Google’s country-specific indexes via both real people and tools:
- Manual tests carried out by kind people in the SEO industry who are based on relevant geo-IP’s (Australia, United Kingdom and USA)
- Via manual tests through VPN / proxy that is based is a relevant country (Canada)
- Software that measures the positions of a (sub)domain on selected keywords in specific Google country-indexes
The following search phrases were tested in Google’s different country-specific indexes. Please try for yourself by copy/paste the search phrases from the fields below and try them in Google (consider including the double quotation marks as this makes a test search in Google more accurate).
All test search phrases showed the expected (sub)domains in Googles search results:
- Can you use Google hreflang to international SEO? Yes
- If you have problems with duplicate content, should you then combine hreflang with canonical? Yes
- If you do NOT have problems with duplicate content, should you then also combine hreflang with canonical? No
Finally I should like to say that earlier it was not a good idea to let the three sub domains or equivalent ccTLD’s be indexed in Google, because of the problems with double content. At the same time it would be almost impossible to get other than the main .com domain to appear in all search results, even when searching in Google’s country-specific indexes. But thanks to hreflang and canonical, this is now possible.
Please beware that we also present the results from this test in this YouTube video