Search Engines & SEO Blog

OpenLinkGraph: Functions & Capabilities

I want to start out the OpenLinkGraph introduction by showing you the current web-frontend. When you type-in the domain you want to query, we will show you a summary-page for that domain. Here, we want to give you a quick but informative outline of the domain's backlink-profile. We want you to be able to get a feel for the domain in just a few seconds. Up top, we have the most important metrics: total amount of all links found as well as the respective amount of different host-names, domains, IP-addresses as well as networks they link to. With some experience, you will be able to use these values as well as their relationship to each other, to quickly get a basic idea of where to place the domain. Further down, we will then show the domain's best-linked-to URLs as well as the most commonly used linktexts. These, as well as the graphs showing the link-source-countries and Top-Level-Domains, have their own separate pages where you can sort them freely and get additional information for each.

SISTRIX OpenLinkGraphThe most powerful section can be reached through the “Links”-tab: here, we list all the links we could find for the domain. Seeing how this usually leads to extremely long and confusing lists, we integrated a number of interesting filter-options. You are also able to precisely choose and filter the link-source as well as the link-target, each on its own page. This means that it takes only seconds to get results for queries such as “show all links for sistrix.de, that are connected to a URL on the www.seomoz.org/blog directory”. That is not to say that the other filter-options don't pack a punch themselves: a query for link-type (text-link, picture-link, 301, 302, canonical or meta-refresh), for example, can often bring to light some interesting insights into linkbuilding-strategies. Searching for link-text can expose risky link-buys, while filtering results by follow/nofollow, Top-Level-Domain as well as country-of-origin complete this section. It starts to get really interesting once you combine these filters and jointly use them. We designed the system in a way that you are able to use any number of filters at the same time. We will then sort all the links found through those filter-options and return them to you.

The other tabs in the OpenLinkGraph beta will show you summary evaluations on the hostnames for a domain that is being linked to, single URLs, link-texts, Top-Level-Domains as well as for countries. For all of these evaluations, we will show the entire amount of links returned, as well as the host-, domain-, IP- as well as net-pop. The returned lists are also sortable by these 5 criteria. So much for this short introduction into the current state of the OpenLinkGraph. The next blogpost will be about the size of the index as well as a comparison to other services.

Johannes Beus - 19.09.2011 15:16


OpenLinkGraph: the SISTRIX Link-Index

SISTRIX OpenLinkGraphIt has been nearly two years since we started out with gathering ideas and first drafts and now, we can finally show the first fruits of our labor: the SISTRIX OpenLinkGraph private-beta went live this weekend and we have already gotten some valuable feedback from users. The determining factor for developing this tool was the realisation that only our own index, which we crawl and process ourselves, would be able to give us the results we would expect. Additionally, there is the fact that since Microsoft bought Yahoo, they decided to cease operations of their own crawling-ambitions. This means that the main trove of link-data has disappeared, which made developing our own index unavoidable.

What might sound simple at first glance, turned out to be hugely challenging: billions of websites need to be prioritised, crawled and processed. The database needs to spit out the results within seconds. Considering the number of servers supporting such a system, you have some of them break down on a daily basis, which makes it necessary to buffer their impact on the system. As one could imagine, this makes for enough complexity to make it lot of fun.

The result of our work is this platform, which makes it possible to deal with the current ideas and applications, as well as be prepared for future requirements: both the index-size as well as the evaluation-methods will not push the system to any discernible limits, which means we will be able to enjoy it for quite a while. Seeing how an introduction to the OpenLinkGraph would be far too long for one blogposting, I will take the next few days to preview the different parts of the system. For those of you coming to Dmexco this week, you can come by our booth D-69 and get a live preview of our tool as well as take home a beta invitation.

Johannes Beus - 19.09.2011 10:29


SEO-Regulars'-Table Bonn on 09.29.2011

Hello, my name is Hanns Kronenberg and on September 1st, I became part of the SISTRIX-Team. Up until now, I did my blogging at seo-strategie.de, while in the future, I will surely make some blogposts here, too.

It was extremely tempting for me to be able to both work together with Johannes and on projects like the SISTRIX Toolbox and the OpenLinkGraph, enough that I happily traded in my independence as a SEO-Consultant to become part of the larger whole.

I am especially happy to start my work here with organizing the new SEO-Regulars'-Table in Bonn, among other things. The last one is already some month in the past and it really is high time to continue with this treasured tradition. The date we have chosen is the 29th of September 2011 and the Regulars'-Table will start as usual at seven o' clock in the evening.

If you want to sign up, please use this form. We will then send you more detailed information a few days before the actual event. Also please sign-up early enough as there is a limit of 50 attendees.

Hanns Kronenberg - 08.09.2011 11:24


Goodbye, Yahoo SiteExplorer

From tomorrow on, it will be exactly 15 days until Yahoo is expected to shut off all API-access to their backlink-database. Those who have followed Microsofts remarks on the matter over the past couple of month, will have noticed that there are no plans to make this database available again, in the future. This means that the principal source for backlink-data will dissapear overnight. Links have been the #1 ranking-signal since the founding days of Google and that this is not going to change anytime soon, even though other signals like user-data or social networking-data are being used. This leaves us – especially me as the Toolbox-operator – with the question of which sources to use in the future?

It should not come as a surprise that I did extensive research on this subject and tested possible alternatives. On the subject of backlink-data there are two vendors that immediately come to mind: Seomoz and Majesticseo. Seomoz is fiddling arround with their own index since 2008: first using Linkscape, now with the Opensiteexplorer as a display-tool. Majesticseo came to be out of a concept of a decentralized searchengine and has, since they not only kept hording links but also started throwing away old links in a segment they call “fresh index”, become a serious competitor to Seomoz.

A comparison between the two that goes deeper than just a subjective evaluation of how each vendors' graphical features look, is actually harder than first expected. So I decided on the following procedure: I gathered up link-metrics for five packets of five domains each (25 domains in all) with both Seomoz and Majesticseo: on the one hand, I compared the absolute amound of links returned as indicator of the size of the data-pool and depth of the crawl, on the other hand I gathered up the domain-pop, meaning the amount of different domains that link to the target-domain, as an indicator for the breadth and diversity of the data. The domain-packets are each made up of 5 domains, belonging to the largest in Germany, five vertical portals, five pages that have something to do with Bonn, five Amazon country-specific subsidiaries as well as 5 SEO-sites. You can see the chart for all the domains and results in Google Docs:



The results have actually surpised me a bit: while Majesticseo clearly comes out on top as far as the absolute number of links are concerned (17 to 8), we have Seomoz who takes home a clear victory as far as domain-popularity is concerned (18 to 7). The problem with this result is this: neither Majesticseo nor Seomoz deliver backlink-data that have both the needed depth and breadth at the same time. Using a combination of both services is not reasonable/possible thanks to the limited amount of API-access and/or the costs associated with it. This means that the question of whether to “buy or build” was answered rather clearly and we went to work on developing our own solution. But more on that next month...

Johannes Beus - 31.08.2011 17:37


Google Panda reaches Germany

It has taken longer than expected but today, Google implemented their algorithm-changes, collectively known as “Panda”, in Germany as well as in other non-English-speaking markets. Google's goal, similar to those for the US and the UK, is to increase the quality of their searchresults: over the past couple of years, we have seen numerous projects settle down in the grey-area between sites of amazing quality and those that are total crap, where most searchengine users do not believe that they have earned their rankings.

Based on the SISTRIX VisibilityIndex, we calculated the winners as well as losers for the German Google-index for the Panda-update. To do this, we evaluated a large number of searchqueries and compared them to their “pre-Panda” results. We start with those who lost the most in this update:

Sites affected by Google-Panda in Germany
#DomainChangeBeforeAfter
1kelkoo.de-86%15,742,20Domaininfo
2alatest.de-70%9,652,86Domaininfo
3wikio.de-65%20,407,09Domaininfo
4online-artikel.de-63%9,023,31Domaininfo
5webnews.de-62%16,736,35Domaininfo
6suite101.de-59%64,1726,20Domaininfo
7helpster.de-59%60,3424,69Domaininfo
8dasverzeichnis.info-59%18,517,58Domaininfo
9openpr.de-59%5,902,43Domaininfo
10yopi.de-56%57,3625,10Domaininfo
11pressebox.de-56%6,823,00Domaininfo
12informationsarchiv.net-56%7,963,51Domaininfo
13eurip.com-56%5,512,44Domaininfo
14tupalo.com-55%9,924,44Domaininfo
15dooyoo.de-55%221,4499,41Domaininfo
16experto.de-55%33,4315,02Domaininfo
17flix.de-55%9,904,47Domaininfo
18indeed.de-55%9,674,38Domaininfo
19geizkragen.de-55%45,1820,51Domaininfo
20moebel.de-54%7,253,33Domaininfo
21zehn.de-54%8,253,83Domaininfo
22emagister.de-53%5,212,44Domaininfo
23cosmiq.de-52%54,1926,24Domaininfo
24ciao.de-51%399,53197,26Domaininfo
25tecchannel.de-50%14,617,26Domaininfo
26preisvergleich.eu-50%5,382,69Domaininfo
27news.de-50%12,906,49Domaininfo
28preisvergleich.org-49%10,555,37Domaininfo
29whoswho.de-49%19,7210,08Domaininfo
30supportnet.de-49%11,996,14Domaininfo
31hotfrog.de-49%113,0358,13Domaininfo
32twenga.de-48%5,762,97Domaininfo
33frag-einen-anwalt.de-47%16,238,56Domaininfo
34staedte-info.net-47%29,0015,40Domaininfo
35gutefrage.net-46%444,65239,86Domaininfo


This list is not full of huge surprises – for many domains, guesswork was already being done beforehand whether they might get hit. That those domains are now affected should come as a surprise to only a few people. All in all, we can summarize the Panda-update for Germany in that Google has reached its goal of increasing the quality of their index.

Everyone interested in a more extensive list with winners and losers of this first Panda-update in Germany can just send me an e-mail. Toolbox-users will see updated values tomorrow, which will already have a full implementation of the Panda-update. For those wanting to see detailed results, you can sign-up for a free SISTRIX-testaccount HERE, you will then be notified by e-mail once all the numbers are online.

Johannes Beus - 12.08.2011 17:09


Google Plus distribution

It has been more than a month since Google started their Facebook-competitor Google Plus. The media response has been immense, even now we still see people write up comments on the different aspects and background of this new Google service on a daily basis. When I monitor my inbox for messages that Google sends me when someone adds me to their circle, then it seems as though the growth of Google Plus is still picking up speed. In this posting I want to show two evaluations that we did for Google Plus that should help take a closer look at this growth: on the one hand, we started to monitor a multitude of URLs for the amount of votes they received since the start of Google Plus. You can already find this data in the SISTRIX Labs as part of the Top-URLs as well as the weekly winners. Here we have the growth in votes for 3 interesting URLs:


While we are able to see some growth, it is in no way in the same order of magnitude what you would expect for something that has more than 40 million users, at the time of writing, as well as an integration of their button in the Google-SERPs: Google's homepage, which currently holds the number 4 spot of most often “plussed” URLs, was only able to gain about thirteenthousand new votes over the past five weeks. In comparison: the Facebook-homepage was able to garner more than 200.000 “Likes” in the same amount of time.

For our second evaluation we crawled about 100 million pages from about 400.000 mostly German-speaking domains and analysed whether they had the usual social-media-buttons from Facebook, Twitter and Google+ . The results are astonishing:


The Google-Plus-button is already integrated in more domains than the Twitter-button – even though Twitter has already been on the market for a considerably longer time. While there still is a gap to Facebook, it is not as if Facebook has an uncatchable lead. Looking forward, I think it could be especially hard for Twitter, which did not add any new features within the past few years, to find their own position between the two Internet-giants Facebook and Google, that does not end up being the MySpace-way.

Johannes Beus - 02.08.2011 15:11


IndexWatch 06/2011

Straight from the “better late than never”-box: here we have the IndexWatch-numbers for June 2011.  As always, we calculated both the winners and losers based on their VisibilityIndex at the beginning and the end of the month.  We start with the winners:


Parasite-hosting seems to be a subject that Google cannot really put a lid on: every few month we see a mass-host skyrocket to the top with numerous, mostly adult keywords and it will take Google a few weeks to react, probably manually.  This time it’s ohost.de, while it will likely be some other host in the coming month.  A definite sign of how helpless Google is when it comes to this came a few days ago, when they banned the whole cc.co domain with all its subdomains.  Supremacy in dealing with SPAM definitely has a different look to it.

Seeing Youtube.com on the list might seem astonishing at first, so let’s take a short look at reasons for this:  back in the day the Google-SERPs were simple – there were 10 results and that was it.  When Google noticed that no one will use the picture- or video-vertical-searches by themselves, they came up with the Universal-Search idea:  any time Google deems it useful, they will show Universal-Search results in addition to the 10 regular results in the SERPs.  A few weeks ago, Google made some changes to some of the video-results:  now, they will not be shown in addition to the 10 organic results but will replace one of those 10 results from the SERPs.  Another change is that you will not be shown the familiar box with 3 or more videos anymore but only one video with screenshot.  In these cases we decided to count the results as organic – which means they now count in the VisibilityIndex calculation and therefore explains the strong increase for YouTube.

The dpaq.de is an interesting phenomenon:  the URL-shortener for the Deutsche Presse-Agentur (dpa) hides behind this domain.  They decided on offering this service themselves because they do not trust the public options.  Surprisingly, they decided on implementing the usually correct method of blocking the whole domain for searchengines with the help of the robots.txt-file.  The problem here is that Google abides to the robots.txt and does not know that all there is to this domain is redirects to the actual content.  The fact that a lot of very strong links point to the URLs is the reason why Google still ranks them – and not just anywhere, for keywords like easyjet, wettervorhersage or dönerhersteller they rank in the top-10. SEO by accident, congratulation. 


Once again we see numerous aggregators, with next to no visible added value, on the losing end of the Google-index for June.  Domains like urlplus.de were some of the winners on last month’s list, it looks like Google decided that the quality of the content did not justify the number of visitors send their way and took countermeasures.  Similar reasons explain the fall of domains like bizzinformation.de or robtex.com.

I was a little surprised by the large decrease in the visibility of taz.de, even though sometimes the content would justify such a loss.  Taking a first look, I was not able to make out any SEO-reasons why they should have lost such a large chunk of their visibility. They managed to decrease their visibility from a high of more than 30 points to less than 10 points at the time of writing – external trafficmeasuring-tools like Google Trends also show a distinct decline in visitor-numbers since the start of the year.

Johannes Beus - 13.07.2011 21:44


IndexWatch 05/2011

Up until the beginning of this year, I did regular evaluations of the changes within the Google-index for each month, listed winning and losing domains and made some comments on interesting cases. For some inexplicable reason I stopped doing so for these past month. Now I would like to pick up the torch again. The following charts are based on the SISTRIX VisibilityIndex, as they have always been. Holding with tradition, lets start with the winning domains in last months Google-index:


The top two contenders are domains where I believe that their visibility is unreasonably high. Urlpuls.de adds some scraped content for a domain together with their Alexa-data and a few networking information to broil up pseudo-content that is then published in a visually quite appealing fashion: according to Google's algorithm, this is enough to garner a visibility of 100 points and more. Just as a comparison: domains like twitter.com, pcwelt.de or zeit.de have approximately the same visibility. I am sure that this domain will lose its ranking as quickly as it has gained it – but it will likely be a manual change, which is not necessarily evidence of Google’s technological superiority. This is especially true when you consider that this is not an exceptional, one-time matter, but something that happens rather regularly.

When we look at Tradoria.de, we see a sort of internet-shopping-center gain massive footing over the last month. The VisibilityIndex more than doubled within the past two weeks. The domain is being found for more keywords, while the actual trigger is the increased ranking of the site: before, only 1.4% of keywords were in the Top-10, now that number went up to 2.2 percent. This is still a low value but they are moving in the right direction. Here, it is also interesting to mention that Yate.co has also gained ground during this time – it seems that the current changes to the Google-index prefers sites like these, something that should make the operators of these sites take a rather relaxed stand toward the introduction of Google-Panda.


While mybestbrands.de saw phenomenal growth in the previous month, they were now set back by quite a bit. It looks as though the Google-algorithm decided that the sites' content was not ready for the top-page quite as often as before: they went from nearly 12% Top-10-hits down to about 6%. Maybe their linkbuilding-efforts in the past were a little bit too “active”?

Motoso.de suffered a set-back in their SEO-efforts last month: before they managed to move forward, slowly but surely, where now it seems they are back on the road to the VisibilityIndex-level they had at the beginning of the year. When I take a look at how much the amount of returned keywords has grown over the past month (and then plummeted last week) and compare that to the disproportionally weak VisibilityIndex-growth, then there are two possible explanations: for one, there could be an OnPage-problem with the supply- and (internal) linking of meaningful content. The other option could be a case of Google not having enough trust in the domain, which could be remedied by one or two strong links.

Johannes Beus - 05.06.2011 22:42


Panda-Update: first reflections

While continental Europe is on the edge of its seats, waiting for the Panda Update to be implemented, both the US and British indexes are slowly getting a breather. There, the changes are already made and the dust has settled somewhat. This is surely way too early for a final verdict on the Panda-Update but I feel like putting up some thoughts and conclusions for discussion.

The Panda-Update has its roots in Google Caffeine. This is the codeword that Google used last year when they implemented a reworking of their search-infrastructure from the ground up. Since then, new pages as well as ranking-signals can be added to the index much more quickly and the total size of Google's index has also grown massively. This has also become a problem for Google: pre-Caffeine, only a limited number of URLs from domains with questionable content were indexed, which worked as sort of a natural filter. It is this filter that Caffeine got rid of, which means that quite a lot of low-quality content has found its way into the index and can now be shown. Panda is Google's way of getting a grip on this problem by cleaning the rankings of content that may not clearly violate the spam-guidelines, but which is still not up to par with Google's (and the users) quality expectations.

Qype und YelpOver the past few weeks, I have already written about domains that were affected by Panda. The first Panda-Update in the US surely caused the largest commotion but both the UK-version as well as the update for the US gave us access to some interesting data-sets. Since then, a discussion has been going on about the possible causes: some think that Google upped the amount of brand-searches, many are of the impression that Google rates the quality of text on a page, while others believe that a reduction in indexed pages could help. It is too early to go and work on solutions, seeing how we have yet to see a domain that manages to substantially bounce back after getting hit by Panda. Personally, I think that Google is assessing a multitude of signals at once. When a large enough sample of these signals for a domain show a large discrepancy from the norm, a filter gets activated. This is where it gets interesting to start speculating about possible signals that are used in evaluating a domain. The comment of a Google-employee on one of the support-forums let us know that, at the moment, it seems that Google is still content on using domain-wide signals:

„In addition, it’s important for webmasters to know that low quality content on part of a site can impact a site’s ranking as a whole.“

For a closer look, it is interesting to delve into both qype.co.uk as well as yelp.co.uk: both domains have the same business model and while Qype got hit, Yelp managed to be spared by Panda. Many of the metrics for both domains are quite similar, though once you look at the 'time on site', the pageview/user as well as the bounce-rate, you start to notice some differences. The following numbers are Alexa-numbers and while I am aware that these numbers are to be taken with a grain of salt, we are looking at two very similar sites and the differences between them, which should be enough to make this work:

Qype.co.ukYelp.co.uk
PV/User3,154,39
Bounce %61,7%52,0%
Time on Site2,6 min4,8 min

You notice off the bat that Yelp has the better numbers, across the board. Similar “domain-pairs” (one hit by Panda, the other spared) show striking differences in these metrics as well. Here for example, we have Pricerunner.co.uk as well as Ciao.co.uk:

Ciao.co.ukPricerunner.co.uk
PV/User2,113,10
Bounce %63,2%47,6%
Time on Site1,7 min2,5 min

For all these metrics, I would advise not looking at the absolute numbers but at the deviations a certain domain shows in comparison to their competition. And while we certainly cannot be sure whether Google is using such metrics in their decision-process to use a Panda-filter, I feel confident that this theory has a lot going for it. It is certain though, that numerous other signals will also affect the outcome.

Johannes Beus - 26.04.2011 12:28


Panda Vol. II: Ehow.com got hit this time

When Google announced to apply the Panda update to all English speaking countries at the beginning of this week, they also mentioned to change some of the filters for the US, too. As these 2% of U.S. queries are affected have sizable impact on important domains, I’d like to throw updated visibility data into the discussion.

As with our other analysis, the following data is based on observing ranking information before and after the update. The keywords are chosen to reflect an intersection of the local search behavior and we’re quite confident to have a highly reliable data set. Let’s get right into our findings.

Ehow.com got hit this time. They were among the sites at which the farmer update was aiming but somehow survived the first round. Like in the UK, they lost massive visibility in the US since this week:



Overall the impact of this second Panda update in the US wasn’t as big as the first time but for affected domains the consequences could be harsh. Here is a list of further domains affected:

Losers
#DomainChangeSISTRIX beforeSISTRIX after# KWs (before)# KWs (after)
1ehow.com-66%411,56138,70489.294228.592
2greatschools.org-56%37,3816,5636.57415.697
3brighthub.com-91%17,451,4959.57416.004
4markosweb.com-85%18,542,7113.0575.867
5superpages.com-68%17,645,6461.49332.171
6medterms.com-53%22,4610,499.6374.416
7life123.com-94%11,700,7260.28120.764
8tech-faq.com-57%17,197,3310.2904.666
9spike.com-64%14,725,2810.7115.203
10thefreecountry.com-87%10,051,305.0991.153
11managementhelp.org-66%13,254,542.661959
12videojug.com-90%9,500,9119.7676.582
1310best.com-87%9,381,1910.9962.996
14quintcareers.com-70%11,623,444.4571.609
15mortgageloan.com-58%13,515,734.3201.624

As the now free results in the Google SERPs aren’t let white, there are also winners from the update. Here is a table with the biggest winners:

Winners
#DomainChangeSISTRIX beforeSISTRIX after# KWs (before)# KWs (after)
1wiktionary.org28%126,18161,1830.81733.685
2yelp.com8%133,36143,62170.209174.139
3dailymotion.com31%27,2735,8064.19171.795
4etsy.com8%94,85102,2979.66981.986
5sears.com12%60,8068,0046.79149.899
6huffingtonpost.com8%85,0791,69111.429114.950
7latimes.com9%45,2449,3899.068101.657
8boston.com20%20,3024,4438.35939.569
9mashable.com15%19,5322,4225.07926.786
10tomshardware.com13%21,4424,2956.28259.782
11pcmag.com9%29,5032,2928.72328.928
12cbsnews.com8%36,3339,0755.82856.955
13computing.net29%8,5311,0121.68224.174
14reuters.com9%28,2230,6456.92558.852
15ask.com12%19,3821,7889.56892.988

Like last time, Google seems so have reached its goal: ranking quality content better than before. It’s interesting to see how similar businesses fare in the US and in the UK. Yelp.com is among the winners in the US while Qype, a European company doing a similar business here, is one of the biggest losers the the UK. If you’re interested in having a deeper look at the affected domains, request a demo account for the SISTRIX Toolbox and we will supply you with unparalled insights.

Update 04/18/2011:
Demand Media has published a statement where they point out that traffic to ehow.com hasn’t declined 66%. I’d like to emphasize that although our data usually correlates quite well with actual traffic numbers it is a view from the outside and Demand Medias’ own data is of course correct.

Johannes Beus - 16.04.2011 10:24


1 2 3 4 5 6 7 8 9 10 ... 103