Dear Apple,

With the usual relish of a kid at Christmas, I went to pick up the iPad Pro which was launched today. My faithful and trusted Mac Air has been around the block. The “A”,”S” and “H” keys are showing signs that I do not cut my fingernails enough, but otherwise it is working fine.

The new iPad Pro was hailed as the missing link for businessmen. The device that would finally get those stubborn desktop and laptop users hooked onto tablets.

Anyway – having gone to the candy store and partaken of the fruit, I thought you might be interested to know why I am back to using my Mac Air to write this blog post. Perhaps we can fix my initial problems before my 14 day money back guarantee runs out, because I REALLY have been looking forward to something that replaces the Mac Air.

My initial concern, before leaving the store, was the lack of a USB port. I speak at conferences all around the world and they generally ask for me to bring my presentation in on a USB stick. They absolutely do not want me to rely on the web and I have seen many a presenter scuppered for such a rash and dangerous approach… so I agree with them. So how will I get my presentations from my device onto the presentation laptop? I am sure that will become clear… so let’s move on…

I bought the iPad Pro with 128GB and the fancy keyboard. The keyboard was one reason I had not migrated in the past, so this sounded like the panacea. But it seems connecting a keyboard doth not a Laptop make 🙁 Most of the time, the silly iPad keyboard pops up on the screen even with my keyboard attached. Why? WHY? but that was the least of my challenges.

My largest problem so far is that I can no longer embed LastPass into my browser. That means that I cannot easily log in securely, with unique and complex passwords, to anything on the web any more. I CAN download the LastPass app, but that is not a realistic solution day-to-day. I have to log in to dozens, scores, of systems every day and I use multiple browser plugins. I need a REAL operating system. I assumed you had figured that out if you had worked out that there is an incredibly substantial group of people not moving to tablets. You did some research as to why, surely?

Now – let’s talk about mice. I am fine with the track-pad on the Mac Air. It works. But where has it gone on that “fully functional keyboard”? I assume I can go out and buy a Bluetooth mouse – but let’s not pretend that touching the screen on its side is a substitute for a mouse or track pad. It just isn’t.

But the REAL issue? I cannot download normal software! How do you expect to say it is a substitute for a laptop if I cannot download any of my software?

So I decided to try to write a blog post to get used to the system.

But I couldn’t log onto my WordPress from my Browser any more, because I can’t install browser extensions.

So I download my password system, but I am now reduced to cutting and pasting every time, which seems to include having to use my fingerprint button twice as well, each time.

It just is not a replacement for a laptop. I fear it will be back to the candy store until the new Mac Air comes out. If you plan to lose the USB ports on that as well, then you may lose a lot more than that.

Yours faithfully,

Dixon Jones.

Remember what you read online

Here is a great way to dramatically improve memory when reading articles online.

This week I finally get to sit down and do something constructive with Bryan Eisenberg, the founder of the Web Analytics association (now renamed the Digital Analytics association). He’s been around forever and it amazes me that so few businesses are able to get to grips with his essentially obvious mantra that business online needs to be underpinned with good decisions based on good data. Recently, the mantra changed a bit though, as we all start to realise that some of the power within the mass of data that some companies have managed to harness, lies not in analysis – but in knowing what the “Big Idea” is that you want to get out of the Big Data.

But that’s not the full point of this post. Bryan gave me a bunch of posts he had written on the subject. As I started to read, it dawned on me that I really wasn’t taking in the words like I used to. I think ADT is growing in all of us involved on the web. Age doesn’t help and the glass of red wine was the final straw. I closed the laptop and left it for a few days.

Well I am glad to report that I have now taken in the articles and found an excellent way to aid memory retention online. I downloaded an app from iTunes onto my iPhone that turns web pages into audio. The idea for the app is that you can turn any news feed into a podcast. That’s useful in itself. But the real “deep” recall occurs when you read AND listen at the same time.

By reading and having computer generated spoken word, you combine two senses, dramatically improving recall. We all use PowerPoint (or in my case “Prezi”) to help us elaborate and communicate in different ways, because a picture can say a thousand words, but we rarely combine senses when reading online. I heartily recommend that you change that now.

Using different senses online to communicate is one of the reasons I use to broadcast My Search Kingdom Radio Show. (Next show is on Thursday). Using those senses to RECEIVE information is just as important.

If you want to look at the iPhone App I have started using, it’s called “Speakably”. It looks new, so not perfect yet, but it does the job. What’s more, you can speed up the speed of the spoken word. Going at double speed won’t work just by listening alone, but it is pretty easy to read fast with the audio running at the same time.

It turns out this is especially effective. A few months back I was at the Glasgow Science Museum. A great day out by the way. In there, in the top floor, is a whole interactive experience about sound. One exhibit let’s you hear something spoken very fast. It is hard to understand. Then you hear it at normal speed and of course it make sense. The “Wow” moment comes when you hear it fast a second time. Now your brain has a pattern to follow and this time it all makes sense at the faster speed.

Put a computerised audio over web pages when you want to read fast and take in the content. You’ll be amazed at how much faster… And effective… Your reading will be.

If you want to read the posts Bryan sent me:

Pass the word. Please share the post if you find it useful.

At SES, Bryan and I are heading up the “Big Data” Meet the Experts table at lunch in San Francisco. This year I have been pushing Majestic SEO within a new but growing conference circuit based around Big Data and Predictive Analytics and along the way I have found some great ways to mine Big Data.

Some stuff about Google’s Crawler

How Google crawls and retrieves data from the web. Some common SEO issues that arise. From Googler’s mouth.

Pierre Far is a Googler. I expect he’d appreciate that I pointed him out on G+. He spoke a bit at ThinkVisibility about the crawler and some of the issues that face the whole information gathering and retrieval process. His pictures weren’t as pretty as the “How Majestic Works” infographic, but there was some useful substance in there.

For example: Did you know that Google only checks Robots.txt about once per day to help keep the load off your server? and that having a +1 button on your site can override robots.txt? These are some of the things that he brought up in his very interesting presentation. I made some notes as I went along. I hope they are legible…

Google sets a conservative crawl rate per server. So too many domains or URLs will reduce crawl rate per URL

If you use shared hosting, then this could easily be problematic for you. If you do not know how many other websites are on the same IP number as you, then you may be surprised. You can easily check this by putting your domain or IP number into Majestic’s neighbourhood checker to see how many other websites we saw on the same IP number. currently is on a server with 10 sites. But there could be hundreds. More importantly, if one site has a massive amount of URLs… and it is not yours… then you could be losing crawl opportunities, just because there’s a big site that isn’t connected to you in any way on the same IP number. You can’t really go complaining to Google about this. You bought the cheap hosting, and this is one of the sacrifices you made.

If a CMS has huge duplication, Google then knows, and this is how it notifies you of duplicates on WMT.

This is interesting because it is more efficient to realize a site has duplicate urls at this point than after Google has had to analyze all the data and deduplicae on your behalf.

Google then picks URLS in a chosen order

I asked Pierre what factors affected which URLs were selected. In truth I asked if deep links to urls were likely to prioritize those urls for a higher crawl rate than other pages. Of course I believe deep links will change this priority, but had to ask. I was just given:

Change Rate of page content will change this.

Which is not quite what I asked – but nice to know.

Google checks Robots.txt about once per day. Not every visit.

This was interesting to me. Majestic checks more often and you would be surprised at how simply checking Robots.txt annoys some people. Maybe less is more.

Google then crawls the URLs and sends feedback to scheduler.

If server spikes with 500 errors, Googlebot backs off. Also (as with Majestic) firewalls etc can block the bot. This can – after a few days – create a state in Google, that says the site is dead. The Jquery blog had this issue.

If 503 error on robots.txt they stop crawling.

OK. Don’t do that then 🙂

Biggest and smallest ISPs can block Googlebot at the ISP level.

That was good to see that other crawlers face this issue. Because ISPs need to protect their bandwidth, the fact that you want Google to visit you site does not necessarily mean it will be so. Firewalls at the ISP may block bots even before they see your home page. They may (more likely) start throttling bits. So if your pages are taking a long time to get indexed, this may be a factor.

Strong recommendation – set up email notifications in Web Master Tool.

Pierre did not understand why we were not all doing this. If Google has crawling errors – or other things that they would like to warn us about – then an email notification trumps waiting for us to log back in to Webmastertools. I’ll be setting mine up right after this post.

Getting better and better at seeing .js files.

At least – I think that’s what he said.

Soft error pages create an issue and so Google tries hard to detect those.

If they can’t, they end up crawling the soft error as a crawl slot (at the expense of another URL crawl, maybe). So if you don’t know what a soft error is, it is when an error page returns a 200 response instead of a 400 (usually 404) response. You can “ping” a random non-existent url on your site to check this using Receptional’s free http header checker if you want.

Google then analyses the content. If it is no index, then that’s it.

There was a question from the audience: “Is Google keeping up with the growth of the web?” Pierre likes to think they are, but admitted it was hard to tell.

Serving the data back to you:

Google receives your incoming query and searches the Index.

Err – yes. Google does not try to scan the whole web in real time. Non-techies don’t realize this it seems.

Magic produces ordered links.

No questions allowed on the magic!

On displaying result, Google needs to:

  • Pick a url
  • Pick title: usually title tag, sometimes change tag based on user query. This is win win for everyone
  • Generate Snippet: will create stuff on page, but strongly recommends using rich snippets.
  • Generates Site-links: depends on query and result as to whether this appears. If you see a bad site-link issue (wrong link) check for canonicalisation issue.

A +1 button can override Robots.txt, on the basis that it is a stronger signal than Robots.txt.

Question from the audience: “Why are rich snippets showing are so volatile?” Google has noticed people spamming rich snippets recently, so he said maybe that was a reason for increased testing.

Pierre was completely unable to talk about using +1 as a ranking signal. (whether by policy or because it was not his part of the ship)

Q: “How can we prioritize the crawl to get new content spidered?” A: Pierre threw it back. Do some simple maths. 1 URL/second is 8400 per day. Google is unlikely to hit your site continually for 24 hours, so large amounts of new content can take time to crawl.

Q: “What error message should you use if your site comes offline for a while?” A: 503, but be careful if only some of your site is offline not to serve a 503 on robots.txt.

OK – that was about it. Thanks Pierre for the help.

Oh – nearly forgot – Pierre would like to point out that all this is in the Google Webmaster Documentation.

Yahoo Site Explorer Alternatives

Yahoo Site Explorer is going offline. Here is a list of the main Yahoo site explorer alternatives. There are a few to choose from with different strengths and weaknesses.

So Yahoo Site Explorer finally died. It took most of the year – but many SEOs had found stronger alternatives a while back. However – if you only just woke up to the fact that your search world just collapsed, there really are only a few realistic alternatives to choose from.

Majestic Site Explorer

I do want everyone to know that I am prejudiced here, but Majestic’s Site Explorer has the Biggest data source (way more than Yahoo with 3.5 trillion urls). The freshest data (updated more than once a day) and the fastest site (because the data is not scraped on the fly and is optimized on the web’s front end.

Unlike Yahoo – which only gave 1,000 backlinks even though they reported that they knew of more, Majestic gives you many thousands depending on your subscription level – or ALL of the backlinks in their advanced reports. Getting data on your own sites is free. You can also get headline link counts for all sites for free as well, but the real magic – all the links, with all the anchor text, in any order you really want, does require a subscription. There has been a huge investment into the Majestic SEO project and they do (in my biased opinion) have the world’s best site explorer.


Great at the moment to get some of the links. Blekko has a much smaller database – but they would argue they concentrate on quality over quality. I am not sure the truth isn’t a bit more about how to crawl and store the whole web, but with a huge investment recently from Yandex, they might overcome these issues in time. That said – with Yandex being a full search engine, there may be pressure on Blekko to stop releasing all the backlink data at some point.


The SEOMoz offering. A great tool for sure with extra bells and whistles with their Domain Authority and other added value metrics. They are using the Amazon Cloud which is not cheap and has had some outages recently – but they probably have the highest user base of all the alternatives in this list and many people swear by their data.


I think they are pretty good – but you best speak German for best results.

Most other alternatives use the data from these sites – or derivatives thereof.

Prediction Technology

My agency, Receptional, has been involved in some interesting projects over the years. It seems that when other agencies or consultancies are asked a question that just cannot be answered without a bit of clever programming, they bow out of the race. We find ourselves building ad-hoc tools for clients which do not cost the earth, but produce interesting data designed exclusively to answer the question at hand.

We think that this is something we would like to get involved in further. It’s interesting stuff. To that end, I am going to want to really start looking at technologies which “scientifically” try to make predictions and forecasts in new ways.
To that end, I will be starting a new blog soon – called and the first post is nearly ready to go. The point of the blog will be to report on new predictive analytics technologies and if you want to be in RIGHT AT THE START then you could do worse than following @PredictionTech on Twitter.  If you have technologies that you think PredictionTech should be reporting on and investigating, then send them through via that Twitter profile for consideration. If you want to write for PredictionTech (It’s unpaid, but may get you to some cool conferences) then also let us know.

Do you think the project will have legs? What would you like PredictionTech to look at?


Could Twitter Build a Better Full Text Search Engine?

Twitter has different quality signals to Google within its data. Could Twitter use this to build a better search engine? This article puts the case for Twitter and argues that Twitter might have the edge in creating a very different Index which Google could not easily replicate.

It isn’t beyond the realms of possibility and if I was Twitter I would be thinking about it.

If you search for a keyword on Twitter, it used to be that Twitter would just look at the text in Twitter content. Then they started to look at the text in the URLs. I noticed later that they where also able to parse URL text via redirects and now of course they have changed their own URL shortening system which gives them more weight.

But is there an opportunity here for Twitter to create a better algorithm for sentiment analysis and a better algorithm for crawl priorities which would give them a search engine edge over the likes of google and Bing? I think there is and I think that Twitter knows that the Twitter Fire Hose (The name given to the API that can dump every twitter post out to a third party, if only the third party had the bandwidth to take it) has the ability to be a game danger in search. This is why they would have held out and were ultimately prepared to walk away from the deal with Google.

The thing is, I don’t think Google had been properly leveraging the data. It was being used for “realtime search” but it does not look like Google had been confident enough about the deal to let it distract Google’s main search methodology from weighting its main index based in the link mentions within the Fire Hose feed. If they had – and if they had let the weight of a page rise (and eventually fall) based on these links and the clout (or Klout) of the people shouting out on Twitter, then the results could have look very different on Google.

Twitter – of course – have this data all the time. It could be the basis for prioritising a crawler and that’s half the battle in developing a search agent that needs to scale up to the size needed to index the web. Indexing a full text search is a massive task – as the Majestic 12 project has found – but you do not have to do it all at once, Twitter already index the URL and maybe the page Title. What about the first paragraph only? Or the snippet from when the article appears on a blog home page? Basically enough to get the drift of the article.

Now – combine this with sentiment – which they can get from the context of the Tweets much more readily than Google can from crawling review sites – and Twitter really do have something potentially special.

All that remains would be the business case of trying to create yet another search engine. However – one of the factors in such a decision must be the knowledge that Google is developing its own competitor to Twitter and if Twitter are not careful, they will find their advantages erode from underneath them. So they need to fight back and soon.

Attracting Affiliates

How do you find and attract affiliates? This article shows you cool technique that works for me.

I only just noticed a video interview (above) that I did with Dr. Ralph Wilson of web marketing today on how to attract affiliates. In it, one of the things I talk about is looking for affiliate link signatures to track down your competitor’s affiliates. I should probably explain how you might do that. For example, let’s look at Amazon affiliates. I just searched the web and found an Amazon affiliates link for a self help book. A right click on the image lets me copy the link into, so that I can see how the affiliate links to Amazon. It is a link like this:

Looking at another book on the same site we can see this link:

So between the two, we can see the difference between the two links. If we want to find all the people that are affiliates of a particualar book at Amazon, say the “anger control workbook”, then we are looking for links that START with … MajesticSEO’s link analysis let’s you sort by url – or even by part URL. So you can use MajesticSEO to find affiliates of a certain genre and use this to attract them to your program.

Here’s another example, more suited to most of us in small businesses. Mary Ferrin writes and runs one of her affiliates is and you can see that his page links to: For a much more reasonable cost, I can find other affiliates linking to that page with the same ?family prefix by forcing an analysis in Majestic for links that ONLY contain “” in them.

He presto! Here are her other affiliates!

Guess what you should do next with this data as a Merchant? Check these against your own affiliate list to see who you haven’t got. Since I own Murder Mystery Games… guess what I plan to do next week?

3 miles of Data Cabling

We are growing our office. We are not a large company, but we hope to be… so we had an extension built We then all moved into that, whilst the main office building was totally gutted. Even the second floor was rebuilt and the roof raised to meet modern day building regulations. Apparently we aren’t allowed to only employ dwarves any longer. It’s heightest.

Today I had a look in at the work in progress…. Lan cabling lining the floorboards, roofing and the walls.

Apparently… nearly 3 miles of it!

I asked if this would mean we could dispense with the central heating. Apparently not.

So is this necessary? Well without it, any business would need 10 times as many meetings, which would mean loads more travel. So I bet that if it was “carbon tested” so to speak, then yes.

Linkscape vs Majestic

There are very very “Link maps” in the world commercially available to the public. A link map is the hardest element to replicate in Google’s search algorithm. There are really only two companies with commercial link maps available now to the masses. This article helps you choose between the two.

Who has the best back link data in the world today? Discounting Yahoo, there are only two world class systems being developed that I can see. They are Majestic – which has been quietly link walking since 2004 and is only now revealing its hand, and Linkscape – probably the most well known in the US – which has had considerable investment from the Rand Foundation (SEOMoz).

I’ve been impressed with both and thought it was time to really put both systems through the test. Which one is better and which one is priced right?

To clarify – I am looking at the PAID versions of both systems. I covered the following areas:

  • Index Size
  • General Look and Feel
  • Manipulating data
  • Pricing
  • Global reach

Index Size

Both sides could shout about the size of their index. Indeed – Majestic certainly is, claiming that they now have 539 billion urls indexed – which they say compares to only 170 billion indexed by Yahoo and only 38 billion indexed by Linkscape. In fact Linkscape’s Meta Description puts their own number higher at 54 Billion+, but even at this level, Majestic’s data (if true) is 10 TIMES the size of Linkscape’s at the moment and about half the size of Google’s. So let’s test this with a few examples – from popular to unknown
Small site test: (Software association of New Hampshire)

I chose this one for several reasons. The first is that I have never heard of them. I just went through the DMoz directory randomly starting with a state I’ve never been to. The second reason is that they 301 the www onto the non www so will avoid a potential flaw in results. Third, the site does not have an architecture that is built upon multiple subdomains.

Majestic found:  5,127 external back-links from 882 referring domains. with 229 unique anchor texts.
Linkscape found: 25 external links from 6 domains & subdomains. Linkscape only shows the top 50 anchor texts in this report.

Well on this basis – Majestic is absolutely crucifying Linkscape – but let’s be careful… Majestic may be giving so much data that we are not comparing like with like.

Big Site Test: (The UK’s most well known news brand)

Large sites will be especially interesting to compare because they tend to have many subdomains (like I tried to find a big site without significan subdomains, but even Wikipedia uses them for language, so I think we need to accept that any link analysis tool needs to cope with subdomains. So what did we find with the BBC?

SEOMoz found:  16,424,105 links from 315,686 domains/subdomains
Majestic found:  345,383,557 links from 598,475 domains.

Again, Majestic shows considerably more backlinks. Majestic;s data, though, includes 23 million image links, 22 million nofollow links, 1 million, 15 million DELETED and 2.9 million mentions (links in plain text, without a hyperlink). On the other hand, SEOMoz’s number appears to count subdomains as seprate domains, instead of limiting their advertised number to the number of Top Level Domains (TLDs).

If we take all of Majestic’s deleted domains out, and even if SEOMoz’s data had already excluded these, (which it doesn’t) then I think we can safely say that Majestic’s index is considerably more developed than Linkscape’s at the moment.

How can Majestic’s Index be so much larger? Majestic started indexing in 2004. That’s a lot of crawling time that Linkscape needs to catch up on. In addition, Majestic’s method of collecting data was ingenious – using distributed crawlers, similar to the bit torrent idea of using multiple partners to use their spare computer downtime to crawl the web. This has given Majestic considerable processing power at a relatively low cost.

General Look and Feel

Majestic’s hands down win on the index size is entirely reversed when it comes to Linkscape’s considerably better “look and feel”. Linkscape looks usable – whilst Majestic looks like it is built by a techie who never quite got around to thinking about it all from the user’s point of view.


Linkscape lays the data out logically, with a dashboard containing the most important information readily displayed and intuitive tabs to drill down to the referring domains or the URL anchor text. When you delve into the “links to domain” tab, SEOMoz lets you filter the result on the fly. This is an especially nice feature. For example, you can easily hide or include particular types of links. To do this with Majestic, you need to go right back to the options menu and force a new analysis of the data. You can get the same sorts of data, but it just takes more effort in Majestic and looks better in Linkscape.

By comparison – Majestic tries to display Top anchors, top referring domains and top pages all on the same page, offering a drill down on each table. It’s all too much data for a single screen. This has now also been augmented with some new graphs – which are nice… but MORE DATA! I also think people will be confused between the two graphs on this dashboard – entitled: “External backlinks discovery for” and “Referring domains discovery for”. I know the difference – but I guess you’ll have to look twice… and I would prefer if these defaulted to cumulative graphs.

Manipulating Data

The thing that strikes me between the two systems is that Linkscape only gives you detailed data about the 50 most common anchor text phrases, and the 50 most important links. Looking at as my example, I also found that all the most important links were internal! Now that may be – but if I want internal link data I can use Xenu Link Sleuth… it’s external data that I want – and by comparison, Majestic gave me so much that I immediately need to start filtering out what I feel may not be appropriate.
Majestic gives 200 results to SEOMoz’s 50 per page on the screen. You can drill down to up to 3,000 l of SEOMoz’s results, page by page – but this makes it hard to extract the data.

On both systems, you can export the data to a CSV file and then you get the whole lot! This is incredibly powerful, except that Linkscape limites their data to just under 3,000 URLs, whilst Majestic gives you the complete data dump if you want it all. There is, however, a considerable learning curve here for using Majestic. To get the data you REALLY want, you need to manipulate the “options” and then force a new analysis… THEN you need to download the data into a CSV. That gives you vastly superior information than SEOMOz, but it does take a while to be able to see the data from different perspectives.

Majestic also has some useful tools for power users. You can, for example, group your different accounts (SEOMoz calls them reports) into sub-folders. SEMoz let’s you compare two competitors side by side, but Majestic’s folders allow you to compare a whole industry sector if you had enough funds to collect all the data.


I am not going to go into pricing for the real high end users, who may be spending several thousand every month to use the data. For mere mortals, the pricing models are very different.

Comparing the prices is like comparing apples and oranges.

Linkscape is part of my SEOMoz Gold membership. That start from 25 reports a month for about $80. When I run a report, I get the data for that domain, at that point in time. I get to keep it for as long as I want provided I remain a member of SEMoz. By contrast, on Majestic, I buy access to a domain’s data, for a given amount of time – from 7 days upwards.

Majestic similarly uses a “credits” system to get around the international issues, but the price of a domain can vary dramatically. In the examples I used, cost just a couple of credits, whilst analysing the BBC would cost 600 credits for seven days access (or 3000 for a year’s).

So which is cheaper actually depends on what sites you are analyzing and how you are using the system. If you only have $20 though… you probably only have Majestic as an option.


Both systems are function rich and I probably have missed a few. If either Linkscape or Majestic think I’ve missed a trick here, they both know how to contact me and I will correct the table below – but only for functions available at the date of posting.




Your own domain for free



Domain Quality Estimate

MozRank (trying)

ACRank (Needs work)

External Links list



Internal Links List



Links to URL



Ability to filter on the fly



Filter by images



Filter noscripts



Filter Nofollow



Filter Ofscreen links



Filter same IP number



Filter Same IP block



Filter same subdomain



Filter Same root domain



Filter by Frame



Filter by Redirect

301s shown


Filter Deleted Links



Filter in/out Alt Text



Filter Mentions

Not tracked


Filter by specific anchor text



Filter by crawl date



Filter by URL text



By given IP range




Linkscape is considerably more intuitive at the present time, but here is much more depth of data at Majestic and for professionals, the leaning curve will be worth the effort. By contrast, though, SEOMoz has a huge variety of other tools available within its membership fee which you will still need for Internet Marketing even if you do go for Majestic.