Bing Hack: Seeing Geo-targeted results

Bing launched yesterday – a few days before everyone thought it would and I have been pleasantly surprised by the generally positive feedback from people within the industry. Last time Microsoft launched a search engine, I can safely say the response wasn’t QUITE so good. In the intervening years they have really tried to engage with the industry and it seems that this has paid off a bit.

A huge amount of work seems to have gone into user intent and user behaviour. This means that results can vary dramatically depending on the nature of the query. The decision to add images to the results, or local listings, or currency exchange rates all depend on what Microsoft can glean about you.

This means they have used some novel new GeoTargeting features which are a little unusual – meaning that most people around the world are going to see different results EVEN WHEN THEY TRY TO SEE RESULT FROM OTHER COUNTRIES.

Let me explain…

The first two “layers” of Geo-targeting are cookie based. One defines the country and one defines a much narrower area like your town. To change the country, you would think that you just need to click on the country button in the top right:  bing-counyry-code

But this doesn’t truly give you US results if you are in the UK….

The second Cookie related setting can be modified by typing in a very generic term like “plumbers”. You should get a local result. From there you have a button that lets you change your location:


this will allow you to draw local results from anywhere. (I switched mine from Ealing, to New York.) But this still doesn’t guarantee you seeing proper US results in the main SERPS.


To properly see UK results from the US you need a third level of Geotargeting. The third level is IP location. Looking at the Bing SERPS from a computer hosted in the country you want to review does make a difference. To do this, the easiest way is to look through a US proxy from the UK or a UK Proxy from the US. If you want to look through a US proxy, but before you do that, you should probably disable all your cookies or at least set then to the country you are trying to view.

But even THIS isn’t perfect, we found. There is a FOURTH layer which even a proxy server doesn’t fix. This fourth layer has some unusual idiosyncracies.

Andy and I worked on this today and it looks like the accept-language headersetting is working differently in the US than in the UK. Changing the accept-language header is a bit of a pain, as it gets configured when your browser gets installed, although in Firefox I imagine that the “tamper data” plugin is abot to get quite popular as you can change the settings using this plugin.

We used this plugin, with cookies disabled, through a UK and through a US Proxy, to see whether IP location or accept language header too precendence. The results were geeky but interesting:


IP Location

Accept language setting

Result in Bing






















The interesting element here is that on a US IP address, the accept language setting takes precedence, but from a UK IP address, the accept language setting defers to letting the IP address taking precedence.

So it seems that Bing does not trust an IP address being reported as coming from the US as it does from outside the US.

Summary and Conclusion

If you are in the UK and you REALLY want to see what the Americans are really seeing on Bing then you need to do all of the following:

1. Set the country in Bing AND

2. Set the town in Bing AND

3. View through a US based proxy server AND

4. Change the accept language settings by installing a Firefox plugin

If you are in the US, then you only need to do one of the two last elements to be able to see a regional result outside the US.

BrowseRank: Microsoft’s challenge to PageRank. (Part 1)

Microsoft have detailed an 8 page, two column document coming out of China which may well eventually beat Google’s PageRank and then some. Because the paper is so long, I thought I would do what Graywolf did years ago for the PageRank algorithm paper, by summarizing it all in easy to digest terms. (Added: Thanks Michael for sending me the link, see comments.)

PageRank was Larry Page’s brainchild (hence “Page”) and was the making of Google. It used the basic notion that a link from one web page to another can act as an endorsement of the landing page’s value. It worked brilliantly. Mostly. But Microsoft think they have a better approach. Here’s what their paper says.

The paper proposes a new method for working out a page’s importance – called BrowseRank. The article starts by summarizing the theory behind PageRank which I won’t reiterate, although they do also not that PageRank also uses a discrete time Markov process – which basically refers to the way its bot link-walks around the web, collecting data. It highlights two fundamental weaknesses of the model. First – it points out that the link graph that is generated is not especially reliable as a data source, because people can create pages with tons of links automatically and quickly. (Now who would do a think like that?) They also suggest that the PageRank model uses a random “walk” along the links. (I’m not so sure about “random”. Maybe in the early days) They also observe that the original PageRank theory took no account of the amount of time that a websurfer spends on a web page – something seen as a significant flaw by the writers of the China paper.

It is these drawbacks that BrowseRank seeks to overcome in what they say is a “more powerful mathematical model”. Instead of using the link graph as the basis for data collection, they propose to utilize a user browsing graph, generated from user behaviour. They will collect the data either through web browsers or at the server level. (MY ADD: Let’s face it – they have plenty of people using Microsoft’s Internet explorer, so even if this was decided by the courts to be something that required an opt-in, they would have a HUGE data set. Even if they didn’t, then any sites that wanted to be seen only need to opt in themselves… probably through adding tracking code or by allowing the server to pass logs to Microsoft. Either way – they have a potentially great data source here. ) They do propose to store the data by anonymous user, you will be glad to here! (MY ADD: They do not need to personalize it, but my guess is that they will need to do something to filter by IP number to avoid spammers… but even as I type, I think they could get around that.)

So – with this data collection methodology, the basic calculation of a page’s importance becomes obvious: The more visits to the web page combined with the longer people spend on those pages, the more important the page. (MY ADD: Time to start writing long, engaging copy and maybe if you are black hat, reduce the page’s usability so people take ages to figure out how to move off the page). The writers call this very “Web 2.0” in concept, with user behaviour defining “implicit voting” on page importance.

So now you get the basics – but the paper gets into much more depth… (I am only on page 2, column 1 of an eight page document here, and page 1 was half taken up with the author details).

Leveraging the data using BrowseRank

Here’s where they start to get clever. Once you have this data, you can’t use a discrete-time Markov process to leverage it. I’ll try to explain each bit separately as I go through it, but here’s the terminology they are using

  • They use BrowseRank to compute the “stationary probability distribution of the continuous Markov Process” (Explained as best as I can guess below)
  • The use use a “noise” model to represent the Markov Process Observations (explained as best as I can guess below) and
  • They use an embedded Markov Chain based technology to speed up calculation of the stationary distribution (Again explained below)
  • My interpretation of all those is below. (TEMPORARY NOTE: As I plan to go through this paper in details, I will probably amend these as I understand them more. By the time I have written the last part of these blog posts, I’ll delete this sentence and keep the content static thereafter except typos)

    Stationary probability distribution of the continuous Markov Process

    What the heck is that??? Well look – it seems to me that a Markov process basically is a link-walking bot. This database will not be based on link walking – so it needs to find a way to put a percentage, if you will, on the probability that a person clicks on each link on a page. The highest POSSIBLE combined total for any page’s total links would be 100%, but this would be impossible in practice as the back button and people going nowhere from a page would also need to be included. So for every link on the page, there is a percentage chance that a user will click, with the total percentage representing the “Stationary Probability Distribution“. Hope that makes sense? In layman’s terms, It means that they don’t NEED to use a bot to linkwalk for this part.

    Noise Model to represent the Markov Process Observations

    I think this means that the system will look at the content on the page and relate it to the content on the pages that they link to (and content indicators like anchor text etc) to get some indication of where a user might link next, which would be valuable where a page does not have enough user behaviour data to provide a meaningful map.

    An embedded Markov Chain based technology to speed up calculation of the stationary distribution

    Relying entirely on user behaviour to map the whole web would take to long… so they are going to use a bot anyway! But now, instead of the Bot linking randomly, it will link based on the probability distribution from pages where there is suffficient confidence in the distribution (and presumably sufficient confidence in the page value).

    I’ll expand on this and go through the rest of the paper on subsequent blog posts over the next few days, once I have digested the paper. We still need to cover:

    • User Data Behaviour – how they plan to define user behaviour, including their modelling and assumptions. (Real MATHS here, that I can’t cut and paste because your browser may not supprt the symbols!)
    • The Conituous-time Markov Model: More maths
    • The Algorithm itself: Horrendous amounts of Maths
    • Experimental results.

    More soon! The actual paper is available in PDF form at

    The authors include: Yu-Ting Liu and Ying Zhang.

    Integrated Search. The future of search.

    Last week I was privileged enough to be invited to Microsoft’s “Live Search Symposium” in a posh private venue in Knightsbridge. There were only a hundred or so guests, but when the guests include Danny Sullivan, Dave Naylor and the tech boffins at the British Library you know you are in the right sort of place.

    (By the way – how short can you make YOUR domain name? The British Library is… Do you think their DNS could drop the www? that would be very cool. Anyway… I digress.)

    The symposium was exceedingly slick. MUCH better than I am used to frankly, from Microsoft. They really put text based serps into perspective as being… frankly the very start of search. I know we have been banding about universal or blended search for over a year now, but in the UK at least, we really haven’t made the leap. Microsoft seem pretty joined up in their thinking about how that leap will change their fortunes in serach. Microsoft are not thinking “serps”. They are thinking vertical search and multiple media. They may still be weaker than Google in the organic results, but what they have been building an the infrastructure powerful enough to break Google’s market up entirely by encouraging different people to build different ways to search, based on different audiences. Danny reported on one such example of the Indiana Jones Search Engine straight out of the meeting and if you haven’t spent 30 minutes engaging with you really should! but Microsoft have gone way further and it looks like their new “Silverlight” product (a bit like flash on steroids). I am no developer, but seeing how the dots connect almost makes me wish I was.

    Microsoft are not just paying lip service to these joined up dots either, They have created dozens of viral videos which must have cost a fortune! (sorry… the video below is probably still loading… bear with Microsoft…)

    What Dixon Jones laughs at during working hours

    There are several of these designed primarily (it would appear) for the UK market.

    Probably the most impressive thiong I saw to show integrated search could already be was which was pulling news, image and data feeds in real time as the London elections were going on. Once built, the system was functioning and updating seemlessly straight through the election period and is still current now. You need to download Silverlight to see it, mind, but wow – that’s going to challenge the very core of the news providers. Apparently anyone with programming skills and some time on their hands could have built it, using Microsoft’s freely available APIs. It didn’t have to be MSN, but obviously they wanted to see how far they could take the technology.

    When you have search that is so rich in any given verticle or topic, built by millions of enthusiasts in thousands of genres and styles… what does Google become by comparison? Just a directory of search engines – because the “most authoritative” source and “richest experience” on (say) bungee jumping will be a site that drags in every valuable news search, image search, map locations, addresses, forums and blogs on the subject in the most entertaining way… and that way will be based on Microsoft’s silverlight technology and Microsoft’s APIs. Not Google’s.

    The better Google gets at retutning the “best site” at the head of search, the more Microsoft technology driven sites will be at the far end. That leaves Google with the long tail and no place to go.

    I’m not saying Microsoft’s strategy will work. But it’s definitely a strategy that wasn’t thought up and developed to this level of sophistication on the back of a napkin. It might just work.