Saturday, 22 October 2011

Google Puts A Price On Privacy


Earlier this week, Google made a significant changepurportedly to better protect the search privacy of users. In reality, it specifically — and deliberately — left a gaping hole open to benefit its bottom line. If you pay-to-play, Google will share its search data with you.
Google’s a big company that goes after revenue in a variety of ways some critics feel put users second. However, I’m struggling to think of other examples where Google has acted in such a crass, it’s all-about-the-revenue manner as it has this week. The best comparison I can think of is when Google decided to allow Chinese censorship. Yes, this is in the same league.
It’s in that league because Google is a company that prides itself by doing right by the user. Yet in this case, it seems perfectly happy to sell out privacy, if you’re an advertiser. That’s assuming you believe that Caller ID-like information that’s being blocked (except for advertisers) is a privacy issue.
Google doesn’t, as best I can tell. Instead, the blocking is a pesky side effect to a real privacy enhancement Google made, a side effect Google doesn’t seem to want to cure for anyone but advertisers.
If it had taken a more thoughtful approach, ironically, Google could have pushed many sites across the web to become more secure themselves. It missed that opportunity.
I’ll cover all of this below, in detail. It’s a long article. If you prefer a short summary, skip to the last two sections, “Why Not Get Everyone To Be Secure” and “Moving Forward.”

Default Encrypted Search Begins

Let’s talk particulars. On Tuesday, Google announced that by default, it would encrypt the search sessions of anyone signed in to Google.com. This means that when someone searches, no one can see the results that Google is sending back to them.
That’s good. Just as you might want your Gmail account encrypted, so that no one can see what you’re emailing, so you also may want the search results that Google is communicated back to you to be kept private.
That’s especially so because those search results are getting more personalized and potentially could be hacked. The EFF, in its post about Google’s change, pointed to two papers (here and here) about this.

Encryption Can Break Caller ID

There’s a side effect to encryption that involves what are called “referrers.” When someone clicks on a link from one web site that leads to another, most browsers pass along referrer data, which is sort of like a Caller ID for the internet. The destination web site can see where the person came from.
When someone comes from an encrypted site, this referrer information isn’t passed on unless they are going to another encrypted site. That means when Google moved to encrypted search, it was blocking this Caller ID on its end for virtually all the sites that it lists, since most of them don’t run encrypted or “secure” servers themselves.
This is a crucial point. Encryption — providing a secure web site — doesn’t block referrers if someone goes from one secure web site to another. Consider it like this:
  • Unsecure >>> passes referrer to >>> Unsecure
  • Secure >>> passes referrer to >>> Secure
  • Secure /// does NOT pass referrer to /// Unsecure

Google’s Referrer Problem

If everyone on the web ran secure servers, aside from the web being a more secure place in the way that Google itself wants it to be, the referrer hypocrisy that Google committed this week wouldn’t be an issue.
The vast majority of sites don’t run secure servers, of course. Tha posed a problem for Google. Referrers from search engines are unique. Since as long as we’ve had search engines — over 15 years — those links people click on from search engine results have contained the search terms people have used.
For publishers, this has made search marketing incredibly powerful. They are able to tell exactly what terms were used when someone found their web site, at a search engine like Yahoo, Bing or Google
Moving to secure searching meant that Google was suddenly, dramatically, no longer going to send this information to publishers, because as I’ve covered, virtually none of those publishers were running secure servers. As a result, Google almost certain realized there was going to be backlash.

Putting A Price On Privacy

Google could have endured the backlash, saying that if publishers still wanted this data, they could move to secure servers. Instead, it deliberately chose to override how referrers are passed, so that they would continue to be provided to just its advertisers.
Backlash, Google would endure, but it seems apparently not from those who made Google nearly $10 billion last quarter alone.
To solve this, Google changed from the standard way that referrers are supposed to be passed to its own unique system, which works like this:
  • Secure /// does NOT pass referrer to /// Unsecure unless…
  • Secure >>> passes referrer if ADVERTISER to >>> Unsecure
Let me be very clear. Google has designed things so that Caller ID still works for its advertisers, but not anyone else, even though the standard for secure services isn’t supposed to allow this. It broke the standard, deliberately, to prevent advertiser backlash.

The PR Plan For Publisher Backlash: It’s A Tiny Loss!

Google still knew there would be backlash from another group of publishers, those who have received this Caller ID referrer data from Google’s “free” or “organic” or “editorial” or “SEO” listings. What was the solution for that problem?
Here, Google seems to have a three-fold approach. First, suggest that only a tiny amount of data is being withheld. Some scoffed at Google’s estimate that I reported, that this would impact less than 10% of query data. But so far that seems to be holding true.
For example, here was our second most popular keyword sending us traffic from Google yesterday, according to Google Analytics:
“Not Provided” is what Google reports in cases when it now blocks referrers — or technically, it still provides referrers but is specifically stripping search terms out of them.
Our number two keyword! And yet, we received nearly 15,000 keyword-related visits from Google yesterday. These terms that were withheld amounted to only 2.6% of them.
On my personal blog, this is in about the 2% range. SEOmoz reported around 2%, as well.
These low figures will makes it easier for Google to gloss over publisher concerns, especially when they’re almost all being voiced by those in the SEO industry. The industry has a bad name, so if it’s against something, that can almost seem like a ringing endorsement for good.
Ars Technica had some comments like this, in response to its story on the Google change:
I’m playing the saddest song in the world on the smallest violin in the world. Poor, poor, SEO leaches
I AM completely unsympathetic. The sooner these SEO leeches, parasites, spammers and scammers die die die the better off the web will be.
Don’t make this mistake. This is not just an SEO issue. This is a user privacy issue. SEOs are simply the harbingers spotting Google’s hypocrisy around privacy.

The Data’s Still Around!

The second bit of PR messaging was to reassure that plenty of search data can be found in another way through the Google Webmaster Central service.
This is true. Google does provide search query data through this service, and it’s warmly welcomed by many site owners.
However, Google also provides search query data to its advertisers through the AdWords system. That’s the publisher equivalent to Google Webmaster Central.
Since advertisers can get data through AdWords, just as publishers can use Google Webmaster Central, why does Google still need to deliberately override how referrers would normally be blocked just for advertisers?
Google argued in its blog post that advertisers needed referrer data “to measure the effectiveness of their campaigns and to improve the ads and offers they present to you.” Outside of conversion tracking to the keyword level and retargeting, that doesn’t hold up, to me. I’ll get back to these.

Google Said Referrer Data Was Better

By the way, Google is on record as saying the data in Google Webmaster Central for publishers is not as good as referrer data.
This comes from an online exchange between Matt Cutts, the head of Google’s web spam team and who acts as a liaison on many publisher issues, and Gabriel Weinberg, the founder of tiny Duck Duck Go search engine who was challenging Google over providing referrer information.
Weinberg wrote:
So now that we know what is going on, why allow this personal information to leak? As far as I can tell, the only reason is so Webmasters can do better at Google SEO. And that reason can be wholly mitigated through the use of Google’s Webmaster Tools.
Cutts responded (and I’ll bold the key part):
Google’s webmaster tools only provide a sampling of the data. We used to provide info for only 100 queries. Now we provide it for more queries, but it’s still a sample.
Please don’t make the argument that the data in our webmaster console is equivalent to the data that websites can currently find in their server logs, because that’s not the case.
In January of this year, data from Google Webmaster Central was deemed inferior to referrer data. In October, it’s repositioned as an acceptable alternative to blocking referrers.

Referrers Are Private!

Google’s third and most important method of countering backlash is to make out that referrer data is somehow so private that it can no longer be provided to publishers. If you read closely, however, you understand that Google never actually takes this position. Rather, it’s implied.
Google’s blog post on the change made no mention — none — that this move was done because referrers had private information that might leak out. It was only about protecting the search results themselves:
As search becomes an increasingly customized experience, we recognize the growing importance of protecting the personalized search results we deliver.
Remember those studies I mentioned? Those were all about search results, not about referrers.
Referrers only get mentioned in Google’s post as a heads-up to publishers that they’ll be lost, and not because they’re also private and need to be protected but rather — well, Google doesn’t explain why. The implication is that they just have to go.
As I’ve read stories in the broader press, I’ve seen the assumption that Google is blocking referrers because it considers them to be private. Heck, I came away from my initial interview with Google when the news broke thinking the same thing.
It’s no wonder. Because Google has deliberately broken security to pass referrers to advertisers but not publishers, it had to lump that qualification into the overall security story. It made referrer blocking seem like it was done to protect privacy, rather than the troublesome side effect it really was.

But You Didn’t Say They Were Private Before

To emphasize how not-private Google has viewed referrer data, consider two issues.
The first was in 2009, when Google made a change to its search results that broke referrers from being passed. Publishers were upset, and Google restored referrers.
Cutts — who keep in mind is one of the people Google had talk about this week’s encryption change –  tweeted “yay” about the restoration. Clearly, he didn’t see any privacy issues being lost by it then. He was happy Google went out of its way to bring referrers back.
Think 2009 is too far back? OK, at the beginning of this year, Duck Duck Go — aside from buying a billboard to attack Google on privacy grounds – launched an illustrated guide to alleged Google privacy issues, including concerns over referrrer data.
In reaction to that, Cutts pushed back on referrers being a problem:
Referrers are a part of the way the web has worked since before Google existed. They’re a browser-level feature more than something related to specific websites.
When he was further challenged on the issue by Duck Duck Go’s founder Weinberg, Cutts specifically did not include referrers in a list of things that seemed to be private:
That was the same day we announced SSL search, which prevents referrers to http sites….
The fact is that Google has a good history of supporting privacy, from fighting overly broad subpoenas from the DOJ to SSL Search to creating a browser plugin to opt out of personalized advertising.
On a personal note here, I like Matt Cutts. I’m not trying to single him out unfairly by citing this stuff. He’s just a Googler extremely close to the issue, knowledgable about it and even when speaking in a semi-official manner, it reflects back on what’s true with Google.
Personally, I get the impression he might not agree with the referrer blocking for publishers but is going to put the best spin he can on a decision that his company made. Just my gut feel, and no special knowledge here. I could be wrong. Maybe I can get him to share more later.

Google Change Benefits Google

I think it’s fair to say that Google has not agreed with the view that referrers are private, nor has it clearly said referrers were blocked to protect privacy.
So why do it? One reason is that it makes Google more competitive. If someone lands on your web site, and you know the search term they used, you can then target them in various ways across the web with ads you believe reflect that search interest. All you need is the initial term.
This is called “retargeting,” and Google’s a leading provider of retargeted ads. When you cut the referrers out, except for your own advertisers, Google makes it harder for its competitors to offer retargeting services. Search marketers already understand this. Wait until Google’s anti-trust enemies clue in. They’ll be swooping in on this one (and we’ll have more to say on it in the future).
Another benefit is that it prevents anyone but Google’s own advertisers from doing keyword-level conversion tracking. With search referrers, you can determine what someone who searched for a particular term later did on your site. What further pages did they go to? Did they purchase a product or service? Without the search terms, you can’t do this degree of analysis.
That is, of course, unless you buy an ad. Conversion tracking at the keyword level turns into another sales feature for Google.

Didn’t Think Or Don’t Care?

I think the biggest reason Google hasn’t fixed the broken referrer problem is either that it just didn’t care about publishers or didn’t really think through the issues more.
Either is bad. The latter has some weight. Consider the last time that Google broke referrers, Cutts explained that the impact just hadn’t been considered:
[Cutts] says the team didn’t think about the referrer aspect. So they stopped. They’ve paused it until they can find out how to keep the referrers.
Surely someone had to have thought about the impact this time? Someone decided that it was a good idea to keep passing referrer information to advertisers. Someone decided that for whatever reason — and it wasn’t privacy — that publishers couldn’t keep getting this information. But what that reason is remains unclear.

Why Not Get Everyone To Be Secure?

What I do know is that Google missed a huge opportunity to make the entire web much more secure. Google could have declared that it was shifting its default search for everyone  – not just logged-in users — to be secure. Privacy advocates would have loved this even more than the current change which, using Google’s own figures, protect less than 10% of Google.com searchers.
Google could have also said that if anyone wanted to continue receiving referrer data, they needed to shift to running secure servers themselves. Remember, referrers pass from secure server to secure server.
Millions of sites quickly adopted Google +1 buttons in the hopes they might get more traffic from Google. Those same millions would have shifted — and quickly — over to secure servers in order to continue receiving referrer data.
Better protection across the web for everyone, while maintaining the unwritten contract between search engines and the publishers that support them to provide referrer data. That would have been a good solution. Instead, we got Google providing protection for a sliver of those searching, withholding data from the majority of sites that support it and solving problems only for its advertisers.

Moving Forward

I’m expecting to talk further to Google about these issues, which I raised with the company right after writing my initial story. I’m still waiting for them to find anyone appropriate higher up in the company to respond. Fingers-crossed. The best I could get so far was this statement:
We’ve tried to strike a balance here — improving privacy for signed in users while also continuing to provide substantial query data to webmasters.
To conclude, I think the move to secure searching is great. I’d like to see more of it.
As for referrers, there are some who do believe that they are private. Chris Soghoian is a leading advocate about this, and I’d recommend anyone who wants to understand more to read the blog post he wrote about an FTC complaint he filed over the issue. Read the complaint, too. Also see Duck Duck Go’s DontTrack.us site.
In terms of Google blocking referrers, it already blocks tons of stuff it considers private from its search suggestions. Conceivably, it could use the same technology to filter search referrers, to help publishers and protect users.
But aside from that, if Google thinks this needs to be done for privacy reasons, then it needs to block referrers for everyone and not still allow them to work for advertisers. That move is one of the most disturbing, hypocritical things I’ve ever seen Google do. It also needs to take the further step and stop its own Chrome browser from passing them.
If blocking referrers isn’t a privacy issue, then Google needs provider referrer data to all publishers, not just those who advertise.

No comments:

Post a Comment