What Google Is

Posted: July 15th, 2012 | Author: | Filed under: TechCrunch | Tags: | No Comments » | 0 views

google-focused4

Editor’s note: Benjy Weinberger is the engineering site lead for foursquare’s San Francisco office. He previously worked on infrastructure and revenue engineering at Twitter, and before that on search and ad engineering at Google for eight years.

No, really, what is Google? TechCrunch co-editor Alexia Tsotsis recently posted an interesting piece about Google’s focus, or rather the perceived lack of it. Google has its fingers in so many pies that there are quite a few angles from which to consider the above question.

The title of Alexia’s post says it all: “Remember When Google Was a Search Engine?” For consumers, Google is, or at least used to be, a search company. On the other hand, for investors, and cynics, Google is an ad network. That is, after all, where the money comes from.

But, as a former Googler and unabashed fan of the company (take this as both full disclosure and a disclaimer), I have a different perspective. For me, Google is, and always has been, a systems company.

Systems First

Most startups begin by focusing on the product: user experience, design, features, marketing and so on. These companies rely primarily on hosted or off-the shelf systems infrastructure, and focus their engineering resources on the front end elements, the things that make their company unique.

But some of these startups enjoy massive growth, and their traffic increases to the point where they can no longer scale with general-purpose systems. This is an important inflection point in a company’s life: you either hire a bunch of engineers with systems experience to develop the custom technology you need to scale, or you sell the company and let someone else worry about it.

Google, however, had a very different technology trajectory … It did systems first. This isn’t really that surprising: the front end user experience in a search engine, at least back in 1998, was dirt simple, an HTML form with a single input box and a ‘Search’ button.

The tricky parts of search were crawling the web, indexing the content and retrieving relevant results very quickly. These problems required an ability to run complex computations in parallel on large numbers of computers, while being resilient to failure of any one of them. In other words, web search is fundamentally a distributed systems problem, as well as, more obviously, an Information Retrieval (IR) problem.

As a result, Google focused on systems from day one. It hired the best and the brightest, such as the now-renowned Jeff Dean and Sanjay Ghemawat, legendary Bell Labs pioneers Rob Pike and Ken Thompson, and many other incredibly talented systems engineers, both famous and anonymous (note: I don’t count myself in that number. I was just lucky to get to work with these folks).

The outcome was that distributed systems are a core part of Google’s DNA, even more so than search.

The Google Iceberg

Once Google had its formidable systems in place, many applications suggested themselves, applications that in some cases only Google was able to build. Most of what consumers see of Google, from search to Gmail to ads to Google Docs to book scanning to YouTube, are the one-tenth of the iceberg that sticks out of the water.

What connects these seemingly disparate products is the submerged nine-tenths: Google’s planet-scale distributed systems. Even seemingly left-field projects, such as the self-driving car, benefit from Google’s unrivaled data-crunching ability.

There are other companies with world-class systems proficiency, such as Amazon, Yahoo! and Microsoft. But Google casts an unusually long shadow over the rest of Silicon Valley. The bulk of the technologies that power so many startups out there, from distributed filesystems to MapReduce to NoSQL databases, were primarily invented at Google. And the company has served as such a wellspring of talent for startups that its technical influence has spread wide, despite being a meager contributor to the open-source world (*).

Trimming from the Middle

Of course not everything Google does is driven by a technology-first attitude. Android and Google+, for example, address strategic threats to Google’s core business, and Google obviously has to pursue them. But the technology behind even the less successful of these is first-rate.

While Google’s product karma is hit-and-miss, the company’s systems prowess gives both management and employees confidence that they can solve hard problems no one else can tackle, including moonbeam problems such as augmented reality glasses and self-driving cars. Whether Google should be tackling these problems is a matter of opinion, but doing so is endemic to the company.

Between these two extremes however, are the middle-ground projects, and it’s these, neither strategic nor epic, that Larry Page is trying to pare down as CEO. If Google doesn’t need it, and Google isn’t uniquely positioned to do it, then why do it?

What binds all the different Google efforts together then, is not an overarching plan, but an underlying technology platform. This may not form a coherent vision, but great things will continue to come from it. As well as no small number of duds.

Note: Huge credit goes to Yahoo!, Facebook, Twitter and other companies for creating open-source versions of these technologies, both for their own use and for the benefit of the community at large. Google publishes many papers on these technologies, but keeps their own implementations proprietary (their technology stack is too tightly integrated to open-source just parts of it), requiring the open-source community to re-implement the publications.




Ad Targeting Is Hard

Posted: July 14th, 2012 | Author: | Filed under: TechCrunch | No Comments » | 0 views

Screen Shot 2012-07-14 at 4.26.42 PM

Editor’s note: Benjy Weinberger is the engineering site lead for foursquare’s San Francisco office. He previously worked on infrastructure and revenue engineering at Twitter, and before that on search and ad engineering at Google for eight years.

Microsoft recently announced that it’s taking a huge $6.2 Billion writedown over the failed aQuantive acquisition. This news, and the scrutiny of Facebook’s business model following their IPO drama, show that, in online advertising, it’s all about the targeting.

As this Reuters analysis explains, there’s so much online advertising space that merely putting billboards up all over the internet is no longer a lucrative business. Meanwhile, Google AdWords remains phenomenally successful, generating over $36B in revenue in 2011. The key difference? targeting. Google’s sophisticated ad-targeting algorithms greatly increase the relevance to the user, and therefore the likelihood of the user clicking on an ad. This is what makes AdWords so much more effective than banner ads.

So why isn’t everyone just improving their targeting? Unfortunately, it’s not that simple. Ad targeting is a difficult artificial intelligence (AI) problem, and while you may not agree that it’s a worthy one, it does require a lot of technical heavy lifting. Here’s why:

The Algorithm

A targeting algorithm take everything you know about the impression – search keywords, location, demographics, previous user activity, time of day, the previous CTR (clickthrough rate) of the ad and so on – and uses that to choose from among millions of candidate ads the one to show. And it has to do this in a fraction of a second. This is not a trivial problem. Can you think, offhand, how you’d do it? If so, I’d like to talk to you about a Data Scientist role at foursquare…

Ad targeting is a relevance problem somewhat similar to web search: given a huge repository of information, and whatever we know about what the user is looking for, find the most relevant information and return it. While the algorithms are not the same, and indeed Google has two entirely separate divisions solving each problem, both for technical and ethical reasons, the difficulty is similar.

Basically, to even begin to tackle ad targeting, you need top-notch data scientists with PhDs in Machine Learning, Information Retrieval or other AI fields. If you’ve spent any time at all at a startup you know how hard it is to hire these people.

The Data

Even once you have an algorithm, it’s not much use without data. The more you know about the user, the more precise your algorithm can be. This is not just for the obvious reason that you need something to target by,  but also because you need to train your algorithm. Machine Learning algorithms are so-called because they adapt through an iterative process: you feed them a set of training data, along with the expected results, and they slowly increase their precision, in a manner analogous to human learning.

The kind of data you can gather depends largely on the consumer service you provide: Google knows a lot about your current intent, via your search keywords. Facebook knows a lot about your context, via your social activity. So far, intent appears to be more valuable than context when it comes to ad targeting. But the holy grail is to have both, which partly explains Google+.

For precise targeting you need a lot of data, particularly about current intent, and this is hard to come by for any but the most successful services.

The Systems

Assuming you have the algorithms and the data, you still have the problem of how to apply them efficiently. You can’t let your user wait around while you laboriously figure out which ads to show. Ads systems are typically expected to return a result within a few hundred milliseconds. It takes very large, very complex distributed systems to pull this off. Google’s SmartASS system, for example, is one of the best-engineered systems I’ve ever encountered. Systems of this sophistication are hard to build.

The Virtuous Trinity

All too often, online advertising is a zero-sum game. The more intrusive display ads are to the user, the more benefit the advertiser perceives. And in a CPI (cost-per-impression) paradigm, the ad publisher is firmly on the side of the advertiser.

But with strong targeting and CPC (cost-per-click) billing, a virtuous trinity emerges: the more relevant an ad is, the happier the user is, and the more likely to click on the ad. This gets the advertiser more engagement, and the ad publisher (who gets paid per click) more money. All three participants are incentivized by better targeting, and all share in the created value. Creating this virtuous trinity, rather than spamming the web with banner ads, is how to truly succeed in online advertising.

To get ad targeting right you need a combination of cutting-edge algorithms, sophisticated systems and mountains of relevant data. Putting all these in place requires a world-class engineering team, the right product, and a lot of users. Not many companies have all these assets.

Microsoft does though, which makes this recent news somewhat perplexing. I guess that just having the ability to do something isn’t enough. Whether you actually get out there and do it or not is the $6.2B question.