Kamis, 25 Februari 2010

This stuff is tough

Yesterday's news that the European Commission has opened a preliminary inquiry into competition complaints from three companies has generated a lot of questions about how Google's ranking works. Here, Amit Singhal, a Google Fellow responsible for ranking, who has worked in search for almost 20 years, explains the principles behind our algorithm.

Pop quiz. Get ready. You're only going to have a few milliseconds to answer this question, so look sharp. Here goes: "know the way to San Jose?" Now display the answer on a screen that’s about 14 inches wide and 12 inches tall. Find the answer from among billions and billions of documents. Wait a second - is this for directions or are we talking about the song? Too late. Just find the answer and display it. Now on to the next question. Because you'll have to answer hundreds of millions each day to do well at this test. And in case you find yourself getting too good at it, don’t worry: at least 20% of those questions you get every day you’ll have never seen before. Sound hard? Welcome to the wild world of search at Google. More specifically, welcome to the world of ranking.

Google ranking is a collection of algorithms used to seek out relevant and useful results for a user's query. There's a ton that goes into building a state-of-the-art ranking system like ours. Our algorithms use hundreds of different signals to pick the top results for any given query. Signals are indicators of relevance, and they include items as simple as the words on a webpage or more complex calculations such as the authoritativeness of other sites linking to any given page. Those signals and our algorithms are in constant flux, and are constantly being improved. On average, we make one or two changes to them every day. Lately,
I’ve been reading about whether regulators should look into dictating how search engines like Google conduct their ranking. While the debate unfolds about government-regulated search, let me provide some general thinking behind our approach to ranking. Future ranking experts (inside or outside government) might find it helpful. Our philosophy has three main elements:

1. Algorithmically-generated results.
2. No query left behind.
3. Keep it simple.

After nearly two decades, I’ve lost count of how many times I've been asked why Google chooses to generate its search results algorithmically. Here's how we see it: the web is built by people. You are the ones creating pages and linking to pages. We are utilizing all this human contribution through our algorithms to order and rank our results. We think that's a much better solution than a hand-arranged one. Other search engines approach this differently -- selecting some results one at a time, manually curating what you see on the page. We believe that approach which relies heavily on an individual's tastes and preferences just doesn't produce the quality and relevant ranking that our algorithms do. And given the hundreds of millions of queries we have to handle every day, it wouldn't be feasible to handle each by hand anyway.

This brings me to the next point: leaving no query behind. Usually once I've explained to people the thinking behind algorithmically-generated results, some will ask me, "But what if you do a search, and the results you see are just plain lousy? Why wouldn't you just go in there by hand and change them?" The part of this question that's valid is in terms of lousy results. It happens. It happens all the time. Every day we get the right answers for people, and every day we get stumped. And we love getting stumped. Because more often than not, a broken query is just a symptom of a potential improvement to be made to our ranking algorithm. Improving the underlying algorithm not only improves that one query, it improves an entire class of queries, and often for all languages around the world in over 100 countries. I should add, however, that we do have clear written policies for websites that are included in our results, and we do take action on sites that are in violation of our policies or for a small number of other reasons (such as legal requirements, child porn, spam, viruses/malware, etc.). But those cases are quite different from the notion of rearranging the page you see one result at a time.

Finally, simplicity. This seems pretty obvious. Isn't it the desire of all system architects to keep their systems simple? We work very hard to keep our system simple without compromising on the quality of results. This is an ongoing effort, and a worthy one. Our commitment to simplicity has allowed us innovate quickly, and it shows.

Ultimately, search is nowhere near a solved problem. Although I've been at this for almost two decades now, I'd still guess that search isn't quite out of its infancy yet. The science is probably just about at the point where we're crawling. Soon we'll walk. I hope that in my lifetime, I'll see search enter its adolescence.

In the meantime, we're working hard at our ongoing pop quizzes. Here's one last one: "search engine." In 0.14 seconds from among a few hundred million pages,
our initial results are: AltaVista, Dogpile Web Search, Bing and Ask.com. I guess I'd better get back to work.

Posted by: Amit Singhal, Google Fellow

Update 2 March, 10:30am
First of all, let me thank everyone for their kind comments and honest views in this discussion. Gary, I love search, after having done search for almost 20 years, I still come into work every morning like a kid going to a candy store. Alongside my passion for search, one fact that keeps me so excited is that what was science fiction in search research twenty years ago is now coming to fruition at Google. The semantic systems we have built are something I didn't expect to build in my lifetime. Secondly, Google has given me an environment where researchers like me can practice search in its pure algorithmic form. I can't put in words how incredibly satisfying this combination is for a search geek like me :-)

Posted by: Amit Singhal, Google Fellow

Rabu, 24 Februari 2010

Serious threat to web in Italy

Cross-posted from the Official Google Blog

In late 2006, students at a school in Turin, Italy filmed and then uploaded a video to Google Video that showed them bullying an autistic schoolmate. The video was totally reprehensible and we took it down within hours of being notified by the Italian police. We also worked with the local police to help identify the person responsible for uploading it and she was subsequently sentenced to 10 months community service by a court in Turin, as were several other classmates who were also involved. In these rare but unpleasant cases, that's where our involvement would normally end.

But in this instance, a public prosecutor in Milan decided to indict four Google employees —David Drummond, Arvind Desikan, Peter Fleischer and George Reyes (who left the company in 2008). The charges brought against them were criminal defamation and a failure to comply with the Italian privacy code. To be clear, none of the four Googlers charged had anything to do with this video. They did not appear in it, film it, upload it or review it. None of them know the people involved or were even aware of the video's existence until after it was removed.

Nevertheless, a judge in Milan today convicted 3 of the 4 defendants — David Drummond, Peter Fleischer and George Reyes — for failure to comply with the Italian privacy code. All 4 were found not guilty of criminal defamation. In essence this ruling means that employees of hosting platforms like Google Video are criminally responsible for content that users upload. We will appeal this astonishing decision because the Google employees on trial had nothing to do with the video in question. Throughout this long process, they have displayed admirable grace and fortitude. It is outrageous that they have been subjected to a trial at all.

But we are deeply troubled by this conviction for another equally important reason. It attacks the very principles of freedom on which the Internet is built. Common sense dictates that only the person who films and uploads a video to a hosting platform could take the steps necessary to protect the privacy and obtain the consent of the people they are filming. European Union law was drafted specifically to give hosting providers a safe harbor from liability so long as they remove illegal content once they are notified of its existence. The belief, rightly in our opinion, was that a notice and take down regime of this kind would help creativity flourish and support free speech while protecting personal privacy. If that principle is swept aside and sites like Blogger, YouTube and indeed every social network and any community bulletin board, are held responsible for vetting every single piece of content that is uploaded to them — every piece of text, every photo, every file, every video — then the Web as we know it will cease to exist, and many of the economic, social, political and technological benefits it brings could disappear.

These are important points of principle, which is why we and our employees will vigorously appeal this decision.

Posted by Matt Sucherman, VP and Deputy General Counsel - Europe, Middle East and Africa

Selasa, 23 Februari 2010

Committed to competing fairly

As Google has grown, we've not surprisingly faced more questions about our role in the advertising ecosystem and our overall approach to competition. This kind of scrutiny goes with the territory when you are a large company. However, we've always worked hard to ensure that our success is earned the right way -- through technological innovation and great products, rather than by locking in our users or advertisers, or creating artificial barriers to entry.

The European Commission has notified us that it has received complaints from three companies: a UK price comparison site, Foundem, a French legal search engine called ejustice.fr, and Microsoft's Ciao! from Bing. While we will be providing feedback and additional information on these complaints, we are confident that our business operates in the interests of users and partners, as well as in line with European competition law.

Given that these complaints will generate interest in the media, we wanted to provide some background to them. First, search. Foundem - a member of an organisation called ICOMP which is funded partly by Microsoft - argues that our algorithms demote their site in our results because they are a vertical search engine and so a direct competitor to Google. ejustice.fr's complaint seems to echo these concerns.

We understand how important rankings can be to websites, especially commercial ones, because a higher ranking typically drives higher volumes of traffic. We are also the first to admit that our search is not perfect, but it's a very hard computer science problem to crack. Imagine having to rank the 272 million possible results for a popular query like the iPod on a 14 by 12 screen computer screen in just a few milliseconds. It's a challenge we face millions of times each day.

Our algorithms aim to rank first what people are most likely to find useful and we have nothing against vertical search sites -- indeed many vertical search engines like Moneysupermarket.com, Opodo and Expedia typically rank high in Google's results. For more information on this issue check out our guidelines for webmasters and advertisers, and for an independent analysis of Foundem's ranking issues please read this report by Econsultancy.

Regarding Ciao!, they were a long-time AdSense partner of Google's, with whom we always had a good relationship. However, after Microsoft acquired Ciao! in 2008 (renaming it Ciao! from Bing) we started receiving complaints about our standard terms and conditions. They initially took their case to the German competition authority, but it now has been transferred to Brussels.

Though each case raises slightly different issues, the question they ultimately pose is whether Google is doing anything to choke off competition or hurt our users and partners. This is not the case. We always try to listen carefully if someone has a real concern and we work hard to put our users' interests first and to compete fair and square in the market. We believe our business practices reflect those commitments.

Senin, 15 Februari 2010

Working with European academics

Google grew out of an academic experiment and we continue to value a strong dialogue with universities around the globe. While we do significant in-house research and engineering, we also maintain strong relations with leading academic institutions world-wide pursuing research in areas of common interest.

And each year, faculty members across a broad range of computing disciplines visit Google to explore the latest research and technology results and discuss the challenges the community faces. We recently hosted the third Google EMEA Faculty Summit in our Zurich office, the largest of our engineering centres in the region. One hundred computer science academics from 62 leading universities throughout Europe, Africa and the Middle East joined over 80 Google engineers for three days of exciting dialogue. As our VP of Research Alfred Spector put it in his keynote: "We need to maintain strong relationships with the academic community, we can't be an island unto ourselves."

The event featured contributions from receipients of Google's support, including Dr. Andy Hopper of Cambridge University, whose research group recently received a Focused Research Award for their work on 'Computing for the Future of the Planet', and Dr. Frank Stajano, also of Cambridge and the newest addition to Google's Visiting Faculty Program. Attendees also heard about Google Transit – which started out as a "20% time" project and achieved fruition thanks to Faculty member Hannah Bast, who, prior to her current position at the University of Freiburg, spent a year on sabbatical working with the Google Zurich team as part of our Visiting Faculty programme.

However, the real aim of the event is to provide Google employees and academics maximum opportunities for networking, discussion and collaboration. Attendees participated in day-long 'stream' discussions on themes ranging from Privacy and Security – with the participation of leading researchers such as Professor Ross Anderson – to Natural Language Technologies, featuring NLP expert Fred Jelinek. The groups also looked at mobile applications and, more generally, the current challenges that our search and advertising engineers are working on.

But our relations with universities is of course not simply an annual conference. Our University Relations initiatives supports university research, technological innovation and the teaching and learning experience through a variety of programs. We offer awards through the Faculty Research Awards program and fund specific research in areas of study that are of key interest to Google as well as the research community, through our Google Focused Research Awards program. Through the Google Visiting Faculty Program, faculty are invited for 6-12 month periods to join Google research teams on projects of mutual interest. We covered some of our other initiatives in an earlier post.

We will be posting more about our work in this area over the coming months, and we are certainly interested in expanding our collaborations in EMEA.

Post by Vicky Greaves, University Programmes Specialist

Jumat, 12 Februari 2010

Tech talk: the future of browsers


A browser is that, just a browser. Nothing special. Right?
Ever thought about how much time you actually spent working 'inside' your browser? We search, chat, email and collaborate in a browser. And like most of you, in our spare time, we shop, bank, read news and keep in touch with friends - all using a browser. Hell, you spent probably more time inside your browser than inside your car !

Since Google engineers spend so much time online, they began seriously thinking about what kind of browser could exist if you started from scratch and built on the best elements out there. They realized that what was needed was not just a browser, but also a modern platform for web pages and applications: and that is what Google has set out to build. The result: Google Chrome.

On the surface, we designed a browser window that is streamlined and simple. Like the classic Google homepage, Google Chrome is clean and fast. It gets out of your way and gets you where you want to go.

Under the hood, Google engineers were able to build the foundation of a browser that runs today's complex web applications much better . By keeping each tab in an isolated "sandbox", we were able to prevent one tab from crashing another and provide improved protection from rogue sites. We improved speed and responsiveness across the board. And we keep on adding stunning features to Google Chrome.

Want to have a peek under the hood? Google invites you to a Tech Talk by Chrome product manager Anders Sandholm; and you'll understand why a browser is so much more than a window on the Internet. Spend your lunchtime with us and you will be better informed when in a couple of weeks, you'll be asked to make a choice about which browser you want to spend your time in.

If you want to attend, please register here. While this event is primarily aimed at policy makers from EU institutions, we'll be happy to welcome a wider audience if we have enough chairs.

When: Thursday February 25, 12:15 - 13:45 hours CET (Sandwich lunch provided).
Where: Google Brussels - Chaussée D'Etterbeek 180 - Steenweg op Etterbeek 180, 2nd floor, 1040 Brussels

About our Tech Talks: Ever wondered how exactly Google is tackling the big technology problems that the online world faces ? Want to take a look behind the curtain of our engineering operations and learn from the people who actually work on the Google products and services day-in, day-out? Here's your chance: The Google Brussels TechTalks

Posted by Alain Van Gaever, Telecom Policy Manager and Matthias Graf, European Head of Engineering Communications

Rabu, 10 Februari 2010

Think big with a gig

Google today made an exciting announcement of its attention to build and test ultra-high speed broadband networks in the United States. While we have no similar plans for Europe, we think this is the type of action needed to make the Internet faster and better. The European Commission has endorsed a new digital strategy which puts at its center the construction of faster broadband for all Europeans. Significant private and public sector investments will be required to meet this ambitious goal. We'll be returning to this theme in coming months.

Imagine sitting in a rural health clinic, streaming three-dimensional medical imaging over the web and discussing a unique condition with a specialist in Paris or London. Or procuring a high-definition, full-length feature film in less than five minutes. Or collaborating with classmates around the world while watching live 3-D video of a university lecture. Universal, ultra high-speed Internet access will make all this and more possible. We've urged the FCC to look at new and creative ways to get there in its National Broadband Plan – and today we're announcing an experiment of our own.

Google is planning to build and test ultra-high speed broadband networks in one or more trial locations across the United States.

In our U.S. experiment, we'll deliver Internet speeds more than 100 times faster than what most Europeans have access to today with 1 gigabit per second, fiber-to-the-home connections. We'll offer service at a competitive price to at least 50,000 and potentially up to 500,000 people.

Our goal is to experiment with new ways to help make Internet access better and faster for everyone. Here are some specific things that we have in mind:

  • Next generation apps: We want to see what developers and users can do with ultra high-speeds, whether it's creating new bandwidth-intensive "killer apps" and services, or other uses we can't yet imagine.

  • New deployment techniques: We'll test new ways to build fiber networks, and to help inform and support deployments elsewhere, we'll share what we learn with the world.

  • Openness and choice: We'll operate an "open access" network, giving users the choice of multiple service providers. And consistent with our past advocacy, we'll manage our network in an open, non-discriminatory, and transparent way.

Like our WiFi network in Mountain View, the purpose of this project is to experiment and learn. Many network providers are making real progress to expand and improve high-speed Internet access, but there's still more to be done. We don't think we have all the answers – but through our trial, we hope to make a meaningful contribution to the shared goal of delivering faster and better Internet for everyone.

As a first step, today we're putting out a request for information (RFI) to help identify interested communities. We welcome responses from local government, as well as members of the public. If you'd like to respond, click here to learn more, or check out our video:

We'll collect responses until March 26th, and will announce our target communities later this year. Stay tuned.

Posted by Minnie Ingersoll and James Kelly, Product Managers

Senin, 01 Februari 2010

An engineer's perspective on privacy

Ever wondered what data Google's search engine collects and why we retain search logs for certain periods of time? Here's a hint: it's not to personalise advertising as many people wrongly assume.

Our first ever Brussels Tech Talk last week was about this and other questions on online privacy, given that it was Data Protection Day. Dr Alma Whitten, Google's engineering lead for privacy, addressed a full room of policy makers and other interested stakeholders. Alma demonstrated how we harness the power of data to "learn from the good guys, fight the bad guys, and invent the future." You can watch the video of the talk, and follow along with her presentation below:

While the technology is complicated, the explanation is simple: log data enable our engineers to refine algorithms for the benefit of all search users. If clicking on the top results occurs for any given query, it signals that we are doing something right. If people are hitting 'next page' or typing in another query, we learn something is wrong. Every time a user searches on the web, you benefit from what Google has learned from millions of previous searches. However, rather than a solved problem, the search science is still in its infancy. By launching hundreds of innovations in search just during the last year, we're constantly trying to improve search so you'll hopefully find among the first results the website that contains the answer you were looking for in midst of more than a trillion unique URLs.

We aim to always balance innovative product development with a serious respect for users' privacy. For us, this process starts with providing transparency and allowing users control. Alma explained the ways we're working to provide our users with more transparency and choice: things like the Ads Preferences Manager, Google Dashboard, and Data Liberation Front. And she referred to the challenges engineers face to achieve transparency and control with respect to different categories of data such as logged-in vs. unauthenticated data.

In the coming months, the Brussels office will be hosting more TechTalks, where other Google engineers will share what they're working on, how they approach solving some exciting challenges, and the opportunities they see coming up. We'll announce the talks on this blog. Keep tuned.

Posted by Sebastian Müller, European Policy Manager

P.S. The video's sound quality could be better - we're arranging for superior recording equipment for the next Brussels TechTalk.