Google Research Blog
The latest news from Research at Google
Large-scale graph computing at Google
Monday, June 15, 2009
Posted by Grzegorz Czajkowski, Systems Infrastructure Team
If you squint the right way, you will notice that graphs are everywhere. For example, social networks, popularized by
, are graphs that describe relationships among people. Transportation routes create a graph of physical connections among geographical locations. Paths of disease outbreaks form a graph, as do games among soccer teams, computer network topologies, and citations among scientific papers. Perhaps the most pervasive graph is the web itself, where documents are vertices and links are edges. Mining the web has become an important branch of information technology, and at least one
major Internet company
has been founded upon this graph.
Despite differences in structure and origin, many graphs out there have two things in common: each of them keeps growing in size, and there is a seemingly endless number of facts and details people would like to know about each one. Take, for example, geographic locations. A relatively simple analysis of a standard map (a graph!) can provide the shortest route between two cities. But progressively more sophisticated analysis could be applied to richer information such as speed limits, expected traffic jams, roadworks and even weather conditions. In addition to the shortest route, measured as sheer distance, you could learn about the most scenic route, or the most fuel-efficient one, or the one which has the most rest areas. All these options, and more, can all be extracted from the graph and made useful — provided you have the right tools and inputs. The web graph is similar. The web contains billions of documents, and that number increases daily. To help you find what you need from that vast amount of information, Google extracts more than 200 signals from the web graph, ranging from the language of a webpage to the number and quality of other pages pointing to it.
In order to achieve that, we have created scalable infrastructure, named Pregel, to mine a wide range of graphs. In Pregel, programs are expressed as a sequence of iterations. In each iteration, a vertex can, independently of other vertices, receive messages sent to it in the previous iteration, send messages to other vertices, modify its own and its outgoing edges' states, and mutate the graph's topology (experts in parallel processing will recognize that the
Bulk Synchronous Parallel Model
Currently, Pregel scales to billions of vertices and edges, but this limit will keep expanding. Pregel's applicability is harder to quantify, but so far we haven't come across a type of graph or a practical graph computing problem which is not solvable with Pregel. It computes over large graphs much faster than alternatives, and the application programming interface is easy to use. Implementing
, for example, takes only about 15 lines of code. Developers of dozens of Pregel applications within Google have found that "thinking like a vertex," which is the essence of programming in Pregel, is intuitive.
We've been using Pregel internally for a while now, but we are beginning to share information about it outside of Google.
will be speaking at the joint industrial track between
this August on the very subject. In case you aren't able to join us there, here's a spoiler:
The seven bridges of Königsberg
— inspiration for
famous theorem that established the basics of graph theory — spanned the Pregel river.
Adaptive Data Analysis
Automatic Speech Recognition
Electronic Commerce and Algorithms
Google Science Fair
Google Voice Search
High Dynamic Range Imaging
Internet of Things
Natural Language Processing
Natural Language Understanding
Optical Character Recognition
Public Data Explorer
Security and Privacy
Site Reliability Engineering
Give us feedback in our
Official Google Blog
Public Policy Blog
Lat Long Blog
Ads Developer Blog
Android Developers Blog