Information Blog: 2013

Showing posts with label 2013. Show all posts

Conference Report USENIX Annual Technical Conference ATC 2013

Posted by gilogo at 11:15 AM Labels: 2013, annual, atc, computer, conference, report, technical, usenix

Posted by Murray Stokely, Google Storage Analytics Team

This year marks Google’s eleventh consecutive year as a sponsor of the USENIX Annual Technical Conference (ATC), just one of the co-located events at USENIX Federated Conference Week (FCW), which combines numerous conferences and workshops covering fields such as Autonomic Computing, Feedback Computing and much more in an intensive week of research, trends, and community interaction.

ATC provides a broad forum for computing systems research with an emphasis on implementations and experimental results. In addition to the Googlers presenting publications, we had two members on the program committee of ATC and several keynote speakers, invited speakers, panelists, committee members, and participants at the other co-located events at FCW.

In the paper Janus: Optimal Flash Provisioning for Cloud Storage Workloads, Googler Christoph Albrecht and co-authors demonstrated a system that allows users to make informed ?ash memory provisioning and partitioning decisions in cloud-scale distributed ?le systems that include both ?ash storage and disk tiers. As ?ash memory is still expensive, it is best to use it only for workloads that can make good use of it. Janus creates long term workload characterizations based on RPC samples and file age metadata. It uses these workload characterizations to formulate and solve an optimization problem that maximizes the reads sent to the flash tier. Based on evaluations from workloads using Janus, in use at Google for the past 6 months, the authors conclude that the recommendation system is quite effective, with ?ash hit rates using the optimized recommendations 47-76% higher than the option of using the ?ash as an unpartitioned tier.

In packetdrill: Scriptable Network Stack Testing, from Sockets to Packets, Google’s Neal Cardwell and co-authors showcased a portable, open-source scripting tool that enables testing the correctness and performance of network protocols. Despite their importance in modern computer systems, network protocols often undergo only ad hoc testing before their deployment, in large part due to their complexity. Furthermore, new algorithms have unforeseen interactions with other features, so testing has only become more daunting as TCP has evolved. The packetdrill tool was instrumental in the development of three new features for Linux TCP—Early Retransmit, Fast Open, and Loss Probes—and allowed the authors to ?nd and ?x 10 bugs in Linux. Furthermore, the team uses packetdrill in all phases of the development process for the kernel used in one of the world’s largest Linux installations. In the hope that sharing packetdrill with the community will make the process of improving Internet protocols an easier one, the source code and test scripts for packetdrill have been made freely available.

There were also additional refereed publications with Google co-authors at some of the co-located events at FCW, notably NicPic: Scalable and Accurate End-Host Rate Limiting, which outlines a system which enables accurate network traffic scheduling in a scalable fashion, and AGILE: Elastic Distributed Resource Scaling for Infrastructure-as-a-Service, a system that efficiently handles dynamic application workloads, reducing both penalties and user dissatisfaction.

Google is proud to support the academic community through conference participation and sponsorship. In particular, we are happy to mention one of the other interesting papers from this year’s USENIX FCW, co-authored by former Google PhD fellowship recipient Ashok Anand, MiG: Efficient Migration of Desktop VM Using Semantic Compression.

USENIX is a supporter of open access, so the papers and videos from the talks are available on the conference website.

2013 Google PhD Fellowships 5 Years of Supporting the Future of Computer Science

Posted by gilogo at 8:19 PM Labels: 2013, 5, computer, fellowships, future, google, of, phd, science, supporting, the, years

Posted by Michael Rennaker, Google University Relations

We are extremely excited to announce the 2013 Global Google PhD Fellows. From all around the globe, these 39 PhD students represent the fifth class in the program’s history, a select group recognized by Google researchers and their institutions as some of the most promising young academics in the world. As we welcome the newest class of PhD Fellows, we take a look back at the program’s roots and hear from two past recipients.

In 2009, Google launched its PhD Fellowship Program, created to recognize and support outstanding graduate students pursuing work in computer science, related disciplines or promising research areas. In its inaugural year, 13 United States PhD students were awarded fellowships, drawn from an extremely competitive pool of applicants. The global program now covers Europe, China, India and Australia and continues to draw some of the best young researchers, reflecting Google’s commitment to building strong relations with the academic community.

Among those first recipients of the fellowship award are 2009 PhD Fellow Roxana Geambasu, Visiting Professor in the Computer Science Department at Columbia University, and 2010 European Doctoral Fellow Roland Angst, Visiting Assistant Professor at Stanford University and affiliated with the Max Planck Center for Visual Computing and Communication. As early recipients of the award, Roxana and Roland reflect on the impact that the Google Fellowship program had on their careers.

For Roxana, the fellowship provided the tools and connections that helped lay the foundation for her academic career. She believes industrial fellowship programs are very important, as they give students an opportunity to interact more closely with industry.

“Beyond the financial support, I think that the fellowship impacted my career in many important ways. First, the Google fellowships are regarded as highly competitive, so receiving the award was probably a big plus on my resume when I was interviewing for faculty positions.”

“Second, the award yielded a mentor within Google, Brad Chen, with whom Ive kept in touch ever since, as well as opportunities to visit the campus, deliver talks and meet Google engineers. Brad and I continue to meet at conferences and discuss my work, his work and (of late) the work of my students; it’s through that relationship I’m exposed to new people from Google and gain valuable advice about faculty award opportunities.”

Roland Angst credits the award with the ability to lighten his teaching load and instead focus on his research, which ultimately prepared him for his future academic career. Like Roxanna, Roland states that the fellowship also gave him the opportunity to establish connections with people working in related topics in industry.

“In my view, programs such as the Google Fellowship Awards represent an important and integral link between industry and universities. Firstly, such programs increase the awareness in the academic world for relevant problems in industry. Secondly, these programs allow the IT industry to express their gratitude to the educational services provided by the universities on which the IT industry heavily relies on."

We welcome the latest recipients of the Global Google PhD Fellowships for 2013 with great excitement and high expectations. Recognized for their incredible innovation, creativity and leadership, we are very happy to support these excellent PhD students and offer our sincere congratulations.

Googler Moti Yung elected as 2013 ACM Fellow

Posted by gilogo at 1:18 AM Labels: 2013, acm, as, computer, elected, fellow, googler, moti, yung

Posted by Alfred Spector, VP of Engineering

Yesterday, the Association for Computing Machinery (ACM) released the list of those who have been elected ACM Fellows in 2013. I am excited to announce that Google Research Scientist Moti Yung is among the distinguished individuals receiving this honor.

Moti was chosen for his contributions to computer science and cryptography that have provided fundamental knowledge to the field of computing security. We are proud of the breadth and depth of his contributions, and believe they serve as motivation for computer scientists worldwide.

On behalf of Google, I congratulate our colleague, who joins the 17 ACM Fellow and other professional society awardees at Google, in exemplifying our extraordinarily talented people. You can read a more detailed summary of Moti’s accomplishments below, including the official citations from ACM.

Dr. Moti Yung: Research Scientist
For contributions to cryptography and its use in security and privacy of systems

Moti has made key contributions to several areas of cryptography including (but not limited to!) secure group communication, digital signatures, traitor tracing, threshold cryptosystems and zero knowledge proofs. Motis work often seeds a new area in theoretical cryptography as well as finding applications broadly. For example, in 1992, Moti co-developed a protocol by which users can commonly compute a group key using their own private information that is secure against coalitions of rogue users. This work led to the growth of the broadcast encryption research area and has applications to pay-tv, network communication and sensor networks.
Moti is also a long-time leader of the security and privacy research communities, having mentored many of the leading researchers in the field, and serving on numerous program committees. A prolific author, Moti routinely publishes 10+ papers a year, and has been a key contributor to principled and consistent anonymization practices and data protection at Google.

Influential Papers for 2013

Posted by gilogo at 9:04 AM Labels: 2013, computer, for, influential, papers

Posted by Corinna Cortes and Alfred Spector, Google Research

Googlers across the company actively engage with the scientific community by publishing technical papers, contributing open-source packages, working on standards, introducing new APIs and tools, giving talks and presentations, participating in ongoing technical debates, and much more. Our publications offer technical and algorithmic advances, feature aspects we learn as we develop novel products and services, and shed light on some of the technical challenges we face at Google. Below are some of the especially influential papers co-authored by Googlers in 2013. In the coming weeks we will be offering a more in-depth look at some of these publications.

Algorithms

Online Matching and Ad Allocation, by Aranyak Mehta [Foundations and Trends in Theoretical Computer Science]
Matching is a classic problem with a rich history and a significant impact, both on the theory of algorithms and in practice. There has recently been a surge of interest in the online version of the matching problem, due to its application in the domain of Internet advertising. The theory of online matching and allocation has played a critical role in the design of algorithms for ad allocation. This monograph provides a survey of the key problems and algorithmic techniques in this area, and provides a glimpse into their practical impact.

Computer Vision

Fast, Accurate Detection of 100,000 Object Classes on a Single Machine, by Thomas Dean, Mark Ruzon, Mark Segal, Jonathon Shlens, Sudheendra Vijayanarasimhan, Jay Yagnik [Proceedings of IEEE Conference on Computer Vision and Pattern Recognition]
In this paper, we show how to use hash table lookups to replace the dot products in a convolutional filter bank with the number of lookups independent of the number of filters. We apply the technique to evaluate 100,000 deformable-part models requiring over a million (part) filters on multiple scales of a target image in less than 20 seconds using a single multi-core processor with 20GB of RAM.

Distributed Systems

Photon: Fault-tolerant and Scalable Joining of Continuous Data Streams, by Rajagopal Ananthanarayanan, Venkatesh Basker, Sumit Das, Ashish Gupta, Haifeng Jiang, Tianhao Qiu, Alexey Reznichenko, Deomid Ryabkov, Manpreet Singh, Shivakumar Venkataraman [SIGMOD]
In this paper, we talk about Photon, a geographically distributed system for joining multiple continuously flowing streams of data in real-time with high scalability and low latency. The streams may be unordered or delayed. Photon fully tolerates infrastructure degradation and datacenter-level outages without any manual intervention while joining every event exactly once. Photon is currently deployed in production, processing millions of events per minute at peak with an average end-to-end latency of less than 10 seconds.

Omega: flexible, scalable schedulers for large compute clusters, by Malte Schwarzkopf, Andy Konwinski, Michael Abd-El-Malek, John Wilkes [SIGOPS European Conference on Computer Systems (EuroSys)]
Omega addresses the need for increasing scale and speed in cluster schedulers using parallelism, shared state, and lock-free optimistic concurrency control. The paper presents a taxonomy of design approaches and evaluates Omega using simulations driven by Google production workloads.

Human-Computer Interaction

FFitts Law: Modeling Finger Touch with Fitts Law, by Xiaojun Bi, Yang Li, Shumin Zhai [Proceedings of the SIGCHI Conference on Human Factors in Computing Systems (CHI 2013)]
Fitts’ law is a cornerstone of graphical user interface research and evaluation. It can precisely predict cursor movement time given an on screen target’s location and size. In the era of finger-touch based mobile computing world, the conventional form of Fitts’ law loses its power when the targets are often smaller than the finger width. Researchers at Google, Xiaojun Bi, Yang Li, and Shumin Zhai, devised finger Fitts’ law (FFitts law) to fix such a fundamental problem.

Information Retrieval

Top-k Publish-Subscribe for Social Annotation of News, by Alexander Shraer, Maxim Gurevich, Marcus Fontoura, Vanja Josifovski [Proceedings of the 39th International Conference on Very Large Data Bases]
The paper describes how scalable, low latency content-based publish-subscribe systems can be implemented using inverted indices and modified top-k document retrieval algorithms. The feasibility of this approach is demonstrated in the application of annotating news articles with social updates (such as Google+ posts or tweets). This application is casted as publish-subscribe, where news articles are treated as subscriptions (continuous queries) and social updates as published items with large update frequency.

Machine Learning

Ad Click Prediction: a View from the Trenches, by H. Brendan McMahan, Gary Holt, D. Sculley, Michael Young, Dietmar Ebner, Julian Grady, Lan Nie, Todd Phillips, Eugene Davydov, Daniel Golovin, Sharat Chikkerur, Dan Liu, Martin Wattenberg, Arnar Mar Hrafnkelsson, Tom Boulos, Jeremy Kubica [KDD]
How should one go about making predictions in extremely large scale production systems? We provide a case study for ad click prediction, and illustrate best practices for combining rigorous theory with careful engineering and evaluation. The paper contains a mix of novel algorithms, practical approaches, and some surprising negative results.

Learning kernels using local rademacher complexity, by Corinna Cortes, Marius Kloft, Mehryar Mohri [Advances in Neural Information Processing Systems (NIPS 2013)]
This paper shows how the notion of local Rademacher complexity, which leads to sharp learning guarantees, can be used to derive algorithms for the important problem of learning kernels. It also reports the results of several experiments with these algorithms which yield performance improvements in some challenging tasks.

Efficient Estimation of Word Representations in Vector Space, by Tomas Mikolov, Kai Chen, Greg S. Corrado, Jeffrey Dean [ICLR Workshop 2013]
We describe a simple and speedy method for training vector representations of words. The resulting vectors naturally capture the semantics and syntax of word use, such that simple analogies can be solved with vector arithmetic. For example, the vector difference between man and woman is approximately equal to the difference between king and queen, and vector displacements between any given countrys name and its capital are aligned. We provide an open source implementation as well as pre trained vector representations at http://word2vec.googlecode.com

Large-Scale Learning with Less RAM via Randomization, by Daniel Golovin, D. Sculley, H. Brendan McMahan, Michael Young [Proceedings of the 30 International Conference on Machine Learning (ICML)]
We show how a simple technique -- using limited precision coefficients and randomized rounding -- can dramatically reduce the RAM needed to train models with online convex optimization methods such as stochastic gradient descent. In addition to demonstrating excellent empirical performance, we provide strong theoretical guarantees.

Machine Translation

Source-Side Classifier Preordering for Machine Translation, by Uri Lerner, Slav Petrov [Proc. of EMNLP 13]
When translating from one language to another, it is important to not only choose the correct translation for each word, but to also put the words in the correct word order. In this paper we present a novel approach that uses a syntactic parser and a feature-rich classifier to perform long-distance reordering. We demonstrate significant improvements over alternative approaches on a large number of language pairs.

Natural Language Processing

Token and Type Constraints for Cross-Lingual Part-of-Speech Tagging, by Oscar Tackstrom, Dipanjan Das, Slav Petrov, Ryan McDonald, Joakim Nivre [Transactions of the Association for Computational Linguistics (TACL 13)]
Knowing the parts of speech (verb, noun, etc.) of words is important for many natural language processing applications, such as information extraction and machine translation. Constructing part-of-speech taggers typically requires large amounts of manually annotated data, which is missing in many languages and domains. In this paper, we introduce a method that instead relies on a combination of incomplete annotations projected from English with incomplete crowdsourced dictionaries in each target language. The result is a 25 percent error reduction compared to the previous state of the art.

Universal Dependency Annotation for Multilingual Parsing, by Ryan McDonald, Joakim Nivre, Yoav Goldberg, Yvonne Quirmbach-Brundage, Dipanjan Das, Kuzman Ganchev, Keith Hall, Slav Petrov, Hao Zhang, Oscar Tackstrom, Claudia Bedini, Nuria Bertomeu Castello, Jungmee Lee, [Association for Computational Linguistics]
This paper discusses a public release of syntactic dependency treebanks (https://code.google.com/p/uni-dep-tb/). Syntactic treebanks are manually annotated data sets containing full syntactic analysis for a large number of sentences (http://en.wikipedia.org/wiki/Dependency_grammar). Unlike other syntactic treebanks, the universal data set tries to normalize syntactic phenomena across languages when it can to produce a harmonized set of multilingual data. Such a resource will help large scale multilingual text analysis and evaluation.

Networks

B4: Experience with a Globally Deployed Software Defined WAN, by Sushant Jain, Alok Kumar, Subhasree Mandal, Joon Ong, Leon Poutievski, Arjun Singh, Subbaiah Venkata, Jim Wanderer, Junlan Zhou, Min Zhu, Jonathan Zolla, Urs Hölzle, Stephen Stuart, Amin Vahdat [Proceedings of the ACM SIGCOMM Conference]
This paper presents the motivation, design, and evaluation of B4, a Software Defined WAN for our data center to data center connectivity. We present our approach to separating the network’s control plane from the data plane to enable rapid deployment of new network control services. Our first such service, centralized traffic engineering allocates bandwidth among competing services based on application priority, dynamically shifting communication patterns, and prevailing failure conditions.

Policy

When the Cloud Goes Local: The Global Problem with Data Localization, by Patrick Ryan, Sarah Falvey, Ronak Merchant [IEEE Computer]
Ongoing efforts to legally define cloud computing and regulate separate parts of the Internet are unlikely to address underlying concerns about data security and privacy. Data localization initiatives, led primarily by European countries, could actually bring the cloud to the ground and make the Internet less secure.

Robotics

Cloud-based robot grasping with the google object recognition engine, by Ben Kehoe, Akihiro Matsukawa, Sal Candido, James Kuffner, Ken Goldberg [IEEE Int’l Conf. on Robotics and Automation]
What if robots were not limited by onboard computation, algorithms did not need to be implemented on every class of robot, and model improvements from sensor data could be shared across many robots? With wireless networking and rapidly expanding cloud computing resources this possibility is rapidly becoming reality. We present a system architecture, implemented prototype, and initial experimental data for a cloud-based robot grasping system that incorporates a Willow Garage PR2 robot with onboard color and depth cameras, Google’s proprietary object recognition engine, the Point Cloud Library (PCL) for pose estimation, Columbia University’s GraspIt! toolkit and OpenRAVE for 3D grasping and our prior approach to sampling-based grasp analysis to address uncertainty in pose.

Security, Cryptography, and Privacy

Alice in Warningland: A Large-Scale Field Study of Browser Security Warning Effectiveness, by Devdatta Akhawe, Adrienne Porter Felt [USENIX Security Symposium]
Browsers show security warnings to keep users safe. How well do these warnings work? We empirically assess the effectiveness of browser security warnings, using more than 25 million warning impressions from Google Chrome and Mozilla Firefox.

Social Systems

Arrival and departure dynamics in Social Networks, by Shaomei Wu, Atish Das Sarma, Alex Fabrikant, Silvio Lattanzi, Andrew Tomkins [WSDM]
In this paper, we consider the natural arrival and departure of users in a social network, and show that the dynamics of arrival, which have been studied in some depth, are quite different from the dynamics of departure, which are not as well studied. We show unexpected properties of a nodes local neighborhood that are predictive of departure. We also suggest that, globally, nodes at the fringe are more likely to depart, and subsequent departures are correlated among neighboring nodes in tightly-knit communities.

All the news thats fit to read: a study of social annotations for news reading, by Chinmay Kulkarni, Ed H. Chi [In Proc. of CHI2013]
As news reading becomes more social, how do different types of annotations affect peoples selection of news articles? This crowdsourcing experiment show that strangers opinion, unsurprisingly, has no persuasive effects, while surprisingly unknown branded companies still have persuasive effects. What works best are friend annotations, helping users decide what to read, and provide social context that improves engagement.

Software Engineering

Does Bug Prediction Support Human Developers? Findings from a Google Case Study, by Chris Lewis, Zhongpeng Lin, Caitlin Sadowski, Xiaoyan Zhu, Rong Ou, E. James Whitehead Jr. [International Conference on Software Engineering (ICSE)]
"Does Bug Prediction Support Human Developers?" was a study that investigated whether software engineers changed their code review habits when presented with information about where bug-prone code might be lurking. Much to our surprise we found out that developer behavior didnt change at all! We went on to suggest features that bug prediction algorithms need in order to fit with developer workflows, which will hopefully result in more supportive algorithms being developed in the future.

Speech Processing

Statistical Parametric Speech Synthesis Using Deep Neural Networks, by Heiga Zen, Andrew Senior, Mike Schuster [Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP)]
Conventional approaches to statistical parametric speech synthesis use decision tree-clustered context-dependent hidden Markov models (HMMs) to represent probability densities of speech given text. This paper examines an alternative scheme in which the mapping from an input text to its acoustic realization is modeled by a deep neural network (DNN). Experimental results show that DNN-based speech synthesizers can produce more natural-sounding speech than conventional HMM-based ones using similar model sizes.

Accurate and Compact Large Vocabulary Speech Recognition on Mobile Devices, by Xin Lei, Andrew Senior, Alexander Gruenstein, Jeffrey Sorensen [Interspeech]
In this paper we describe the neural network-based speech recognition system that runs in real-time on android phones. With the neural network acoustic model replacing the previous Gaussian mixture model and a compressed language model using on-the-fly rescoring, the word-error-rate is reduced by 27% while the storage requirement is reduced by 63%

Statistics

Pay by the Bit: An Information-Theoretic Metric for Collective Human Judgment, by Tamsyn P. Waterhouse [Proc CSCW]
Theres a lot of confusion around quality control in crowdsourcing. For the broad problem subtype we call collective judgment, I discovered that information theory provides a natural and elegant metric for the value of contributors work, in the form of the mutual information between their judgments and the questions answers, each treated as random variables

Structured Data Management

F1: A Distributed SQL Database That Scales, by Jeff Shute, Radek Vingralek, Bart Samwel, Ben Handy, Chad Whipkey, Eric Rollins, Mircea Oancea, Kyle Little?eld, David Menestrina, Stephan Ellner, John Cieslewicz, Ian Rae, Traian Stancescu, Himani Apte [VLDB]
In recent years, conventional wisdom has been that when you need a highly scalable, high throughput data store, the only viable options are NoSQL key/value stores, and you need to work around the lack of transactional consistency, indexes, and SQL. F1 is a hybrid database we built that combines the strengths of traditional relational databases with the scalability of NoSQL systems, showing its not necessary to compromise on database functionality to achieve scalability and high availability. The paper describes the F1 system, how we use Spanner underneath, and how weve designed schema and applications to hide the increased commit latency inherent in distributed commit protocols.

Google Research Awards Summer 2013

Posted by gilogo at 8:56 AM Labels: 2013, awards, computer, google, research, summer

Posted by Maggie Johnson, Director of Education & University Relations

Another round of the Google Research Awards is complete. This is our biannual open call for proposals on computer science-related topics including machine learning and structured data, policy, human computer interaction, and geo/maps. Our grants cover tuition for a graduate student and provide both faculty and students the opportunity to work directly with Google scientists and engineers.

This round, we received 550 proposals from 50 countries. After expert reviews and committee discussions, we decided to fund 105 projects. The subject areas that received the highest level of support were human-computer interaction, systems and machine learning. In addition, 19% of the funding was awarded to universities outside the U.S.

We noticed some new areas emerging in this round of proposals. In particular, an increase of interest in neural networks, accessibility-related projects, and some innovative ideas in robotics. One project features the use of Android-based multi-robot systems which are significantly more complex than single robot systems. Faculty researchers are looking to explore novel uses of Google Glass such as an indoor navigation system for blind users, and how Glass can facilitate social interactions.

Congratulations to the well-deserving recipients of this round’s awards. If you are interested in applying for the next round (deadline is October 15), please visit our website for more information.

Information Blog

Conference Report USENIX Annual Technical Conference ATC 2013

2013 Google PhD Fellowships 5 Years of Supporting the Future of Computer Science

Googler Moti Yung elected as 2013 ACM Fellow

Influential Papers for 2013

Google Research Awards Summer 2013

Search

Archive