Showing posts with label sense. Show all posts
Showing posts with label sense. Show all posts

Automatically making sense of data



While the availability and size of data sets across a wide range of sources, from medical to scientific to commercial, continues to grow, there are relatively few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data. But what if one could automatically discover human-interpretable trends in data in an unsupervised way, and then summarize these trends in textual and/or visual form?

To help make progress in this area, Professor Zoubin Ghahramani and his group at the University of Cambridge received a Google Focused Research Award in support of The Automatic Statistician project, which aims to build an "artificial intelligence for data science".

So far, the project has mostly been focussing on finding trends in time series data. For example, suppose we measure the levels of solar irradiance over time, as shown in this plot:This time series clearly exhibits several sources of variation: it is approximately periodic (with a period of about 11 years, known as the Schwabe cycle), but with notably low levels of activity in the late 1600s. It would be useful to automatically discover these kinds of regularities (as well as irregularities), to help further basic scientific understanding, as well as to help make more accurate forecasts in the future.

We can model such data using non-parametric statistical models based on Gaussian processes. Such methods require the specification of a kernel function which characterizes the nature of the underlying function that can accurately model the data (e.g., is it periodic? is it smooth? is it monotonic?). While the parameters of this kernel function are estimated from data, the form of the kernel itself is typically specified by hand, and relies on the knowledge and experience of a trained data scientist.

Prof Ghahramanis group has developed an algorithm that can automatically discover a good kernel, by searching through an open-ended space of sums and products of kernels as well as other compositional operations. After model selection and fitting, the Automatic Statistician translates each kernel into a text description describing the main trends in the data in an easy-to-understand form.

The compositional structure of the space of statistical models neatly maps onto compositionally constructed sentences allowing for the automatic description of the statistical models produced by any kernel. For example, in a product of kernels, one kernel can be mapped to a standard noun phrase (e.g. ‘a periodic function’) and the other kernels to appropriate modifiers of this noun phrase (e.g. ‘whose shape changes smoothly’, ‘with growing amplitude’). The end result is an automatically generated 5-15 page report describing the patterns in the data with figures and tables supporting the main claims. Here is an extract of the report produced by their system for the solar irradiance data:
Extract of the report for the solar irradiance data, automatically generated by the automatic statistician.
The Automatic Statistician is currently being generalized to find patterns in other kinds of data, such as multidimensional regression problems, and relational databases. A web-based demo of a simplified version of the system was launched in August 2014. It allowed a user to upload a dataset, and to receive an automatically produced analysis after a few minutes. An expanded version of the service will be launched in early 2015 (we will post details when available). We believe this will have many applications for anyone interested in Data Science.
Read More..

Making Sense of MOOC Data



In order to further evolve the open education system and online platforms, Google’s course design and development teams continually experiment with massive, open online courses. Recently, at the Association for Computing Machinery’s recent Learning@Scale conference in Atlanta, GA, several members of our team presented findings about our online courses. Our research focuses on learners’ goals and activities as well as self-evaluation as an assessment tool. In this post, I will present highlights from our research as well as how we’ve applied this research to our current course, Making Sense of Data.

Google’s five online courses over the past two years have provided an opportunity for us to identify learning trends and refine instructional design. As we posted previously, learners register for online courses for a variety of reasons. During registration, we ask learners to identify their primary goal for taking the class. We found that just over half (52.5%) of 41,000 registrants intended to complete the Mapping with Google course; the other half aimed to learn portions of the curriculum without earning a certificate. Next we measured how well participants achieved those goals by observing various interaction behaviors in the course, such as watching videos, viewing text lessons, and activity completion. We found that 42.4% of 21,000 active learners (who did something in the course other than register) achieved the goals they selected during registration. Similarly, for our Introduction to Web Accessibility course, we found that 56.1% of 4,993 registrants intended to complete the course. Based on their interactions with course materials, we measured that 49.5% of 1,037 active learners achieved their goals.

Although imperfect, these numbers are more accurate measures of course success than completion rates. Because students come to the course for many different reasons, course designers should make it easier for learners to meet a variety of objectives. Since many participants in online courses may just want to learn a few new things, we can help them by releasing all course content at the outset of the course and enabling them to search for specific topics of interest. We are exploring other ways of personalizing courses to help learners achieve individual goals.

Our research also indicates that learners who complete activities are more likely to complete the course than peers who completed no activities. Activities include auto-graded multiple-choice or short-answer questions that encourage learners to practice skills from the course and receive instant feedback. In the Mapping with Google course, learners who completed at least sixty percent of course activities were much more likely to submit final projects than peers who finished fewer activities. This leads us to believe that as course designers, we should be paying more attention to creating effective, relevant activities than focusing so heavily on course content. We hypothesize that learners also use activities’ instant feedback to help them determine whether they should spend time reviewing the associated content. In this scenario, we believe that learners could benefit from experiencing activities before course content.

As technological solutions for assessing qualitative work are still evolving, an active area of our research involves self-evaluation. We are also intrigued by previous research showing the links between self-evaluation and enhanced metacognition. In several courses, we have asked learners to submit projects aligned with course objectives, calibrate themselves by evaluating sample work, then apply a rubric to assess their own work. Course staff graded a random sample of project submissions then compared the learners’ scores with course staff’s scores. In general, we found a moderate agreement on Advanced Power Searching (APS) case studies (55.1% within 1 point of each other on a 16-point scale), with an increased agreement on the Mapping projects (71.6% within 2 points of each other on a 27-point scale). We also observed that students submitted high quality projects overall, with course staff scoring 73% of APS assignments a B (80%) or above; similarly, course staff evaluated 94% of Mapping projects as a B or above.

What changed between the two courses that allowed for a higher agreement with the mapping course? The most important change seems to be more objective criteria for the mapping project rubric. We also believe that we haven’t given enough weight to teaching learners how to evaluate their own work. We plan to keep experimenting with self-evaluation in future courses.


Since we are dedicated to experimenting with courses, we have not only applied these findings to the Making Sense of Data course, but we have also chosen to experiment with new open-source software and tools. We’re exploring the following aspects of online education in this class:

  • Placing activities before content
  • Reduced use of videos
  • Final project that includes self-reflection without scores
  • New open-source technologies, including authoring the course using edX studio and importing it into cbX (running on Google’s AppEngine platform) as well as Oppia explorations

We hope that our research and the open-source technologies we’re using will inspire educators and researchers to continue to evolve the next generation of online learning platforms.
Read More..

Making Sense of Data with Google


In September 2013, Google announced joining forces with edX to contribute to their open source platform, Open edX. Since then we’ve been working together to expand this open education ecosystem. We’re pleased to announce our first online course built using Open edX. Making Sense of Data showcases the collaborative technology of Google and edX using cbX to run Open edX courses on Google App Engine.

The world is filled with lots of information; learning to make sense of it all helps us to gain perspective and make decisions. We’re pleased to share tools and techniques to structure, visualize, and analyze information in our latest self-paced, online course: Making Sense of Data.

Making Sense of Data is intended for anybody who works with data on a daily basis, such as students, teachers, journalists, and small business owners, and who wants to learn more about how to apply that information to practical problems. Participants will learn about the data process, create and use Fusion Tables (an experimental tool), and look for patterns and relationships in data. Knowledge of statistics or experience with programming is not required.

Like past courses, participants engage with course material through a combination of video and text lessons, activities, and projects. In this course, we will also introduce some new features that help create a more engaging participant experience. For example, participants will be able to access instant hangouts and live chats from the course web page for quick help or for direct feedback. As with all of our MOOCs, you’ll learn from Google experts and collaborate with participants worldwide. You’ll also have the opportunity to complete a final project and apply the skills you’ve learned to earn a certificate.

Making Sense of Data runs from March 18 - April 4, 2014. Visit g.co/datasense to learn more and register today. We look forward to seeing you make sense of all the information out there!
Read More..