Information Blog: data

Showing posts with label data. Show all posts

Download Cracked Android Apps Apk

Posted by gilogo at 7:49 AM Labels: android, apk, app, data, download, game, mod

Download latest version of the best apps and games apk in apkmatters.com.. The most up to date site to download cracked modded apps and games android full last version for free. Download and install appeven app apk for android, ios and windows pc/ laptop or mac computer. free tweaked & hacked apps with appeven.

Hot free download crack traceparts dvd

Lucky patcher apk download latest version for android

Ridhanrc.com - infinite painter premium 6.1.35 apk

Tweakbox app download for android, ios, pc & mac for free. tweakbox app latest version an play store market which allows you to install third party apps 2018.. Download acmarket apk which is an android appstore for downloading cracked games and apps. acmarket app latest version for android, ios, pc & mac 2018.. Download cracked android apps and games free and fast from acmarket..

Zapya Android Apk Free Download

Posted by gilogo at 12:00 AM Labels: android, apk, app, data, download, game, mod

Zapya apk free download android mobile. install zapya for pc windows and iphone latest version. zapya app is a file and video transfer/sharing tool.. Download zapya apk an instant file sharing application. zapya apk is the best file sharing application. download zapya apk for android now.. Feel free to download zapya apk latest version for android or windows os devices from this page. how to use zapya apk for android..

Zapya apk free download for android/iphone/windows pc

Zapya apk download zapya xender shareit for pc - ramen 10

Zapya free download for pc cracked download [latest

Zapya apk is one of the best cross-platform file sharing app for android users. zapya apk is very simple and easy to use. download zapya apk to share all type of files with any size.. Download zapya apk file v5.5 (us) (com.dewmobile.kuaiya.apk). transfer any files such as app, photo, music, video to another device without aid of mobile network nor wi-fi routers.. Download zapya 5.5.2 zapya is a tool with which you can send files to other users quickly and easily. install any apk on your android device. 2.2.6 ..

The Big Data Brain Drain Why Science is in Trouble

Posted by gilogo at 10:44 PM Labels: big, brain, computer, data, drain, in, is, science, the, trouble, why

I dont often blog about academia, but my colleague, Mark Wilson, has brought this fascinating blog article, called The Big Data Brain Drain: Why Science is in Trouble to my attention. It basically makes the case that academic science is in trouble because the skills it requires are either very much in demand by industry and commerce (i.e, the ability to analyse big data) and that the skills required to create scientific software to do just this are not well rewarded inside academia. I encourage you to read the article yourself.

from The Universal Machine http://universal-machine.blogspot.com/

Put the internet to work for you.

via Personal Recipe 895909

IFTTT Recipe: Add meeting minutes to Evernote connects google-calendar to evernote

Recommended for you

Automatically making sense of data

Posted by gilogo at 1:25 AM Labels: automatically, computer, data, making, of, sense

Posted by Kevin Murphy, Research Scientist and David Harper, Head of University Relations, EMEA

While the availability and size of data sets across a wide range of sources, from medical to scientific to commercial, continues to grow, there are relatively few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data. But what if one could automatically discover human-interpretable trends in data in an unsupervised way, and then summarize these trends in textual and/or visual form?

To help make progress in this area, Professor Zoubin Ghahramani and his group at the University of Cambridge received a Google Focused Research Award in support of The Automatic Statistician project, which aims to build an "artificial intelligence for data science".

So far, the project has mostly been focussing on finding trends in time series data. For example, suppose we measure the levels of solar irradiance over time, as shown in this plot:

This time series clearly exhibits several sources of variation: it is approximately periodic (with a period of about 11 years, known as the Schwabe cycle), but with notably low levels of activity in the late 1600s. It would be useful to automatically discover these kinds of regularities (as well as irregularities), to help further basic scientific understanding, as well as to help make more accurate forecasts in the future.

We can model such data using non-parametric statistical models based on Gaussian processes. Such methods require the specification of a kernel function which characterizes the nature of the underlying function that can accurately model the data (e.g., is it periodic? is it smooth? is it monotonic?). While the parameters of this kernel function are estimated from data, the form of the kernel itself is typically specified by hand, and relies on the knowledge and experience of a trained data scientist.

Prof Ghahramanis group has developed an algorithm that can automatically discover a good kernel, by searching through an open-ended space of sums and products of kernels as well as other compositional operations. After model selection and fitting, the Automatic Statistician translates each kernel into a text description describing the main trends in the data in an easy-to-understand form.

The compositional structure of the space of statistical models neatly maps onto compositionally constructed sentences allowing for the automatic description of the statistical models produced by any kernel. For example, in a product of kernels, one kernel can be mapped to a standard noun phrase (e.g. ‘a periodic function’) and the other kernels to appropriate modifiers of this noun phrase (e.g. ‘whose shape changes smoothly’, ‘with growing amplitude’). The end result is an automatically generated 5-15 page report describing the patterns in the data with figures and tables supporting the main claims. Here is an extract of the report produced by their system for the solar irradiance data:

Extract of the report for the solar irradiance data, automatically generated by the automatic statistician.

The Automatic Statistician is currently being generalized to find patterns in other kinds of data, such as multidimensional regression problems, and relational databases. A web-based demo of a simplified version of the system was launched in August 2014. It allowed a user to upload a dataset, and to receive an automatically produced analysis after a few minutes. An expanded version of the service will be launched in early 2015 (we will post details when available). We believe this will have many applications for anyone interested in Data Science.

Making Sense of MOOC Data

Posted by gilogo at 9:32 AM Labels: computer, data, making, mooc, of, sense

Posted by Julia Wilkowski, Staff Instructional Designer

In order to further evolve the open education system and online platforms, Google’s course design and development teams continually experiment with massive, open online courses. Recently, at the Association for Computing Machinery’s recent Learning@Scale conference in Atlanta, GA, several members of our team presented findings about our online courses. Our research focuses on learners’ goals and activities as well as self-evaluation as an assessment tool. In this post, I will present highlights from our research as well as how we’ve applied this research to our current course, Making Sense of Data.

Google’s five online courses over the past two years have provided an opportunity for us to identify learning trends and refine instructional design. As we posted previously, learners register for online courses for a variety of reasons. During registration, we ask learners to identify their primary goal for taking the class. We found that just over half (52.5%) of 41,000 registrants intended to complete the Mapping with Google course; the other half aimed to learn portions of the curriculum without earning a certificate. Next we measured how well participants achieved those goals by observing various interaction behaviors in the course, such as watching videos, viewing text lessons, and activity completion. We found that 42.4% of 21,000 active learners (who did something in the course other than register) achieved the goals they selected during registration. Similarly, for our Introduction to Web Accessibility course, we found that 56.1% of 4,993 registrants intended to complete the course. Based on their interactions with course materials, we measured that 49.5% of 1,037 active learners achieved their goals.

Although imperfect, these numbers are more accurate measures of course success than completion rates. Because students come to the course for many different reasons, course designers should make it easier for learners to meet a variety of objectives. Since many participants in online courses may just want to learn a few new things, we can help them by releasing all course content at the outset of the course and enabling them to search for specific topics of interest. We are exploring other ways of personalizing courses to help learners achieve individual goals.

Our research also indicates that learners who complete activities are more likely to complete the course than peers who completed no activities. Activities include auto-graded multiple-choice or short-answer questions that encourage learners to practice skills from the course and receive instant feedback. In the Mapping with Google course, learners who completed at least sixty percent of course activities were much more likely to submit final projects than peers who finished fewer activities. This leads us to believe that as course designers, we should be paying more attention to creating effective, relevant activities than focusing so heavily on course content. We hypothesize that learners also use activities’ instant feedback to help them determine whether they should spend time reviewing the associated content. In this scenario, we believe that learners could benefit from experiencing activities before course content.

As technological solutions for assessing qualitative work are still evolving, an active area of our research involves self-evaluation. We are also intrigued by previous research showing the links between self-evaluation and enhanced metacognition. In several courses, we have asked learners to submit projects aligned with course objectives, calibrate themselves by evaluating sample work, then apply a rubric to assess their own work. Course staff graded a random sample of project submissions then compared the learners’ scores with course staff’s scores. In general, we found a moderate agreement on Advanced Power Searching (APS) case studies (55.1% within 1 point of each other on a 16-point scale), with an increased agreement on the Mapping projects (71.6% within 2 points of each other on a 27-point scale). We also observed that students submitted high quality projects overall, with course staff scoring 73% of APS assignments a B (80%) or above; similarly, course staff evaluated 94% of Mapping projects as a B or above.

What changed between the two courses that allowed for a higher agreement with the mapping course? The most important change seems to be more objective criteria for the mapping project rubric. We also believe that we haven’t given enough weight to teaching learners how to evaluate their own work. We plan to keep experimenting with self-evaluation in future courses.

Since we are dedicated to experimenting with courses, we have not only applied these findings to the Making Sense of Data course, but we have also chosen to experiment with new open-source software and tools. We’re exploring the following aspects of online education in this class:

Placing activities before content
Reduced use of videos
Final project that includes self-reflection without scores
New open-source technologies, including authoring the course using edX studio and importing it into cbX (running on Google’s AppEngine platform) as well as Oppia explorations

We hope that our research and the open-source technologies we’re using will inspire educators and researchers to continue to evolve the next generation of online learning platforms.

The reusable holdout Preserving validity in adaptive data analysis

Posted by gilogo at 3:41 AM Labels: adaptive, analysis, computer, data, holdout, in, preserving, reusable, the, validity

Posted by Moritz Hardt, Research Scientist

Machine learning and statistical analysis play an important role at the forefront of scientific and technological progress. But with all data analysis, there is a danger that findings observed in a particular sample do not generalize to the underlying population from which the data were drawn. A popular XKCD cartoon illustrates that if you test sufficiently many different colors of jelly beans for correlation with acne, you will eventually find one color that correlates with acne at a p-value below the infamous 0.05 significance level.

Image credit: XKCD

Unfortunately, the problem of false discovery is even more delicate than the cartoon suggests. Correcting reported p-values for a fixed number of multiple tests is a fairly well understood topic in statistics. A simple approach is to multiply each p-value by the number of tests, but there are more sophisticated tools. However, almost all existing approaches to ensuring the validity of statistical inferences assume that the analyst performs a fixed procedure chosen before the data are examined. For example, “test all 20 flavors of jelly beans”. In practice, however, the analyst is informed by data exploration, as well as the results of previous analyses. How did the scientist choose to study acne and jelly beans in the first place? Often such choices are influenced by previous interactions with the same data. This adaptive behavior of the analyst leads to an increased risk of spurious discoveries that are neither prevented nor detected by standard approaches. Each adaptive choice the analyst makes multiplies the number of possible analyses that could possibly follow; it is often difficult or impossible to describe and analyze the exact experimental setup ahead of time.

In The Reusable Holdout: Preserving Validity in Adaptive Data Analysis, a joint work with Cynthia Dwork (Microsoft Research), Vitaly Feldman (IBM Almaden Research Center), Toniann Pitassi (University of Toronto), Omer Reingold (Samsung Research America) and Aaron Roth (University of Pennsylvania), to appear in Science tomorrow, we present a new methodology for navigating the challenges of adaptivity. A central application of our general approach is the reusable holdout mechanism that allows the analyst to safely validate the results of many adaptively chosen analyses without the need to collect costly fresh data each time.

The curse of adaptivity

A beautiful example of how false discovery arises as a result of adaptivity is Freedman’s paradox. Suppose that we want to build a model that explains “systolic blood pressure” in terms of hundreds of variables quantifying the intake of various kinds of food. In order to reduce the number of variables and simplify our task, we first select some promising looking variables, for example, those that have a positive correlation with the response variable (systolic blood pressure). We then fit a linear regression model on the selected variables. To measure the goodness of our model fit, we crank out a standard F-test from our favorite statistics textbook and report the resulting p-value.

Inference after selection: We first select a subset of the variables based on a data-dependent criterion and then fit a linear model on the selected variables.

Freedman showed that the reported p-value is highly misleading - even if the data were completely random with no correlation whatsoever between the response variable and the data points, we’d likely observe a significant p-value! The bias stems from the fact that we selected a subset of the variables adaptively based on the data, but we never account for this fact. There is a huge number of possible subsets of variables that we selected from. The mere fact that we chose one test over the other by peeking at the data creates a selection bias that invalidates the assumptions underlying the F-test.

Freedman’s paradox bears an important lesson. Significance levels of standard procedures do not capture the vast number of analyses one can choose to carry out or to omit. For this reason, adaptivity is one of the primary explanations of why research findings are frequently false as was argued by Gelman and Loken who aptly refer to adaptivity as “garden of the forking paths”.

Machine learning competitions and holdout sets

Adaptivity is not just an issue with p-values in the empirical sciences. It affects other domains of data science just as well. Machine learning competitions are a perfect example. Competitions have become an extremely popular format for solving prediction and classification problems of all sorts.

Each team in the competition has full access to a publicly available training set which they use to build a predictive model for a certain task such as image classification. Competitors can repeatedly submit a model and see how the model performs on a fixed holdout data set not available to them. The central component of any competition is the public leaderboard which ranks all teams according to the prediction accuracy of their best model so far on the holdout. Every time a team makes a submission they observe the score of their model on the same holdout data. This methodology is inspired by the classic holdout method for validating the performance of a predictive model.

Ideally, the holdout score gives an accurate estimate of the true performance of the model on the underlying distribution from which the data were drawn. However, this is only the case when the model is independent of the holdout data! In contrast, in a competition the model generally incorporates previously observed feedback from the holdout set. Competitors work adaptively and iteratively with the feedback they receive. An improved score for one submission might convince the team to tweak their current approach, while a lower score might cause them to try out a different strategy. But the moment a team modifies their model based on a previously observed holdout score, they create a dependency between the model and the holdout data that invalidates the assumption of the classic holdout method. As a result, competitors may begin to overfit to the holdout data that supports the leaderboard. This means that their score on the public leaderboard continues to improve, while the true performance of the model does not. In fact, unreliable leaderboards are a widely observed phenomenon in machine learning competitions.

Reusable holdout sets

A standard proposal for coping with adaptivity is simply to discourage it. In the empirical sciences, this proposal is known as pre-registration and requires the researcher to specify the exact experimental setup ahead of time. While possible in some simple cases, it is in general too restrictive as it runs counter to today’s complex data analysis workflows.

Rather than limiting the analyst, our approach provides means of reliably verifying the results of an arbitrary adaptive data analysis. The key tool for doing so is what we call the reusable holdout method. As with the classic holdout method discussed above, the analyst is given unfettered access to the training data. What changes is that there is a new algorithm in charge of evaluating statistics on the holdout set. This algorithm ensures that the holdout set maintains the essential guarantees of fresh data over the course of many estimation steps.

The limit of the method is determined by the size of the holdout set - the number of times that the holdout set may be used grows roughly as the square of the number of collected data points in the holdout, as our theory shows.

Armed with the reusable holdout, the analyst is free to explore the training data and verify tentative conclusions on the holdout set. It is now entirely safe to use any information provided by the holdout algorithm in the choice of new analyses to carry out, or the tweaking of existing models and parameters.

A general methodology

The reusable holdout is only one instance of a broader methodology that is, perhaps surprisingly, based on differential privacy—a notion of privacy preservation in data analysis. At its core, differential privacy is a notion of stability requiring that any single sample should not influence the outcome of the analysis significantly.

Example of a stable learning algorithm: Deletion of any single data point does not affect the accuracy of the classifier much.

A beautiful line of work in machine learning shows that various notions of stability imply generalization. That is any sample estimate computed by a stable algorithm (such as the prediction accuracy of a model on a sample) must be close to what we would observe on fresh data.

What sets differential privacy apart from other stability notions is that it is preserved by adaptive composition. Combining multiple algorithms that each preserve differential privacy yields a new algorithm that also satisfies differential privacy albeit at some quantitative loss in the stability guarantee. This is true even if the output of one algorithm influences the choice of the next. This strong adaptive composition property is what makes differential privacy an excellent stability notion for adaptive data analysis.

In a nutshell, the reusable holdout mechanism is simply this: access the holdout set only through a suitable differentially private algorithm. It is important to note, however, that the user does not need to understand differential privacy to use our method. The user interface of the reusable holdout is the same as that of the widely used classical method.

Reliable benchmarks

A closely related work with Avrim Blum dives deeper into the problem of maintaining a reliable leaderboard in machine learning competitions (see this blog post for more background). While the reusable holdout could directly be used for this purpose, it turns out that a variant of the reusable holdout, we call the Ladder algorithm, provides even better accuracy.

This method is not just useful for machine learning competitions, since there are many problems that are roughly equivalent to that of maintaining an accurate leaderboard in a competition. Consider, for example, a performance benchmark that a company uses to test improvements to a system internally before deploying them in a production system. As the benchmark data set is used repeatedly and adaptively for tasks such as model selection, hyper-parameter search and testing, there is a danger that eventually the benchmark becomes unreliable.

Conclusion

Modern data analysis is inherently an adaptive process. Attempts to limit what data scientists will do in practice are ill-fated. Instead we should create tools that respect the usual workflow of data science while at the same time increasing the reliability of data driven insights. It is our goal to continue exploring techniques that can help to create more reliable validation techniques and benchmarks that track true performance more accurately than existing methods.

See through the clouds with Earth Engine and Sentinel 1 Data

Posted by gilogo at 9:39 AM Labels: 1, and, clouds, computer, data, earth, engine, see, sentinel, the, through, with

Posted by Luc Vincent, Engineering Director, Geo Imagery

This year the Google Earth Engine team attended the European Geosciences Union General Assembly meeting in Vienna, Austria to engage with a number of European geoscientific partners. This was just the first of a series of European summits the team has attended over the past few months, including, most recently, the IEEE Geoscience and Remote Sensing Society meeting held last week in Milan, Italy.

Noel Gorelick presenting Google Earth Engine at EGU 2015.

We are very excited to be collaborating with many European scientists from esteemed institutions such as the European Commission Joint Research Centre, Wageningen University, and University of Pavia. These researchers are utilizing the Earth Engine geospatial analysis platform to address issues of global importance in areas such as food security, deforestation detection, urban settlement detection, and freshwater availability.

Thanks to the enlightened free and open data policy of the European Commission and European Space Agency, we are pleased to announce the availability of Copernicus Sentinel-1 data through Earth Engine for visualization and analysis. Sentinel-1, a radar imaging satellite with the ability to see through clouds, is the first of at least 6 Copernicus satellites going up in the next 6 years.

Sentinel-1 data visualized using Earth Engine, showing Vienna (left) and Milan (right).

Wind farms seen off the Eastern coast of England.

This radar data offers a powerful complement to other optical and thermal data from satellites like Landsat, that are already available in the Earth Engine public data catalog. If you are a geoscientist interested in accessing and analyzing the newly available EC/ESA Sentinel-1 data, or anything else in our multi-petabyte data catalog, please sign up for Google Earth Engine.

We look forward to further engagements with the European research community and are excited to see what the world will do with the data from the European Unions Copernicus program satellites.

Doing Data Science with coLaboratory

Posted by gilogo at 5:04 AM Labels: colaboratory, computer, data, doing, science, with

Posted by Kayur Patel, Kester Tong, Mark Sandler, and Corinna Cortes, Google Research

Building products and making decisions based on data is at the core of what we do at Google. Increasingly common among fields such as journalism and government, this data-driven mindset is changing the way traditionally non-technical organizations do work. In order to bring this approach to even more fields, Google Research is excited to be a partner in the coLaboratory project, a new tool for data science and analysis, designed to make collaborating on data easier.

Created by Google Research, Matthew Turk (creator of the yt visualization package), and the IPython/Jupyter development team, coLaboratory merges successful open source products with Google technologies, enabling multiple people to collaborate directly through simultaneous access and analysis of data. This provides a big improvement over ad-hoc workflows involving emailing documents back and forth.

Setting up an environment for collaborative data analysis can be a hurdle, as requirements vary among different machines and operating systems, and installation errors can be cryptic. The coLaboratory Chrome App addresses this hurdle. One-click installs coLaboratory, IPython, and a large set of popular scientific python libraries (with more on the way). Furthermore, because we use Portable Native Client (PNaCl), coLaboratory runs at native speeds and is secure, allowing new users to start working with IPython faster than ever.

In addition to ease of installation, coLaboratory enables collaboration between people with different skill sets. One example of this would be interactions between programmers who write complex logic in code and non-programmers who are more familiar with GUIs. As shown below, a programmer writes code (step 1) and then annotates that code with simple markup to create an interactive form (step 2). The programmer can then hide the complexity of code to show only the form (step 3), which allows a non-programmer to re-run the code by changing the slider and dropdowns in the form (step 4). This interaction allows programmers to write complex logic in code and allows non-programmers to manipulate that logic through simple GUI hooks.

For more information about this project please see our talks on collaborative data science and zero dependency python. In addition to our external partners in the coLaboratory project, we would like to thank everyone at Google who contributed: the Chromium Native Client team, the Google Drive team, the Open Source team, and the Security team.

Making Sense of Data with Google

Posted by gilogo at 8:47 PM Labels: computer, data, google, making, of, sense, with

Posted by John Atwood, Program Manager

In September 2013, Google announced joining forces with edX to contribute to their open source platform, Open edX. Since then we’ve been working together to expand this open education ecosystem. We’re pleased to announce our first online course built using Open edX. Making Sense of Data showcases the collaborative technology of Google and edX using cbX to run Open edX courses on Google App Engine.

The world is filled with lots of information; learning to make sense of it all helps us to gain perspective and make decisions. We’re pleased to share tools and techniques to structure, visualize, and analyze information in our latest self-paced, online course: Making Sense of Data.

Making Sense of Data is intended for anybody who works with data on a daily basis, such as students, teachers, journalists, and small business owners, and who wants to learn more about how to apply that information to practical problems. Participants will learn about the data process, create and use Fusion Tables (an experimental tool), and look for patterns and relationships in data. Knowledge of statistics or experience with programming is not required.

Like past courses, participants engage with course material through a combination of video and text lessons, activities, and projects. In this course, we will also introduce some new features that help create a more engaging participant experience. For example, participants will be able to access instant hangouts and live chats from the course web page for quick help or for direct feedback. As with all of our MOOCs, you’ll learn from Google experts and collaborate with participants worldwide. You’ll also have the opportunity to complete a final project and apply the skills you’ve learned to earn a certificate.

Making Sense of Data runs from March 18 - April 4, 2014. Visit g.co/datasense to learn more and register today. We look forward to seeing you make sense of all the information out there!

Opening up Course Builder data

Posted by gilogo at 4:28 PM Labels: builder, computer, course, data, opening, up

Posted by John Cox and Pavel Simakov, Course Builder Team, Google Research

Course Builder is an experimental, open source platform for delivering massive online open courses. When you run Course Builder, you own everything from the production instance to the student data that builds up while your course is running.

Part of being open is making it easy for you to access and work with your data. Earlier this year we shipped a tool called ETL (short for extract-transform-load) that you can use to pull your data out of Course Builder, run arbitrary computations on it, and load it back. We wrote a post that goes into detail on how you can use ETL to get copies of your data in an open, easy-to-read format, as well as write custom jobs for processing that data offline.

Now we’ve taken the next step and added richer data processing tools to ETL. With them, you can build data processing pipelines that analyze large datasets with MapReduce. Inside Google we’ve used these tools to learn from the courses we’ve run. We provide example pipelines ranging from the simple to the complex, along with formatters to convert your data into open formats (CSV, JSON, plain text, and XML) that play nice with third-party data analysis tools.

We hope that adding robust data processing features to Course Builder will not only provide direct utility to organizations that need to process data to meet their internal business goals, but also make it easier for educators and researchers to gauge the efficacy of the massive online open courses run on the Course Builder platform.

Released Data Set Features Extracted From YouTube Videos for Multiview Learning

Posted by gilogo at 9:15 AM Labels: computer, data, extracted, features, for, from, learning, multiview, released, set, videos, youtube

Posted by Omid Madani, Senior Software Engineer

“If it looks like a duck, swims like a duck, and quacks like a duck, then it probably is a duck.”

The “duck test”.

Performance of machine learning algorithms, supervised or unsupervised, is often significantly enhanced when a variety of feature families, or multiple views of the data, are available. For example, in the case of web pages, one feature family can be based on the words appearing on the page, and another can be based on the URLs and related connectivity properties. Similarly, videos contain both audio and visual signals where in turn each modality is analyzed in a variety of ways. For instance, the visual stream can be analyzed based on the color and edge distribution, texture, motion, object types, and so on. YouTube videos are also associated with textual information (title, tags, comments, etc.). Each feature family complements others in providing predictive signals to accomplish a prediction or classification task, for example, in automatically classifying videos into subject areas such as sports, music, comedy, games, and so on.

We have released a dataset of over 100k feature vectors extracted from public YouTube videos. These videos are labeled by one of 30 classes, each class corresponding to a video game (with some amount of class noise): each video shows a gameplay of a video game, for teaching purposes for example. Each instance (video) is described by three feature families (textual, visual, and auditory), and each family is broken into subfamilies yielding up to 13 feature types per instance. Neither video identities nor class identities are released.

We hope that this dataset will be valuable for research on a variety of multiview related machine learning topics, including multiview clustering, co-training, active learning, classifier fusion and ensembles.

The data and more information can be obtained from the UCI machine learning repository (multiview video dataset), or from here.

Search

Archive