Showing posts with label cloud. Show all posts
Showing posts with label cloud. Show all posts

Cloud storage on the Raspberry Pi



I recently started playing around with cloud storage on the Raspberry Pi. Im already using it as a web server and a media server, why not this as well.

I started hacking up something to make Dropbox work and then decided that was stupid and found some open source solutions instead.

I found two options that were easy to install and seemed to work decently.

The first one is Owncloud


If you already have apache installed and running, this one is easy (If not see here).

All you have to do is install the packages required, download the files and do some simple setup.

sudo apt-get update && sudo apt-get install curl libcurl3 php5-curl sqlite3 php5 php5-sqlite php5-gd php-xml-parser php5-intl

cd /var/www/
sudo wget http://download.owncloud.org/community/owncloud-5.0.2.tar.bz2
sudo tar -xvf owncloud-5.0.2.tar.bz2
sudo chown -R www-data:www-data owncloud
sudo nano /etc/apache2/sites-enabled/000-default
Then change the line AllowOverride None to AllowOveride All in the /var/www/ section.
sudo a2enmod rewrite
sudo a2enmod headers
sudo service apache2 restart

Now just go to your http://ip-address/owncloud/ with a separate computer or go to http://localhost/owncloud/ on the Raspberry Pi and pick a username and password.

Pros:

  • Can access local data from your external drives easily.
  • Can access your dropbox data.
  • Doesnt require its own service.
  • Password protected (storing the salted hash of the password!).
Cons:
  • Unencrypted SQL database that anyone can download.
  • Slow!
  • Doesnt automatically sort through your data like its features imply. You have to manually upload everything.


The second one is Seafile


This one is also pretty easy to install and has a Raspberry Pi specific version.
I put my stuff in a directory in my home folder but you could put it wherever. For more security, consider creating a special user for seafile.

sudo apt-get update && sudo apt-get install python2.7 python-setuptools python-simplejson python-imaging sqlite3
cd
mkdir cloud
cd cloud
wget http://seafile.googlecode.com/files/seafile-server_1.5.1_pi.tar.gz
tar -xzf seafile-server_*
mkdir installed
mv seafile-server_* installed
cd seafile-server-*
./setup-seafile.sh

The last script will install it and ask some easy questions for you to answer to set up things.
To run the seafile server, you have to do the following:

sudo ./seafile.sh start
sudo ./seahub.sh start

This has to be done every time your Raspberry Pi restarts, which is fairly easy using a cron script
crontab -e

then add:
@reboot sudo /home/pi/cloud/seafile-server-1.5.1/seafile.sh start
@reboot sudo /home/pi/cloud/seafile-server-1.5.1/seahub.sh start

Then save and exit.
For more help, see here.
Now just go to your http://ip-address:8000 with a separate computer or go to http://localhost:8000

Pros:

  • Fast!
  • Doesnt run if you dont want it to.
  • Doesnt have all of its files accessible over the normal webserver.
  • Requires username and password.
Cons:
  • Takes time to start up and sometimes seahub.sh doesnt start properly
  • Unencrypted SQL database (Though I havent found a way to easily grab this db).
  • Cant access your dropbox data..

After a quick trial with the two of them, Im going with seafile. I havent removed OwnCloud but Im seriously considering it do to the speed and security issues.

Please let me know your experiences with the two of them and any more options to try out!


Consider donating to further my tinkering.


Places you can find me
Read More..

Facilitating Genomics Research with Google Cloud Platform



The understanding of the origin and progression of cancer remains in its infancy. However, due to rapid advances in the ability to accurately read and identify (i.e. sequence) the DNA of cancerous cells, the knowledge in this field is growing rapidly. Several comprehensive sequencing studies have shown that alterations of single base pairs within the DNA, known as Single Nucleotide Variants (SNVs), or duplications, deletions and rearrangements of larger segments of the genome, known as Structural Variations (SVs), are the primary causes of cancer and can influence what drugs will be effective against an individual tumor.

However, one of the major roadblocks hampering progress is the availability of accurate methods for interpreting genome sequence data. Due to the sheer volume of genomics data (the entire genome of just one person produces more than 100 gigabytes of raw data!), the ability to precisely localize a genomic alteration (SNV or SV) and resolve its association with cancer remains a considerable research challenge. Furthermore, preliminary benchmark studies conducted by the International Cancer Genome Consortium (ICGC) and The Cancer Genome Atlas (TCGA) have discovered that different mutation calling software run on the same data can result in detection of different sets of mutations. Clearly, optimization and standardization of mutation detection methods is a prerequisite for realizing personalized medicine applications based on a patient’s own genome.

The ICGC and TCGA are working to address this issue through an open community-based collaborative competition, run in conjunction with leading research institutions: the Ontario Institute for Cancer Research, University of California Santa Cruz, Sage Bionetworks, IBM-DREAM, and Oregon Health and Sciences University. Together, they are running the DREAM Somatic Mutation Calling Challenge, in which researchers from across the world “compete” to find the most accurate SNV and SV detection algorithms. By creating a living benchmark for mutation detection, the DREAM Challenge aims to improve standard methods for identifying cancer-associated mutations and rearrangements in tumor and normal samples from whole-genome sequencing data.

Given Google’s recent partnership with the Global Alliance for Genomics and Health, we are excited to provide cloud computing resources on Google Cloud Platform for competitors in the DREAM Challenge, enabling scientists who do not have ready access to large local computer clusters to participate with open access to contest data as well as credits that can be used for Google Compute Engine virtual machines. By leveraging the power of cloud technologies for genomics computing, contestants have access to powerful computational resources and a platform that allows the sharing of data. We hope to democratize research, foster the open access of data, and spur collaboration.

In addition to the core Google Cloud Platform infrastructure, the Google Genomics team has implemented a simple web-based API to store, process, explore, and share genomic data at scale. We have made the Challenge datasets available through the Google Genomics API. The challenge includes both simulated tumor data for which the correct answers are known and real tumor data for which the correct answers are not known.
Genomics API Browser showing a particular cancer variant position (highlighted) in dataset in silico #1 that was missed by many challenge participants.
Although submissions for the simulated data can be scored immediately, the winners on the real tumor data will not immediately be known when the challenge closes. This is a consequence of the fact that current DNA sequencing technology does not provide 100% accurate data, which adds to the complexity of the problem these algorithms are attempting to tackle. Therefore, to identify the winners, researchers must turn to alternative laboratory technologies to verify if a particular mutation that was found in sequencing data is actually (or likely) to be true. As such, additional data will be collected after the Challenge is complete in order to determine the winner. The organizers will re-sequence DNA from the cells of the real tumor using an independent sequencing technology (Ion Torrent), specifically examining regions overlapping the positions of the cancer mutations submitted by the contest participants.

As an analogy, a "scratched magnifying glass" is used to examine the genome the first time around. The second time around, a "stronger magnifying glass with scratches in different places" is used to look at the specific locations in the genome reported by the challenge participants. By combining the data collected by those two different "magnifying glasses", and then comparing that against the cancer mutations submitted by the contest participants, the winner will then be determined.

We believe we are at the beginning of a transformation in medicine and basic research, driven by advances in genome sequencing and computing at scale. With the DREAM Challenge, we are all excited to be part of bringing researchers around the world to focus on this particular cancer research problem. To learn more about how to participate in the challenge register here.
Read More..

Collaborative Mathematics with SageMathCloud and Google Cloud Platform



(cross-posted on the Google for Education blog and Google Cloud Platform blog)

Modern mathematics research is distinguished by its openness. The notion of "mathematical truth" depends on theorems being published with proof, letting the reader understand how new results build on the old, all the way down to basic mathematical axioms and definitions. These new results become tools to aid further progress.

Nowadays, many of these tools come either in the form of software or theorems whose proofs are supported by software. If new tools produce unexpected results, researchers must be able to collaborate and investigate how those results came about. Trusting software tools means being able to inspect and modify their source code. Moreover, open source tools can be modified and extended when research veers in new directions.

In an attempt to create an open source tool to satisfy these requirements, University of Washington Professor William Stein built SageMathCloud (or SMC). SMC is a robust, low-latency web application for collaboratively editing mathematical documents and code. This makes SMC a viable platform for mathematics research, as well as a powerful tool for teaching any mathematically-oriented course. SMC is built on top of standard open-source tools, including Python, LaTeX, and R. In 2013, William received a 2013 Google Research Award which provided Google Cloud Platform credits for SMC development. This allowed William to extend SMC to use Google Compute Engine as a hosting platform, achieving better scalability and global availability.
SMC allows users to interactively explore 3D graphics with only a browser
SMC has its roots in 2005, when William started the Sage project in an attempt to create a viable free and open source alternative to existing closed-source mathematical software. Rather than starting from scratch, Sage was built by making the best existing open-source mathematical software work together transparently and filling in any gaps in functionality.

During the first few years, Sage grew to have about 75K active users, while the developer community matured with well over 100 contributors to each new Sage release and about 500 developers contributing peer-reviewed code.

Inspired by Google Docs, William and his students built the first web-based interface to Sage in 2006, called The Sage Notebook. However, The Sage Notebook was designed for a small number of users and would work for a small group (such as a single class), but soon became difficult to maintain for larger groups, let alone the whole web.

As the growth of new users for Sage began to stall in 2010, due largely to installation complexity, William turned his attention to finding ways to expand Sages availability to a broader audience. Based on his experience teaching his own courses with Sage, and feedback from others doing the same, William began building a new Web-hosted version of Sage that can scale to the next generation of users.

The result is SageMathCloud, a highly distributed multi-datacenter application that creates a viable way to do computational mathematics collaboratively online. SMC uses a wide variety of open source tools, from languages (CoffeeScript, node.js, and Python) to infrastructure-level components (especially Cassandra, ZFS, and bup) and a number of in-browser toolkits (such as CodeMirror and three.js).

Latency is critical for collaborative tools: like an online video game, everything in SMC is interactive. The initial versions of SMC were hosted at UW, at which point the distance between Seattle and far away continents was a significant issue, even for the fastest networks. The global coverage of Google Cloud Platform provides a low-latency connection to SMC users around the world that is both fast and stable. Its not uncommon for long-running research computations to last days, or even weeks -- and here the robustness of Google Compute Engine, with machines live-migrating during maintenance, is crucial. Without it, researchers would often face multiple restarts and delays, or would invest in engineering around the problem, taking time away from the core research.

SMC sees use across a number of areas, especially:

  • Teaching: any course with a programming or math software component, where you want all your students to be able to use that component without dealing with the installation pain. Also, SMC allows students to easily share files, and even work together in realtime. There are dozens of courses using SMC right now.
  • Collaborative Research: all co-authors of a paper can work together in an SMC project, both writing the paper there and doing research-level computations.

Since launching SMC in May 2013, there are already more than 20,000 monthly active users whove started using Sage via SMC. We look forward to seeing if SMC has an impact on the number of active users of Sage, and are excited to learn about the collaborative research and teaching that it makes possible.
Read More..