The Internet in Real-Time
How Quickly Data is Generated

Click the animation to open the full version (via Penny Stocks Lab).
Category Archives: Big Data
Map Reduce: A really simple introduction
Ever since google published its research paper on map reduce, you have been hearing about it. Here and there. If you have uptil now considered map-reduce a mysterious buzzword, and ignored it, Know that its not. The basic concept is really very simple. and in this tutorial I try to explain it in the simplest way that I can. Note that I have intentionally missed out some deeper details to make it really friendly to a beginner.
Read the full article http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/
Building an R Hadoop System

After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience and steps to achieve that are presented at http://www.rdatamining.com/tutorials/rhadoop. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows.
1. Install Hadoop
2. Run Hadoop
3. Install R
4. Install RHadoop
5. Run R jobs on Hadoop
6. What’s Next
Not yet test, but in the next 2/3 weeks I will!
YouTube Wordcount MapReduce in R
http://www.youtube.com/watch?v=hSrW0Iwghtw
The Open Compute Project


Facebook has been able to quantify energy efficiency gains of 38% for new servers conforming to the specs of The Open Compute Project, said Matt Corddry, Director of Hardware Engineering at Facebook, speaking at the Open Server Summit in Santa Clara, California. Moreover, the new servers deliver a 24% cost savings compared to generic OEM servers.
The Open Compute Project, which Facebook launched in April 2011, has resulted in vastly simplified Compute Servers, Storage JBODs and an innovative Open Rack System.
Source: http://www.convergedigest.com/2013/10/open-server-summit-open-compute.html

Hadoop – delivery – destillation
elasticsearch

visualize logs and time-stamped data
elasticsearch works seamlessly with kibana to let you see and interact with your data
manage events and logs
elasticsearch works seamlessly with logstash to collect, parse, index, and search logs
search your hadoop data and get real-time results
deep api integration makes searching data in hadoop easy

Web Servers log sizes
For example, on a site that used to get around 2 million page views per day, the log size had 133,910,121 entries for only one week, and consumed over 38 gigabytes!
Source: http://2bits.com/drupal-performance/reducing-size-and-io-load-apaches-web-server-log-files.html
Other interesting links…
http://zoompf.com/blog/2009/11/performance-questions-to-ask-hosting-providers-log-file-access
Facebook’s newest data center – in Luleå, Sweden
On the edge of the Arctic Circle, where the River Lule meets the Gulf of Bothnia, lies a very important building. Facebook’s newest data center – in Luleå, Sweden – is now handling live traffic from around the world.
Source: http://www.facebook.com/luleadatacenter







What is Hadoop? Other big data terms like MapReduce? Cloudera’s CEO talks us through big data trends
Facebook High-Tech Cold Storage Data Center
Photos weren’t taken by me!
I wish!
Images by Taylor Hatmaker for ReadWrite
Facebook (arguably) owns more data than God.
Photos stolen from http://kellisis.wordpress.com/2013/10/17/photo-tour-inside-facebooks-new-high-tech-cold-storage-data-center-2/




