Category Archives: Big Data

Map Reduce: A really simple introduction

Ever since google published its research paper on map reduce, you have been hearing about it. Here and there. If you have uptil now considered map-reduce a mysterious buzzword, and ignored it, Know that its not. The basic concept is really very simple. and in this tutorial I try to explain it in the simplest way that I can. Note that I have intentionally missed out some deeper details to make it really friendly to a beginner.

Read the full article http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/

Building an R Hadoop System

After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience and steps to achieve that are presented at http://www.rdatamining.com/tutorials/rhadoop. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows.

1. Install Hadoop
2. Run Hadoop
3. Install R
4. Install RHadoop
5. Run R jobs on Hadoop
6. What’s Next

Source: http://www.rdatamining.com/tutorials/rhadoop

Not yet test, but in the next 2/3 weeks I will!

YouTube Wordcount MapReduce in R
http://www.youtube.com/watch?v=hSrW0Iwghtw

The Open Compute Project

Facebook has been able to quantify energy efficiency gains of 38% for new servers conforming to the specs of The Open Compute Project, said Matt Corddry, Director of Hardware Engineering at Facebook, speaking at the Open Server Summit in Santa Clara, California.  Moreover, the new servers deliver a 24% cost savings compared to generic OEM servers

The Open Compute Project, which Facebook launched in April 2011, has resulted in vastly simplified Compute Servers, Storage JBODs and an innovative Open Rack System.

 Source: http://www.convergedigest.com/2013/10/open-server-summit-open-compute.html

 

elasticsearch

visualize logs and time-stamped data

elasticsearch works seamlessly with kibana to let you see and interact with your data

manage events and logs

elasticsearch works seamlessly with logstash to collect, parse, index, and search logs

search your hadoop data and get real-time results

deep api integration makes searching data in hadoop easy