Screencast Building a Fast Data Front End for Hadoop.
Date: This event took place live on June 24 2015
Presented by: John Hugg
Duration: Approximately 60 minutes.
Massive increases in both the volume and velocity of data have led to the development of interactive, real-time applications on fast streaming data. These fast data applications are often the front ends to Big Data (data at rest) and require integration between Fast + Big. To provide maximum value they require a data pipeline with the ability to compute real-time analytics on fast moving data, to make transactional decisions against state, and ultimately deliver data at high speeds to long-term Hadoop-based analytics stores like Cloudera, Hortonworks and MapR.
The new challenge is building applications that tap fast data and seamlessly integrate with the value contained in data stores — combining machine learning and dynamic processing. A variety of approaches are employed including Apache Storm, Spark Streaming, Samza, and in-memory database technology.
During this webcast you will learn:
- The pros and cons of the various approaches used to create fast data applications
- The pros and cons of Lambda and Kappa architectures compared to more traditional approaches
- Understand the tradeoffs and advantages surrounding the resurgence of ACID and SQL
- How integration with the Hadoop ecosystem can reduce latency and improve transactional intelligence
About John Hugg, Founding Engineer
John Hugg is a Software Developer at VoltDB. He has spent his entire career working with databases and information management. In 2008, he was lured away from a Ph.D. program by Mike Stonebraker to work on what became VoltDB. As the first engineer on the product, he liaised with a team of academics at MIT, Yale, and Brown who were building H-Store, VoltDB’s research prototype. Then he helped build the world-class engineering team at VoltDB to continue development of the open source and commercial products.
Magnus Daum and Stefan Lucks have created two PostScript files with identical MD5 hash, of which one is a letter of recommendation, and the other is a security clearance.
I use REDIS for some personal sh*ts that I’m developing…
I’m running RedisDesktopManager 0.7.6… new version 0.7.6.9 looks awesome!
Here are a few things that we can use / read about MySQL and JSON.
Small benchmark tests of reading 20000
Krasimir Tsonev has made a small benchmark tests of reading 20000 from a MySQL db and from a JSON file.
I’m gonna just put the *end* results. You can read all the article on Krasimir’s blog about MySQL vs JSON file data storing benchmark results.
ab -n 30 -c 30 http://localhost/mqsql.php
Concurrency Level: 30
Time taken for tests: 30.518 seconds
ab -n 30 -c 30 http://localhost/json.php
Concurrency Level: 30
Time taken for tests: 3.384 seconds
MySQL query to JSON
The MySQL query
AS json FROM users;
The JSON result
Got this from http://www.thomasfrank.se/mysql_to_json.html.
The Internet in Real-Time
How Quickly Data is Generated
Click the animation to open the full version (via Penny Stocks Lab).
Ever since google published its research paper on map reduce, you have been hearing about it. Here and there. If you have uptil now considered map-reduce a mysterious buzzword, and ignored it, Know that its not. The basic concept is really very simple. and in this tutorial I try to explain it in the simplest way that I can. Note that I have intentionally missed out some deeper details to make it really friendly to a beginner.
Read the full article http://ksat.me/map-reduce-a-really-simple-introduction-kloudo/
After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience and steps to achieve that are presented at http://www.rdatamining.com/tutorials/rhadoop. Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows.
1. Install Hadoop
2. Run Hadoop
3. Install R
4. Install RHadoop
5. Run R jobs on Hadoop
6. What’s Next
Not yet test, but in the next 2/3 weeks I will!
YouTube Wordcount MapReduce in R