Category Archives: Big Data

NSA data collection about ‘population control’ not law enforcement – whistleblower

FBI whistleblower Jesselyn Radack joins RT America’s Simone Del Rosario to discuss thye growing concern that the NSA is collecting so much data that it can no longer be effective in preventing terror. Radack says the terror attacks of 9/11 created a ‘blank check’ wherein the usual constraints on surveillance were removed, including probable cause and the necessity of getting a warrant before conducting domestic data collection.

Microsoft’s Software is Malware

Microsoft Back Doors

Microsoft Sabotage

The wrongs in this section are not precisely malware, since they do not involve making the program that runs in a way that hurts the user. But they are a lot like malware, since they are technical Microsoft actions that harm to the users of specific Microsoft software.

Microsoft Surveillance

Microsoft DRM

Microsoft Jails

Microsoft Tyrants

As this page shows, if you do want to clean your computer of malware, the first software to delete is Windows.

Building a Fast Data Front End for Hadoop

Screencast  Building a Fast Data Front End for Hadoop.

Date: This event took place live on June 24 2015
Presented by: John Hugg
Duration: Approximately 60 minutes.
Cost: Free


Massive increases in both the volume and velocity of data have led to the development of interactive, real-time applications on fast streaming data. These fast data applications are often the front ends to Big Data (data at rest) and require integration between Fast + Big. To provide maximum value they require a data pipeline with the ability to compute real-time analytics on fast moving data, to make transactional decisions against state, and ultimately deliver data at high speeds to long-term Hadoop-based analytics stores like Cloudera, Hortonworks and MapR.

The new challenge is building applications that tap fast data and seamlessly integrate with the value contained in data stores — combining machine learning and dynamic processing. A variety of approaches are employed including Apache Storm, Spark Streaming, Samza, and in-memory database technology.

During this webcast you will learn:

  • The pros and cons of the various approaches used to create fast data applications
  • The pros and cons of Lambda and Kappa architectures compared to more traditional approaches
  • Understand the tradeoffs and advantages surrounding the resurgence of ACID and SQL
  • How integration with the Hadoop ecosystem can reduce latency and improve transactional intelligence

About John Hugg, Founding Engineer

John Hugg is a Software Developer at VoltDB. He has spent his entire career working with databases and information management. In 2008, he was lured away from a Ph.D. program by Mike Stonebraker to work on what became VoltDB. As the first engineer on the product, he liaised with a team of academics at MIT, Yale, and Brown who were building H-Store, VoltDB’s research prototype. Then he helped build the world-class engineering team at VoltDB to continue development of the open source and commercial products.



MySQL vs/and/plus/more JSON

Here are a few things that we can use / read about MySQL and JSON.

Small benchmark tests of reading 20000

Krasimir Tsonev has made a small benchmark tests of reading 20000 from a MySQL db and from a JSON file.

I’m gonna just put the *end* results. You can read all the article on Krasimir’s blog about MySQL vs JSON file data storing benchmark results.


ab -n 30 -c 30 http://localhost/mqsql.php
Concurrency Level:      30
Time taken for tests:   30.518 seconds



ab -n 30 -c 30 http://localhost/json.php
Concurrency Level:      30
Time taken for tests:   3.384 seconds
MySQL query to JSON
username email
mike [email protected]
jane [email protected]
stan [email protected]

The MySQL query

AS json FROM users;

The JSON result

     {username:'mike',email:[email protected]'},
     {username:'jane',email:[email protected]'},
     {username:'stan',email:[email protected]'}

Got this from







Map Reduce: A really simple introduction

Ever since google published its research paper on map reduce, you have been hearing about it. Here and there. If you have uptil now considered map-reduce a mysterious buzzword, and ignored it, Know that its not. The basic concept is really very simple. and in this tutorial I try to explain it in the simplest way that I can. Note that I have intentionally missed out some deeper details to make it really friendly to a beginner.

Read the full article

Building an R Hadoop System

After reading documents and tutorials on MapReduce and Hadoop and playing with RHadoop for about 2 weeks, finally I have built my first R Hadoop system and successfully run some R examples on it. My experience and steps to achieve that are presented at Hopefully it will make it easier to try RHadoop for R users who are new to Hadoop. Note that I tried this on Mac only and some steps might be different for Windows.

1. Install Hadoop
2. Run Hadoop
3. Install R
4. Install RHadoop
5. Run R jobs on Hadoop
6. What’s Next


Not yet test, but in the next 2/3 weeks I will!

YouTube Wordcount MapReduce in R