Category Archives: Big Data

The Infrastructure Behind Twitter: Scale

Overview of Twitter Fleet

Twitter came of age when hardware from physical enterprise vendors ruled the data center. Since then we’ve continually engineered and refreshed our fleet to take advantage of the latest open standards in technology and hardware efficiency in order to deliver the best possible experience.

dashboard for VMware, SNMP, REST API and more

Simple dashboard system for sysadmins with modules for VMware, SNMP, REST API and more

SysAdminBoard is a simple dashboard system written in Python, HTML and Javascript and served on a simple CherryPy Webserver (included). It was originally written to reformat snmp data for the Panic Statusboard iPad App, but has since become a fully stand-alone project that can grab data from a variety of sources and render charts and graphs in a web browser.

NSA data collection about ‘population control’ not law enforcement – whistleblower

FBI whistleblower Jesselyn Radack joins RT America’s Simone Del Rosario to discuss thye growing concern that the NSA is collecting so much data that it can no longer be effective in preventing terror. Radack says the terror attacks of 9/11 created a ‘blank check’ wherein the usual constraints on surveillance were removed, including probable cause and the necessity of getting a warrant before conducting domestic data collection.

Microsoft’s Software is Malware

Microsoft Back Doors

Microsoft Sabotage

The wrongs in this section are not precisely malware, since they do not involve making the program that runs in a way that hurts the user. But they are a lot like malware, since they are technical Microsoft actions that harm to the users of specific Microsoft software.

Microsoft Surveillance

Microsoft DRM

Microsoft Jails

Microsoft Tyrants

As this page shows, if you do want to clean your computer of malware, the first software to delete is Windows.

Building a Fast Data Front End for Hadoop

Screencast  Building a Fast Data Front End for Hadoop.

Date: This event took place live on June 24 2015
Presented by: John Hugg
Duration: Approximately 60 minutes.
Cost: Free


Massive increases in both the volume and velocity of data have led to the development of interactive, real-time applications on fast streaming data. These fast data applications are often the front ends to Big Data (data at rest) and require integration between Fast + Big. To provide maximum value they require a data pipeline with the ability to compute real-time analytics on fast moving data, to make transactional decisions against state, and ultimately deliver data at high speeds to long-term Hadoop-based analytics stores like Cloudera, Hortonworks and MapR.

The new challenge is building applications that tap fast data and seamlessly integrate with the value contained in data stores — combining machine learning and dynamic processing. A variety of approaches are employed including Apache Storm, Spark Streaming, Samza, and in-memory database technology.

During this webcast you will learn:

  • The pros and cons of the various approaches used to create fast data applications
  • The pros and cons of Lambda and Kappa architectures compared to more traditional approaches
  • Understand the tradeoffs and advantages surrounding the resurgence of ACID and SQL
  • How integration with the Hadoop ecosystem can reduce latency and improve transactional intelligence

About John Hugg, Founding Engineer

John Hugg is a Software Developer at VoltDB. He has spent his entire career working with databases and information management. In 2008, he was lured away from a Ph.D. program by Mike Stonebraker to work on what became VoltDB. As the first engineer on the product, he liaised with a team of academics at MIT, Yale, and Brown who were building H-Store, VoltDB’s research prototype. Then he helped build the world-class engineering team at VoltDB to continue development of the open source and commercial products.



MySQL vs/and/plus/more JSON

Here are a few things that we can use / read about MySQL and JSON.

Small benchmark tests of reading 20000

Krasimir Tsonev has made a small benchmark tests of reading 20000 from a MySQL db and from a JSON file.

I’m gonna just put the *end* results. You can read all the article on Krasimir’s blog about MySQL vs JSON file data storing benchmark results.


ab -n 30 -c 30 http://localhost/mqsql.php
Concurrency Level:      30
Time taken for tests:   30.518 seconds



ab -n 30 -c 30 http://localhost/json.php
Concurrency Level:      30
Time taken for tests:   3.384 seconds
MySQL query to JSON
username email
mike [email protected]
jane [email protected]
stan [email protected]

The MySQL query

AS json FROM users;

The JSON result

     {username:'mike',email:'[email protected]'},
     {username:'jane',email:'[email protected]'},
     {username:'stan',email:'[email protected]'}

Got this from