Category Archives: Big Data

bigdata – Global Science Research (GSR)

About us

Global Science Research (GSR) was founded to optimize marketing strategies with the power of big data and psychological sciences. Our innovative methods produce insight on a revolutionary scale, empowering clients to understand consumers, markets, and competitors more deeply and accurately than ever before. GSR’s team is formed of experienced business analysts and leading academics, all of whom are experienced in helping companies utilize the power of psychology in the customer journey. Our innovative methods produce consumer insight on a revolutionary scale, empowering our clients to understand their consumers, markets, and competitors more deeply and accurately than ever before.

The Infrastructure Behind Twitter: Scale


  • Hadoop: We have multiple clusters storing over 500 PB divided in four groups (real time, processing, data warehouse and cold storage). Our biggest cluster is over 10k nodes. We run 150k applications and launch 130M containers per day.

  • Manhattan(the backend for Tweets, Direct Messages, Twitter accounts, and more): We run several clusters for different use cases such as large multi tenant, smaller for non common, read only, and read/write for heavy write/heavy read traffic patterns. The read/only cluster handles 10s of millions QPS whereas a read/write cluster handles millions of QPS. The highest performance cluster, our observability cluster, which ingests in every datacenter, handles over tens of million writes.

  • Graph: Our legacy Gizzard/MySQL based sharded cluster for storing our graphs. Flock, our social graph, can manage peaks over tens of million QPS, averaging our MySQL servers to 30k – 45k QPS.

  • Blobstore: Our image, video and large file store where we store hundreds of billions objects.

  • Cache: Our Redis and Memcache clusters: caching our users, timelines, tweets and more.

  • SQL: This includes MySQL, PostgreSQL and Vertica. MySQL/PosgreSQL are used where we need strong consistency, managing ads campaign, ads exchange as well as internal tools. Vertica is a column store often used as a backend for Tableau supporting sales and user organisations.

The Infrastructure Behind Twitter: Scale

Overview of Twitter Fleet

Twitter came of age when hardware from physical enterprise vendors ruled the data center. Since then we’ve continually engineered and refreshed our fleet to take advantage of the latest open standards in technology and hardware efficiency in order to deliver the best possible experience.

dashboard for VMware, SNMP, REST API and more

Simple dashboard system for sysadmins with modules for VMware, SNMP, REST API and more

SysAdminBoard is a simple dashboard system written in Python, HTML and Javascript and served on a simple CherryPy Webserver (included). It was originally written to reformat snmp data for the Panic Statusboard iPad App, but has since become a fully stand-alone project that can grab data from a variety of sources and render charts and graphs in a web browser.

NSA data collection about ‘population control’ not law enforcement – whistleblower

FBI whistleblower Jesselyn Radack joins RT America’s Simone Del Rosario to discuss thye growing concern that the NSA is collecting so much data that it can no longer be effective in preventing terror. Radack says the terror attacks of 9/11 created a ‘blank check’ wherein the usual constraints on surveillance were removed, including probable cause and the necessity of getting a warrant before conducting domestic data collection.