The Infrastructure Behind Twitter: Scale


  • Hadoop: We have multiple clusters storing over 500 PB divided in four groups (real time, processing, data warehouse and cold storage). Our biggest cluster is over 10k nodes. We run 150k applications and launch 130M containers per day.

  • Manhattan(the backend for Tweets, Direct Messages, Twitter accounts, and more): We run several clusters for different use cases such as large multi tenant, smaller for non common, read only, and read/write for heavy write/heavy read traffic patterns. The read/only cluster handles 10s of millions QPS whereas a read/write cluster handles millions of QPS. The highest performance cluster, our observability cluster, which ingests in every datacenter, handles over tens of million writes.

  • Graph: Our legacy Gizzard/MySQL based sharded cluster for storing our graphs. Flock, our social graph, can manage peaks over tens of million QPS, averaging our MySQL servers to 30k – 45k QPS.

  • Blobstore: Our image, video and large file store where we store hundreds of billions objects.

  • Cache: Our Redis and Memcache clusters: caching our users, timelines, tweets and more.

  • SQL: This includes MySQL, PostgreSQL and Vertica. MySQL/PosgreSQL are used where we need strong consistency, managing ads campaign, ads exchange as well as internal tools. Vertica is a column store often used as a backend for Tableau supporting sales and user organisations.