What databases do the World Wide Web's biggest sites run on?

DatabaseDatabase DesignWeb ApplicationsScalability

Database Problem Overview


This question is meant to serve as a list of databases and their configurations that the major web sites use and would be a great reference for anyone thinking of scaling their web site to the size of Twitter, Facebook or even Google.

Please keep your answers to a minimum and be sure to cite any sources used.

EDIT:

Also, please bold both the web-site name and the database for easier scanning.

Database Solutions


Solution 1 - Database

http://facebook.com">Facebook.com</a>;

Currently running 610 (soon to be 1000) Hadoop nodes in a single cluster with Hive datastore. Both Hive and Cassandra have been open-sourced by Facebook.

Facebook stats:

  • More than 200 million active users
  • More than 100 million users log on to Facebook at least once each day
  • More than 30 million users update their statuses at least once each day
  • Average user has 120 friends on the site

Sources:

Solution 2 - Database

Stack Overflow - SQL Server.

Jeff Atwood wrote a nice blog post on this

https://blog.stackoverflow.com/2008/09/what-was-stack-overflow-built-with/

Solution 3 - Database

http://linkedin.com">LinkedIn.com</a>;

  • Oracle (Relational Database)
  • MySQL (Relational Database)

Databases replicated on multiple servers for high availability. Each specific Service uses its own domain-specific DB.

LinkedIn stats:

  • 22 million members
  • 4+ million unique visitors/month
  • 40 million page views/day
  • 2 million searches/day

Sources:

Solution 4 - Database

Flickr uses MySQL.

YouTube uses MySQL but they are moving to Google's BigTable.

Myspace uses SQL Server.

Wikipedia uses MySQL.

Solution 5 - Database

Microsoft.com

  • SQL Server (no surprise there)

Microsoft.com stats:

  • 250 million unique visits/month.
  • 70 million page views/day.
  • 15,000 connections/second.
  • Maintains an average of 35,000 concurrent connections to a total of 80 Web servers.

Sources:

Solution 6 - Database

Yahoo.com

  • PostgreSQL (modified) - A client can connect to any of the nodes in the cluster (or a policy restricted subset). A query flows from the client to the server it chose to connect with. The SQL compiler on that node compiles and optimizes the query on that single node (no parallelism).

Yahoo.com stats:

  • 24 billion events a day
  • 2-petabyte, claims largest database (Mar 2008)

Source:

Solution 7 - Database

Digg

  • MySQL (Relational Database) for scaling out reads
  • MemcacheDB (Key-Value Store) for scaling out writes

Both data stores are distributed across multiple servers.

Digg stats:

  • 30M users
  • 26M uniques per month
  • 2 billion requests a month
  • 13,000 requests a second, peak at 27,000 requests a second.

Sources:

Solution 8 - Database

http://twitter.com">Twitter.com</a>;

  • MySQL (Relational Database).
  • Cassandra (Multi-dimensional, distributed key-value store). Twitter is just "beginning to use Cassandra at Twitter" (see second source).

In May 2008, Twitter had 1 MySQL instance for writes with multiple MySQL slave instances for reads.

Twitter stats:

  • Total Users: 1+ million
  • Total Active Users: 200,000 per week
  • Total Twitter Messages: 3 million/day
  • 5% of Twitter users account for 75% of all activity
  • 72.5% of all users joining during the first five months of 2009

Sources:

Solution 9 - Database

Solution 10 - Database

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionniktechView Question on Stackoverflow
Solution 1 - DatabaseniktechView Answer on Stackoverflow
Solution 2 - DatabaseACPView Answer on Stackoverflow
Solution 3 - DatabaseniktechView Answer on Stackoverflow
Solution 4 - DatabaseMohammed NasmanView Answer on Stackoverflow
Solution 5 - DatabaseFredrik MörkView Answer on Stackoverflow
Solution 6 - DatabaseKahWee TengView Answer on Stackoverflow
Solution 7 - DatabaseniktechView Answer on Stackoverflow
Solution 8 - DatabaseniktechView Answer on Stackoverflow
Solution 9 - DatabasestribikaView Answer on Stackoverflow
Solution 10 - DatabaseduffymoView Answer on Stackoverflow