Difference between scaling horizontally and vertically for databases

DatabaseDatabase DesignNosqlScalability

Database Problem Overview


I have come across many NoSQL databases and SQL databases. There are varying parameters to measure the strength and weaknesses of these databases and scalability is one of them. What is the difference between horizontally and vertically scaling these databases?

Database Solutions


Solution 1 - Database

Horizontal scaling means that you scale by adding more machines into your pool of resources whereas Vertical scaling means that you scale by adding more power (CPU, RAM) to an existing machine.

An easy way to remember this is to think of a machine on a server rack, we add more machines across the horizontal direction and add more resources to a machine in the vertical direction.

                  Horizontal Scaling/Vertical Scaling Visualisation

In the database world, horizontal-scaling is often based on the partitioning of the data i.e. each node contains only part of the data, in vertical-scaling the data resides on a single node and scaling is done through multi-core i.e. spreading the load between the CPU and RAM resources of that machine.

With horizontal-scaling it is often easier to scale dynamically by adding more machines into the existing pool - Vertical-scaling is often limited to the capacity of a single machine, scaling beyond that capacity often involves downtime and comes with an upper limit.

Good examples of horizontal scaling are Cassandra, MongoDB, Google Cloud Spanner .. and a good example of vertical scaling is MySQL - Amazon RDS (The cloud version of MySQL). It provides an easy way to scale vertically by switching from small to bigger machines. This process often involves downtime.

In-Memory Data Grids such as GigaSpaces XAP, Coherence etc.. are often optimized for both horizontal and vertical scaling simply because they're not bound to disk. Horizontal-scaling through partitioning and vertical-scaling through multi-core support.

You can read more on this subject in my earlier posts: Scale-out vs Scale-up and The Common Principles Behind the NOSQL Alternatives

Solution 2 - Database

Scaling horizontally ===> Thousands of minions will do the work together for you.

Scaling vertically ===> One big hulk will do all the work for you.

enter image description here

Solution 3 - Database

Let's start with the need for scaling that is increasing resources so that your system can now handle more requests than it earlier could.

When you realise your system is getting slow and is unable to handle the current number of requests, you need to scale the system.

This provides you with two options. Either you increase the resources in the server which you are using currently, i.e, increase the amount of RAM, CPU, GPU and other resources. This is known as vertical scaling.

Vertical scaling is typically costly. It does not make the system fault tolerant, i.e if you are scaling application running with single server, if that server goes down, your system will go down. Also the amount of threads remains the same in vertical scaling. Vertical scaling may require your system to go down for a moment when process takes place. Increasing resources on a server requires a restart and put your system down.

Another solution to this problem is increasing the amount of servers present in the system. This solution is highly used in the tech industry. This will eventually decrease the request per second rate in each server. If you need to scale the system, just add another server, and you are done. You would not be required to restart the system. Number of threads in each system decreases leading to high throughput. To segregate the requests, equally to each of the application server, you need to add load balancer which would act as reverse proxy to the web servers. This whole system can be called as a single cluster. Your system may contain a large number of requests which would require more amount of clusters like this.

Hope you get the whole concept of introducing scaling to the system.

Solution 4 - Database

There is an additional architecture that wasn't mentioned - SQL-based database services that enable horizontal scaling without the complexity of manual sharding. These services do the sharding in the background, so they enable you to run a traditional SQL database and scale out like you would with NoSQL engines like MongoDB or CouchDB. Two services I am familiar with are http://www.enterprisedb.com">EnterpriseDB</a> for PostgreSQL and http://www.xeround.com">Xeround</a> for MySQL. I saw an in-depth http://xeround.com/blog/2011/07/scaling-mysql-database-in-the-cloud">post</a> by Xeround which explains why scale-out on SQL databases is difficult and how they do it differently - treat this with a grain of salt as it is a vendor post. Also check out Wikipedia's http://en.wikipedia.org/wiki/Cloud_database">Cloud Database entry, there is a nice explanation of SQL vs. NoSQL and service vs. self-hosted, a list of vendors and scaling options for each combination. ;)

Solution 5 - Database

Yes scaling horizontally means adding more machines, but it also implies that the machines are equal in the cluster. MySQL can scale horizontally in terms of Reading data, through the use of replicas, but once it reaches capacity of the server mem/disk, you have to begin sharding data across servers. This becomes increasingly more complex. Often keeping data consistent across replicas is a problem as replication rates are often too slow to keep up with data change rates.

Couchbase is also a fantastic NoSQL Horizontal Scaling database, used in many commercial high availability applications and games and arguably the highest performer in the category. It partitions data automatically across cluster, adding nodes is simple, and you can use commodity hardware, cheaper vm instances (using Large instead of High Mem, High Disk machines at AWS for instance). It is built off the Membase (Memcached) but adds persistence. Also, in the case of Couchbase, every node can do reads and writes, and are equals in the cluster, with only failover replication (not full dataset replication across all servers like in mySQL).

Performance-wise, you can see an excellent Cisco benchmark: http://blog.couchbase.com/understanding-performance-benchmark-published-cisco-and-solarflare-using-couchbase-server

Here is a great blog post about Couchbase Architecture: http://horicky.blogspot.com/2012/07/couchbase-architecture.html

Solution 6 - Database

Traditional relational databases were designed as client/server database systems. They can be scaled horizontally but the process to do so tends to be complex and error prone. NewSQL databases like NuoDB are memory-centric distributed database systems designed to scale out horizontally while maintaining the SQL/ACID properties of traditional RDBMS.

For more information on NuoDB, read their technical white paper.

Solution 7 - Database

SQL databases like Oracle, db2 also support Horizontal scaling through Shared disk cluster. For example Oracle RAC, IBM DB2 purescale or Sybase ASE Cluster edition. New node can be added to Oracle RAC system or DB2 purescale system to achieve horizontal scaling.

But the approach is different from noSQL databases (like mongodb, CouchDB or IBM Cloudant) is that the data sharding is not part of Horizontal scaling. In noSQL databases data is shraded during horizontal scaling.

Solution 8 - Database

The accepted answer is spot on the basic definition of horizontal vs vertical scaling. But unlike the common belief that horizontal scaling of databases is only possible with Cassandra, MongoDB, etc I would like to add that horizontal scaling is also very much possible with any traditional RDMS; that too without using any third party solutions.

I know of many companies, specially SaaS based companies that do this. This is done using simple application logic. You basically take a set of users and divide them over multiple DB servers. So for example, you would typically have a "meta" database/table that would store clients, DB server/connection strings, etc and a table that stores client/server mapping.

Then simply direct requests from each client to the DB server they are mapped to.

Now some may say this is akin to horizontal partitioning and not "true" horizontal scaling and they will be right in some ways. But the end result is that you have scaled your DB over multiple Db servers.

The only difference between the two approaches to horizontal scaling is that one approach (MongoDB, etc) the scaling is done by the DB software itself. In that sense you are "buying" the scaling. In the other approach (for RDBMS horizontal scaling), the scaling is built by application code/logic.

Solution 9 - Database

Adding lots of load balancers creates extra overhead and latency and that is the drawback for scaling out horizontally in nosql databases. It is like the question why people say RPC is not recommended since it is not robust.

I think in a real system we should use both sql and nosql databases to utilize both multicore and cloud computing capabilities of today's systems.

On the other hand, complex transactional queries has high performance if sql databases such as oracle being used. NoSql could be used for bigdata and horizontal scalability by sharding.

Solution 10 - Database

You have a company and there is only 1 worker but you got 1 new project at that time you hire new candidate -- this is horizontal scaling. where new candidate is new machines and project is new traffic/calls to your api's.

Where as 1 project with an IIT/NIT guy handling all request to your api/traffic. If any time more request to your api's then fire him and replacing him with a high IQ NIT/IIT guy -- this is vertical scaling.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionLondon guyView Question on Stackoverflow
Solution 1 - DatabaseNati ShalomView Answer on Stackoverflow
Solution 2 - DatabaseSankarganesh EswaranView Answer on Stackoverflow
Solution 3 - DatabaseyatharthaView Answer on Stackoverflow
Solution 4 - DatabaseDina KaiserView Answer on Stackoverflow
Solution 5 - Databasescalabl3View Answer on Stackoverflow
Solution 6 - DatabaseMichael WaclawiczekView Answer on Stackoverflow
Solution 7 - DatabaseDebasishView Answer on Stackoverflow
Solution 8 - DatabaseDharmendar Kumar 'DK'View Answer on Stackoverflow
Solution 9 - Databasefarshad-nshView Answer on Stackoverflow
Solution 10 - DatabaseJithin Kumar SView Answer on Stackoverflow