What are horizontal and vertical partitions in database and what is the difference?
DatabaseDatabase Problem Overview
I read that
SELECT
is a horizontal partition of the relation into two set of tuples.
and
PROJECT
is a vertical partition of the relation into two relations.
However, I don't understand what that means. Can you explain it in layman's terms?
Database Solutions
Solution 1 - Database
Not a complete answer to the question but it answers what is asked in the question title. So the general meaning of horizontal and vertical database partitioning is:
Horizontal partitioning involves putting different rows into different tables. Perhaps customers with ZIP codes less than 50000 are stored in CustomersEast, while customers with ZIP codes greater than or equal to 50000 are stored in CustomersWest. The two partition tables are then CustomersEast and CustomersWest, while a view with a union might be created over both of them to provide a complete view of all customers.
Vertical partitioning involves creating tables with fewer columns and using additional tables to store the remaining columns. Normalization also involves this splitting of columns across tables, but vertical partitioning goes beyond that and partitions columns even when already normalized.
See more details here.
Solution 2 - Database
A projection creates a subset of attributes in a relation hence a "vertical partition"
A selection creates a subset of the tuples in a relation hence a "horizontal partition"
Given a table (r)
as
a : b : c : d : e
-----------------
1 : 2 : 3 : 4 : 5
1 : 2 : 3 : 4 : 5
2 : 2 : 3 : 4 : 5
2 : 2 : 3 : 4 : 5
An expression such as
PROJECT a, b (SELECT a=1 (r))
-- SELECT a, b FROM r WHERE a=1
Would "do"
a : b | c : d : e
-----------------
1 : 2 | 3 : 4 : 5
1 : 2 | 3 : 4 : 5
================= < -- horizontal partition (by SELECTION)
2 : 2 | 3 : 4 : 5
2 : 2 | 3 : 4 : 5
^ -- vertical partition (by PROJECTION)
Resulting in
a : b
------
1 : 2
1 : 2
Solution 3 - Database
Necromancing.
I think the existing answers are too abstract.
So here my attempts at a more practical explanation:
Partitioning form a developer's point of view is all about performance.
More exactly, it's about what happens when you have large amounts of data in your tables, and you still want to query the data fast.
Here some excerpts from slides by Bill Karwin about what exactly horizontal partitioning is all about:
The above is bad, because:
The solution: HORIZONTAL PARTITONING
Horizontal partitioning divides a table into multiple tables. Each table then contains the same number of columns, but fewer rows.
The difference: Query Performance and simplicity
Now, on the difference between horizontal and vertical partitioning:
"Tribbles" can also accumulate in columns. Example:
The solution to that problem is VERTICAL PARTITIONING
Proper normalization is ONE form of vertical partitioning
To quote technet
> Vertical partitioning divides a table into multiple tables that > contain fewer columns. > > The two types of vertical partitioning are > normalization and row splitting: > > Normalization is the standard > database process of removing redundant columns from a table and > putting them in secondary tables that are linked to the primary table > by primary key and foreign key relationships. > > Row splitting divides > the original table vertically into tables with fewer columns. Each > logical row in a split table matches the same logical row in the other > tables as identified by a UNIQUE KEY column that is identical in all > of the partitioned tables. For example, joining the row with ID 712 > from each split table re-creates the original row. Like horizontal > partitioning, vertical partitioning lets queries scan less data. This > increases query performance. For example, a table that contains seven > columns of which only the first four are generally referenced may > benefit from splitting the last three columns into a separate table. > Vertical partitioning should be considered carefully, because > analyzing data from multiple partitions requires queries that join the > tables. > > Vertical partitioning also could affect performance if > partitions are very large.
That sums it up nicely.
Now on SELECT vs. PROJECT:
This SO post describes the difference as such:
> Select Operation : This operation is used to select rows from a table (relation) that specifies a given logic, which is called as a
> predicate
. The predicate is a user defined condition to select rows
> of user's choice.
>
> Project Operation : If the user is interested in selecting the values of a few attributes, rather than selection all attributes of
> the Table (Relation), then one should go for PROJECT
Operation.
SELECT is an actual SQL operation (statement), while PROJECT is a term used in relational algebra.
Judging from you posting this on SO and not on MathOverflow, I would suggest you don't read relational algebra books if you just want to learn SQL for developing applications.
If you are in dire need of a recommendation for a good book about (advanced) SQL, here is one
> SQL Antipatterns: Avoiding the Pitfalls of Database Programming
> Bill Karwin
> ISBN-13: 978-1934356555
> ISBN-10: 1934356557
That's the one book about SQL worth reading.
Most other books about SQL that I've seen out there can be summed up by this cynical statement about photoshop books:
> There are more books about photoshop than people actually using photoshop.
Solution 4 - Database
The distinction of horizontal vs vertical comes from the traditional tabular view of a database. A database can be split vertically — storing different tables & columns in a separate database or horizontally — storing rows of a same table in multiple database nodes.
Horizontal partitioning is often referred as Database Sharding.
# Example of vertical partitioning
fetch_user_data(user_id) -> db[“USER”].fetch(user_id)
fetch_photo(photo_id) -> db[“PHOTO”].fetch(photo_id)
# Example of horizontal partitioning
fetch_user_data(user_id) -> user_db[user_id % 2].fetch(user_id)
Find more details here: https://medium.com/@jeeyoungk/how-sharding-works-b4dec46b3f6
Solution 5 - Database
Consider a single table in a database, it has some rows and columns.
There are two ways your could pick data: You could pick some rows, or you could pick some columns (well ok, three ways, you could pick some rows, and within that pick some columns.)
You can think of select as picking some rows - that's horizontal (and not picking the rest, hence partitioning)
You can think of project as picking some columns - that's vertical (and not picking the rest)