Sample database for exercise

SqlTest DataSample Data

Sql Problem Overview


I would like to play with a larger database to test my knowledge on SQL.

Is there a huge .sql out there that I can use to play with SQL queries?

Sql Solutions


Solution 1 - Sql

You could try the classic MySQL world database.

The world.sql file is available for download here:

http://dev.mysql.com/doc/index-other.html

Just scroll down to Example Databases and you will find it.

Solution 2 - Sql

This is an online database but you can try with the stackoverflow database: https://data.stackexchange.com/stackoverflow/query/new

You also can download its dumps here:

https://archive.org/download/stackexchange

Solution 3 - Sql

Check out CodePlex for Microsoft SQL Server Community Projects & Samples

3rd party edit

On top of the link above you might look at

Solution 4 - Sql

Why not download the English Wikipedia? There are compressed SQL files of various sizes, and it should certainly be large enough for you

The main articles are XML, so inserting them into the db is a bit more of a problem, but you might find there are other files there that suit you. For example, the inter-page links SQL file is 2.3GB compressed. Have a look at https://en.wikipedia.org/wiki/Wikipedia:Database_download for more info.

Oskar

Solution 5 - Sql

This is what I am using for learning sql: employees-db

> this is a sample database with an integrated test suite, used to test > your applications and database servers

3rd party edit

According to launchpad.net the database has moved to github.

> The database contains about 300,000 employee records with 2.8 million > salary entries. The export data is 167 MB, which is not huge, but > heavy enough to be non-trivial for testing. > > The data was generated, and as such there are inconsistencies and > subtle problems. Rather than removing them, we decided to leave the > contents untouched, and use these issues as data cleaning exercises.

Solution 6 - Sql

If you want a big database of real data to play with, you could sign up for the Netflix Prize contest and get access to their data, which is pretty large (a few gigs of entries).

3rd party edit

The URL above does not contain the dataset anylonger (october 2016). The wikipedia page about the Netflix Prize reports that a law suit was settled regarding privacy concerns.

Solution 7 - Sql

You want huge?

Here's a small table: create table foo (id int not null primary key auto_increment, crap char(2000));

insert into foo(crap) values ('');

-- each time you run the next line, the number of rows in foo doubles. insert into foo( crap ) select * from foo;

run it twenty more times, you have over a million rows to play with.

Yes, if he's looking for looks of relations to navigate, this is not the answer. But if by huge he means to test performance and his ability to optimize, this will do it. I did exactly this (and then updated with random values) to test an potential answer I had for another question. (And didn't answer it, because I couldn't come up with better performance than what that asker had.)

Had he asked for "complex", I'd have gien a differnt answer. To me,"huge" implies "lots of rows".

Because you don't need huge to play with tables and relations. Consider a table, by itself, with no nullable columns. How many different kinds of rows can there be? Only one, as all columns must have some value as none can be null.

Every nullable column multiples by two the number of different kinds of rows possible: a row where that column is null, an row where it isn't null.

Now consider the table, not in isolation. Consider a table that is a child table: for every child that has an FK to the parent, that, is a many-to-one, there can be 0, 1 or many children. So we multiply by three times the count we got in the previous step (no rows for zero, one for exactly one, two rows for many). For any grandparent to which the parent is a many, another three.

For many-to-many relations, we can have have no relation, a one-to-one, a one-to-many, many-to-one, or a many-to-many. So for each many-to-many we can reach in a graph from the table, we multiply the rows by nine -- or just like two one-to manys. If the many-to-many also has data, we multiply by the nullability number.

Tables that we can't reach in our graph -- those that we have no direct or indirect FK to, don't multiply the rows in our table.

By recursively multiplying the each table we can reach, we can come up with the number of rows needed to provide one of each "kind", and we need no more than those to test every possible relation in our schema. And we're nowhere near huge.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
Questionn00ki3View Question on Stackoverflow
Solution 1 - Sqlkarim79View Answer on Stackoverflow
Solution 2 - SqliberckView Answer on Stackoverflow
Solution 3 - SqlbendeweyView Answer on Stackoverflow
Solution 4 - Sqluser87335View Answer on Stackoverflow
Solution 5 - SqlEng.FouadView Answer on Stackoverflow
Solution 6 - SqlDan LewView Answer on Stackoverflow
Solution 7 - SqltpdiView Answer on Stackoverflow