Fill database tables with a large amount of test data

MysqlDatabase

Mysql Problem Overview


I need to load a table with a large amount of test data. This is to be used for testing performance and scaling.

How can I easily create 100,000 rows of random/junk data for my database table?

Mysql Solutions


Solution 1 - Mysql

You could also use a stored procedure. Consider the following table as an example:

CREATE TABLE your_table (id int NOT NULL PRIMARY KEY AUTO_INCREMENT, val int);

Then you could add a stored procedure like this:

DELIMITER $$
CREATE PROCEDURE prepare_data()
BEGIN
  DECLARE i INT DEFAULT 100;

  WHILE i < 100000 DO
    INSERT INTO your_table (val) VALUES (i);
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;

When you call it, you'll have 100k records:

CALL prepare_data();

Solution 2 - Mysql

For multiple row cloning (data duplication) you could use

DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;
 
  WHILE i < 100000 DO
    INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
    SELECT `user_id`, `page_id`, `name`, `description`, `created`
    FROM `table`
    WHERE id = 1;
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;

Solution 3 - Mysql

Here it's solution with pure math and sql:

create table t1(x int primary key auto_increment);
insert into t1 () values (),(),();

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 1265 rows affected (0.01 sec)
Records: 1265  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 2530 rows affected (0.02 sec)
Records: 2530  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 5060 rows affected (0.03 sec)
Records: 5060  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 10120 rows affected (0.05 sec)
Records: 10120  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 20240 rows affected (0.12 sec)
Records: 20240  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 40480 rows affected (0.17 sec)
Records: 40480  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 80960 rows affected (0.31 sec)
Records: 80960  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 161920 rows affected (0.57 sec)
Records: 161920  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 323840 rows affected (1.13 sec)
Records: 323840  Duplicates: 0  Warnings: 0

mysql> insert into t1 (x) select x + (select count(*) from t1) from t1;
Query OK, 647680 rows affected (2.33 sec)
Records: 647680  Duplicates: 0  Warnings: 0

Solution 4 - Mysql

If you want more control over the data, try something like this (in PHP):

<?php
$conn = mysql_connect(...);
$num = 100000;

$sql = 'INSERT INTO `table` (`col1`, `col2`, ...) VALUES ';
for ($i = 0; $i < $num; $i++) {
  mysql_query($sql . generate_test_values($i));
}
?>

where function generate_test_values would return a string formatted like "('val1', 'val2', ...)". If this takes a long time, you can batch them so you're not making so many db calls, e.g.:

for ($i = 0; $i < $num; $i += 10) {
  $values = array();
  for ($j = 0; $j < 10; $j++) {
    $values[] = generate_test_data($i + $j);
  }
  mysql_query($sql . join(", ", $values));
}

would only run 10000 queries, each adding 10 rows.

Solution 5 - Mysql

try filldb

you can either post your schema or use existing schema and generate dummy data and export from this site and import in your data base.

Solution 6 - Mysql

create table mydata as select * from information_schema.columns;
insert into mydata select * from mydata;
-- repeating the insert 11 times will give you at least 6 mln rows in the table.

I am terribly sorry if this is out of place, but I wanted to offer some explanation on this code as I know just enough to explain it and how the answer above is rather useful if you only understand what it does.

The first line Creates a table called mydata , and it generates the layout of the columns from the information_schema, which stores the information about your MYSQL server, and in this case, it is pulling from information_schema.columns, which allows the table being created to have all the column information needed to create not only the table, but all the columns you will need automatically, very handy.

The second line starts off with an Insert statement that will now target that new table called mydata and insert the Information_schema data into the table. The last line is just a comment suggesting you run the script a few times if you want to generate more data.

Lastly in conclusion, in my testing, one execution of this script generated 6,956 rows of data. If you are needing a quick way to generate some records, this isn't a bad method. However, for more advanced testing, you might want to ALTER the table to include a primary key that auto increments so that you have a unique index as a database without a primary key is a sad database. It also tends to have unpredictable results since there can be duplicate entries. All that being said, I wanted to offer some insight into this code because I found it useful, I think others might as well, if only they had spent the time to explain what it is doing. Most people aren't a fan of executing code that they have no idea what it is going to do, even from a trusted source, so hopefully someone else found this useful as I did. I'm not offering this as "the answer" but rather as another source of information to help provide some logistical support to the above answer.

Solution 7 - Mysql

I really like the mysql_random_data_loader utility from Percona, you can find more details about it here.

mysql_random_data_loader is a utility that connects to the mysql database and fills the specified table with random data. If foreign keys are present in the table, they will also be correctly filled.

This utility has a cool feature, the speed of data generation can be limited.

For example, to generate 30,000 records, in the sakila.film_actor table with a speed of 500 records per second, you need the following command

mysql_random_data_load sakila film_actor 30000 --host=127.0.0.1 --port=3306 --user=my_user --password=my_password --qps=500 --bulk-size=1

I have successfully used this tool to simulate a workload in a test environment by running this utility on multiple threads at different speeds for different tables.

Solution 8 - Mysql

This is a more performant modification to @michalzuber answer. The only difference is removing the WHERE id = 1, so that the inserts can accumulate on each run.

The amount of records produced would be n^2;

So for 10 iterations 10^2 = 1024 records For 20 iterations 20^2 = 1048576 records and so on.

DELIMITER $$
CREATE PROCEDURE insert_test_data()
BEGIN
  DECLARE i INT DEFAULT 1;

  WHILE i <= 10 DO
    INSERT INTO `table` (`user_id`, `page_id`, `name`, `description`, `created`)
    SELECT `user_id`, `page_id`, `name`, `description`, `created`
    FROM `table`;
    SET i = i + 1;
  END WHILE;
END$$
DELIMITER ;
CALL insert_test_data();
DROP PROCEDURE insert_test_data;

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionMohammad UmairView Question on Stackoverflow
Solution 1 - MysqlDaniel VassalloView Answer on Stackoverflow
Solution 2 - MysqlmichalzuberView Answer on Stackoverflow
Solution 3 - MysqlDaniil IaitskovView Answer on Stackoverflow
Solution 4 - MysqlgmarcotteView Answer on Stackoverflow
Solution 5 - Mysqlganesh konathalaView Answer on Stackoverflow
Solution 6 - MysqlMaxView Answer on Stackoverflow
Solution 7 - MysqlIvan GusevView Answer on Stackoverflow
Solution 8 - MysqlBernard WiesnerView Answer on Stackoverflow