Tools for Generating Mock Data?

TestingData Generation

Testing Problem Overview


I'm looking for recommendations of a good, free tool for generating sample data for the purpose of loading into test databases. By analogy, something that produces "lorem ipsum" text for any RDBMS. Features I'm looking for include:

  • Flexibility to generate data for an existing table definition.
  • Ability to generate small and large data sets (> 1 million rows or more).
  • Generate in SQL script format (INSERT statements) or else in a flat file format suitable for bulk import (which is usually faster).
  • A command-line interface for easy scripting.
  • Extensible, open source, written in a dynamic language (these are nice-to-haves, not strong requirements).

PS: I did search for a duplicate question on StackOverflow, but I didn't find one. If there is one, I'll be grateful to get a pointer to it.


Thanks for the great responses everyone! I should amend my requirements that I use Mac OS X as my primary development environment, not Windows (though I did say command-line interface is desirable, and that practically rules out Windows). The Windows-specific suggestions will no doubt be useful to other readers of this question, though, so thanks.


Here is my conclusion:

  • GenerateData:
  • PHP web app interface, not command line
  • limited to generating 200 records (or pay $20 for license to generating 5,000 records)
  • RedGate SQL Data Generator
  • not free, price $295
  • requires Windows, .NET, SQL Server
  • Visual Studio 2008 Database Edition
  • requires Windows
  • requires costly MSDN or ISV subscription
  • Banner Datadect
  • not free, price $595
  • requires Windows (?)
  • no support for MySQL (?)
  • GUI, not command line or scriptable
  • Ruby Faker gem
  • way too slow to use ActiveRecord for bulk data load
  • Super Smack
  • chiefly a load-testing tool, with a random data generator built in
  • pretty simple to use nevertheless
  • overall a good runner-up tool
  • Databene Benerator
  • best solution for my needs
  • XML scripts, compatible with DbUnit
  • open source (GPL) Java code
  • command-line usage
  • access many databases directly via JDBC

Testing Solutions


Solution 1 - Testing

Take a look at databene benerator, a test data generator that looks close to your requirements.

  • it can generate data for an existing table definition (or even anonymize production data)
  • it can generate larges data set (unlimited size)
  • it supports various input (CSV, Flat Files, DBUnit) and output format (CSV, Flat Files, DBUnit, XML, Excel, Scripts)
  • it can be used on the command line or through a maven plugin
  • it's open source and customizable

I would give it a try.

BTW, a list of similar products is available on databene benerator's web site.

Solution 2 - Testing

This looks quite promising: http://www.generatedata.com/">generatedata.com</a>;. Open-source, has lots of built-in data types.

There are several others listed here: http://www.webresourcesdepot.com/test-sample-data-generators/">Test (Sample) Data Generators. I don't have experience with any of them, but a few on that list look like they could be pretty decent.

Solution 3 - Testing

Try http://www.mockaroo.com

This is a tool my company made to help test our own applications. We've made it free for anyone to use. It's basically the Forgery ruby gem with a web app wrapped around it. You can generate data in CSV, txt, or SQL formats. Hope this helps.

Solution 4 - Testing

I know you said you were looking for a free tool, but this is one case where I would suggest that spending $295 will pay you back quickly in time saved. I've been using the RedGate tool SQL Data Generator for the last year and it is, to be short, an awesome tool. It allows for setting dependencies between columns, generates realistic data for business objects such as phone numbers, urls, names, etc. I can honestly state that this tool has paid for itself time and time again.

Solution 5 - Testing

If you are looking or willing to use something MySQL-specific, you could take a look at Super Smack. It is currently maintained by Tony Bourke.

Super Smack allows you to generate random data to insert into your database tables. It is customizable, allowing you to use the packaged words.dat file, or any test data of your choice.

One of the nice things about it is that it is command-line is highly customizable. There is some fairly decent examples of usage in the book High Performance MySQL which is also excerpted here.

Not sure if that is along the lines of what you are looking for, but just a thought.

Solution 6 - Testing

A Ruby script with one of the available fake data generators should do you just fine.

http://faker.rubyforge.org/ is one such gem. Unfortunately, this doesn't fulfill all your requirements.

Here is another: http://random-data.rubyforge.org/

And a tutorial for using Faker: http://www.rubyandhow.com/how-to-generate-fake-names-addresses-in-ruby/


RE: Flexibility to generate data for an existing table definition. Combine the Faker gem with one of the available ORMs. ActiveRecord would probably be easiest.

Solution 7 - Testing

Normally very costly, but if you are a small ISV you can get Visual Studio 2008 Database Edition very cheaply, see the empower and bizspark promotions. It provides a lot more functionality then just generating test data (Integration with SCC, Unit Testing, DB Refactoring, etc.)

As I like the fact that Red-Grate tools are so easy to learn, I would still look at SQL Data Generator

Solution 8 - Testing

a tool that really should not be missing from the list is the Data Generator from Datanamic that populates databases directly or generates insert scripts, has a large collection of pre-installed generators ( and supports multiple databases...

http://www.datanamic.com/datagenerator/index.html

Solution 9 - Testing

I know you're not looking for actual lorem ipsum text; but in case anyone else searches for an actual lorem ipsum generator and finds this thread: lipsum.com does a great job of it.

Solution 10 - Testing

Not free, but Visual Studio 2008 Database Edition is a good alternative and it provides a lot more functionality (Integration with SCC, Unit Testing, DB Refactoring, etc...)

Solution 11 - Testing

I use a tool called Datatect:

  1. Generates data to flat files or any ODBC compliant database.
  2. Extensible via VBScript.
  3. Referentially aware; will populate foreign keys with values from parent table.
  4. Data is context aware; city, state and phone numbers for given zip codes, first names and titles with gender.
  5. Can create custom, complex data types.
  6. Generate over 2 billion proper names, business names, street addresses, cities, states, and zip codes.

I've used this tool to generate as many as 40,000,000 rows of data to a SQLServer database, and 8,000,000 rows of data to an Oracle database.

I am in no way affiliated with Banner Systems, just a satisfied customer.

Solution 12 - Testing

Here is the list of such tools (both free and commercial): http://c2.com/cgi/wiki?TestDataGenerator

Solution 13 - Testing

For OS X there is Data Creator (US $ 7). Download is free for test purpose. You can use it to evaluate the software and its features.

It requires OS X Lion or successive. It can generate a lot of different field type and has a custom export mode plus some pre-set (TSV, CSV, Html table, web page with table inside).

http://www.tensionsoftware.com/osx/datacreator/

here at the App Store:

https://itunes.apple.com/us/app/data-creator/id491686136?mt=12

Solution 14 - Testing

You can use DbSchema, www.dbschema.com it's a database management tool and it has a Random Data Generator to populate your database.

Solution 15 - Testing

Not direct answer to your question but this can be helpful for certain kind of data :

Fake Name Generator can be useful - http://www.fakenamegenerator.com/ , not for everything but user accounts or stuff like that. AFAIK They provide support for bulk order.

Solution 16 - Testing

+1 for Benerator: I tried 3 or 4 of the other tools on offer (including dbmonster) but found Benerator to be very quick, to deliver realistic data and to be flexible. I also got very quick & helpful feedback from the tool's creator when I posted on the forum.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionBill KarwinView Question on Stackoverflow
Solution 1 - TestingPascal ThiventView Answer on Stackoverflow
Solution 2 - TestingChad BirchView Answer on Stackoverflow
Solution 3 - TestingmockaroodevView Answer on Stackoverflow
Solution 4 - TestingKevDogView Answer on Stackoverflow
Solution 5 - TestingjonstjohnView Answer on Stackoverflow
Solution 6 - TestingbrendanjerwinView Answer on Stackoverflow
Solution 7 - TestingIan RingroseView Answer on Stackoverflow
Solution 8 - Testinguser2072139View Answer on Stackoverflow
Solution 9 - TestingJenn D.View Answer on Stackoverflow
Solution 10 - Testingbastos.sergioView Answer on Stackoverflow
Solution 11 - TestingPatrick CuffView Answer on Stackoverflow
Solution 12 - TestingIgorJView Answer on Stackoverflow
Solution 13 - TestingRPTView Answer on Stackoverflow
Solution 14 - Testinguser2143407View Answer on Stackoverflow
Solution 15 - Testingdr. evilView Answer on Stackoverflow
Solution 16 - TestingdavekView Answer on Stackoverflow