Parsing SQL code in C#

C#Sql

C# Problem Overview


I want to parse SQL code using C#.

Specifically, is there any freely available parser which can parse SQL code and generate a tree or any other structure out of it? It should also generate the proper tree for nested structures.

It should also return which kind of statement the node of this tree represents.

For example, if the node contains a loop condition then it should return that this is a "loop type" of a node.

Or is there any way by which I can parse the code in C# and generate a tree of the type I want?

C# Solutions


Solution 1 - C#

Specifically for Transact-SQL (Microsoft SQL Server) you can use the Microsoft.SqlServer.Management.SqlParser.Parser namespace available in Microsoft.SqlServer.Management.SqlParser.dll, an assembly included with SQL Server and which can be freely distributed.

Here's an example method for parsing T-SQL as a string into a sequence of tokens:

IEnumerable<TokenInfo> ParseSql(string sql)
{
    ParseOptions parseOptions = new ParseOptions();
    Scanner scanner = new Scanner(parseOptions);

    int state = 0,
        start,
        end,
        lastTokenEnd = -1,
        token;

    bool isPairMatch, isExecAutoParamHelp;

    List<TokenInfo> tokens = new List<TokenInfo>();

    scanner.SetSource(sql, 0);

    while ((token = scanner.GetNext(ref state, out start, out end, out isPairMatch, out isExecAutoParamHelp)) != (int)Tokens.EOF)
    {
        TokenInfo tokenInfo =
            new TokenInfo()
            {
                Start = start,
                End = end,
                IsPairMatch = isPairMatch,
                IsExecAutoParamHelp = isExecAutoParamHelp,
                Sql = sql.Substring(start, end - start + 1),
                Token = (Tokens)token,
            };

        tokens.Add(tokenInfo);

        lastTokenEnd = end;
    }

    return tokens;
}

Note that the TokenInfo class is just a simple class with the above-referenced properties.

Tokens is this enumeration:

and includes constants like TOKEN_BEGIN, TOKEN_COMMIT, TOKEN_EXISTS, etc.

Solution 2 - C#

Scott Hanselman recently featured the Irony project which includes a sample SQL parser.

Solution 3 - C#

[Warning: answer may no longer apply as of 2021]

Use Microsoft Entity Framework (EF).

It has a "Entity SQL" parser which builds an expression tree,

using System.Data.EntityClient;
...
EntityConnection conn = new EntityConnection(myContext.Connection.ConnectionString);
conn.Open();
EntityCommand cmd = conn.CreateCommand();
cmd.CommandText = @"Select t.MyValue From MyEntities.MyTable As t";
var queryExpression = cmd.Expression;
....
conn.Close();

Or something like that, check it out on MSDN.

And it's all on Ballmers tick :-)

There is also one on The Code Project, SQL Parser.

Good luck.

Solution 4 - C#

You may take a look at a commerical component: general sql parser at http://www.sqlparser.com It supports SQL syntax of Oracle, T-SQL, DB2 and MySQL.

Solution 5 - C#

Try ANTLR - There are a bunch of SQL grammars on there.

Solution 6 - C#

VSTS 2008 Database Edition GDR includes assemblies that handle SQL parsing and script generation that you can reference from your project. Database Edition uses the parser to parse the script files to represent in-memory model of your database and then uses the script generator to generate SQL scripts from the model. I think there are just two assemblies you need to have and reference in your project. If you don't have the database edition, you may install the trial version to get the assemblies or there might be another way to have them without installing the database edition. Check out the following link. Data Dude:Getting to the Crown Jewels .

Solution 7 - C#

Try GOLD Parser, it's a powerful and easy to learn BNF engine. You can search the grammars already made for what you want (ie: SQL ANSI 89 Grammar).

I started using this for HQL parsing (the NHibernate query language, very similar to SQL), and it's awesome.

UPDATE: Now the NH dev team has done the HQL parsing using ANTLR (which is harder to use, but more powerful AFAIK).

Solution 8 - C#

As Diego suggested, grammars are the way to go IMHO. I've tried Coco/r before, but that is too simple for complex SQL. There's ANTLR with a number of grammars ready.

Someone even tried to build a SQL engine, check the code if there's something for you in SharpHSQL - An SQL engine written in C#.

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionArchieView Question on Stackoverflow
Solution 1 - C#Andrey BelykhView Answer on Stackoverflow
Solution 2 - C#user423430View Answer on Stackoverflow
Solution 3 - C#TFDView Answer on Stackoverflow
Solution 4 - C#JamesView Answer on Stackoverflow
Solution 5 - C#Andrew PetersView Answer on Stackoverflow
Solution 6 - C#Mehmet ArasView Answer on Stackoverflow
Solution 7 - C#Diego JancicView Answer on Stackoverflow
Solution 8 - C#Robert CutajarView Answer on Stackoverflow