T-SQL: Opposite to string concatenation - how to split string into multiple records

Sql ServerSql Server-2005Tsql

Sql Server Problem Overview


> Possible Duplicate:
> Split string in SQL

I have seen a couple of questions related to string concatenation in SQL. I wonder how would you approach the opposite problem: splitting coma delimited string into rows of data:

Lets say I have tables:

userTypedTags(userID,commaSeparatedTags) 'one entry per user
tags(tagID,name)

And want to insert data into table

userTag(userID,tagID) 'multiple entries per user

Inspired by Which tags are not in the database? question

EDIT

Thanks for the answers, actually more then one deserves to be accepted but I can only pick one, and the solution presented by Cade Roux with recursions seems pretty clean to me. It works on SQL Server 2005 and above.

For earlier version of SQL Server the solution provided by miies can be used. For working with text data type wcm answer will be helpful. Thanks again.

Sql Server Solutions


Solution 1 - Sql Server

There are a wide varieties of solutions to this problem documented here, including this little gem:

CREATE FUNCTION dbo.Split (@sep char(1), @s varchar(512))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + 1, CHARINDEX(@sep, @s, stop + 1)
      FROM Pieces
      WHERE stop > 0
    )
    SELECT pn,
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 512 END) AS s
    FROM Pieces
  )

Solution 2 - Sql Server

You can also achieve this effect using XML, as seen here, which removes the limitation of the answers provided which all seem to include recursion in some fashion. The particular use I've made here allows for up to a 32-character delimiter, but that could be increased however large it needs to be.

create FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
    RETURN
    (
        SELECT r.value('.','VARCHAR(MAX)') as Item
        FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(REPLACE(REPLACE(@s,'& ','&amp; '),'<','&lt;'), @sep, '</r><r>') + '</r></root>') as valxml) x
        CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
    )

Then you can invoke it using:

SELECT * FROM dbo.Split(' ', 'I hate bunnies')

Which returns:

-----------
|I        |
|---------|
|hate     |
|---------|
|bunnies  |
-----------


I should note, I don't actually hate bunnies... it just popped into my head for some reason.
The following is the closest thing I could come up with using the same method in an inline table-valued function. DON'T USE IT, IT'S HORRIBLY INEFFICIENT! It's just here for reference sake.

CREATE FUNCTION [dbo].[Split] (@sep VARCHAR(32), @s VARCHAR(MAX))
RETURNS TABLE
AS
	RETURN
	(
        SELECT r.value('.','VARCHAR(MAX)') as Item
        FROM (SELECT CONVERT(XML, N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>') as valxml) x
		CROSS APPLY x.valxml.nodes('//root/r') AS RECORDS(r)
	)

Solution 3 - Sql Server

I use this function (SQL Server 2005 and above).

create function [dbo].[Split]
(
	@string nvarchar(4000),
	@delimiter nvarchar(10)
)
returns @table table
(
	[Value] nvarchar(4000)
)
begin
	declare @nextString nvarchar(4000)
	declare @pos int, @nextPos int

	set @nextString = ''
	set @string = @string + @delimiter

	set @pos = charindex(@delimiter, @string)
	set @nextPos = 1
	while (@pos <> 0)
	begin
		set @nextString = substring(@string, 1, @pos - 1)
		
		insert into @table
		(
			[Value]
		)
		values
		(
			@nextString
		)

		set @string = substring(@string, @pos + len(@delimiter), len(@string))
		set @nextPos = @pos
		set @pos = charindex(@delimiter, @string)
	end
	return
end

Solution 4 - Sql Server

For the particular case of splitting strings into words I've come across another solution for SQL Server 2008.

with testTable AS
(
SELECT 1 AS Id, N'how now brown cow' AS txt UNION ALL
SELECT 2, N'she sells sea shells upon the sea shore' UNION ALL
SELECT 3, N'red lorry yellow lorry' UNION ALL
SELECT 4, N'the quick brown fox jumped over the lazy dog'
)

SELECT display_term, COUNT(*) As Cnt
 FROM testTable
CROSS APPLY sys.dm_fts_parser('"' + txt + '"', 1033, 0,0)
GROUP BY display_term
HAVING COUNT(*) > 1
ORDER BY Cnt DESC

Returns

display_term                   Cnt
------------------------------ -----------
the                            3
brown                          2
lorry                          2
sea                            2

Solution 5 - Sql Server

Slight modification of the solution above so it works with variable-length delimiters.

create FUNCTION dbo.fn_Split2 (@sep nvarchar(10), @s nvarchar(4000))
RETURNS table
AS
RETURN (
    WITH Pieces(pn, start, stop) AS (
      SELECT 1, 1, CHARINDEX(@sep, @s)
      UNION ALL
      SELECT pn + 1, stop + (datalength(@sep)/2), CHARINDEX(@sep, @s, stop + (datalength(@sep)/2))
      FROM Pieces
      WHERE stop > 0
    )
    SELECT pn,
      SUBSTRING(@s, start, CASE WHEN stop > 0 THEN stop-start ELSE 4000 END) AS s
    FROM Pieces
  )

NB: I've used datalength() since len() reports incorrectly if there are trailing spaces.

Solution 6 - Sql Server

Here's a Split function that is compatible with SQL Server versions prior to 2005.

CREATE FUNCTION dbo.Split(@data nvarchar(4000), @delimiter nvarchar(100))  
RETURNS @result table (Id int identity(1,1), Data nvarchar(4000)) 
AS  
BEGIN 
	DECLARE @pos   INT
	DECLARE @start INT
	DECLARE @len   INT
	DECLARE @end   INT

	SET @len   = LEN('.' + @delimiter + '.') - 2
	SET @end   = LEN(@data) + 1
	SET @start = 1
	SET @pos   = 0

	WHILE (@pos < @end)
	BEGIN
		SET @pos = CHARINDEX(@delimiter, @data, @start)
		IF (@pos = 0) SET @pos = @end

		INSERT @result (data) SELECT SUBSTRING(@data, @start, @pos - @start)
		SET @start = @pos + @len
	END

	RETURN
END

Solution 7 - Sql Server

Using CLR, here's a much simpler alternative that works in all cases, yet 40% faster than the accepted answer:

using System;
using System.Collections;
using System.Data.SqlTypes;
using System.Text.RegularExpressions;
using Microsoft.SqlServer.Server;

public class UDF
{
    [SqlFunction(FillRowMethodName="FillRow")]
    public static IEnumerable RegexSplit(SqlString s, SqlString delimiter)
    {
        return Regex.Split(s.Value, delimiter.Value);
    }

    public static void FillRow(object row, out SqlString str)
    {
        str = new SqlString((string) row);
    }
}

Of course, it is still 8 times slower than PostgreSQL's regexp_split_to_table.

Solution 8 - Sql Server

SELECT substring(commaSeparatedTags,0,charindex(',',commaSeparatedTags))

will give you the first tag. You can proceed similarly to get the second one and so on by combining substring and charindex one layer deeper each time. That's an immediate solution but it works only with very few tags as the query grows very quickly in size and becomes unreadable. Move on to functions then, as outlined in other, more sophisticated answers to this post.

Solution 9 - Sql Server

I wrote this awhile back. It assumes the delimiter is a comma and that the individual values aren't bigger than 127 characters. It could be modified pretty easily.

It has the benefit of not being limited to 4,000 characters.

Good luck!

ALTER Function [dbo].[SplitStr] ( 
        @txt text 
) 
Returns @tmp Table 
        ( 
                value varchar(127)
        ) 
as 
BEGIN 
        declare @str varchar(8000) 
                , @Beg int 
                , @last int 
                , @size int 
        
        set @size=datalength(@txt) 
        set @Beg=1 
        
        
        set @str=substring(@txt,@Beg,8000) 
        IF len(@str)<8000 set @Beg=@size 
        ELSE BEGIN 
                set @last=charindex(',', reverse(@str)) 
                set @str=substring(@txt,@Beg,8000-@last) 
                set @Beg=@Beg+8000-@last+1 
        END 
        
        declare @workingString varchar(25) 
                , @stringindex int 
        
        
        
        while @Beg<=@size Begin 
                WHILE LEN(@str) > 0 BEGIN 
                        SELECT @StringIndex = CHARINDEX(',', @str) 
                        
                        SELECT 
                                @workingString = CASE 
                                        WHEN @StringIndex > 0 THEN SUBSTRING(@str, 1, @StringIndex-1) 
                                        ELSE @str 
                                END 
                
                        INSERT INTO 
                                @tmp(value)
                        VALUES 
                                (cast(rtrim(ltrim(@workingString)) as varchar(127)))
                        SELECT @str = CASE 
                                WHEN CHARINDEX(',', @str) > 0 THEN SUBSTRING(@str, @StringIndex+1, LEN(@str)) 
                                ELSE '' 
                        END 
                END 
                set @str=substring(@txt,@Beg,8000) 
        
                if @Beg=@size set @Beg=@Beg+1 
                else IF len(@str)<8000 set @Beg=@size 
                ELSE BEGIN 
                        set @last=charindex(',', reverse(@str)) 
                        set @str=substring(@txt,@Beg,8000-@last) 
                        set @Beg=@Beg+8000-@last+1 
        
                END 
        END     

        return
END 

Solution 10 - Sql Server

I up-voted "Nathan Wheeler" answer as I found "Cade Roux" answer did not work above a certain string size.

Couple of points

-I found adding the DISTINCT keyword improved performance for me.

-Nathan's answer only works if your identifiers are 5 characters or less, of course you can adjust that...If the items you are splitting are INT identifiers as I am you can us same as me below:

CREATE FUNCTION [dbo].Split
(
	@sep VARCHAR(32), 
	@s VARCHAR(MAX)
)
RETURNS 
	@result TABLE (
		Id INT NULL
	)   
AS
BEGIN
	DECLARE @xml XML
	SET @XML = N'<root><r>' + REPLACE(@s, @sep, '</r><r>') + '</r></root>'
	
	INSERT INTO @result(Id)
	SELECT DISTINCT r.value('.','int') as Item
	FROM @xml.nodes('//root//r') AS RECORDS(r)

	RETURN
END

Solution 11 - Sql Server

I usually do this with the following code:

create function [dbo].[Split](@string varchar(max), @separator varchar(10))
returns @splited table ( stringPart varchar(max) )
with execute as caller
as
begin
    declare @stringPart varchar(max);
    set @stringPart = '';
    
    while charindex(@separator, @string) > 0
    begin
        set @stringPart = substring(@string, 0, charindex(@separator, @string));
        insert into @splited (stringPart) values (@stringPart);
        set @string = substring(@string, charindex(@separator, @string) + len(@separator), len(@string) + 1);
    end
    
    return;
end
go

You can test it with this query:

declare @example varchar(max);
set @example = 'one;string;to;rule;them;all;;';

select * from [dbo].[Split](@example, ';');

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionkristofView Question on Stackoverflow
Solution 1 - Sql ServerCade RouxView Answer on Stackoverflow
Solution 2 - Sql ServerNathan WheelerView Answer on Stackoverflow
Solution 3 - Sql Serveruser39603View Answer on Stackoverflow
Solution 4 - Sql ServerMartin SmithView Answer on Stackoverflow
Solution 5 - Sql ServerRoryView Answer on Stackoverflow
Solution 6 - Sql ServerTomalakView Answer on Stackoverflow
Solution 7 - Sql ServersayapView Answer on Stackoverflow
Solution 8 - Sql ServerYann SemetView Answer on Stackoverflow
Solution 9 - Sql ServerwcmView Answer on Stackoverflow
Solution 10 - Sql ServerDarrenView Answer on Stackoverflow
Solution 11 - Sql ServerMarekView Answer on Stackoverflow