Regex pattern inside SQL Replace function?

Sql ServerRegex

Sql Server Problem Overview


SELECT REPLACE('<strong>100</strong><b>.00 GB', '%^(^-?\d*\.{0,1}\d+$)%', '');

I want to replace any markup between two parts of the number with above regex, but it does not seem to work. I'm not sure if it is regex syntax that's wrong because I tried simpler one such as '%[^0-9]%' just to test but it didn't work either. Does anyone know how can I achieve this?

Sql Server Solutions


Solution 1 - Sql Server

You can use PATINDEX to find the first index of the pattern (string's) occurrence. Then use STUFF to stuff another string into the pattern(string) matched.

Loop through each row. Replace each illegal characters with what you want. In your case replace non numeric with blank. The inner loop is if you have more than one illegal character in a current cell that of the loop.

DECLARE @counter int

SET @counter = 0

WHILE(@counter < (SELECT MAX(ID_COLUMN) FROM Table))
BEGIN  
   
    WHILE 1 = 1
    BEGIN
		DECLARE @RetVal varchar(50)
	    
		SET @RetVal =  (SELECT Column = STUFF(Column, PATINDEX('%[^0-9.]%', Column),1, '')
		FROM Table
		WHERE ID_COLUMN = @counter)
		
		IF(@RetVal IS NOT NULL)		  
		  UPDATE Table SET
		  Column = @RetVal
		  WHERE ID_COLUMN = @counter
		ELSE
			break
	END
   
    SET @counter = @counter + 1
END

Caution: This is slow though! Having a varchar column may impact. So using LTRIM RTRIM may help a bit. Regardless, it is slow.

Credit goes to this StackOverFlow answer.

EDIT Credit also goes to @srutzky

Edit (by @Tmdean) Instead of doing one row at a time, this answer can be adapted to a more set-based solution. It still iterates the max of the number of non-numeric characters in a single row, so it's not ideal, but I think it should be acceptable in most situations.

WHILE 1 = 1 BEGIN
    WITH q AS
        (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
        FROM Table)
    UPDATE Table
    SET Column = STUFF(Column, q.n, 1, '')
    FROM q
    WHERE Table.ID_Column = q.ID_Column AND q.n != 0;
    
    IF @@ROWCOUNT = 0 BREAK;
END;

You can also improve efficiency quite a lot if you maintain a bit column in the table that indicates whether the field has been scrubbed yet. (NULL represents "Unknown" in my example and should be the column default.)

DECLARE @done bit = 0;
WHILE @done = 0 BEGIN
    WITH q AS
        (SELECT ID_Column, PATINDEX('%[^0-9.]%', Column) AS n
        FROM Table
        WHERE COALESCE(Scrubbed_Column, 0) = 0)
    UPDATE Table
    SET Column = STUFF(Column, q.n, 1, ''),
        Scrubbed_Column = 0
    FROM q
    WHERE Table.ID_Column = q.ID_Column AND q.n != 0;

    IF @@ROWCOUNT = 0 SET @done = 1;

    -- if Scrubbed_Column is still NULL, then the PATINDEX
    -- must have given 0
    UPDATE table
    SET Scrubbed_Column = CASE
        WHEN Scrubbed_Column IS NULL THEN 1
        ELSE NULLIF(Scrubbed_Column, 0)
    END;
END;

If you don't want to change your schema, this is easy to adapt to store intermediate results in a table valued variable which gets applied to the actual table at the end.

Solution 2 - Sql Server

Instead of stripping out the found character by its sole position, using Replace(Column, BadFoundCharacter, '') could be substantially faster. Additionally, instead of just replacing the one bad character found next in each column, this replaces all those found.

WHILE 1 = 1 BEGIN
    UPDATE dbo.YourTable
    SET Column = Replace(Column, Substring(Column, PatIndex('%[^0-9.-]%', Column), 1), '')
    WHERE Column LIKE '%[^0-9.-]%'
    If @@RowCount = 0 BREAK;
END;

I am convinced this will work better than the accepted answer, if only because it does fewer operations. There are other ways that might also be faster, but I don't have time to explore those right now.

Solution 3 - Sql Server

In a general sense, SQL Server does not support regular expressions and you cannot use them in the native T-SQL code.

You could write a CLR function to do that. See here, for example.

Solution 4 - Sql Server

For those looking for a performant and easy solution and are willing to enable CLR:

create database TestSQLFunctions
go
use TestSQLFunctions
go
alter database TestSQLFunctions set trustworthy on

EXEC sp_configure 'clr enabled', 1
RECONFIGURE WITH OVERRIDE
go

CREATE ASSEMBLY [SQLFunctions]
AUTHORIZATION [dbo]
FROM 
WITH PERMISSION_SET = SAFE

go

CREATE FUNCTION RegexReplace(
	@input nvarchar(max),
	@pattern nvarchar(max),
	@replacement nvarchar(max)
) RETURNS nvarchar  (max)
AS EXTERNAL NAME SQLFunctions.[SQLFunctions.Regex].Replace; 

go

-- outputs This is a test 
select dbo.RegexReplace('This is a test 12345','[0-9]','')

Content of the DLL: enter image description here

Solution 5 - Sql Server

Here is a function I wrote to accomplish this based off of the previous answers.

CREATE FUNCTION dbo.RepetitiveReplace
(
    @P_String VARCHAR(MAX),
    @P_Pattern VARCHAR(MAX),
    @P_ReplaceString VARCHAR(MAX),
    @P_ReplaceLength INT = 1
)
RETURNS VARCHAR(MAX)
BEGIN
    DECLARE @Index INT;

    -- Get starting point of pattern
    SET @Index = PATINDEX(@P_Pattern, @P_String);

    while @Index > 0
    begin
        --replace matching charactger at index
        SET @P_String = STUFF(@P_String, PATINDEX(@P_Pattern, @P_String), @P_ReplaceLength, @P_ReplaceString);
        SET @Index = PATINDEX(@P_Pattern, @P_String);
    end

    RETURN @P_String;
END;

[Gist][1] [1]: https://gist.github.com/jkdba/ca13fe8f2a9855c4bdbfd0a5d3dfcda2

Edit:

Originally I had a recursive function here which does not play well with sql server as it has a 32 nesting level limit which would result in an error like the below any time you attempt to make 32+ replacements with the function. Instead of trying to make a server level change to allow more nesting (which could be dangerous like allow never ending loops) switching to a while loop makes a lot more sense.

Maximum stored procedure, function, trigger, or view nesting level exceeded (limit 32).

Solution 6 - Sql Server

I stumbled across this post looking for something else but thought I'd mention a solution I use which is far more efficient - and really should be the default implementation of any function when used with a set based query - which is to use a cross applied table function. Seems the topic is still active so hopefully this is useful to someone.

Example runtime on a few of the answers so far based on running recursive set based queries or scalar function, based on 1m rows test set removing the chars from a random newid, ranges from 34s to 2m05s for the WHILE loop examples and from 1m3s to {forever} for the function examples.

Using a table function with cross apply achieves the same goal in 10s. You may need to adjust it to suit your needs such as the max length it handles.

Function:

CREATE FUNCTION [dbo].[RemoveChars](@InputUnit VARCHAR(40))
RETURNS TABLE
AS
RETURN
	(
		WITH Numbers_prep(Number) AS
			(
				SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1 UNION ALL SELECT 1
			)
		,Numbers(Number) AS
			(
				SELECT TOP (ISNULL(LEN(@InputUnit),0))
					row_number() OVER (ORDER BY (SELECT NULL))
				FROM Numbers_prep a
					CROSS JOIN Numbers_prep b
			)
		SELECT
			OutputUnit
		FROM
			(
				SELECT
					substring(@InputUnit,Number,1)
				FROM  Numbers
				WHERE substring(@InputUnit,Number,1) like '%[0-9]%'
				ORDER BY Number
				FOR XML PATH('')
			) Sub(OutputUnit)
	)

Usage:

UPDATE t
SET column = o.OutputUnit
FROM ##t t
CROSS APPLY [dbo].[RemoveChars](t.column) o

Solution 7 - Sql Server

Wrapping the solution inside a SQL function could be useful if you want to reuse it. I'm even doing it at the cell level, that's why I'm putting this as a different answer:

CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (@string VARCHAR(300))
RETURNS VARCHAR(300)
BEGIN
	DECLARE @str VARCHAR(300) = @string;
	DECLARE @Pattern VARCHAR (20) = '%[^a-zA-Z0-9]%';
	DECLARE @Len INT;
	SELECT @Len = LEN(@String);	
	WHILE @Len > 0 
	BEGIN
		SET @Len = @Len - 1;
		IF (PATINDEX(@Pattern,@str) > 0)
			BEGIN
				SELECT @str = STUFF(@str, PATINDEX(@Pattern,@str),1,'');	
			END
		ELSE
		BEGIN
			BREAK;
		END
	END		
	RETURN @str
END

A more speedy approach for large strings would look something like this:

CREATE FUNCTION [dbo].[fnReplaceInvalidChars] (@string VARCHAR(MAX))
RETURNS VARCHAR(MAX)
BEGIN
    DECLARE @str VARCHAR(MAX) = @string;
    DECLARE @Pattern VARCHAR (MAX) = '%[^a-zA-Z0-9]%';
    WHILE PATINDEX(@Pattern,@str) > 0
    BEGIN
        SELECT @str = STUFF(@str, PATINDEX(@Pattern,@str),1,''); 
    END     
    RETURN @str
END

Solution 8 - Sql Server

I've created this function to clean up a string that contained non numeric characters in a time field. The time contained question marks when they did not added the minutes, something like this 20:??. Function loops through each character and replaces the ? with a 0 :

 CREATE FUNCTION [dbo].[CleanTime]
(
	-- Add the parameters for the function here
	@intime nvarchar(10) 
)
RETURNS nvarchar(5)
AS
BEGIN
	-- Declare the return variable here
	DECLARE @ResultVar nvarchar(5)
	DECLARE @char char(1)
	-- Add the T-SQL statements to compute the return value here
	DECLARE @i int = 1
	WHILE @i <= LEN(@intime)
	BEGIN
	SELECT @char =  CASE WHEN substring(@intime,@i,1) like '%[0-9:]%' THEN substring(@intime,@i,1) ELSE '0' END
	SELECT @ResultVar = concat(@ResultVar,@char)   
	set @i  = @i + 1       
	END;
	-- Return the result of the function
	RETURN @ResultVar

END
        

Solution 9 - Sql Server

i think this solution is faster and simple. i use always CTE/recursive beacuse WHILE is so slow on mssql. I use it in projects I work with and large databases.

/*
Function:			dbo.kSql_ReplaceRegExp
Create Date:        20.02.2021
Author:             Karcan Ozbal

Description:        The given string value will be replaced according to the given regexp/pattern.

Parameter(s):       @Value		 : Value/Text to REPLACE.
					@RegExp		 : The regexp/pattern to be used for REPLACE operation.

Usage:				select dbo.kSql_ReplaceRegExp('2T3EST5','%[0-9]%')
Output:				'TEST'
*/
ALTER FUNCTION [dbo].[kSql_ReplaceRegExp](
	@Value nvarchar(max),
	@RegExp nvarchar(50)
)
RETURNS nvarchar(max)
AS
BEGIN
	DECLARE @Result nvarchar(max)

	;WITH CTE AS (
		SELECT NUM = 1, VALUE = @Value, IDX = PATINDEX(@RegExp, @Value)
		UNION ALL
		SELECT NUM + 1, VALUE = REPLACE(VALUE, SUBSTRING(VALUE,IDX,1),''), IDX = PATINDEX(@RegExp, REPLACE(VALUE, SUBSTRING(VALUE,IDX,1),'')) 
		FROM CTE
		WHERE IDX > 0
	)
	SELECT TOP(1) @Result = VALUE 
	FROM CTE 
	ORDER BY NUM DESC
	OPTION (maxrecursion 0)

	RETURN @Result
END

Solution 10 - Sql Server

If you are doing this just for a parameter coming into a Stored Procedure, you can use the following:

declare @badIndex int
set @badIndex = PatIndex('%[^0-9]%', @Param)
while @badIndex > 0
    set	@Param = Replace(@Param, Substring(@Param, @badIndex, 1), '')
    set @badIndex = PatIndex('%[^0-9]%', @Param)

Solution 11 - Sql Server

I thought this was clearer:

ALTER FUNCTION [dbo].[func_ReplaceChars](
    @Value nvarchar(max),
    @Chars nvarchar(50)
)
RETURNS nvarchar(max)
AS
BEGIN
	DECLARE @cLen int = len(@Chars);
	DECLARE @curChar int = 0;

	WHILE @curChar<@cLen
	BEGIN
		set @Value = replace(@Value,substring(@Chars,@curChar,1),'');

		set @curChar = @curChar + 1;
	END;

    RETURN @Value
END

Solution 12 - Sql Server

I think a simpler and faster approach is iterate by each character of the alphabet:

DECLARE @i int
SET @i = 0

WHILE(@i < 256)
BEGIN  

    IF char(@i) NOT IN ('0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '.')      
      
      UPDATE Table SET Column = replace(Column, char(@i), '')
        
    SET @i = @i + 1

END

Attributions

All content for this solution is sourced from the original question on Stackoverflow.

The content on this page is licensed under the Attribution-ShareAlike 4.0 International (CC BY-SA 4.0) license.

Content TypeOriginal AuthorOriginal Content on Stackoverflow
QuestionJanTView Question on Stackoverflow
Solution 1 - Sql ServerMukusView Answer on Stackoverflow
Solution 2 - Sql ServerErikEView Answer on Stackoverflow
Solution 3 - Sql ServerSzymonView Answer on Stackoverflow
Solution 4 - Sql ServerMichaelDView Answer on Stackoverflow
Solution 5 - Sql ServerjkdbaView Answer on Stackoverflow
Solution 6 - Sql ServerSQLGobbleDeGookView Answer on Stackoverflow
Solution 7 - Sql ServerIvan RasconView Answer on Stackoverflow
Solution 8 - Sql ServerNordinView Answer on Stackoverflow
Solution 9 - Sql ServerKarcanView Answer on Stackoverflow
Solution 10 - Sql Servergoddess_elliView Answer on Stackoverflow
Solution 11 - Sql ServerRene BotternView Answer on Stackoverflow
Solution 12 - Sql ServerGregorioView Answer on Stackoverflow