SQL Server Parsing a string into a table

Recently someone asked how to break apart multiple values stored together in a single field.  I’ve written a number of variations of functions that perform this task in SQL Server.

Purpose

Some of you may ask why would you ever do such a thing.  You should just normalize your tables and store that delimited data in a child table.  Let me start by saying I don’t generally recommend you design tables with comma delimited values exclusively.  In some scenarios I would both store a summary of delimited values in a column AND create a child table with each entry.  In most cases it just makes sense to have the child table only.  As always it depends on your requirements.  Deciding what level of de-normalization is in your database is complicated.  Ideally, start with a normalized database, and only de-normalize where performance requires it.

Another purpose of this function is in receiving delimited values in a stored procedure call and then turn around and use each delimited value for creating a child table record.

Single Column as String

A string with a single array or values is the easiest and most common example to be parsed.  The example below converts into a table of nvarchar(256).  But you can modify to return a varchar or nvarchar of any length per column.  I set it as this length since that is what I needed in my scenario at max.  You want to return a table of the smallest type in memory as possible.  Later I will show conversion of other types. This function is for converting unicode strings to a table, hence the UString in the name. You get to define the delimited when calling the function. The delimited can be any length as you can see in the examples.

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertUStringToTable]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[XQS_ConvertUStringToTable]
GO

/*  TESTS ...
select * from [dbo].[XQS_ConvertUStringToTable]('tag1;tag2;tag3', ';')
select * from [dbo].[XQS_ConvertUStringToTable]('tag1|-|tag2|-|tag3', '|-|')
*/
create FUNCTION [dbo].[XQS_ConvertUStringToTable](@RawData nvarchar(max), @RowSeparator varchar(5))
RETURNS @T TABLE (Col1 nvarchar(256))
AS
BEGIN

	if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator))
	BEGIN
		RETURN
	END

declare @pos int
declare @piece nvarchar(256)

	-- Need to tack a delimiter onto the end of the input string if one doesn't exist
	if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator
	set @RawData = @RawData + @RowSeparator

	set @pos =  charindex(@RowSeparator, @RawData)
	while @pos <> 0 
	BEGIN
		if (@pos > 1)
		BEGIN
			set @piece = left(@RawData, @pos - 1)

			insert into @T(Col1) Values (@piece)
		END
		set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '')
		set @pos = charindex(@RowSeparator, @RawData)
	END
	RETURN	
END
GO

Single Column as Integers

As I mentioned previously if you know your datatype, return a table of that type instead of casting the values after returning the table of values.  The only difference in this next example is that it returns a table of integers instead. As you can see the insert statement varies from the previous code snippet by adding a cast.

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertUStringToIntegerTable]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[XQS_ConvertUStringToIntegerTable]
GO
/*  TESTS ...
select * from [dbo].[XQS_ConvertUStringToIntegerTable]('1;1234567890;56789', ';')
select * from [dbo].[XQS_ConvertUStringToIntegerTable]('1|-|1234567890|-|56789', '|-|')
*/
create FUNCTION [dbo].[XQS_ConvertUStringToIntegerTable](@RawData nvarchar(max), @RowSeparator nvarchar(10))
RETURNS @T TABLE (ID int)
AS
BEGIN

	if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator))
	BEGIN
		RETURN
	END

declare @pos int
declare @piece nvarchar(80)

	-- Need to tack a delimiter onto the end of the input string if one doesn't exist
	if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator
	set @RawData = @RawData + @RowSeparator

	set @pos =  charindex(@RowSeparator, @RawData)
	while @pos <> 0 
	BEGIN
		if (@pos > 1)
		BEGIN
			set @piece = left(@RawData, @pos - 1)

			insert into @T(ID) Values (cast(@piece as int))
		END
		set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '')
		set @pos = charindex(@RowSeparator, @RawData)
	END
	RETURN	
END
GO

Single column as GUIDs

Converting to a table of unique identifier isn’t much different.  Except now we can take in varchar(max) instead since there are no unicode characters in GUIDs.

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertStringToUniqueIDTable]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[XQS_ConvertStringToUniqueIDTable]
GO
/*  TESTS ...
select * from [dbo].XQS_ConvertStringToUniqueIDTable('{36266EB8-2C50-4131-B3C7-B62BC9DA25D9};ECC75250-A498-4314-8E51-5722C89608B3;{EEA31C53-7B9B-4da1-9823-8B0F60D8F7DA}', ';')
select * from [dbo].XQS_ConvertStringToUniqueIDTable('36266EB8-2C50-4131-B3C7-B62BC9DA25D9|-|{ECC75250-A498-4314-8E51-5722C89608B3}|-|{EEA31C53-7B9B-4da1-9823-8B0F60D8F7DA}', '|-|')
*/
create FUNCTION [dbo].[XQS_ConvertStringToUniqueIDTable](@RawData varchar(max), @RowSeparator varchar(5))
RETURNS @T TABLE (ID uniqueidentifier)
AS
BEGIN

	if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator))
	BEGIN
		RETURN
	END

declare @pos int
declare @piece varchar(40)

	-- Need to tack a delimiter onto the end of the input string if one doesn't exist
	if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator
	set @RawData = @RawData + @RowSeparator

	set @pos =  charindex(@RowSeparator, @RawData)
	while @pos <> 0 
	BEGIN
		if (@pos > 1)
		BEGIN
			set @piece = left(@RawData, @pos - 1)

			insert into @T(ID) Values (cast(@piece as uniqueidentifier))
		END
		set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '')
		set @pos = charindex(@RowSeparator, @RawData)
	END
	RETURN	
END
GO

Multiple columns of string

Sometimes you may have to parse out multiple columns per logical item/row in the source string.  You have to define a set number of columns on the output table.  My generic version of this function returns a table of 3 string columns.  For specific purposes you may want to create a copy of this with different return types or a different number of returned columns.  The code logic would be just about the same no matter the number of or types of columns, except for the casts would vary as in the code samples above.

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertNameValueUStringToTable]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[XQS_ConvertNameValueUStringToTable]
GO
/*  TESTS ...
select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1|42;2|43;1|44', ';', '|')
select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1|42|x;2|43;1|44|y', ';', '|')

select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1;42|-|2;43|-|1;44', '|-|', ';')
select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1;42;x|-|2;43|-|1;44;y', '|-|', ';')
*/
create FUNCTION [dbo].[XQS_ConvertNameValueUStringToTable](@RawData nvarchar(max), @RowSeparator nvarchar(10), @ColSeparator nvarchar(10))
RETURNS @T TABLE (Col1 nvarchar(128), Col2 nvarchar(1024) NULL, Col3 nvarchar(1024) NULL)
AS
BEGIN

	if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator))
	BEGIN
		RETURN
	END

declare @pos int, @piece nvarchar(256)
declare @subpos int, @subpos2 int
declare @pieceCol1 nvarchar(128), @pieceCol2 nvarchar(1024), @pieceCol3 nvarchar(1024)

	-- Need to tack a delimiter onto the end of the input string if one doesn't exist
	if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator
	set @RawData = @RawData + @RowSeparator

	set @pos =  charindex(@RowSeparator, @RawData)
	while @pos <> 0 
	BEGIN
		if (@pos > 1)
		BEGIN
			set @piece = left(@RawData, @pos - 1)

			set @subpos = charindex(@ColSeparator, @piece)
			set @pieceCol1 = left(@piece, @subpos - 1)
			set @subpos2 = charindex(@ColSeparator, @piece, @subpos+len(@ColSeparator))
			if (@subpos2 > @subpos)
			BEGIN
				set @pieceCol2=substring(@piece, @subpos+len(@ColSeparator), @subpos2 - @subpos - len(@ColSeparator))
				set @pieceCol3=substring(@piece, @subpos2+len(@ColSeparator), len(@piece)-@subpos2)
			END
			ELSE
			BEGIN
				set @pieceCol2=substring(@piece, @subpos+len(@ColSeparator), len(@piece)-@subpos)
				set @pieceCol3=null
			END

			insert into @T(Col1, Col2, Col3) Values (@pieceCol1, @pieceCol2, @pieceCol3)
		END
		set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '')
		set @pos = charindex(@RowSeparator, @RawData)
	END
	RETURN	
END
GO

Mulitple columns of specific types

This is a variation of the previous sample, except with integer types for the three columns.  Just substitute the casts and delcared variable types per column as needed in your case.

if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_SCR_ConvertGroupAccessUStringToTable]') and xtype in (N'FN', N'IF', N'TF'))
drop function [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable]
GO

create FUNCTION [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable](@GroupAccess nvarchar(max))
RETURNS @T TABLE (AccessType int, GroupID int, PermissionID int NULL)
AS
BEGIN
/* each access record separated by semi-colon (;).  Each column in each record separated by a pipe (|) */
/* The permissionID 'column' is optional. */
/* @GroupAccess Sample: '1|42;2|43;1|44' (accessType|groupID|permissionID) */
/*                   OR: '1|42|23;2|43|23;1|44|23' */
/*  TESTS ...
select * from [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable]('1|42;2|43;1|44') 
select * from [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable]('1|42|23;2|43|23;1|44|23') 
*/

declare @pos int, @piece nvarchar(256)
declare @subpos int, @subpos2 int
declare @pieceAccessType nvarchar(64), @accessType int
declare @pieceGroupID nvarchar(64), @groupID int
declare @piecePermissionID nvarchar(64), @permissionID int

	if (len(rtrim(ltrim(@GroupAccess)))  ';'
	set @GroupAccess = @GroupAccess  + ';'

	set @pos =  charindex(';', @GroupAccess)
	while @pos <> 0 
	BEGIN
		if (@pos > 1)
		BEGIN
			set @piece = left(@GroupAccess, @pos - 1)

			set @subpos = charindex('|', @piece)
			set @pieceAccessType = left(@piece, @subpos - 1)
			set @accessType = cast(@pieceAccessType as int)
			set @subpos2 = charindex('|', @piece, @subpos+1)
			if (@subpos2 > @subpos)
			BEGIN
				set @pieceGroupID=substring(@piece, @subpos+1, @subpos2 - @subpos - 1)
				set @piecePermissionID=substring(@piece, @subpos2+1, len(@piece)-@subpos2)
				set @groupID = cast(@pieceGroupID as int)
				set @permissionID =  cast(@piecePermissionID as int)
			END
			ELSE
			BEGIN
				set @pieceGroupID=substring(@piece, @subpos+1, len(@piece)-@subpos)
				set @groupID = cast(@pieceGroupID as int)
				set @permissionID = null
			END

			insert into @T(AccessType, GroupID, PermissionID) Values (@accessType, @groupID, @permissionID)
		END
		set @GroupAccess = stuff(@GroupAccess, 1, @pos, '')
		set @pos =  charindex(';', @GroupAccess)
	END
	RETURN	
END
GO
About these ads

4 comments

  1. [...] Recently someone asked how to break apart multiple values stored together in a single field.  I’ve written a number of variations of functions that perform this task in SQL Server. Purpose S…  [...]

  2. Kevin G. Boles · · Reply

    Did you actually try these on a reasonably sized table? They are unbelievably poorly performing. See here (http://www.sqlservercentral.com/articles/Tally+Table/72993/) and here (http://www.sqlservercentral.com/Forums/Topic1101315-203-1.aspx) for an article and discussions on MANY methods for splitting with an incredible array of benchmarking, etc. This type of splitter discussed here should just not be used.

    1. Good point. I should have clarified when to use these or when not to. I would not use these as part of a calculation in a where clause, nor as part of a calculated column.

      I have used these as a means to pass in associations to child records of a stored procedure. For instance, say we have a post table, tag table, and an association table between for a many to many between posts and tags. I would also store a varchar on the post table with the passed in delimited string of tags. When saving a post i would parse the tag ids to associate and save each. This is betterthan a separate db connection per tag to associate. On returning a post or list of posts I just return the columns on the table. I can load the names for each tag id from a cache in the app code.

      Edit: I have not used this in a batch job. Where i have used these they were not a performance issue. But thanks for the links. I read it over once and it looks promising. I’ll read it over more closely later.

      1. Kevin G. Boles · ·

        Assuming you are on SQL 2008 or above the things you describe are much better done with Table Valued Parameters.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 76 other followers

%d bloggers like this: