Recently someone asked how to break apart multiple values stored together in a single field. I’ve written a number of variations of functions that perform this task in SQL Server.
Some of you may ask why would you ever do such a thing. You should just normalize your tables and store that delimited data in a child table. Let me start by saying I don’t generally recommend you design tables with comma delimited values exclusively. In some scenarios I would both store a summary of delimited values in a column AND create a child table with each entry. In most cases it just makes sense to have the child table only. As always it depends on your requirements. Deciding what level of de-normalization is in your database is complicated. Ideally, start with a normalized database, and only de-normalize where performance requires it.
Another purpose of this function is in receiving delimited values in a stored procedure call and then turn around and use each delimited value for creating a child table record.
A string with a single array or values is the easiest and most common example to be parsed. The example below converts into a table of nvarchar(256). But you can modify to return a varchar or nvarchar of any length per column. I set it as this length since that is what I needed in my scenario at max. You want to return a table of the smallest type in memory as possible. Later I will show conversion of other types. This function is for converting unicode strings to a table, hence the UString in the name. You get to define the delimited when calling the function. The delimited can be any length as you can see in the examples.
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertUStringToTable]') and xtype in (N'FN', N'IF', N'TF')) drop function [dbo].[XQS_ConvertUStringToTable] GO /* TESTS ... select * from [dbo].[XQS_ConvertUStringToTable]('tag1;tag2;tag3', ';') select * from [dbo].[XQS_ConvertUStringToTable]('tag1|-|tag2|-|tag3', '|-|') */ create FUNCTION [dbo].[XQS_ConvertUStringToTable](@RawData nvarchar(max), @RowSeparator varchar(5)) RETURNS @T TABLE (Col1 nvarchar(256)) AS BEGIN if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator)) BEGIN RETURN END declare @pos int declare @piece nvarchar(256) -- Need to tack a delimiter onto the end of the input string if one doesn't exist if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator set @RawData = @RawData + @RowSeparator set @pos = charindex(@RowSeparator, @RawData) while @pos <> 0 BEGIN if (@pos > 1) BEGIN set @piece = left(@RawData, @pos - 1) insert into @T(Col1) Values (@piece) END set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '') set @pos = charindex(@RowSeparator, @RawData) END RETURN END GO
As I mentioned previously if you know your datatype, return a table of that type instead of casting the values after returning the table of values. The only difference in this next example is that it returns a table of integers instead. As you can see the insert statement varies from the previous code snippet by adding a cast.
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertUStringToIntegerTable]') and xtype in (N'FN', N'IF', N'TF')) drop function [dbo].[XQS_ConvertUStringToIntegerTable] GO /* TESTS ... select * from [dbo].[XQS_ConvertUStringToIntegerTable]('1;1234567890;56789', ';') select * from [dbo].[XQS_ConvertUStringToIntegerTable]('1|-|1234567890|-|56789', '|-|') */ create FUNCTION [dbo].[XQS_ConvertUStringToIntegerTable](@RawData nvarchar(max), @RowSeparator nvarchar(10)) RETURNS @T TABLE (ID int) AS BEGIN if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator)) BEGIN RETURN END declare @pos int declare @piece nvarchar(80) -- Need to tack a delimiter onto the end of the input string if one doesn't exist if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator set @RawData = @RawData + @RowSeparator set @pos = charindex(@RowSeparator, @RawData) while @pos <> 0 BEGIN if (@pos > 1) BEGIN set @piece = left(@RawData, @pos - 1) insert into @T(ID) Values (cast(@piece as int)) END set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '') set @pos = charindex(@RowSeparator, @RawData) END RETURN END GO
Converting to a table of unique identifier isn’t much different. Except now we can take in varchar(max) instead since there are no unicode characters in GUIDs.
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertStringToUniqueIDTable]') and xtype in (N'FN', N'IF', N'TF')) drop function [dbo].[XQS_ConvertStringToUniqueIDTable] GO /* TESTS ... select * from [dbo].XQS_ConvertStringToUniqueIDTable('{36266EB8-2C50-4131-B3C7-B62BC9DA25D9};ECC75250-A498-4314-8E51-5722C89608B3;{EEA31C53-7B9B-4da1-9823-8B0F60D8F7DA}', ';') select * from [dbo].XQS_ConvertStringToUniqueIDTable('36266EB8-2C50-4131-B3C7-B62BC9DA25D9|-|{ECC75250-A498-4314-8E51-5722C89608B3}|-|{EEA31C53-7B9B-4da1-9823-8B0F60D8F7DA}', '|-|') */ create FUNCTION [dbo].[XQS_ConvertStringToUniqueIDTable](@RawData varchar(max), @RowSeparator varchar(5)) RETURNS @T TABLE (ID uniqueidentifier) AS BEGIN if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator)) BEGIN RETURN END declare @pos int declare @piece varchar(40) -- Need to tack a delimiter onto the end of the input string if one doesn't exist if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator set @RawData = @RawData + @RowSeparator set @pos = charindex(@RowSeparator, @RawData) while @pos <> 0 BEGIN if (@pos > 1) BEGIN set @piece = left(@RawData, @pos - 1) insert into @T(ID) Values (cast(@piece as uniqueidentifier)) END set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '') set @pos = charindex(@RowSeparator, @RawData) END RETURN END GO
Sometimes you may have to parse out multiple columns per logical item/row in the source string. You have to define a set number of columns on the output table. My generic version of this function returns a table of 3 string columns. For specific purposes you may want to create a copy of this with different return types or a different number of returned columns. The code logic would be just about the same no matter the number of or types of columns, except for the casts would vary as in the code samples above.
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_ConvertNameValueUStringToTable]') and xtype in (N'FN', N'IF', N'TF')) drop function [dbo].[XQS_ConvertNameValueUStringToTable] GO /* TESTS ... select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1|42;2|43;1|44', ';', '|') select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1|42|x;2|43;1|44|y', ';', '|') select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1;42|-|2;43|-|1;44', '|-|', ';') select * from [dbo].[XQS_ConvertNameValueUStringToTable]('1;42;x|-|2;43|-|1;44;y', '|-|', ';') */ create FUNCTION [dbo].[XQS_ConvertNameValueUStringToTable](@RawData nvarchar(max), @RowSeparator nvarchar(10), @ColSeparator nvarchar(10)) RETURNS @T TABLE (Col1 nvarchar(128), Col2 nvarchar(1024) NULL, Col3 nvarchar(1024) NULL) AS BEGIN if (len(rtrim(ltrim(@RawData))) < len(@RowSeparator)) BEGIN RETURN END declare @pos int, @piece nvarchar(256) declare @subpos int, @subpos2 int declare @pieceCol1 nvarchar(128), @pieceCol2 nvarchar(1024), @pieceCol3 nvarchar(1024) -- Need to tack a delimiter onto the end of the input string if one doesn't exist if right(rtrim(@RawData),len(@RowSeparator)) <> @RowSeparator set @RawData = @RawData + @RowSeparator set @pos = charindex(@RowSeparator, @RawData) while @pos <> 0 BEGIN if (@pos > 1) BEGIN set @piece = left(@RawData, @pos - 1) set @subpos = charindex(@ColSeparator, @piece) set @pieceCol1 = left(@piece, @subpos - 1) set @subpos2 = charindex(@ColSeparator, @piece, @subpos+len(@ColSeparator)) if (@subpos2 > @subpos) BEGIN set @pieceCol2=substring(@piece, @subpos+len(@ColSeparator), @subpos2 - @subpos - len(@ColSeparator)) set @pieceCol3=substring(@piece, @subpos2+len(@ColSeparator), len(@piece)-@subpos2) END ELSE BEGIN set @pieceCol2=substring(@piece, @subpos+len(@ColSeparator), len(@piece)-@subpos) set @pieceCol3=null END insert into @T(Col1, Col2, Col3) Values (@pieceCol1, @pieceCol2, @pieceCol3) END set @RawData = stuff(@RawData, 1, @pos + len(@RowSeparator)-1, '') set @pos = charindex(@RowSeparator, @RawData) END RETURN END GO
This is a variation of the previous sample, except with integer types for the three columns. Just substitute the casts and delcared variable types per column as needed in your case.
if exists (select * from dbo.sysobjects where id = object_id(N'[dbo].[XQS_SCR_ConvertGroupAccessUStringToTable]') and xtype in (N'FN', N'IF', N'TF')) drop function [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable] GO create FUNCTION [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable](@GroupAccess nvarchar(max)) RETURNS @T TABLE (AccessType int, GroupID int, PermissionID int NULL) AS BEGIN /* each access record separated by semi-colon (;). Each column in each record separated by a pipe (|) */ /* The permissionID 'column' is optional. */ /* @GroupAccess Sample: '1|42;2|43;1|44' (accessType|groupID|permissionID) */ /* OR: '1|42|23;2|43|23;1|44|23' */ /* TESTS ... select * from [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable]('1|42;2|43;1|44') select * from [dbo].[XQS_SCR_ConvertGroupAccessUStringToTable]('1|42|23;2|43|23;1|44|23') */ declare @pos int, @piece nvarchar(256) declare @subpos int, @subpos2 int declare @pieceAccessType nvarchar(64), @accessType int declare @pieceGroupID nvarchar(64), @groupID int declare @piecePermissionID nvarchar(64), @permissionID int if (len(rtrim(ltrim(@GroupAccess))) ';' set @GroupAccess = @GroupAccess + ';' set @pos = charindex(';', @GroupAccess) while @pos <> 0 BEGIN if (@pos > 1) BEGIN set @piece = left(@GroupAccess, @pos - 1) set @subpos = charindex('|', @piece) set @pieceAccessType = left(@piece, @subpos - 1) set @accessType = cast(@pieceAccessType as int) set @subpos2 = charindex('|', @piece, @subpos+1) if (@subpos2 > @subpos) BEGIN set @pieceGroupID=substring(@piece, @subpos+1, @subpos2 - @subpos - 1) set @piecePermissionID=substring(@piece, @subpos2+1, len(@piece)-@subpos2) set @groupID = cast(@pieceGroupID as int) set @permissionID = cast(@piecePermissionID as int) END ELSE BEGIN set @pieceGroupID=substring(@piece, @subpos+1, len(@piece)-@subpos) set @groupID = cast(@pieceGroupID as int) set @permissionID = null END insert into @T(AccessType, GroupID, PermissionID) Values (@accessType, @groupID, @permissionID) END set @GroupAccess = stuff(@GroupAccess, 1, @pos, '') set @pos = charindex(';', @GroupAccess) END RETURN END GO
Good point. I should have clarified when to use these or when not to. I would not use these as part of a calculation in a where clause, nor as part of a calculated column.
I have used these as a means to pass in associations to child records of a stored procedure. For instance, say we have a post table, tag table, and an association table between for a many to many between posts and tags. I would also store a varchar on the post table with the passed in delimited string of tags. When saving a post i would parse the tag ids to associate and save each. This is betterthan a separate db connection per tag to associate. On returning a post or list of posts I just return the columns on the table. I can load the names for each tag id from a cache in the app code.
Edit: I have not used this in a batch job. Where i have used these they were not a performance issue. But thanks for the links. I read it over once and it looks promising. I’ll read it over more closely later.
Assuming you are on SQL 2008 or above the things you describe are much better done with Table Valued Parameters.
Did you actually try these on a reasonably sized table? They are unbelievably poorly performing. See here (http://www.sqlservercentral.com/articles/Tally+Table/72993/) and here (http://www.sqlservercentral.com/Forums/Topic1101315-203-1.aspx) for an article and discussions on MANY methods for splitting with an incredible array of benchmarking, etc. This type of splitter discussed here should just not be used.