T-SQL: Sorting Numeric Values That Are Stored As Strings

Following are some tricks that I have found useful for sorting numeric data that is stored as strings. Normally, string values of "2", "1", and "11" will sort as "1", "11", "2". The examples below will sort this sequence as "1", "2", "11" instead.

One caveat; I have not performed extensive performance testing or examined query plans to get an idea of the efficiency of these queries. Be sure to perform your own evaluations before using these in production environments.

For the first example, set up a sample table and some relatively clean test data. The data is "clean" because it is uncomplicated by leading zeros or spaces. For our sample table, there are two columns with numeric data stored as strings: Volume and Series.

CREATE TABLE #Book
(
BookID int identity(1,1) NOT NULL PRIMARY KEY,
Title nvarchar(200) NOT NULL DEFAULT(”),
Volume nvarchar(20) NOT NULL DEFAULT(”),
Series nvarchar(20) NOT NULL DEFAULT(”)
)

INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ’13’, ”)
INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ‘1’, ‘3’)
INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ‘1’, ‘new’)
INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ‘5’, ”)
INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ‘1’, ‘1’)
INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ‘4’, ”)

If you know that you have clean data such as this, this simple query will sort the string values in the Volume and Series columns as numerics.

SELECT        *
FROM        #Book
ORDER BY
        Title,
        RIGHT(SPACE(20) + Volume, 20),
        RIGHT(SPACE(20) + Series, 20)

As you can see, this was achieved by padding each column with its maximum length (20 characters) in spaces, and then sorting by the rightmost 20 characters of each value. Here are the results:

BookID   Title                    Volume   Series
5        Journal of Sloths        1        1
2        Journal of Sloths        1        3
3        Journal of Sloths        1        new
6        Journal of Sloths        4       
4        Journal of Sloths        5       
1        Journal of Sloths        13

For the next example, add some ‘dirty’ data to the Volume column. This new data is numeric values with leading zeros.

INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ‘012’, ”)
INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ’02’, ”)

The following query builds on the first example and handles the dirty data by removing the leading zeros. This is accomplished by converting the values to numeric and then back to nvarchar during the sort operation, like so:

SELECT        *
FROM        #Book
ORDER BY
Title,
        RIGHT(SPACE(20) + CONVERT(nvarchar(20), CONVERT(int, Volume)), 20),
        RIGHT(SPACE(20) + Series, 20)

The results:

BookID   Title                    Volume   Series
5        Journal of Sloths        1        1
2        Journal of Sloths        1        3
3        Journal of Sloths        1        new
8        Journal of Sloths        02       
6        Journal of Sloths        4       
4        Journal of Sloths        5       
7        Journal of Sloths        012       
1        Journal of Sloths        13       

Now add some ‘dirty’ data to the Series column.

INSERT #Book (Title, Volume, Series) VALUES (‘Journal of Sloths’, ‘1’, ’02’)

The prevoius method of converting the Series column to a numeric value and then back to nvarchar is shown here.

SELECT        *
FROM        #Book
ORDER BY
        Title,
        RIGHT(SPACE(20) + CONVERT(nvarchar(20), CONVERT(int, Volume)), 20),
        RIGHT(SPACE(20) + CONVERT(nvarchar(20), CONVERT(int, Series)), 20)

However, in this case, the result is the following error: "Conversion failed when converting the nvarchar value ‘new’ to data type int." This happens because the Series column includes a value of "new" in addition to all of the numeric values. The value "new" cannot be converted to a numeric, so the result is an error.

The solution is to add CASE statements to the ORDER BY clause so that conversions to numeric are only performed if the values are actually numeric. The value is not numeric it is sorted as-is. Here is the final query.

SELECT        *
FROM        #Book
ORDER BY
        Title,
        RIGHT(SPACE(20) +
                CASE WHEN ISNUMERIC(Volume) = 1 THEN CONVERT(nvarchar(20),
                CONVERT(int, Volume)) ELSE Volume END, 20),
        RIGHT(SPACE(20) +
                CASE WHEN ISNUMERIC(Series) = 1 THEN CONVERT(nvarchar(20),
                CONVERT(int, Series)) ELSE Series END, 20)

And the final set of results:

BookID   Title                    Volume   Series
5        Journal of Sloths        1        1
9        Journal of Sloths        1        02
2        Journal of Sloths        1        3
3        Journal of Sloths        1        new
8        Journal of Sloths        02       
6        Journal of Sloths        4       
4        Journal of Sloths        5       
7        Journal of Sloths        012       
1        Journal of Sloths        13       

Obviously, the more logic that is added to the ORDER BY clause, the more performance will be affected. Choose the simplest query that satisfies the requirements imposed by your data, and always remember to evaluate the performance under real-world conditions.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: