22 June 2010

Querying Extended ASCII Characters in SQL Server

Part of a project requires conversion of ADABAS to SQL Server.  ADABAS hearkens from the day when storage space was a precious commodity, so the "Packed" data types were invented.  These compress the values stored in the column, to maximize storage utilization.

When converting packed ADABAS File fields to SQL Server relational table columns, some of the packed data was not unpacked(?) correctly, resulting in some interesting characters appearing in SQL Server.  The entire data content of a field needs not be packed; ADABAS allows you to leave the first N characters unpacked, and then pack the remainder of the field, and other such options.

It was my job to find all records across the entire database (we're talking millions of records per table) that contain ASCII characters that do not appear on a standard, 108 key, US English, QWERTY keyboard.  Constructing a query that iterates through all tables and columns that are varchar data type is easy.  However, the SQL Management Studio query editors don't display extended ASCII characters.

The solution was pretty simple.  Cast a byte value to a character type, to specify the extended character ranges.

SELECT RecordID
FROM MyTable
WHERE ((patindex('%[' + char(0) + '-' + char(31) + ']%', ColumnName COLLATE Latin1_General_BIN2) <> 0)
      OR (patindex('%[' + char(127) + '-' + char(255) + ']%', ColumnName COLLATE Latin1_General_BIN2) <> 0))

This selects records from the table where the number of extended ASCII (key codes 0-31, and 127-255) characters in a specific column is not 0.

13 comments:

  1. Works great and processes very quickly too

    ReplyDelete
  2. could you please help me, as i am trying to achieve the same using oracle SQL

    ReplyDelete
  3. Also once we find out the rows satisfying the condition, i need to replace it with # character

    ReplyDelete
    Replies
    1. This should suit your needs. Replace "COLUMN" with the table column name, and then replace "0001" with the character code you wish to replace.

      SELECT TRANSLATE(COLUMN, UNISTR('\0001\'),'#')

      References:
      http://docs.oracle.com/cd/B19306_01/server.102/b14200/functions204.htm
      http://www.techonthenet.com/oracle/functions/translate.php

      Delete
    2. This comment has been removed by the author.

      Delete
    3. Thanks Mike for the suggestion.

      to be more precise, i am using Cognos BI tool where the SQL code will be runned.

      if we take the same example,

      SELECT RecordID
      FROM MyTable
      WHERE ((patindex('%[' + char(0) + '-' + char(31) + ']%', ColumnName COLLATE Latin1_General_BIN2) > 0)
      OR (patindex('%[' + char(127) + '-' + char(255) + ']%', ColumnName COLLATE Latin1_General_BIN2) > 0))

      Note Removed lt;

      the code is working perfectly as it is identifying the rows which has ascii characters (0-31) and (127-255).

      Now i need a separate column, where all the special characters should be replaced with '#' w.r.t Column

      Thanks in advance

      Delete
    4. could you please help in getting the new column.

      Delete
  4. Hi all,

    i'm new to SQL and working on SQL Server.

    i'm struck in getting the logic for selecting records from the table where there are extended ASCII (key codes 0-31, and 127-255) characters in a specific column and replacing all the extended ASCII (key codes 0-31, and 127-255) characters with '#' character.

    from above code, i am getting the rows which have extended ASCII (key codes 0-31, and 127-255) characters in a specific column.
    i need an another column, to replace the same.

    Could you please help in the task.

    Thanks in advance.

    ReplyDelete
    Replies
    1. can anyone give ideas to get the new column which is replacing all the extended ASCII (key codes 0-31, and 127-255) characters with '#' character.

      Delete
    2. Kirian, I didn't see your comment until today! I hope you found the help you needed. I would love to have provided assistance.

      Delete
    3. This comment has been removed by the author.

      Delete
  5. This is exactly what I needed. Thank You VERY MUCH!

    ReplyDelete

Please provide details, when posting technical comments. If you find an error in sample code or have found bad information/misinformation in a post, please e-mail me details, so I can make corrections as quickly as possible.