Thursday, January 17, 2008

On REVERSing comma-separated set of words

Here is a task [beware it's in Russian :-)]. There is a given string: '5,Z,0,q,1,b,A,Z' we have to get 'Z,A,b,1,q,0,Z,5' as a result of some SQL statement, in other words we have to put the coma separated set of words in the reversed order. The author of the question proposed to use REGEXP to solve it.

There are two options. One is simple:

SQL> COLUMN o FORMAT A30
SQL> COLUMN r FORMAT A30
SQL> VAR s VARCHAR2(30)
SQL> EXEC :s := '5,Z,0,q,1,b,A,Z';

PL/SQL procedure successfully completed.

SQL> 
SQL>  SELECT :s o
  2        , SUBSTR(SYS_CONNECT_BY_PATH(w, ','), 2) r
  3     FROM (
  4           SELECT REGEXP_INSTR(:s, '([^,])+', 1, LEVEL) p
  5                , LEVEL l
  6                , REGEXP_SUBSTR(:s, '([^,])+', 1, LEVEL) w
  7             FROM dual
  8          CONNECT BY REGEXP_SUBSTR(:s, '([^,])+', 1, LEVEL) IS NOT NULL
  9          )
 10    WHERE CONNECT_BY_ISLEAF = 1
 11    START WITH REGEXP_INSTR(:s, ',([^,])+$', 1) + 1 = p
 12  CONNECT BY PRIOR l = l + 1
 13  /

O                              R
------------------------------ ------------------------------
5,Z,0,q,1,b,A,Z                Z,A,b,1,q,0,Z,5


Of course instead of using REGEXP one can try to solve the same task using SUBSTR/INSTR functions.

The second is is a bit trickier. It uses an undocumented but a well known REVERSE function. The REVERSE function is used for reverse indexes, basically it reverses the byte order of a passed value. However, it won't work correctly for the multibyte character sets. So, to solve that limitation we have to ensure that the character values passed to the reverse function aren't multibyte ones. For that we use ASCIISTR and UNISTR functions to get the strings converted back and forth.

SQL> EXEC :s := 'é,ô,ÿ';

PL/SQL procedure successfully completed.

SQL> COLUMN dump_ok FORMAT A40
SQL> COLUMN dump_bad FORMAT A40
SQL> COLUMN ok FORMAT A10
SQL> COLUMN bad FORMAT A10
SQL> COLUMN o FORMAT A10
SQL> SELECT :s o
  2       , ok
  3       , DUMP(ok, 16) dump_ok
  4       , bad
  5       , DUMP(bad, 16) dump_bad
  6    FROM (
  7         SELECT TRANSLATE(
  8                  RTRIM(
  9                    UNISTR(
 10                      REVERSE(
 11                        SYS_CONNECT_BY_PATH(
 12                          REVERSE(
 13                            ASCIISTR(
 14                              REGEXP_SUBSTR(:s, '([^,])+', 1, LEVEL)
 15                            )
 16                          )
 17                        , ','
 18                        )
 19                      )
 20                    )
 21                  , ','
 22                  )
 23                  USING CHAR_CS
 24                ) ok
 25              , RTRIM(
 26                  REVERSE(
 27                    SYS_CONNECT_BY_PATH(
 28                      REVERSE(
 29                         REGEXP_SUBSTR(:s, '([^,])+', 1, LEVEL)
 30                      )
 31                    , ','
 32                    )
 33                  )
 34                , ','
 35                ) bad
 36           FROM dual
 37          WHERE CONNECT_BY_ISLEAF = 1
 38        CONNECT BY REGEXP_SUBSTR(:s, '([^,])+', 1, LEVEL) IS NOT NULL
 39         )
 40  /

O          OK         DUMP_OK                                  BAD        DUMP_BAD
---------- ---------- ---------------------------------------- ---------- --------------------------
é,ô,ÿ      ÿ,ô,é      Typ=1 Len=8: c3,bf,2c,c3,b4,2c,c3,a9     ¿,¿,¿      Typ=1 Len=5: bf,2c,b4,2c,a9

SQL> SELECT DUMP(:s, 16) FROM dual;

DUMP(:S,16)
------------------------------------
Typ=1 Len=8: c3,a9,2c,c3,b4,2c,c3,bf


So, as you can see without appropriate conversion, we get bad "characters" in the string.

Additionally one has to make sure that the difference between the character set and national character set is understood. That can be done using TRANSLATE ... USING function.

Below is an output on nls settings and version used:

SQL> SELECT parameter, value FROM nls_database_parameters WHERE parameter LIKE '%CHARACTERSET%';

PARAMETER                      VALUE
------------------------------ ------------------------------
NLS_NCHAR_CHARACTERSET         AL16UTF16
NLS_CHARACTERSET               AL32UTF8

SQL> SELECT * FROM v$version WHERE ROWNUM = 1;

BANNER
----------------------------------------------------------------
Oracle Database 10g Enterprise Edition Release 10.2.0.3.0 - Prod

No comments: