How to decipher oracle Internal datatype storage

What started out as an investigation into how the optimizer deals with predicates that are outside the known range of value became something else when I tried to determine just what Oracle believes low and high values of the range to be. I didn't expect to have anything to add to the topic, as it has been rather well covered; I just wanted to better understand it by creating a few examples that demonstrate what can happen. As of yet, I have not yet gotten that far. One of the first things I wanted to know for this is what Oracle believes the low and high values to be. These can be seen in both DBA_TAB_COLUMNS and DBA_TAB_COL_STATISTICS in the LOW_VALUE and HIGH_VALUE columns. The DBA_TAB_COL_STATISTICS view is preferred, as these columns are maintained in DBA_TAB_COLUMNS only for
backward compatibility with Oracle 7.
[code language="SQL"] SQL> desc dba_tab_col_statistics Name Null? Type ----------------- -------- ------------------------------------ OWNER VARCHAR2(128) TABLE_NAME VARCHAR2(128) COLUMN_NAME VARCHAR2(128) NUM_DISTINCT NUMBER LOW_VALUE RAW(1000) HIGH_VALUE RAW(1000) DENSITY NUMBER NUM_NULLS NUMBER NUM_BUCKETS NUMBER LAST_ANALYZED DATE SAMPLE_SIZE NUMBER GLOBAL_STATS VARCHAR2(3) USER_STATS VARCHAR2(3) NOTES VARCHAR2(63) AVG_COL_LEN NUMBER HISTOGRAM VARCHAR2(15) SCOPE VARCHAR2(7) [/code]
The LOW_VALUE and HIGH_VALUE values are stored as RAW, so they must be using Oracle's internal storage format for whichever datatype the column consists of. Oracle does supply conversion routines via the
DBMS_STATS package. These routines are deployed as procedures. As Oracle 12c allows using functions defined in a SQL statement these procedures can be used in queries written for a 12c database. Using the DBMS_STATS conversion procedure in databases < 12c requires creating functions so that the values may be returned to a SQL statement. While that method will work, it is often not desirable, and may not even be possible, particularly in a production database. When I say 'not even be possible' what I mean is
not that it cannot be done, but that doing so is probably
not allowed in many databases. To create a SQL statement that can show the high and low values, it will be necessary to use some other means. Let's start off by creating some data to work with.
[code language="SQL"] define chars='ABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789abcdefghijklmnopqrstuvwxyz' create table low_high as select id , mod(id,128) n1 , substr('&&chars',mod(id,42)+1, 20) c1 , sysdate-(mod(id,1000)+1) d1 from ( select level id from dual connect by level <= 128 * 1024 ) / exec dbms_stats.gather_table_stats(ownname => user, tabname => 'LOW_HIGH', method_opt => 'for all columns size auto') [/code]
Now that we have a table, let's take a look a the ranges of values. Note: I am using the _TAB_COLUMNS views for some queries just for simplification of the SQL for demonstration.[code language="SQL"] col low_value format a40 col high_value format a40 prompt prompt NUMERIC prompt select column_name, low_value, high_value from user_tab_columns where table_name = 'LOW_HIGH' and data_type = 'NUMBER' / prompt prompt VARCHAR2 prompt select column_name, low_value, high_value from user_tab_columns where table_name = 'LOW_HIGH' and data_type = 'VARCHAR2' / prompt prompt DATE prompt select column_name, low_value, high_value from user_tab_columns where table_name = 'LOW_HIGH' and data_type = 'DATE' / NUMERIC COLUMN LOW_VALUE HIGH_VALUE ------ ---------------------------------------- ---------------------------------------- ID C102 C30E0B49 N1 80 C2021C 2 rows selected. VARCHAR2 COLUMN LOW_VALUE HIGH_VALUE ------ ---------------------------------------- ---------------------------------------- C1 303132333435363738396162636465666768696A 666768696A6B6C6D6E6F70717273747576777879 1 row selected. DATE COLUMN LOW_VALUE HIGH_VALUE ------ ---------------------------------------- ---------------------------------------- D1 7871030D121C04 78730C07121C04 1 row selected. [/code] Clearly the values being stored for LOW_VALUE and HIGH_VALUE are of little use to us in their current format. What can we do? For the NUMBER and character data types (VARCHAR2, VARCHAR, CHAR) the package UTL_RAW can be used to get the actual values. Here is an example of converting some of these to a human readable format. [code language="SQL"] col low_value format 999999999999 col high_value format 999999999999 select column_name , utl_raw.cast_to_number(low_value) low_value , utl_raw.cast_to_number(high_value) high_value from user_tab_columns where table_name = 'LOW_HIGH' and data_type = 'NUMBER' / col low_value format a20 col high_value format a20 select column_name , utl_raw.cast_to_varchar2(low_value) low_value , utl_raw.cast_to_varchar2(high_value) high_value from user_tab_columns where table_name = 'LOW_HIGH' and data_type = 'VARCHAR2' / COLUMN LOW_VALUE HIGH_VALUE ------------------------------ ------------- ------------- N1 0 127 ID 1 131072 2 rows selected. COLUMN LOW_VALUE HIGH_VALUE ------------------------------ -------------------- -------------------- C1 0123456789abcdefghij fghijklmnopqrstuvwxy 1 row selected. [/code] These values can be verified, as shown here with the N1 column: [code language="SQL"] SQL> select min(n1), max(n1) from low_high; MIN(N1) MAX(N1) ---------- ---------- 0 127 1 row selected. [/code] So far I have done this only with these simple versions of these data types. Variations such as NVARCHAR2, BINARY_FLOAT and others may require different handling. What is missing? The DATE column has not yet been handled. Converting the raw date format to a readable date is not so straightforward as there does not seem to be any conversion function available for that (If you know of one, please write about it in the comments section of this article).