For the distinct values, it’s jsut a call to the ndv functions. Since a CTAS reads all the rows anyways, this should be straightforward to calculate on the fly.

For histograms, there may be additional work needed to first determine what buckets to use and then populate them. Nonetheless, calculating this during the CTAS avoids at least one pass through the data, so should certainly save I/O.

]]>Other than histograms, the significant metrics for the table and columns can be calculated with analytic functions as part of the select, and a multitable insert could be constructed to insert table data into the target table and the calculated statistics into another table (one row, of course). You could then use the statistical data as the basis for setting statistics for the table and columns.

It’d be pretty straightforward for the table I expect, and a pain for the columns.

However, one reason why it might not be worth doing is that estimating statistics can often be both very fast and very accurate, particularly when a table has just been created and you don’t need to worry about the variability in the number of rows per block that comes about from normal delete operations. Block-based low percentage estimation is worth a look.

]]>Same for distinct values, unless the new method is used, which uses an advanced hashing technique to quite accurately approximate the number of distinct values.

]]>My feature wish list include Read-only partitions in a table.

For now, in my data warehouse, I assign partitions to their own tablespaces, so that I could make those tablespaces read only. It would be great to mark partitions read only, and RMAN should also recognize that to improve backup and recovery.

regards

]]>