Update Statistics Informix

Introduction

The optimizer uses the statistics gathered to determine the best access path of all queries. These are especially important when joining multiple tables in a query. Statistics should be regenerated any time there is significant activity in the database or a table.
Statistics can be generated at three levels for tables: LOW, MEDIUM, and HIGH.

Information collected by UPDATE STATISTICS

There are two types of data the optimizer will make use of when generating a query plan.

  1. Statistics: Statistics are generated any time UPDATE STATISTICS in low mode is run, or when UPDATE STATISTICS medium or high mode is run and the phrase "distributions only" is omitted. The statistical information is stored in the system catalog tables.
  2. Data distributions: Data distributions provide more sophisticated information to the optimizer. Distributions are built any time the command UPDATE STATISTICS medium or UPDATE STATISTICS high is executed. Both statistics and data distributions are not collected in real time, but are only refreshed when the UPDATE STATISTICS command is executed. The difference between UPDATE STATISTICS high and medium is the number of rows sampled: high samples the entire table, while medium samples only a subset of rows.

Statistics are generated by scanning tables and indexes. When walking an index, the entire leaf level is walked. While walking the leaf pages, if an index item is marked for deletion, it will be submitted to the b-tree cleaner. This ensures the indexes remain balanced and compact. After the scan of the table or index is complete, the data collected is inserted in the appropriate system tables.

Performance and usability improvements in UPDATE STATISTICS

The major areas of improvement in UPDATE STATISTICS are:

  • Improved parallelism.
  • Increased I/O throughput.
  • Plan information for the DBA.

Tuning UPDATE STATISTICS to run more efficiently.

In order to improve the speed at which data distributions are built it is imperative to tune the sorting of data, especially on large tables.

  1. First way: If you want to change the default sort memory, use the environment variable DBUPSPACE
  2. Second way: The second way to improve the sort speed is by giving UPDATE STATISTICS more than 50 MB of memory to use. You can do this by by setting the PDQPRIORITY for UPDATE STATISTICS, allowing UPDATE STATISTICS to use the Memory Grant Manager (MGM) memory that you have configured.
  3. The third way of improving the sort speed of UPDATE STATISTICS is to take advantage of the parallel sort package. To accomplish this, ensure the environment variables PSORT_NPROCS and DBSPACETEMP are properly configured for your computer.
Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-Share Alike 2.5 License.