Generating Statistics
Use the ANALYZE TABLE command to generate statistics for tables and columns. Use the optional NoScan clause to improve query performance by preventing a scan of files on HDFS. This option gathers only the following statistics:
Number of files
Size of files in bytes
ANALYZE TABLE tablename [PARTITION(partcol1[=val1], partcol2[=val2], ...)] COMPUTE STATISTICS [NoScan];
The following example views statistics for all partitions in the employees table. The query also uses the NoScan clause to improve performance:
ANALYZE TABLE employees PARTITION (dt) COMPUTE STATISTICS [NoScan];
Generating Column-level Statistics:
Use the following syntax to generate statistics for columns in the employees table:
ANALYZE TABLE tablename [PARTITION(partcol1[1=val1], partcol2[=val2], ...)] COMPUTE STATISTICS FOR COLUMNS [NoScan];
The following example generates statistics for all columns in the employees table:
ANALYZE TABLE employees PARTITION (dt) COMPUTE STATISTICS FOR COLUMNS;
Tip | |
---|---|
See Using the Cost-Based Optimizer for Optimal Performance for more information and recommended settings. |