Skip to main content

Datometry Documentation

Specify Statistics Collection Sampling Percentage

You can specify the Teradata statistics collection sampling on Azure Synapse using the Hyper-Q hint default_stats_sample.

Hyper-Q translates the Teradata COLLECT STATISTICS command to CREATE STATISTICS and UPDATE STATISTICS on Azure Synapse. There is a significant difference between Teradata and Azure Synapse statistics collection. To accommodate these differences, Hyper-Q observes the following behavior when creating or updating statistics.

  • When updating statistics with no specified sampling options, Hyper-Q uses the Teradata RESAMPLE option.

  • When creating or updating statistics with an explicit sampling percentage, Hyper-Q will use that percentage.

  • When updating statistics with either the SAMPLE or SYSTEM SAMPLE options, but no specified percentage, Hyper-Q uses the percentage specified using the default_stats_sample hint.

  • When creating statistics with no options specified, or with either the SAMPLE or SYSTEM SAMPLE options, but no specified percentage, Hyper-Q uses the percentage specified using the default_stats_sample hint.

  • Hyper-Q respects the percentage specified for USING SAMPLE n PERCENT

To perform statistics collection using sampling, the Teradata COLLECT STATISTICS USING command uses one of the following options: SAMPLE, SYSTEM SAMPLE, SAMPLE n PERCENT, and NO SAMPLE. If specifying a percentage, Hyper-Q uses SAMPLE n PERCENT.

When creating statistics for the first time, Hyper-Q uses the sampling percentage specified using the default_stats_sample hint. If not specified, the default value is 20 percent.

The syntax for default_stats_sample is:

default_stats_sample = sampling_percentage

where:

Parameter

Description

sampling_percentage

The percentage of statistics sampling you specify determines if the Teradata COLLECT STATISTICS command resamples or does a full scan of the specified table. If you specify 100, Hyper-Q does a full scan. If not specified, the default value is 20 percent.

Hint Scope

The default_stats_sample hint can be specified at the request, session, and global level. To learn more about Hyper-Q SQL hints, and how to use them within a SQL query, see About Hyper-Q SQL Hints.

Example 40. Specify Statistics Collection Sampling as a Global Setting

You can specify default_stats_sample as a global setting in the Hyper-Q configuration file (dtm.ini). This example species a sampling of 5 percent.

"gateway".default_stats_sample=5

To learn more about specifying a hint as a gateway option in the Hyper-Q configuration file, see Gateway Configuration Parameters.



Example 41. Specify Statistics Creation Using Sampling as a Query Hint

You can specify default_stats_sample as a query hint at either the session or request level. The following example species a sampling of 10 percent.

COLLECT STATISTICS 
-- @dtm_default_stats_sample=10 for session
COLUMN (column_name) AS statistics_name on table_name;

Hyper-Q translates the above Teradata COLLECT STATISTICS statement as the following CREATE STATISTICS on Azure Synapse.

CREATE STATISTICS statistics_name
ON schema_name.table_name ([column_name] ) 
WITH SAMPLE 10 PERCENT 

To learn more about Hyper-Q SQL hints, and how to use them within a SQL query, see About Hyper-Q SQL Hints.