The SCHEDULE_RANDOM_REPLICA
query option fine-tunes the scheduling
algorithm for deciding which host processes each HDFS data block or Kudu tablet to reduce
the chance of CPU hotspots.
By default, Impala estimates how much work each host has done for the query, and selects
the host that has the lowest workload. This algorithm is intended to reduce CPU hotspots
arising when the same host is selected to process multiple data blocks / tablets. Use the
SCHEDULE_RANDOM_REPLICA
query option if hotspots still arise for some
combinations of queries and data layout.
The SCHEDULE_RANDOM_REPLICA
query option only applies to tables and
partitions that are not enabled for the HDFS caching.
Type: Boolean; recognized values are 1 and 0, or true
and
false
; any other value interpreted as false
Default: false
Added in: Impala 2.5.0
Related information:
Using HDFS Caching with Impala (Impala 2.1 or higher only), Avoiding CPU Hotspots for HDFS Cached Data , REPLICA_PREFERENCE Query Option (Impala 2.7 or higher only)