Key Default Type Description
cluster.io-pool.size
(none) Integer The size of the IO executor pool used by the cluster to execute blocking IO operations (Master as well as TaskManager processes). By default it will use 4 * the number of CPU cores (hardware contexts) that the cluster process has access to. Increasing the pool size allows to run more IO operations concurrently.
cluster.registration.error-delay
10000 Long The pause made after an registration attempt caused an exception (other than timeout) in milliseconds.
cluster.registration.initial-timeout
100 Long Initial registration timeout between cluster components in milliseconds.
cluster.registration.max-timeout
30000 Long Maximum registration timeout between cluster components in milliseconds.
cluster.registration.refused-registration-delay
30000 Long The pause made after the registration attempt was refused in milliseconds.
cluster.services.shutdown-timeout
30000 Long The shutdown timeout for cluster services like executors in milliseconds.
heartbeat.interval
10000 Long Time interval between heartbeat RPC requests from the sender to the receiver side.
heartbeat.rpc-failure-threshold
2 Integer The number of consecutive failed heartbeat RPCs until a heartbeat target is marked as unreachable. Failed heartbeat RPCs can be used to detect dead targets faster because they no longer receive the RPCs. The detection time is heartbeat.interval * heartbeat.rpc-failure-threshold. In environments with a flaky network, setting this value too low can produce false positives. In this case, we recommend to increase this value, but not higher than heartbeat.timeout / heartbeat.interval. The mechanism can be disabled by setting this option to -1
heartbeat.timeout
50000 Long Timeout for requesting and receiving heartbeats for both sender and receiver sides.
jobmanager.execution.failover-strategy
"region" String This option specifies how the job computation recovers from task failures. Accepted values are:
  • 'full': Restarts all tasks to recover the job.
  • 'region': Restarts all tasks that could be affected by the task failure. More details can be found here.