Failure detection - Ruby SDK

This page shows how to do the following:

Raise and Handle Exceptions
Deliberately Fail Workflows
Set Workflow timeouts
Set Workflow retries
Set Activity timeouts
Set Activity Retry Policy
Heartbeat an Activity
Set Heartbeat timeouts

Raise and Handle Exceptions

In each Temporal SDK, error handling is implemented idiomatically, following the conventions of the language. Temporal uses several different error classes internally — for example, CancelledError in the Ruby SDK, to handle a Workflow cancellation. You should not raise or otherwise implement these manually, as they are tied to Temporal platform logic.

The one Temporal error class that you will typically raise deliberately is ApplicationError. In fact, any other exceptions that are raised from your Ruby code in a Temporal Activity will be converted to an ApplicationError internally. This way, an error's type, severity, and any additional details can be sent to the Temporal Service, indexed by the Web UI, and even serialized across language boundaries.

In other words, these two code samples do the same thing:

class MyError < StandardError
end

class SomethingThatFails < Temporalio::Activity::Definition
  def execute(details)
    Temporalio::Activity::Context.current.logger.info(
      "We have a problem."
    )
    raise MyError.new('Simulated failure')
  end
end

class SomethingThatFails < Temporalio::Activity::Definition
  def execute(details)
    Temporalio::Activity::Context.current.logger.info(
      "We have a problem."
    )
    raise Temporalio::Error::ApplicationError.new('Simulated failure', type: 'MyError')
  end
end

Depending on your implementation, you may decide to use either method. One reason to use the Temporal ApplicationError class is because it allows you to set an additional non_retryable parameter. This way, you can decide whether an error should not be retried automatically by Temporal. This can be useful for deliberately failing a Workflow due to bad input data, rather than waiting for a timeout to elapse:

class SomethingThatFails < Temporalio::Activity::Definition
  def execute(details)
    Temporalio::Activity::Context.current.logger.info(
      "We have a problem."
    )
    raise Temporalio::Error::ApplicationError.new('Simulated failure', non_retryable: true)
  end
end

You can alternately specify a list of errors that are non-retryable in your Activity Retry Policy.

Failing Workflows

One of the core design principles of Temporal is that an Activity Failure will never directly cause a Workflow Failure — a Workflow should never return as Failed unless deliberately. The default retry policy associated with Temporal Activities is to retry them until reaching a certain timeout threshold. Activities will not actually return a failure to your Workflow until this condition or another non-retryable condition is met. At this point, you can decide how to handle an error returned by your Activity the way you would in any other program. For example, you could implement a Saga Pattern that uses rescue blocks to "unwind" some of the steps your Workflow has performed up to the point of Activity Failure.

You will only fail a Workflow by manually raising an ApplicationError from the Workflow code. You could do this in response to an Activity Failure, if the failure of that Activity means that your Workflow should not continue:

class SagaWorkflow < Temporalio::Workflow::Definition
  def execute(details)
    Temporalio::Workflow.execute_activity(Activities::SomethingThatFails, details,start_to_close_timeout: 30)
  rescue StandardError
    raise Temporalio::Error::ApplicationError.new('Fail the Workflow')

This works differently in a Workflow than raising exceptions from Activities. In an Activity, any Ruby exceptions or custom exceptions are converted to a Temporal ApplicationError. In a Workflow, any exceptions that are raised other than an explicit Temporal ApplicationError will only fail that particular Workflow Task and be retried. This includes any typical Ruby RuntimeErrors that are raised automatically. These errors are treated as bugs that can be corrected with a fixed deployment, rather than a reason for a Temporal Workflow Execution to return unexpectedly.

Workflow timeouts

Each Workflow timeout controls the maximum duration of a different aspect of a Workflow Execution.

Workflow Execution Timeout: Limits how long the full Workflow Execution can run.
Workflow Run Timeout: Limits the duration of an individual run of a Workflow Execution.
Workflow Task Timeout: Limits the time allowed for a Worker to process a Workflow Task.

Set these values as keyword parameter options when starting a Workflow.

result = my_client.execute_workflow(
  MyWorkflow, 'some-input',
  id: 'my-workflow-id', task_queue: 'my-task-queue',
  execution_timeout: 5 * 60
)

Workflow retries

A Retry Policy can work in cooperation with the timeouts to provide fine controls to optimize the execution experience.

Use a Retry Policy to automatically retry Workflow Executions on failure. Workflow Executions do not retry by default.

Workflow Executions do not retry by default, and Retry Policies should be used with Workflow Executions only in certain situations.

The retry_policy can be set when calling start_workflow or execute_workflow.

result = my_client.execute_workflow(
  MyWorkflow, 'some-input',
  id: 'my-workflow-id', task_queue: 'my-task-queue',
  retry_policy: Temporalio::RetryPolicy.new(max_interval: 10)
)

Activity timeouts

Each Activity Timeout controls a different aspect of how long an Activity Execution can take:

At least one of start_to_close_timeout or schedule_to_close_timeout is required.

Temporalio::Workflow.execute_activity(
  MyActivity,
  { greeting: 'Hello', name: },
  start_to_close_timeout: 5 * 60
)

Activity Retry Policy

By default, Activities use a system Retry Policy. You can override it by specifying a custom Retry Policy.

To create an Activity Retry Policy in Ruby, set the retry_policy parameter when executing an activity.

Temporalio::Workflow.execute_activity(
  MyActivity,
  { greeting: 'Hello', name: },
  start_to_close_timeout: 5 * 60,
  retry_policy: Temporalio::RetryPolicy.new(max_interval: 10)
)

Override the retry interval with `next_retry_delay`

If you raise an application-level error, you can override the Retry Policy's delay by specifying a new delay.

raise Temporalio::ApplicationError.new(
  'Some error',
  type: 'SomeErrorType',
  next_retry_delay: 3 * Temporalio::Activity::Context.current.info.attempt
)

Heartbeat an Activity

A Heartbeat is a periodic signal from the Worker to the Temporal Service indicating the Activity is still alive and making progress.

Heartbeats are used to detect Worker failure.
Cancellations are delivered via Heartbeats.
Heartbeats may contain custom progress details.

class MyActivity < Temporalio::Activity::Definition
  def execute
    # This is a naive loop simulating work, but similar heartbeat logic
    # applies to other scenarios as well
    loop do
      # Send heartbeat
      Temporalio::Activity::Context.current.heartbeat
      # Sleep before heartbeating again
      sleep(3)
    end
  end
end

Heartbeat Timeout

The Heartbeat Timeout sets the maximum duration between Heartbeats before the Temporal Service considers the Activity failed.

Temporalio::Workflow.execute_activity(
  MyActivity,
  { greeting: 'Hello', name: },
  start_to_close_timeout: 5 * 60,
  heartbeat_timeout: 5
)

Raise and Handle Exceptions​

Failing Workflows​

Workflow timeouts​

Workflow retries​

Activity timeouts​

Activity Retry Policy​

Override the retry interval with next_retry_delay​

Heartbeat an Activity​

Heartbeat Timeout​