Failure detection - Ruby SDK
This page shows how to do the following:
- Raise and Handle Exceptions
- Deliberately Fail Workflows
- Set Workflow timeouts
- Set Workflow retries
- Set Activity timeouts
- Set Activity Retry Policy
- Heartbeat an Activity
- Set Heartbeat timeouts
Raise and Handle Exceptions
In each Temporal SDK, error handling is implemented idiomatically, following the conventions of the language.
Temporal uses several different error classes internally — for example, CancelledError
in the Ruby SDK, to handle a Workflow cancellation.
You should not raise or otherwise implement these manually, as they are tied to Temporal platform logic.
The one Temporal error class that you will typically raise deliberately is ApplicationError
.
In fact, any other exceptions that are raised from your Ruby code in a Temporal Activity will be converted to an ApplicationError
internally.
This way, an error's type, severity, and any additional details can be sent to the Temporal Service, indexed by the Web UI, and even serialized across language boundaries.
In other words, these two code samples do the same thing:
class MyError < StandardError
end
class SomethingThatFails < Temporalio::Activity::Definition
def execute(details)
Temporalio::Activity::Context.current.logger.info(
"We have a problem."
)
raise MyError.new('Simulated failure')
end
end
class SomethingThatFails < Temporalio::Activity::Definition
def execute(details)
Temporalio::Activity::Context.current.logger.info(
"We have a problem."
)
raise Temporalio::Error::ApplicationError.new('Simulated failure', type: 'MyError')
end
end
Depending on your implementation, you may decide to use either method.
One reason to use the Temporal ApplicationError
class is because it allows you to set an additional non_retryable
parameter.
This way, you can decide whether an error should not be retried automatically by Temporal.
This can be useful for deliberately failing a Workflow due to bad input data, rather than waiting for a timeout to elapse:
class SomethingThatFails < Temporalio::Activity::Definition
def execute(details)
Temporalio::Activity::Context.current.logger.info(
"We have a problem."
)
raise Temporalio::Error::ApplicationError.new('Simulated failure', non_retryable: true)
end
end
You can alternately specify a list of errors that are non-retryable in your Activity Retry Policy.
Failing Workflows
One of the core design principles of Temporal is that an Activity Failure will never directly cause a Workflow Failure — a Workflow should never return as Failed unless deliberately.
The default retry policy associated with Temporal Activities is to retry them until reaching a certain timeout threshold.
Activities will not actually return a failure to your Workflow until this condition or another non-retryable condition is met.
At this point, you can decide how to handle an error returned by your Activity the way you would in any other program.
For example, you could implement a Saga Pattern that uses rescue
blocks to "unwind" some of the steps your Workflow has performed up to the point of Activity Failure.
You will only fail a Workflow by manually raising an ApplicationError
from the Workflow code.
You could do this in response to an Activity Failure, if the failure of that Activity means that your Workflow should not continue:
class SagaWorkflow < Temporalio::Workflow::Definition
def execute(details)
Temporalio::Workflow.execute_activity(Activities::SomethingThatFails, details,start_to_close_timeout: 30)
rescue StandardError
raise Temporalio::Error::ApplicationError.new('Fail the Workflow')
This works differently in a Workflow than raising exceptions from Activities.
In an Activity, any Ruby exceptions or custom exceptions are converted to a Temporal ApplicationError
.
In a Workflow, any exceptions that are raised other than an explicit Temporal ApplicationError
will only fail that particular Workflow Task and be retried.
This includes any typical Ruby RuntimeError
s that are raised automatically.
These errors are treated as bugs that can be corrected with a fixed deployment, rather than a reason for a Temporal Workflow Execution to return unexpectedly.
Workflow timeouts
Each Workflow timeout controls the maximum duration of a different aspect of a Workflow Execution.
- Workflow Execution Timeout: Limits how long the full Workflow Execution can run.
- Workflow Run Timeout: Limits the duration of an individual run of a Workflow Execution.
- Workflow Task Timeout: Limits the time allowed for a Worker to process a Workflow Task.
Set these values as keyword parameter options when starting a Workflow.
result = my_client.execute_workflow(
MyWorkflow, 'some-input',
id: 'my-workflow-id', task_queue: 'my-task-queue',
execution_timeout: 5 * 60
)
Workflow retries
A Retry Policy can work in cooperation with the timeouts to provide fine controls to optimize the execution experience.
Use a Retry Policy to automatically retry Workflow Executions on failure. Workflow Executions do not retry by default.
Workflow Executions do not retry by default, and Retry Policies should be used with Workflow Executions only in certain situations.
The retry_policy
can be set when calling start_workflow
or execute_workflow
.
result = my_client.execute_workflow(
MyWorkflow, 'some-input',
id: 'my-workflow-id', task_queue: 'my-task-queue',
retry_policy: Temporalio::RetryPolicy.new(max_interval: 10)
)
Activity timeouts
Each Activity Timeout controls a different aspect of how long an Activity Execution can take:
At least one of start_to_close_timeout
or schedule_to_close_timeout
is required.
Temporalio::Workflow.execute_activity(
MyActivity,
{ greeting: 'Hello', name: },
start_to_close_timeout: 5 * 60
)
Activity Retry Policy
By default, Activities use a system Retry Policy. You can override it by specifying a custom Retry Policy.
To create an Activity Retry Policy in Ruby, set the retry_policy
parameter when executing an activity.
Temporalio::Workflow.execute_activity(
MyActivity,
{ greeting: 'Hello', name: },
start_to_close_timeout: 5 * 60,
retry_policy: Temporalio::RetryPolicy.new(max_interval: 10)
)
Override the retry interval with next_retry_delay
If you raise an application-level error, you can override the Retry Policy's delay by specifying a new delay.
raise Temporalio::ApplicationError.new(
'Some error',
type: 'SomeErrorType',
next_retry_delay: 3 * Temporalio::Activity::Context.current.info.attempt
)
Heartbeat an Activity
A Heartbeat is a periodic signal from the Worker to the Temporal Service indicating the Activity is still alive and making progress.
- Heartbeats are used to detect Worker failure.
- Cancellations are delivered via Heartbeats.
- Heartbeats may contain custom progress details.
class MyActivity < Temporalio::Activity::Definition
def execute
# This is a naive loop simulating work, but similar heartbeat logic
# applies to other scenarios as well
loop do
# Send heartbeat
Temporalio::Activity::Context.current.heartbeat
# Sleep before heartbeating again
sleep(3)
end
end
end
Heartbeat Timeout
The Heartbeat Timeout sets the maximum duration between Heartbeats before the Temporal Service considers the Activity failed.
Temporalio::Workflow.execute_activity(
MyActivity,
{ greeting: 'Hello', name: },
start_to_close_timeout: 5 * 60,
heartbeat_timeout: 5
)