Skip to content

[Data] - Actor retry if there's a failure in __init__#59105

Merged
alexeykudinkin merged 8 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/actor_init_retries
Dec 5, 2025
Merged

[Data] - Actor retry if there's a failure in __init__#59105
alexeykudinkin merged 8 commits intoray-project:masterfrom
goutamvenkat-anyscale:goutam/actor_init_retries

Conversation

@goutamvenkat-anyscale
Copy link
Contributor

Description

Ray Core doesn't restart till actors if __init__ fails. So Ray Data has to manually retry actor init failures

Related issues

Link related issues: "Fixes #1234", "Closes #1234", or "Related to #1234".

Additional information

Optional: Add implementation details, API changes, usage examples, screenshots, etc.

@goutamvenkat-anyscale goutamvenkat-anyscale requested a review from a team as a code owner December 2, 2025 04:54
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - Actor init max retries [Data] - Actor init retry if there's a failure Dec 2, 2025
@goutamvenkat-anyscale goutamvenkat-anyscale added data Ray Data-related issues go add ONLY when ready to merge, run all tests labels Dec 2, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a valuable feature for handling actor initialization failures by adding a configurable retry mechanism. The implementation is clear, and the accompanying tests effectively validate the new logic. I have a couple of suggestions to enhance robustness and clarity.

Signed-off-by: Goutam <[email protected]>
Signed-off-by: Goutam <[email protected]>
@goutamvenkat-anyscale goutamvenkat-anyscale changed the title [Data] - Actor init retry if there's a failure [Data] - Actor retry if there's a failure in __init__ Dec 2, 2025
Signed-off-by: Goutam <[email protected]>
Comment on lines +600 to +601
last_exception = e
attempt += 1
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What i meant by prev comment was that we must log every exception

Signed-off-by: Goutam <[email protected]>
@alexeykudinkin alexeykudinkin enabled auto-merge (squash) December 5, 2025 18:29
@alexeykudinkin alexeykudinkin merged commit 0b94fee into ray-project:master Dec 5, 2025
7 checks passed
@goutamvenkat-anyscale goutamvenkat-anyscale deleted the goutam/actor_init_retries branch January 13, 2026 01:43
peterxcli pushed a commit to peterxcli/ray that referenced this pull request Feb 25, 2026
…59105)

## Description
Ray Core doesn't restart till actors if `__init__` fails. So Ray Data
has to manually retry actor init failures

## Related issues
> Link related issues: "Fixes ray-project#1234", "Closes ray-project#1234", or "Related to
ray-project#1234".

## Additional information
> Optional: Add implementation details, API changes, usage examples,
screenshots, etc.

---------

Signed-off-by: Goutam <[email protected]>
Signed-off-by: peterxcli <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Ray fails to serialize self-reference objects

2 participants