Fix attaching Replicated DBs when the interserver host changed after restarting#93779
Conversation
|
Workflow [PR], commit [0aa9484] Summary: ❌
|
There was a problem hiding this comment.
Pull request overview
This PR fixes an issue where Replicated databases fail to attach after a ClickHouse restart if the interserver host changed. The problem occurs because the host_id (which includes the host address) is stored in the replica_path in ZooKeeper, and a mismatch after restart causes an error. The solution is to update the host_id in ZooKeeper when the UUID matches but the host_id differs.
Changes:
- Added logic to parse and compare the UUID from the stored host_id in ZooKeeper
- When host_id mismatch occurs but UUID matches, update the host_id in ZooKeeper instead of throwing an error
- Added integration test to verify the fix works when interserver_http_host changes after database creation
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| src/Databases/DatabaseReplicated.cpp | Added parseHostID function to extract UUID from host_id string and logic to update ZooKeeper host_id when UUID matches but address changed |
| tests/integration/test_replicated_database_interserver_host/test.py | Refactored config update logic into helper function and added test case for interserver host change scenario |
tests/integration/test_replicated_database_interserver_host/test.py
Outdated
Show resolved
Hide resolved
Replace replica_host_id in the log Co-authored-by: Copilot <[email protected]>
…st.py Update test query Co-authored-by: Copilot <[email protected]>
| } | ||
|
|
||
| if (uuid_in_keeper != db_uuid) | ||
| throw Exception( |
There was a problem hiding this comment.
make it clear?
throw Exception(
ErrorCodes::REPLICA_ALREADY_EXISTS,
"Replica {} of shard {} of replicated database at {} already exists. Replica host ID ('{}') does not match current host ID ('{}'). "
"A replica with the same name exists but with different node identity.",
replica_name, shard_name, zookeeper_path, replica_host_id, host_id);
There was a problem hiding this comment.
I don't see much difference. And we introduce "name" and "node identity" here.
src/Databases/DatabaseReplicated.cpp
Outdated
| replica_host_id, | ||
| host_id); | ||
|
|
||
| // After restarting, InterserverIOAddress might change. |
There was a problem hiding this comment.
list possbile address change senarios?
tiandiwonder
left a comment
There was a problem hiding this comment.
Is there a issue for it? such as oncall issue.
There is a cross link: #89693 It is part of the issue. |
Update the comment, explain why the InterserverIOAddress might change.
Tidy parseHostID
|
test_scheduler_cpu_preemptive/test.py::test_independent_pools[cpu-slot-preemption-timeout-60s] |
2bc5af7
Cherry pick #93779 to 25.10: Fix attaching Replicated DBs when the interserver host changed after restarting
…server host changed after restarting
Cherry pick #93779 to 25.11: Fix attaching Replicated DBs when the interserver host changed after restarting
…server host changed after restarting
Cherry pick #93779 to 25.12: Fix attaching Replicated DBs when the interserver host changed after restarting
…server host changed after restarting
Backport #93779 to 25.10: Fix attaching Replicated DBs when the interserver host changed after restarting
Backport #93779 to 25.11: Fix attaching Replicated DBs when the interserver host changed after restarting
Backport #93779 to 25.12: Fix attaching Replicated DBs when the interserver host changed after restarting
Cherry pick #93779 to 25.3: Fix attaching Replicated DBs when the interserver host changed after restarting
…erver host changed after restarting
Cherry pick #93779 to 25.8: Fix attaching Replicated DBs when the interserver host changed after restarting
…erver host changed after restarting
Backport #93779 to 25.8: Fix attaching Replicated DBs when the interserver host changed after restarting
Backport #93779 to 25.3: Fix attaching Replicated DBs when the interserver host changed after restarting
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
Fix attaching Replicated DBs when the interserver host changed after restarting.
After ClickHouse restarted, the interserver host might change. However, for Replicated DBs, we store host_id, which includes the host, in the replica_path. When attaching after restarting, the host_id mismatched and throw an error.
In this PR, if the host_id mismatch, but the UUID is the same, we set replica_path data to the new host_id.
Documentation entry for user-facing changes