Skip to content

Refreshable MV: DatabaseReplicated support, coordination among replicas#60669

Merged
al13n321 merged 41 commits intomasterfrom
mvrere
Oct 5, 2024
Merged

Refreshable MV: DatabaseReplicated support, coordination among replicas#60669
al13n321 merged 41 commits intomasterfrom
mvrere

Conversation

@al13n321
Copy link
Member

@al13n321 al13n321 commented Mar 1, 2024

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

Refreshable materialized views are now supported in Replicated databases.

Documentation entry for user-facing changes

  • Documentation is written (mandatory for new features)

@robot-ch-test-poll robot-ch-test-poll added the pr-improvement Pull request with some product improvements label Mar 1, 2024
@robot-ch-test-poll
Copy link
Contributor

robot-ch-test-poll commented Mar 1, 2024

This is an automated comment for commit 9c95c47 with description of existing statuses. It's updated for the latest CI running

✅ Click here to open a full report in a separate page

Successful checks
Check nameDescriptionStatus
AST fuzzerRuns randomly generated queries to catch program errors. The build type is optionally given in parenthesis. If it fails, ask a maintainer for help✅ success
BuildsThere's no description for the check yet, please add it to tests/ci/ci_config.py:CHECK_DESCRIPTIONS✅ success
ClickBenchRuns [ClickBench](https://github.com/ClickHouse/ClickBench/) with instant-attach table✅ success
Compatibility checkChecks that clickhouse binary runs on distributions with old libc versions. If it fails, ask a maintainer for help✅ success
Docker keeper imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docker server imageThe check to build and optionally push the mentioned image to docker hub✅ success
Docs checkBuilds and tests the documentation✅ success
Fast testNormally this is the first check that is ran for a PR. It builds ClickHouse and runs most of stateless functional tests, omitting some. If it fails, further checks are not started until it is fixed. Look at the report to see which tests fail, then reproduce the failure locally as described here✅ success
Flaky testsChecks if new added or modified tests are flaky by running them repeatedly, in parallel, with more randomization. Functional tests are run 100 times with address sanitizer, and additional randomization of thread scheduling. Integration tests are run up to 10 times. If at least once a new test has failed, or was too long, this check will be red. We don't allow flaky tests, read the doc✅ success
Install packagesChecks that the built packages are installable in a clear environment✅ success
Integration testsThe integration tests report. In parenthesis the package type is given, and in square brackets are the optional part/total tests✅ success
Performance ComparisonMeasure changes in query performance. The performance test report is described in detail here. In square brackets are the optional part/total tests✅ success
Stateful testsRuns stateful functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stateless testsRuns stateless functional tests for ClickHouse binaries built in various configurations -- release, debug, with sanitizers, etc✅ success
Stress testRuns stateless functional tests concurrently from several clients to detect concurrency-related errors✅ success
Style checkRuns a set of checks to keep the code style clean. If some of tests failed, see the related log from the report✅ success
Unit testsRuns the unit tests for different release types✅ success
Upgrade checkRuns stress tests on server version from last release and then tries to upgrade it to the version from the PR. It checks if the new server can successfully startup without any errors, crashes or sanitizer asserts✅ success

@fm4v
Copy link
Member

fm4v commented Apr 29, 2024

RMV trigger with APPEND throws exception in system.view_refreshes. Works without APPEND:

CREATE MATERIALIZED VIEW test_rmv
REFRESH EVERY 1 HOUR APPEND
ENGINE = Memory
AS SELECT
    now() AS a,
    number AS b
FROM numbers(2);

SELECT *
FROM system.view_refreshes;


Row 1:
──────
database:                 default
view:                     test_rmv
status:                   Scheduled
last_success_time:        ᴺᵁᴸᴸ
last_success_duration_ms: ᴺᵁᴸᴸ
last_refresh_time:        2024-04-29 11:57:21
last_refresh_replica:     
next_refresh_time:        2024-04-29 12:00:00
exception:                Code: 393. DB::Exception: There is no query or query context has expired. (THERE_IS_NO_QUERY), Stack trace (when copying this message, always include the lines below):

0. ./build_docker/./src/Common/Exception.cpp:101: DB::Exception::Exception(DB::Exception::MessageMasked&&, int, bool) @ 0x000000000c99f6db
1. DB::Exception::Exception(PreformattedMessage&&, int) @ 0x00000000078069ac
2. DB::Exception::Exception<>(int, FormatStringHelperImpl<>) @ 0x0000000007814d8b
3. ./build_docker/./src/Interpreters/Context.cpp:2396: DB::Context::getQueryContext() const @ 0x0000000010321089
4. ./build_docker/./src/Analyzer/Passes/QueryAnalysisPass.cpp:7414: DB::(anonymous namespace)::QueryAnalyzer::resolveTableFunction(std::shared_ptr<DB::IQueryTreeNode>&, DB::(anonymous namespace)::IdentifierResolveScope&, DB::(anonymous namespace)::QueryExpressionsAliasVisitor&, bool) @ 0x0000000010a7434a
5. ./build_docker/./src/Analyzer/Passes/QueryAnalysisPass.cpp:7772: DB::(anonymous namespace)::QueryAnalyzer::resolveQueryJoinTreeNode(std::shared_ptr<DB::IQueryTreeNode>&, DB::(anonymous namespace)::IdentifierResolveScope&, DB::(anonymous namespace)::QueryExpressionsAliasVisitor&) @ 0x0000000010a76d1c
6. ./build_docker/./src/Analyzer/Passes/QueryAnalysisPass.cpp:8021: DB::(anonymous namespace)::QueryAnalyzer::resolveQuery(std::shared_ptr<DB::IQueryTreeNode> const&, DB::(anonymous namespace)::IdentifierResolveScope&) @ 0x0000000010a64e91
7. ./build_docker/./src/Analyzer/Passes/QueryAnalysisPass.cpp:0: DB::QueryAnalysisPass::run(std::shared_ptr<DB::IQueryTreeNode>&, std::shared_ptr<DB::Context const>) @ 0x0000000010a629e5
8. ./build_docker/./src/Analyzer/QueryTreePassManager.cpp:183: DB::QueryTreePassManager::run(std::shared_ptr<DB::IQueryTreeNode>) @ 0x0000000010a612a3
9. ./build_docker/./src/Interpreters/InterpreterSelectQueryAnalyzer.cpp:0: DB::(anonymous namespace)::buildQueryTreeAndRunPasses(std::shared_ptr<DB::IAST> const&, DB::SelectQueryOptions const&, std::shared_ptr<DB::Context const> const&, std::shared_ptr<DB::IStorage> const&) (.llvm.11603548317326480995) @ 0x0000000010cf40bd
10. ./build_docker/./src/Interpreters/InterpreterSelectQueryAnalyzer.cpp:132: DB::InterpreterSelectQueryAnalyzer::InterpreterSelectQueryAnalyzer(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&) @ 0x0000000010cf2e51
11. ./build_docker/./src/Interpreters/InterpreterSelectQueryAnalyzer.cpp:170: DB::InterpreterSelectQueryAnalyzer::getSampleBlock(std::shared_ptr<DB::IAST> const&, std::shared_ptr<DB::Context const> const&, DB::SelectQueryOptions const&) @ 0x0000000010cf4bfd
12. ./build_docker/./src/Storages/StorageMaterializedView.cpp:0: DB::StorageMaterializedView::prepareRefresh(bool, std::shared_ptr<DB::Context>, std::optional<DB::StorageID>&) const @ 0x0000000011661e09
13. ./build_docker/./src/Storages/MaterializedView/RefreshTask.cpp:528: DB::RefreshTask::refreshTask() @ 0x0000000012158fa2
14. ./base/glibc-compatibility/musl/clock_gettime.c:62: DB::BackgroundSchedulePool::threadFunction() @ 0x000000000fbf2765
15. ./build_docker/./src/Core/BackgroundSchedulePool.cpp:0: void std::__function::__policy_invoker<void ()>::__call_impl<std::__function::__default_alloc_func<ThreadFromGlobalPoolImpl<false, true>::ThreadFromGlobalPoolImpl<DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0>(DB::BackgroundSchedulePool::BackgroundSchedulePool(unsigned long, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, StrongTypedef<unsigned long, CurrentMetrics::MetricTag>, char const*)::$_0&&)::'lambda'(), void ()>>(std::__function::__policy_storage const*) @ 0x000000000fbf3803
16. ./base/base/../base/wide_integer_impl.h:817: void* std::__thread_proxy[abi:v15000]<std::tuple<std::unique_ptr<std::__thread_struct, std::default_delete<std::__thread_struct>>, void ThreadPoolImpl<std::thread>::scheduleImpl<void>(std::function<void ()>, Priority, std::optional<unsigned long>, bool)::'lambda0'()>>(void*) @ 0x000000000ca5972d
17. ? @ 0x000075b44ee94ac3
18. ? @ 0x000075b44ef26850
 (version 24.4.1.1793)
retry:                    0
progress:                 nan
read_rows:                0
read_bytes:               0
total_rows:               0
total_bytes:              0
written_rows:             0
written_bytes:            0
result_rows:              0
result_bytes:             0

version: 24.4.1.1793

@fm4v
Copy link
Member

fm4v commented Apr 29, 2024

Something wrong with next_refresh_time when using WEEK interval, some cases are correct, some are not.
More cases here: https://docs.google.com/document/d/11XkpCRmrXKt9UVpW3S3KT50sRMlNDErk2kz2bDXtW0I/edit?usp=sharing

CREATE MATERIALIZED VIEW
test_rmv_schedule
REFRESH EVERY _N_ WEEK
ENGINE = Memory
EMPTY
AS SELECT now() as a, number as b FROM numbers(2);


current_time: 2024-04-29

Interval: EVERY 1 WEEK
next_refresh_time 2024-05-06 CORRECT
----
Interval: EVERY 2 WEEK
next_refresh_time 2024-05-06 WRONG
----
Interval: EVERY 3 WEEK
next_refresh_time 2024-05-20 CORRECT
----
Interval: EVERY 4 WEEK
next_refresh_time 2024-05-06 WRONG
----
Interval: EVERY 5 WEEK
next_refresh_time 2024-06-03 CORRECT
----
Interval: EVERY 20 WEEK
next_refresh_time 2024-06-03 WRONG
---
Interval: EVERY 21 WEEK
next_refresh_time 2024-09-23 CORRECT

version: 24.4.1.1793

@fm4v
Copy link
Member

fm4v commented Apr 30, 2024

I can't quite understand how two and more units in EVERY work. There is no such problem with AFTER.

More cases: https://docs.google.com/document/d/1ncSaWYNMzUKkqUWpIoYlsJVQrhVC6fbEPTQhmka8cPQ/edit?usp=sharing

Interval: EVERY 1 YEAR
next_refresh_time 2025-01-01 CORRECT
predicted 2025-01-01 00:00:00
----
Interval: EVERY 1 YEAR 0 MONTH
next_refresh_time 2025-01-01 CORRECT
predicted 2025-01-01 00:00:00
Correct: True
—-
Interval: EVERY 1 YEAR 1 MONTH
next_refresh_time 2025-04-01 WRONG
predicted 2025-02-01 00:00:00
Correct: False
----
Interval: EVERY 1 YEAR 2 MONTH
next_refresh_time 2024-11-01 WRONG
predicted 2025-03-01 00:00:00
Correct: False
----
Interval: EVERY 1 YEAR 3 MONTH
next_refresh_time 2025-01-01 WRONG
predicted 2025-04-01 00:00:00

@fm4v
Copy link
Member

fm4v commented Apr 30, 2024

Should the same units be accepted like "EVERY 1 YEAR 2 YEAR"? Now they just being added together.

CREATE MATERIALIZED VIEW
test_rmv_schedule
REFRESH EVERY 1 YEAR 2 YEAR
ENGINE = Memory
EMPTY
AS SELECT now() as a, number as b FROM numbers(2)

----
current_time: 2024-04-30 15:24:14
Interval: EVERY 1 YEAR 2 YEAR
next_refresh_time 2027-01-01 00:00:00
CREATE MATERIALIZED VIEW
test_rmv_schedule
REFRESH AFTER 1 YEAR 2 YEAR
ENGINE = Memory
EMPTY
AS SELECT now() as a, number as b FROM numbers(2)

----
current_time: 2024-04-30 15:24:15
Interval: AFTER 1 YEAR 2 YEAR
next_refresh_time 2027-04-30 15:24:15

@al13n321 al13n321 force-pushed the mvrere branch 2 times, most recently from 5113db3 to 52e42bb Compare June 11, 2024 05:08
@al13n321
Copy link
Member Author

RMV trigger with APPEND throws exception in system.view_refreshes.

Fixed, thanks for catching it!

Should the same units be accepted like "EVERY 1 YEAR 2 YEAR"? Now they just being added together.

Yeah, it wasn't clear what the user would expect, so I added them together because it's easy and seems natural (since different units are added together). Your comment made it more clear what the user expects :) , so added a check to reject it.


I can't quite understand how two and more units in EVERY work.

Maybe it was confusing because you tested it on Monday :) . EVERY 1 WEEK means refresh every Monday, even if the materialized view was created on a Wednesday. So the first refresh can be anywhere between 0 and 1 weeks from the view creation, depending on when it was created. Similarly, EVERY 2 WEEKS means Mondays of even-numbered weeks (counting from 1969-12-29), and the first refresh can be anywhere between 0 and 2 weeks from view creation.

This is the only reasonable-seeming way I could think of to generalize the notion of "start of each hour"/"every Monday"/"1st of every month" etc to ~arbitrary time intervals. Lmk if you have possibly-better ideas.

I can't quite understand how two and more units in EVERY work.

They're converted to seconds or months and added together. Refresh happens when current time is divisible by that. Added 2 examples in docs/en/sql-reference/statements/create/view.md.


(Unnecessarily long explanation below, feel free to skip.)

The desired behavior for EVERY 1 WEEK is "every Monday". This has the properties: (a) it doesn't depend on when the materialized view was created, (b) consecutive refreshes are exactly 1 week apart. And non-properties: (c) the first refresh doesn't happen exactly 1 week after view creation, it can be anywhere between 0 and 1 week, (d) the schedule is not aligned with months or years.

I would expect EVERY 2 WEEK to have the same two properties and two non-properties, and also to fall on Mondays. There are exactly two possible ways to achieve that: all even-numbered Mondays or all odd-numbered Mondays. I just picked one.

The generalization to SECOND/MINUTE/HOUR/DAY/WEEK (not MONTH/YEAR) is: convert to seconds (say, n seconds), do a refresh when the number of seconds since 1969-12-29 (which is a Monday) is divisible by n. So:

  • EVERY 1 <anything> works as expected.
  • EVERY 2/3/4/5/6/10/12/15/20/30/60 SECOND/MINUTE work as expected. They're divisors of 60, so it's automatically aligned with minutes or hours.
  • EVERY 2/3/4/6/8/12 HOUR works as I guess people would expect: refresh at the start of every even-numbered/divisible-by-3-numbered/etc hour in the day.
  • When the interval is not a divisor of some bigger unit, refreshes won't be aligned with that bigger unit. Alternatively, we could forcefully align them to the next-bigger unit, making refresh interval non-uniform; e.g. EVERY 5 HOUR would refresh at 0, 5, 10, 15, and 20 hours after the start of each day, with a 4-hour interval between 20:00 and 00:00 of next day. That's what the original refreshable MV implementation did, but I didn't like it and changed it, i.e. traded property (b) for property (d). Lmk if you think we should change it back (and what to do with EVERY 59 MINUTES then).
  • It doesn't matter which units were used to specify the interval: EVERY 1 WEEK is the same as EVERY 7 DAY or EVERY 168 HOUR etc.
  • When the interval is just weird, like EVERY 1 DAY 1 SECOND, I'm guessing the user probably doesn't have any specific expectation for how exactly it'll work, so they won't be very surprised?

For MONTH/YEAR: convert to n months, refresh when current month number (since 1970-01) is divisible by n. Mixing seconds-based with months-based intervals is not allowed (e.g. 1 YEAR 2 WEEKS - Exception: Interval shouldn't contain both calendar units and clock units (e.g. months and days)). Refreshing on day >28 of the month (e.g. OFFSET 30 DAY) is not allowed because February.

If you have suggestions for how and which of those things to document, based on what you found confusing - lmk. I'm hoping that the two new examples in view.md are enough to prompt the reader to think about this the intended way - absolute timestamps being divisible by time intervals.

void InterpreterCreateQuery::validateMaterializedViewColumnsAndEngine(const ASTCreateQuery & create, const TableProperties & properties, const DatabasePtr & database)
{
/// This is not strict validation, just catches common errors that would make the view not work.
/// It's possible to circumvent these checks by ALTERing the view or target table after creation.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why don't we validate the view on ALTER as well?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We probably should, I just documented this along the way.

@clickhouse-gh
Copy link
Contributor

clickhouse-gh bot commented Jul 23, 2024

Dear @pufit, this PR hasn't been updated for a while. You will be unassigned. Will you continue working on it? If so, please feel free to reassign yourself.

@al13n321
Copy link
Member Author

AST fuzzer - known problem: #56640

@al13n321
Copy link
Member Author

al13n321 commented Oct 2, 2024

Flaky tests: #62741 , #70282

@al13n321 al13n321 enabled auto-merge October 2, 2024 16:56
@al13n321
Copy link
Member Author

al13n321 commented Oct 3, 2024

Flaky tests: #70337 #70169 #70319

@al13n321 al13n321 added this pull request to the merge queue Oct 5, 2024
Merged via the queue into master with commit 9792306 Oct 5, 2024
@al13n321 al13n321 deleted the mvrere branch October 5, 2024 19:28
@robot-ch-test-poll robot-ch-test-poll added the pr-synced-to-cloud The PR is synced to the cloud repo label Oct 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements pr-synced-to-cloud The PR is synced to the cloud repo

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants