Skip to content

Control nesting level for shards skipping and disallow non-deterministic functions#11715

Merged
alexey-milovidov merged 8 commits intoClickHouse:masterfrom
azat:dist-optimize_skip_unused_shards-fixes
Jun 24, 2020
Merged

Control nesting level for shards skipping and disallow non-deterministic functions#11715
alexey-milovidov merged 8 commits intoClickHouse:masterfrom
azat:dist-optimize_skip_unused_shards-fixes

Conversation

@azat
Copy link
Member

@azat azat commented Jun 16, 2020

Yet another small improvements around distributed querying.

Changelog category (leave one):

  • Improvement

Changelog entry (a user-readable short description of the changes that goes to CHANGELOG.md):

  • Add optimize_skip_unused_shards_nesting (allows control nesting level for shards skipping optimization)
  • Add force_skip_optimize_shards_nesting (allows control nesting level for checking was shards skipped or not)
  • Deprecate force_optimize_skip_unused_shards_no_nested (force_skip_optimize_shards_nesting should be used instead)
  • Disable optimize_skip_unused_shards if sharding_key has non-deterministic func (i.e. rand(), note that this does not changes anything for INSERT side)
Details

HEAD:

  • 080e309050ee844db8cf70951825da9e41f7e987
  • 0e218b0

@blinkov blinkov added the pr-improvement Pull request with some product improvements label Jun 16, 2020
@azat azat marked this pull request as draft June 18, 2020 09:47
azat added 5 commits June 18, 2020 21:49
…stic func

Example of such functions is rand()

And this patch disables only optimize_skip_unused_shards, i.e. INSERT
code path does not changed, so it will work as before.
…queries

P.S. Looks like settings can be converted between SettingUInt64 and
SettingBool without breaking binary protocol.

FWIW maybe it is a good idea to change the semantics of the settings as
follow (but I guess that changing semantic is not a good idea, better to
add new settings and deprecate old ones):
- optimize_skip_unused_shards -- accept nesting level on which the
  optimization will work
- force_skip_optimize_shards_nesting -- accept nesting level on which
  the optimization will work
Before there is no check that optimize_skip_unused_shards was working
for the first level, use cluster with unavalable shard to guarantee
this.
@azat azat force-pushed the dist-optimize_skip_unused_shards-fixes branch from 080e309 to 0e218b0 Compare June 18, 2020 18:50
@azat azat marked this pull request as ready for review June 18, 2020 18:50
@alexey-milovidov
Copy link
Member

P.S. Looks like settings can be converted between SettingUInt64 and
SettingBool without breaking binary protocol?

Yes.

@alexey-milovidov
Copy link
Member

optimize_skip_unused_shards_nesting -- accept nesting level on which the
optimization will work
force_skip_optimize_shards_nesting -- accept nesting level on which
the optimization will work

Much better than optimize_skip_unused_shards=2

Copy link
Member

@alexey-milovidov alexey-milovidov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code LGTM,
let's apply changes to make settings more convenient.

@alexey-milovidov alexey-milovidov self-assigned this Jun 21, 2020
@alexey-milovidov
Copy link
Member

alexey-milovidov commented Jun 21, 2020

BTW, what is the setup when nested Distributed tables are required?
(the question from ClickHouse Meetup)

@azat azat changed the title optimize_skip_unused_shards no nested and non-deterministic functions Control nesting level for shards skipping and disallow non-deterministic functions Jun 21, 2020
- optimize_skip_unused_shards_nesting (allows control nesting level for
  shards skipping optimization)
- force_skip_optimize_shards_nesting (allows control nesting level for
  checking was shards skipped or not)
- deprecates force_skip_optimize_shards_no_nested
@azat
Copy link
Member Author

azat commented Jun 21, 2020

BTW, what is the setup when nested Distributed tables are required?

I was going to add some block about use case into the documentation (since there can be some interesting use cases and also caveats), but did not manage to find time for this, egh.

Anyway let me try to explain it briefly here:

Suppose you have lots of nodes (say 1000) in the cluster.
So what can you do in this case is divide your cluster of 1000 nodes, into smaller, for simplicity let's use 100 nodes per cluster, so 10 smaller cluster of 100 nodes. And the distribute data using some greatly-distributed-column (let's say user_id) between smaller clusters, and for this you will have two distributed tables, one that will distribute between smaller clusters (with sharding_key = consistent hash(user_id)), and distribute table in each cluster (let's use random distribution there for simplicity).
Then when you need to query your data you can use optimize_skip_unused_shards to avoid sending queries to all 1000 nodes, only to 100.

And one more advantage rise up when you need to expand the cluster (basically all above can be solved without nesting, but will require some trickery), since in this case you can just add new nodes into smaller cluster without any data re-sharding.

P.S. note that this a brief descriptions, that does not accounts some possible issues/aspects.

@alexey-milovidov
Copy link
Member

Yandex synchronization check (only for Yandex employees)

This is Ok.

@alexey-milovidov
Copy link
Member

/build/build_docker/../src/Interpreters/ClusterProxy/executeQuery.cpp:55:56: error: converting integer literal to bool, use bool literal instead [modernize-use-bool-literals,-warnings-as-errors]
            new_settings.optimize_skip_unused_shards = 0;
                                                       ^
                                                       false

@alexey-milovidov alexey-milovidov merged commit 18eb141 into ClickHouse:master Jun 24, 2020
@azat azat deleted the dist-optimize_skip_unused_shards-fixes branch June 24, 2020 18:23
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

pr-improvement Pull request with some product improvements

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants