@fmccawley could you please review again? I addressed your comments and tier-3 pipeline is green now.
@ahegyi could you please review this MR as clickhouse and database maintainer?
done. I've added a test for guest user.
done. I've added a test for guest user.
@rob.hunt yes analytics type will be available on both levels. Project and Group (and maybe Organizations and Instance one day
There isn't any check that enforces this
that's correct.
Could we add a guard to the mount_aggregation_engine method to enforce it?
Unfortunately, that's not really possible due to the various flexible ways authorization can be defined. There are cases where the parent object handles authorization, so the child doesn't need to. There might also be cases where the aggregation engine should be accessible without authorization. Given these considerations, I think updating the documentation is the best approach we can take at this moment. If you'd like, we can open a follow-up issue to discuss this further and avoid blocking this MR.
No, there isn't. The field was marked as "Not for production use" and "Experimental" because it's a work-in-progress endpoint. There is a more stable alternative with the older architecture available under the aiUsageData endpoint, which is also experimental but known to be in use by clients. So I believe we are safe here.
@fmccawley it's just to emphasize for other developers that both approaches are possible. authorize: :read_pro_ai_analytics is simpler while class call is more flexible. Both approaches are standard to our GraphQL structures.
I noticed the rspec:undercoverage test is showing a warning https://gitlab.com/gitlab-org/gitlab/-/jobs/13532575386, this will hard fail when the tier 3 pipeline runs.
It should not, because tier-1 doesn't run full spec suite so coverage from request specs is not included. Anyway Adam already approved the MR so I just triggered new pipeline to see if there's any undercoverage issues.
That will require standardized way of deduplication.
We definitely should. And I believe it should be fully hidden inside AEs if possible.
Pavel Shutsin (fe29b9fc) at 18 Mar 14:23
remove disclamer
@rob.hunt then we can update the label and keep it Experimental but not "Not for production use yet".
This MR adds Siphon configured tables and adds the Siphon-specific configuration. We also do testing so the Siphon configuration matches the PostgreSQL and ClickHouse schemas.
The goal of these configurations is to act as a SSoT for tables which are going to be Siphon-replicated on our GitLab environments (GDK, .com, Dedicated, Self-Managed, etc.)
| Config | Description |
|---|---|
table |
The PostgreSQL source table name |
database |
The database where the table resides (e.g., main), this tells us which Siphon producer to use for this table |
ignored_columns |
List of columns to exclude from replication (sensitive/encrypted data) |
replication_targets |
Array of replication target configurations (for now we only register ClickHouse as the target) |
replication_targets[].name |
Name of the replication target (e.g., clickhouse_main) |
replication_targets[].target |
Target ClickHouse table name where the data will be replicated |
replication_targets[].priority |
Replication priority (numeric value), optional. During initial setup we want to replicate specific tables first (organizations, projects, namespaces) |
replication_targets[].dedup_by |
Columns used for deduplication when upserting data. This is an important config for traversal_path de-normalized tables. (https://docs.gitlab.com/development/database/clickhouse/clickhouse_table_design_with_siphon/#siphon-configuration-and-the-chicken-and-egg-problem) |
Config rules have been implemented in the RSpec matcher:
dedup_by), ensure that the PostgreSQL primary keys are suffix of the ClickHouse primary keys.Evaluate this MR against the MR acceptance checklist. It helps you analyze changes to reduce risks in quality, performance, reliability, security, and maintainability.
@tkuah lets open a follow-up for this if you don't mind. I'm not 100% sure we should. For this specific case I believe it will look fine, but for generalized case dimensions can have arguments, so not everything will fit into Parameterized tests. I'd love to have common pattern established between all aggregation_engines/*_spec.rb I hope it will emerge later on when we have more engines => more usecases.