PromQL: rework converter to SQL#95673
Conversation
0fe69a4 to
2aec388
Compare
There was a problem hiding this comment.
Pull request overview
This PR refactors the PromQL to SQL converter implementation by replacing the monolithic PrometheusQueryToSQLConverter class with a modular architecture. The changes introduce a new PrometheusQueryToSQL namespace containing separate components for different conversion tasks, making the code more maintainable and testable.
Changes:
- Replaced
PrometheusQueryToSQLConverterwith modularPrometheusQueryToSQL::Converter - Split converter logic into focused components (selectors, functions, evaluation time, etc.)
- Introduced
PrometheusQueryEvaluationSettingsto replace scattered evaluation parameters - Added helper functions for converting time series types to AST
Reviewed changes
Copilot reviewed 57 out of 57 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
| src/Storages/TimeSeries/PrometheusQueryToSQL/Converter.{h,cpp} | New main converter class with cleaner interface |
| src/Storages/TimeSeries/PrometheusQueryToSQL/*.{h,cpp} | Modular components for specific conversion tasks |
| src/Storages/TimeSeries/PrometheusQueryEvaluationSettings.h | Centralized evaluation settings structure |
| src/Storages/TimeSeries/timeSeriesTypesToAST.{h,cpp} | Helper functions for AST conversion |
| src/Storages/StoragePrometheusQuery.{h,cpp} | Updated to use new converter and configuration |
| src/Storages/StorageTimeSeriesSelector.{h,cpp} | Updated to use new configuration structure |
| src/TableFunctions/TableFunctionPrometheusQuery.{h,cpp} | Simplified using new converter |
| src/Parsers/Prometheus/parseTimeSeriesTypes.{h,cpp} | New parsing utilities for timestamps and durations |
| src/DataTypes/DataTypesDecimal.h | Added tryGetDecimalScale helper function |
| src/Core/DecimalFunctions.{h,cpp} | Added getCurrentDateTime64 utility |
src/Storages/TimeSeries/PrometheusQueryToSQL/getResultColumns.cpp
Outdated
Show resolved
Hide resolved
06efb6a to
97a2869
Compare
| namespace DB::PrometheusQueryToSQL | ||
| { | ||
|
|
||
| /// Converts a prometheus query to SQL. |
There was a problem hiding this comment.
The converter has been significantly reworked in this PR. Instead of one big file PrometheusQueryToSQL.cpp now there is a folder named PrometheusQueryToSQL which contains components of this conversion. The conversion itself has been also changed a lot to generate internal steps in a more suitable manner for implementing operators.
9268b6c to
f78996a
Compare
f78996a to
f601fec
Compare
f601fec to
00fe6fe
Compare
| /// If `store_method` is VECTOR_GRID then the SELECT query outputs two columns `group` (UInt64), `values` (Array(Nullable(scalar_data_type))). | ||
| /// If `store_method` is RAW_DATA then the SELECT query outputs three columns `group` (UInt64), `timestamp` (timestamp_data_type), `value` (scalar_data_type). | ||
| /// If `store_method` is CONST_SCALAR or CONST_STRING then the SELECT query is not used. | ||
| ASTPtr select_query; |
There was a problem hiding this comment.
The other main change of this PR is that now instant vectors are represented as this select_query returning two columns: group (UInt64 meaning tags), and values (Array of nullable floats), and the timestamps aren't in the output of the select_query because they are fixed (in the context of evaluation) and stored instead in the fields SQLQueryPiece::start_time, SQLQueryPiece::end_time and step.
The values column is now convenient for applying operators and functions - we can just apply them to the corresponding values of such arrays of nullable floats, keeping nulls as is, and we get the result.
Before this PR the result of an instant vector was represented as two columns: group (UInt64 meaning tags), and time_series (Array of tuples (timestamp, value)), which was quite inconvenient for operator because it was too tricky to implement binary operators.
| struct ConverterContext; | ||
|
|
||
| /// Represents how data is stored in a SQLQueryPiece. | ||
| enum class StoreMethod |
There was a problem hiding this comment.
Also this StoreMethod was introduced to keep the information how data is represented in SQL at each step while converting a prometheus query to SQL. Before this PR the code had to check often if the result contains specific columns which was quite error-prone and unclear. Now we're checking this StoreMethod so it's easier to see if we miss anything.
| } | ||
|
|
||
| chassert(argument_index == args.size()); | ||
| config = StoragePrometheusQuery::getConfiguration(args, context, over_range); |
There was a problem hiding this comment.
Here I moved parsing arguments of the table function to the storage class - as it's handled in some other our table functions and made the structure StoragePrometheusQuery::Configuration to help passing these arguments between the table function and the storage.
|
Ready for review |
nikitamikhaylov
left a comment
There was a problem hiding this comment.
Almost everything is clear, but I really got confused with all the different naming regarding time: start, end, step, evaluation_time, evaluation_range, default_resolution, window, staleness. Can you please help me find out what is what?
| auto id_data_type = data_table_metadata->columns.get(TimeSeriesColumnNames::ID).type; | ||
| auto timestamp_data_type = data_table_metadata->columns.get(TimeSeriesColumnNames::Timestamp).type; | ||
| auto scalar_data_type = data_table_metadata->columns.get(TimeSeriesColumnNames::Value).type; |
There was a problem hiding this comment.
For future: we need to think if this variability is worth it. I think we can force the structure for all the tables.
There was a problem hiding this comment.
I think it's useful to allow a limited set of data types here. scalar_data_type can be either Float64 or Float32, and timestamp_data_type can be DateTime64 with any scale, or just number of seconds.
src/Storages/TimeSeries/PrometheusQueryToSQL/buildSelectQuery.h
Outdated
Show resolved
Hide resolved
6db069e to
1c28451
Compare
1c28451 to
d6fa815
Compare
|
cfb29a3
Changelog category (leave one):
Changelog entry (a user-readable short description of the changes that goes into CHANGELOG.md):
PromQL: rework converter to SQL
Part of #89356