Modern SQLhttps://modern-sql.com/Modern SQL: A lot has changed since SQL-92enhttps://modern-sql.com/static/icon-400x400.pnghttps://modern-sql.com/static/logo-280x150.pnghttps://modern-sql.com/static/logo-280x150.pngModern SQLhttps://modern-sql.com/Order-Equivalencehttps://modern-sql.com/blog/2026-03/order-equivalence-over-clausehttps://modern-sql.com/blog/2026-03/order-equivalence-over-clause10 Mar 2026 00:00:00 GMTOrder-Equivalent Over ClausesFull Oracle DB support since at least 11gR1. Full Db2 (LUW) support since at least 9.7. Full PostgreSQL support since at least 9.0. Full MySQL support since at least 8.0.11. Full SQLite support since at least 3.25.0. Full MariaDB support since 10.4. Full H2 support since 1.4.200. Full BigQuery support since at least 2020-03-28. Full DuckDB support since at least 1.0.0. Faulty SQL Server support since at least 2012.
  1. ⚡Order-equivalent over clause can produce different row orders
  2. Apparently, the framing must be identical too to make over clauses order-equivalent

At the heart of this article, there is a simple question: Will the following query always give the same numbering in both columns?

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , ROW_NUMBER() OVER (ORDER BY x)
  FROM ...

I did not hide any relevant information. The two column definitions are absolutely identical. The problem becomes more obvious if I add example data that highlights the crucial point.

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , ROW_NUMBER() OVER (ORDER BY x)
     , t.*
  FROM (VALUES (1, 10)
             , (1, 20)
       ) t(x, y)

Two rows have the same value in the column x, which is used in order by. And that brings us to the core of the question: Is the order of peer rows (equally ranked rows) the same across different over clauses? For example, if you sort chronologically in two over clauses, will the rows that happened at the same time have the same order in both over clauses? The answer is, luckily: Yes, the SQL standard requires that. There are nonetheless systems that do not provide that guarantee, as you can see in the chart above.0

The SQL standard defines a term for that purpose: order-equivalent.1 Two over clauses2 are order-equivalent if the partition by (if present) and the Order By clauses refer to the same columns and the effective modifiers in the order by clause ([asc|desc], nulls [first|last], collate) are the same. In the case, that standard requires the row order produced by two over clauses to be the same—even among peer rows.3


If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or get training.


Besides the requirement of the standard there are two more reasons to treat peer rows equally among over clauses whenever possible: Maintaining the principle of least astonishment and reducing the response time. Even if not required by the standard in any way, system vendors should generally seek to minimize the number of required sort operations. After all, it would be wasted time to sort the rows twice in the example above.

Order-equivalent over clauses are very common in practice. Thus, I’m happy the standard takes care of it.4 It is less fortunate that there are similar cases that do not fall under the order-equivalence as defined by the standard. Yet equal treatment of peers would be very desirable. Take the next query as an example.

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , ROW_NUMBER() OVER (ORDER BY x DESC)
  FROM (VALUES (1, 10)
             , (1, 20)
       ) t(x, y)

The query adds two columns: one with ascending numbering, the other with descending numbering. This query might return surprising results as the peers might be treated differently by each over clause. The following is a valid result for this query.

… OVER (ORDER BY x)… OVER (ORDER BY x DESC)
11
22

The reasonable expectation that the numbering always goes in opposite direction is not fulfilled. More formally, one might expect that the total of these columns is same for all rows of the result. When the one column goes up, the other needs to go down by the same amount. As these over clauses are not order-equivalent—they differ in the order by modifiers—, the standard does not require SQL engines to treat peers the same. Yet one of the tested systems has always produced the desirable result of exactly opposite numbering.

Frankly speaking, I think this desirable behavior is rather motivated by performance optimization than by the aim to avoid surprising results. Why should the DBMS re-sort the result if all that needs to be done is to process it in opposite direction? Which coincidentally takes care of equal treatment of peers. This performance aspect is my biggest hope that behavior becomes more common in the not-too-distant future.


Think About It

Do you see other cases where equal treatment of peers would be desirable even though the over clauses are not order-equivalent?5


Till then the question arises how to get a correct numbering in opposite directions—even in presence of peers. The simplest answer is, of course, to avoid peers by extending the order by clause so that now two rows can have the same values in the order keys. Practically it means to include the required columns from primary keys to the order by clause.


Think About It

As opposed to primary key columns it is not generally sufficient to include the unique key columns. Why not?6


When extending the order by clause is not possible or desired, we must use a query that uses order-equivalent over clauses only. The next query does that. Even though the over clauses are not identical, they are still order-equivalent as the different framing (rows between) is irrelevant for order-equivalence as defined by the SQL standard. The example uses a forward-looking frame to count the remaining rows—which gives the desired, opposite descending numbering.

SELECT ROW_NUMBER() OVER (ORDER BY x)
     , COUNT(*)     OVER (ORDER BY x
                          ROWS BETWEEN
                               CURRENT ROW
                           AND UNBOUNDED FOLLOWING
                         )
  FROM 

This solution requires proper implementation of the order-equivalence requirements of the standard—which is not always the case✓✗. A solution that does not require conforming treatment of peers in order-equivalent over clauses is also possible. Just use mathematics.

SELECT rn, cnt - rn + 1
  FROM (SELECT ROW_NUMBER() OVER (ORDER BY x) AS rn
             , COUNT(*)     OVER (          ) AS cnt
          FROM 
) t

The key to success is to use only one window function (here row_number) that treats peers differently. The count function is used to get the total of all rows, which does not require ordering at all. The outer query can build the opposite numbering by subtracting the ascending numbering from the total row count—and taking care of the off-by-one error ;)


Think About It

Why is it wrong to put the same order by clause in the over clause of the count function?7


This article discusses just one example for problems that can arise out of non-determinism. It is generally a good habit to avoid non-determinism whenever possible. Often by including primary key columns into order by clauses, by avoiding row frames, … the standard has a list for that.8 Yet I believe system vendors could take [even] more care to improve usability by avoiding surprising results. Not just in the extend mandated by the SQL standard. In particular, when it would give even faster responses.

Of course, these topics are also covered in my analysis and aggregation training.

Order-Equivalent Over Clauses” by Markus Winand was originally published at Modern SQL.

]]>
SQL /* comments */ can be nestedhttps://modern-sql.com/caniuse/comments/caniuse/comments10 Feb 2026 00:00:00 GMTSQL Comments, Please!

In a program, a comment is a part of the source code that is ignored by the system. Comments are generally used for two purposes: (1) provide background information for future readers of that code, including our selves; (2) deactivate some code without deleting it yet—aka. commenting out. SQL supports comments too, of course.

While SQL comments are generally ignored by the engine, there are two side-effects worth mentioning. The first is that standard SQL does not really ignore comments, they are actually token separators like whitespace. The second side-effect is not standard SQL: Some systems look for special instructions inside comments—the so-called hints. Hints are, however, out of scope here but I wrote about them on Use The Index, Luke! before.

Standard SQL offers two flairs of comments: one starts by two dashes (--) and goes to the end of the line. The other starts with slash-asterisk (/*) and ends with asterisk-slash (*/). That might remind of comments in other programming languages. Unlike many other programming languages, the second form allows for nested comments in SQL. This is particularity useful when commenting out code that contains /*…*/ comments.

  1. Needs a white space after -- to avoid ambiguity with -(-<integer literal>)
  2. Without nesting
  3. Except immediately after select: select/**/*
  4. The # sign is the bitwise exclusive or (xor) operator

Of course, some vendors could not resist adding other forms of comments known from other programming languages. Obviously, they might be treated totally different by other systems.

Standard Features
Implementation-defined elements
  • IV075, “The end-of-line indicator (newline)”

Other Fresh Articles

As I don’t want to spam you too often, I’m not sending every new article on modern-sql.com individually. These are the other fresh articles since the last post.

Further, there are some notable updates:

  • Oracle 23.26.1 supports the filter clause

  • MySQL 9.6 removes the md5, sha, and sha1 functions for security reasons. You can get them back, though:

    INSTALL COMPONENT 'file://component_classic_hashing'

Finally, I’m on mastodon.

SQL Comments, Please!” by Markus Winand was originally published at Modern SQL.

]]>
WITHOUT OVERLAPS Constraintshttps://modern-sql.com/caniuse/without-overlaps-constraintshttps://modern-sql.com/caniuse/without-overlaps-constraints13 Jan 2026 00:00:00 GMTWithout Overlaps ConstraintsFull Db2 (LUW) support since at least 10.5. Full MariaDB support since at least 10.5. Full PostgreSQL support since at least 18.
  1. Functionality available with exclusion constraints
  2. Without keyword for: PERIOD FOR BUSINESS_TIME (…)
  3. Using the btree_gist extension and a range: BUSINESS_TIME tsrange GENERATED ALWAYS AS (tsrange(start_ts, end_ts)) STORED

SQL supports temporal constraints that prevent rows with overlapping time ranges. These constraints are typically used to prevent double bookings in a reservation system.

Such a primary key or unique constraint uses the keywords without overlaps on the period that combines the begin and end of the time frame.0

CREATE TABLE t (
  id INTEGER NOT NULL,
  
  start_ts TIMESTAMP(6) NOT NULL,
  end_ts   TIMESTAMP(6) NOT NULL,
  PERIOD FOR BUSINESS_TIME (start_ts, end_ts),

  PRIMARY KEY (id, BUSINESS_TIME WITHOUT OVERLAPS)
)

Now consider these insert statements.

INSERT INTO t (id, start_ts, end_ts)
VALUES (1, TIMESTAMP'2018-05-16 19:12:50', TIMESTAMP'9999-12-31 23:59:59')
INSERT INTO t (id, start_ts, end_ts)
VALUES (1,  TIMESTAMP'2025-12-01 20:00:00', TIMESTAMP'2026-01-13 20:00:00')

Given the without overlaps constraint the second insert is rejected as its period falls within the range of the first row.

  • Standard Features

    • T180, “System-versioned tables”

    • T181, “Application-time period tables”

    • T200, “Trigger DDL”

    • F521, “Assertions”

    • F671, “Subqueries in CHECK constraints”,

Normative References

Without overlaps constraints are defined in ISO/IEC 9075-2:2023 as part of the optional feature T181, “Application-time period tables”.

Without Overlaps Constraints” by Markus Winand was originally published at Modern SQL.

]]>
DuckDB Coveragehttps://modern-sql.com/blog/2025-12/derby-duckdbhttps://modern-sql.com/blog/2025-12/derby-duckdb16 Dec 2025 00:00:00 GMTAdieu Apache Derby, Welcome DuckDB

Apache Derby was once a popular system to mock an SQL-Engine during unit tests. A use-case that vanished by the ease of which “the real SQL systems” are available during development nowadays. Thus, the following note might be sad, but nor surprising: “On 2025-10-10, the Derby developers voted to retire the project into a read-only state. Derby development and bug-fixing have ended. No further releases will be published”. Consequently, I’ve removed Apache Derby from the charts on modern-sql.com.

As every end is a new beginning modern-sql.com now covers DuckDB instead. DuckDB is an in-process System that aims at analytic workloads on commodity hardware such as your laptop computer. DuckDB is often praised for its ease of use and its capability to use all CPU cores of the system it is running on. That is quite often enough scalability. In particular, when different uses have copies of the database locally on their computer. If that sounds reasonably to you, you might want to give DuckDB a try.

Just because I can I’m also putting the chart about the merge statement here (introduced with DuckDB 1.4, as you can see below).

Full PostgreSQL support since 15. Full BigQuery support since 2023-04-21. Partial Oracle DB support since at least 11gR1. Partial Db2 (LUW) support since at least 9.7. Partial SQL Server support since at least 2008R2. Partial H2 support since 1.4.198. Partial DuckDB support since 1.4.0.
  1. Not all operations
  2. Ambiguities not always reported
  3. Not all syntax variants
  4. ⚡Applies multiple rules

In other news: modern-sql.com has got a dark mode, first training dates for 2026 announced (including US shift).

Adieu Apache Derby, Welcome DuckDB” by Markus Winand was originally published at Modern SQL.

]]>
GROUP BY ALLhttps://modern-sql.com/caniuse/group-by-allhttps://modern-sql.com/caniuse/group-by-all04 Nov 2025 00:00:00 GMTGroup by All: Generate group by from selectNon-standard BigQuery support since at least 2020-03-28. Non-standard Oracle DB support since at least 23.9. Non-standard DuckDB support since at least 1.0.0.
  1. Cannot mix aggregates with non-aggregated columns: a + count(*)
  2. Removes the group by all clause if it leads to an error that does not happen without group by clause

Some Systems allow the special group by all syntax as a shorthand for grouping on all select items that do not have an aggregate function.

SELECT customers.id, customers.name, count(orders.id)
  FROM customers
  LEFT JOIN orders ON customers.id = orders.customer_id
 GROUP BY ALL

This is equivalent to listing customers.id and customers.name in the group by clause explicitly.

The two short-hands provided by the standard (detection of functional dependencies and any_value) require a little more typing but provide a safety-net against accidental changes of the grouping key: If the example above is extended to also select the orders.placed column, it silently changes the grouping key—even if that was not intended.

This keyword all in the syntax must not be mistaken for the <set quantifier> all, which is also allowed in group by.



Standard Features

Normative References

The group by all syntax as shown above is not covered in ISO/IEC 9075-2:2023. The keyword all must not be mistaken for the <set quantifier>, which is also allowed in group by (T434, “GROUP BY DISTINCT”).


Probably in the Next Standard

This feature is probably in the next standard. However, as the standard doesn’t allow for expressions in the group by clause, group by all is non-conforming if items added to group by are not just column names.

As most systems do support expressions in group by, there is still a standardization gap that might lead to different results in different systems in a few corner cases, for example here: select a + COUNT(*) OVER().✓✗


Group by All” by Markus Winand was originally published at Modern SQL.

]]>
… IS JSON [ARRAY|OBJECT|SCALAR]https://modern-sql.com/caniuse/is-jsonhttps://modern-sql.com/caniuse/is-json27 Mar 2024 00:00:00 GMT… Is Json [Array|Object|Scalar]: Test for Valid JSON, Distinguish between Arrays, Objects and ScalarsIS JSON
  1. ⚡Does not recognize JSON scalars as valid JSON
  2. No type constraints: … is json [array|object|scalar]
  3. Accepts unquoted object keys: {a: 1}

Is json is a predicate, similar to is null, to test something for valid JSON content. The test can also distinguish between Arrays ([1,42]), Objects ({"a":42}) and scalar values (strings, numbers, true, false, null).

WHERE c IS JSON OBJECT

The example picks rows for which the column c contains a valid JSON structure of which the topmost element is a JSON object.

As is json never fails it can be used to test the contents of string values for well-formed JSON contents. If the input is the SQL null value, the result is also the SQL null (unknown) value.

The type constraints array, object and scalar make it even useful for values of the type JSON. If no type constraint is specified, value is implied—which returns true for all valid JSON data.

IS JSON type constraints
  1. Accepts unquoted object keys: {a: 1}
  2. Not for character strings

Is json does not allow to check for the specific scalar types string, numeric, boolean or the null values (the SQL/JSON Path item method .type() supports that).

Negation can be done with is not json. Here it has to be noted that is not json object returns true for non-JSON data as well as for valid JSON data that does not have an object at the top level.

Formats and Dialects

SQL uses the JSON format defined by IETF RFC 8259. This JSON format allows a single JSON object to contain the very same key value multiple times:0

{"a": 42,
 "a": 1}

Therefore, the is json predicate accepts such objects as valid JSON. For environments that require a more rigid validation the standard provides the with unique [key] option that causes is json to treat objects with duplicate keys as invalid JSON.

IS JSON type constraints

Further, the SQL standard defines a syntax to specify another data format such as BSON or AVRO by postfixing the tested expression with the format clause. In absence of a format clause format json is implied with refers to the the RFC 8259 format.

IS JSON Format options
  1. Accepts unquoted object keys: {a: 1}
  2. Fails on non-json input

The SQL standard defines only one format: json. Other formats and their names are implementation-defined (IV180).

Alternatives

IS JSON Alternatives
Optional Features
Alternatives
Implementation-Defined Items
  • IV180,“The data format specified by <implementation-defined JSON representation option>”

Tutorials

Normative References

The <JSON predicate> is [not] json [value|array|object|scalar] is defined in ISO/IEC 9075-2:2023 as part as the optional feature T821, “Basic SQL/JSON query operators”.

… Is Json [Array|Object|Scalar]” by Markus Winand was originally published at Modern SQL.

]]>
The ANY_VALUE(…) Aggregate Functionhttps://modern-sql.com/caniuse/any_valuehttps://modern-sql.com/caniuse/any_value27 Feb 2024 00:00:00 GMTThe ANY_VALUE(…) Aggregate Function: New in SQL:2023 — and a better feature from 1999Full Oracle DB support since 19c. Full H2 support since 2.2.220. Full PostgreSQL support since 16. Full DuckDB support since at least 1.0.0. Faulty MySQL support since 5.7. Faulty BigQuery support since 2025-06-01.
  1. ⚡Might return null even if there are non-null values
  2. Not an aggregate function
  3. Without filter clause
  4. Not as window function (over clause)

The aggregate function any_value takes a non-null value from a group of rows.

SELECT product_id
     , ANY_VALUE(order_id) AS example_order_id
  FROM order_lines
 GROUP BY product_id

This query lists ordered product_ids, along with an example order_id for each product.

As with every aggregate function it accepts the filter and over clauses,0 if generally supported by the system.

Support for T626: ANY_VALUE
  1. ⚡Might return null even if there are non-null values
  2. ⚡Might return null even if there are non-null values • ⚡Not an aggregate function

Note that any_value is non-deterministic. It might pick another value every time—i. e. the query results are not reproducible. This is also true, and most surprising, in context of window functions:

SELECT ANY_VALUE(x) OVER()
  FROM …

The example uses any_value as window function “over everything”. In this case, every other aggregate function delivers the same result for every row, deterministically. But the SQL standard does not require this for any_value. Any_value might take different values for every row, even the over clause always covers the same rows. Having that said, none of my tests ever observed such a behavior yet.

Min and max happen to fulfill all requirements the standard puts upon any_value. Wherever you could use any_value, you can also use min or max. The main difference is that any_value could be implemented more efficiently as it doesn’t need to search for a specific value. It may just take the first non-null value that comes along.

Finally, I’d like to address another proclaimed use-case of any_value: Coping with the missing implementation of feature T301, “Functional dependencies”. Consider this example:

CREATE TABLE customers (
 id   BIGINT PRIMARY KEY,
 name VARCHAR,
 […]
)
CREATE TABLE orders (
 customer BIGINT REFERENCES customers(id),
 placed   TIMESTAMP,
 […]
)
SELECT customers.*, MAX(placed)
  FROM customers
  LEFT JOIN orders
         ON customers.id = orders.customer
 GROUP BY customers.id

The query lists all customers and when they placed their last order. Note that it selects all columns from the customers table—thus also the name column. However, the group by has the customers.id column only. Is that even valid?


If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or get training.


It is. Logically, as well as according to the SQL standard. The crucial point is that customer names are functionally dependent on the customers primary key (id). For every group of rows for which the customer.id is the same, customer.name must be the same as well. The SQL standard takes great care to document known functional dependencies—but leaves it up to the implementation to consider them or not. T301, “Functional dependencies” is an optional feature.

Full PostgreSQL support since 9.1. Full MySQL support since 5.7. Faulty SQLite support since at least 3.5.7. Faulty MariaDB support since at least 5.1. Faulty H2 support since at least 1.4.192.
  1. ⚡Doesn’t fail if there is no functional dependency (per default)

As you can see above, this aspect of SQL is widely neglected by the vendors. Some systems raise an error, which means that it don’t implement this optional feature. Some other systems (⚡) just ignore the problem and allow selecting any column—even if there is no functional dependency. Their users cannot notice that there might be something wrong with the query. These systems effectively imply any_value.

While the query above it totally fine, it might fail (lack of T301) or accidentally work correctly (implying any_value). Only two out of nine tested systems give the desired answer for the right reason. These are not exactly the odds I’d like to take.

SQL-92 offered two options to rewrite this query so that it always works correctly: (1) use min or max on every functionally dependent columns that are not mentioned in group by; (2) extend the group by clause by these columns. The new any_value function of SQL:2023 is just a third variant of option 1). Frankly speaking, I find none of them very user friendly. Even worse: They do not protect us from mistakes such as accidentally putting a column from the orders table into the group by clause.1

In 1999, the real solution was added to the SQL standard: the (optional) feature T301, “Functional dependencies”. Thus, I’d like to close this post with an appeal to the vendors: Please go the extra mile and implement T301 rather than T626 (any_value). This is what users expect: no effort, maximum security. Even the performance impact should be low considering it mostly affect the prepare phase. Big kudos to the MySQL team for implementing it in 5.7!

The only downer is that functional dependencies require proper constraints. Systems that lack them, such as data warehouses, cannot benefit from them. Here it might be desirable to have a function that validates that there is only one distinct non-null value. Maybe unique_value(… [null|error] on error)? Or not enforced trusted constraints like in Db2?

The ANY_VALUE(…) Aggregate Function” by Markus Winand was originally published at Modern SQL.

]]>
The Curious Case of the Aggregation Queryhttps://modern-sql.com/blog/2023-12/curious-case-aggregation-queryhttps://modern-sql.com/blog/2023-12/curious-case-aggregation-query05 Dec 2023 00:00:00 GMTThe Curious Case of the Aggregation Query

An interesting query made the rounds recently. The related blog post by Justin Jaffray compares four queries, one of them catching everybody’s attention. I invite you to have a look for yourself. First, the table definitions:

CREATE TABLE aa (a INT);
INSERT INTO aa VALUES (1), (2), (3);
CREATE TABLE xx (x INT);
INSERT INTO xx VALUES (10), (20), (30);

There is not trap in there: two tables containing three rows each. Just note that column a belongs to the table aa and x to xx. Now the mysterious query:

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa

What does it return?

Even if you look carefully, chances are that you’ll get it wrong—just like I did.


If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or get training.


Later in the article we will see that SQL implementations don’t even agree about the correct result. Some return the values 3, 6 and 9 in a table of three rows. While this makes sense, it is not what the SQL standard says.

The standard-conforming result of the query is a single row with the value 6, and this is what most tested systems return. Surprised? You are not alone. To clarify how it works I will provide some quotes from the current SQL standard (SQL:2023) as well as older releases to help us track down the origin of this behavior.

The bottom line is that aggregate functions (actually set function specifications) do not necessarily belong to the select in which are they written. In the example above, the expression sum(a) does not belong to the inner query, but to the outer one. The query actually means the following:

SELECT (SELECT sum_a
          FROM xx
         LIMIT 1
       )
  FROM (SELECT sum(a) AS sum_a
          FROM aa
       ) nested

The subquery in the from clause does the aggregation first. The select clause subquery just refers to the other query’s result. Now it is also clear why there is a limit clause: Without this clause, the subquery returns all three rows from xx so that the query fails—conforming systems do exactly that.

The striking question is of course: What the heck is going on here? This question is actually meant to be two questions in one: (1) What is the wording in the standard that leads to this behavior? (2) Why was it ever specified in this way?

The first question can be answered by looking into the current standard. Specifically, into ISO/IEC 9075-2:2023 §6.9 SR 6:

The aggregation query of a <set function specification> SFS is determined as follows.

Case:

  1. If SFS has no aggregated column reference […]

  2. Otherwise, the innermost qualifying query of the aggregated column references of SFS is the aggregation query of SFS.

The first sentence introduces the term aggregation query. This term is then used to specify which rows an aggregate function processes—and also where to add group by () if needed.0 Case “a” does not apply to our example as there is an aggregated column reference in the sum function, namely a. Case “b” is what we’re here for: it defines the innermost qualifying query as the aggregation query. Qualifying queries are, in turn, those in which the from/join clause introduces a table that is referred to by an argument of the aggregate function.1 Let’s have a look at the original query again.

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa

The only column reference in the sum function refers to a column of the table aa introduced by the outer from clause. Thus, the sum(a) expression belongs to the outer select. Now that the outer query is an aggregation query that has no explicit group by clause, group by () is effectively added so that the final result has a single row.

On the other hand, there is no aggregate function referring to the inner query. This means that the inner query is not an aggregation query and there is no group by () implied! The inner query, which is supposed to return no more than one row, basically returns all three rows of the xx table. It is just the limit clause preventing it.2

Now, that we know how the standard specifies this behavior, let’s look which products follow the specification.

Full PostgreSQL support since at least 8.3. Full MySQL support since 5.1. Full Db2 (LUW) support since at least 9.7. Full MariaDB support since at least 5.1. Full SQLite support since 3.8.3. Full DuckDB support since at least 1.0.0. Partial SQL Server support since at least 2008R2. Faulty Oracle DB support since at least 11gR1. Faulty H2 support since at least 1.4.191. Faulty BigQuery support since at least 2020-03-28.
  1. Aggregation query is always the one that contains the set function
  2. Outer reference must be the only argument (doesn’t support F441)

Only half of the tested products implement the aggregation query in a standard-conforming way. That makes me wonder if the standard should introduce an implementation defined item here. However, in the chart it is also notable that some systems changed their behavior in the past. I checked older releases of the standard and found that the current wording was apparently introduced in 2003. This is at least what MySQL bug 27333 suggests. The bug report complains about MySQL’s behavior and refers to “the SQL2003 draft: (section 6.9)”. The report also paraphrases the above-quoted specification of the aggregation query.

SQL:1999 was worded entirely differently in this regard. The term aggregation query did not exist. I couldn’t even find any other sections that would result in the behavior of the current standard. Going even further back in time, to the well-known draft of SQL-92 that can easily be found online, there was yet another wording. The absence of an explicit group by clause did not imply group by () if needed. Instead, there was a general rule—not a syntax rule—that took care of doing the implicit grouping if necessary. ISO/IEC 9075:1992 §7.9 GR 1ai says:

If the <select list> contains a <set function specification> that contains a reference to a column of T [or …], then [… and] the result of the <query specification> is a table consisting of 1 row.

Emphasis and omissions are mine. Note that the outer select still contains the sum function because the term contains is defined recursively.3 For your convenience, the query again.

SELECT (SELECT sum(a) FROM xx LIMIT 1) FROM aa

As the outer select list contains a <set function specification>, namely sum, the result is a single row. So it seems that this behavior has been there since 1992 at least. As I don’t have access to any even older editions, my archaeological excursion ends here.

Oh, just one more thing…why is it that SQL Server is so picky about outer references in aggregates? Changing the query to sum(a + x) yields error message 8124: Multiple columns are specified in an aggregated expression containing an outer reference. If an expression being aggregated contains an outer reference, then that outer reference must be the only column referenced in the expression.

After further review of the current standard, it turns out it has this limitation. However, implementing the optional feature F441 lifts it. Quoting ISO/IEC 9075-2:2023 §10.9 CR 8:

Without Feature F441, “Extended set function support”, conforming SQL language shall not contain a <value expression> simply contained in a <general set function> that contains a column reference that is an outer reference where the <value expression> is not a column reference.

The standards wording is even more restricting than SQL Server’s error message: in the presence of an outer reference, it rules out everything else—not just other column references. Out of curiosity I checked SQL:1992 again and found something interesting in ISO/IEC 9075:1992 §6.5 SR 4:

If the <value expression> contains a <column reference> that is an outer reference, then that outer reference shall be the only <column reference> contained in the <value expression>.

In SQL:1992, there was precisely the same limitation as expressed by SQL Server’s current error message.

Now, that the relation of SQL Server’s restriction to the standard is also clarified, there remains only one ultimate question: Why the heck was this behavior ever introduced?

The Curious Case of the Aggregation Query” by Markus Winand was originally published at Modern SQL.

]]>
SEARCH DEPTH/BREADTH FIRSThttps://modern-sql.com/caniuse/search_(recursion)https://modern-sql.com/caniuse/search_(recursion)18 Sep 2023 00:00:00 GMTSearch depth/breadth first: Compute an Extra Column to Order the Result of Recursive QueriesFull Oracle DB support since 11gR2. Full PostgreSQL support since 14.
  1. Minor deviation: contents of <sequence column>

The search clause of with recursive creates a column that allows sorting the result in depth-first or breadth-first order.

The search clause follows immediately after a recursive with element—even before a cycle clause. The primary order is specified by the depth first or breadth first keywords. The following, mandatory by clause takes a list of expression to define the per-level order. Finally, the set clause expects the name of the newly created column, the so-called sequence column.

WITH RECURSIVE
     categories (category_id, parent, name)
  AS ( SELECT *
         FROM product_categories
        WHERE parent IS NULL
    UNION ALL
       SELECT pc.*
         FROM categories AS parent
         JOIN product_categories pc
           ON pc.parent = parent.category_id
     )
SEARCH DEPTH FIRST BY name SET seq_col
SELECT category_id, name
  FROM categories
 ORDER BY seq_col

Note that the search clause does not define the order in which rows are returned by the recursive query.0 Instead it provides a column that can be used in an order by clause.

The example above traverses a hierarchy, such as the one shown below, from the root node to the leaf nodes. The order by clause in the last line uses seq_col generated by the search clause to actually establish the specified order. Note that seq_col is really just a column name. You can also put it into the select clause to see what it contains—or just select *. Consequently, the order by clause can also use other sort criteria and asc/desc and nulls first/last modifiers.

In either case, depth-first or breadth-first, the rows of the initial leg of the recursive query get the lowest value in the sequence column. That means, ascending ordering—as used in the example above—returns them first. Specifically, that is the root node: All Goodies.

The next item is, also in either case, the first child of the second level. Note that this is Brands, not Types, due to the by name clause that defines the per-level order.


If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or get training.


The order or the remaining nodes differs between depth-first and breadth-first. Depth-first provides the deeper nodes first. That are, Modern SQL and Use The Index, Luke, in this order. Only then the sibling Types follows along with its children.

Breadth-first, on the other hand, first returns the Types node before any node of the third level. Then all nodes of the third level follow in the order specified in the by clause. Specifically: Books, Modern SQL, Mugs & co, Stickers, Use The Index, Luke. Note that siblings are not kept together. The children of Brands and Types are just intermixed due to the ordering by name.

Notable Extensions

Standard SQL does not support the asc/desc or nulls first/last modifiers in the by clause of search.1

SEARCH DEPTH FIRST BY name DESC SET seq_col

That means it is not possible to just reverse the per-level order in standard SQL. As this is a pretty reasonable requirement, some systems do support it nonetheless.

Contents of the Sequence Column

The SQL standard defines the search clause as a syntactic transformation that puts a row-value (breadth-first) or an array of row-values (depth-first) into the sequence column. Sorting based on these row-values yields the specified order.2 A consequence of this is that the just mentioned modifiers are not supported as they cannot be put into a row-value.3 From that perspective it seems to be no coincidence that systems that supports these modifies do not populate the sequence column in the way the standard describes it, but with a simple ordinal number.

Normative References

The search clause is defined in ISO/IEC 9075-2:2023 §7.18 as part of optional feature T131, “Recursive query”.

Search depth/breadth first” by Markus Winand was originally published at Modern SQL.

]]>
GREATEST(), LEAST()https://modern-sql.com/caniuse/greatest-leasthttps://modern-sql.com/caniuse/greatest-least29 Jun 2023 00:00:00 GMTGreatest(), least(): Largest or Smallest Value of ArgumentsFull MySQL support since at least 5.0. Full Oracle DB support since at least 11gR1. Full Db2 (LUW) support since at least 9.7. Full MariaDB support since at least 5.1. Full BigQuery support since at least 2023-06-22. Full H2 support since 2.2.222. Partial PostgreSQL support since at least 8.3. Partial SQL Server support since 2022. Partial DuckDB support since at least 1.0.0.
  1. ⚡Non-conforming null handling
  2. Alternative: scalar min and max functions

The SQL functions greatest and least take one or more arguments✓✗ and return the largest or smallest value if none of the arguments is null.✓✗

SELECT GREATEST(1, 2, 3)
  FROM …

Greatest and least are scalar functions, not aggregate functions. Unlike min and max they operate on one row at a time.

Null Treatment

The SQL standard requires greatest and least to return null in case one argument is null. This is different from aggregate functions, in particular min and max. While min and max ignore null values and return the smallest or largest value from the remaining set, greatest and least respect null values in so far as the presence of a single null value means that the highest or lowest value cannot be determined without doubt.

As you can see in the chart, the standard SQL null treatment is not always applied. This is no surprise because greatest and least were often supported decades before they were added to the SQL standard in 2023.

Personally, I think that standardizing greatest and least with strict a respect nulls semantic was not the best decision. The SQL standard is anyway not followed very strictly. Demanding new behavior that contradicts decades-old implementations0 makes it even worse because existing implementations will probably not adopt to the new standard. As both ways to treat null have been in the wild for a very long time, neither way should have become standard. It’s just another implementation-defined behavior.1


If you like this page, you might also like …

… to subscribe my mailing lists, get free stickers, buy my book or get training.


Going forward I hope that users will have the choice. For example, respect nulls and ignore nulls specifications could be added similar to those for some window functions.

LEAST(…) [ RESPECT NULLS | IGNORE NULLS ]

The filter clause, known from aggregates, would be more generic but it doesn’t solve the problem of long-standing implementations that ignore null values.

Row Values

In standard SQL, a value that compares to any other value as unknown cannot be the result of greatest or least.

Applying this to the following examples means that ⓵ and ⓶ return the first value because there is no other value that could be even smaller.

⓵ LEAST( ROW(1   ,    1),  ROW(1,    2) )
⓶ LEAST( ROW(1   , null),  ROW(2, null) )
⓷ LEAST( ROW(1   , null),  ROW(1, null) )
⓸ LEAST( ROW(null,    1),  ROW(9,    2) )

If you’re wondering about ⓶, keep in mind that range comparisons on row values are fully determined as soon as a prefix gives a definite answer. This is not the case for example ⓷; even though both values are not distinct from each other, they are still not equal as per the normal comparison rules of SQL. A less-then-or-equals (<=) comparison of these two values yields unknown. The existence of an even smaller value cannot be ruled out for either of the values so that least returns the null value. The same applies to example ⓸. As the first element of one value is null, all comparisons to that value return unknown. Consequently, it is impossible that there is any value that is undoubtedly smaller-or-equal to all other values. Least returns null. These are just consequences of the general behavior of null in SQL: null represents uncertainty, uncertainty propagates.

Again, it is no surprise that old systems don’t conform to the new standard. Those few tested systems that do support row values in greatest and least implement a sort-based logic—i.e., they treat null elements as largest or smallest possible values. Whether it is a very large or very small value varies among systems, very much like the default null ordering in the order by clause (ID133). Luckily, they do it the same way for greatest/least as they do it for order by.

  1. Configurable: set default_null_order = [NULLS_FIRST | NULLS_LAST | NULLS_FIRST_ON_ASC_LAST_ON_DESC | NULLS_LAST_ON_ASC_FIRST_ON_DESC]

Limits

The SQL standard requires greatest and least to have one or more arguments. In practice, some systems do not support the (nearly useless) one-argument case. The upper bound is typically beyond my tests.

Alternatives

Some SQL dialects have scalar min and max functions that take multiple arguments and return the lowest or highest value.

  1. Respects null values: one null value makes the result null
  2. Non-conforming handling of collations
  3. Respects null values: one null value makes the result null • Non-conforming handling of collations

Normative References

The greatest and least functions form the optional feature T054, “GREATEST and LEAST”.

Greatest(), least()” by Markus Winand was originally published at Modern SQL.

]]>