Skip to content

fix: preserve subquery structure when unparsing SubqueryAlias over Ag…#21099

Open
yonatan-sevenai wants to merge 1 commit intoapache:mainfrom
yonatan-sevenai:main
Open

fix: preserve subquery structure when unparsing SubqueryAlias over Ag…#21099
yonatan-sevenai wants to merge 1 commit intoapache:mainfrom
yonatan-sevenai:main

Conversation

@yonatan-sevenai
Copy link

@yonatan-sevenai yonatan-sevenai commented Mar 22, 2026

When the SQL unparser encountered a SubqueryAlias node whose direct child was an Aggregate (or other clause-building plan like Window, Sort, Limit, Union), it would flatten the subquery into a simple table alias, losing the aggregate entirely.

For example, a plan representing:
SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m

would unparse to:
SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would call subquery_alias_inner_query_and_columns (which only unwraps Projection children) and unparse_table_scan_pushdown (which only handles TableScan/SubqueryAlias/Projection). When both returned nothing useful for an Aggregate child, the code recursed directly into the Aggregate, merging its GROUP BY into the outer SELECT instead of wrapping it in a derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union), emit it as a derived subquery via self.derive() with the alias always attached, rather than falling through to the recursive path that would flatten it.

Which issue does this PR close?

Rationale for this change

The SQL unparser silently drops subquery structure when a SubqueryAlias node directly wraps an Aggregate (or Window, Sort, Limit, Union). For example, a plan representing

SELECT ... FROM j1 JOIN (SELECT max(id) FROM j2) AS b ...

unparses to

SELECT ... FROM j1 JOIN j2 AS b ...

losing the aggregate entirely. This produces semantically incorrect SQL.

What changes are included in this PR?

In the SubqueryAlias handler within select_to_sql_recursively (datafusion/sql/src/unparser/plan.rs):

  • Added an early check: if the SubqueryAlias's direct child is a plan type that builds its own SELECT clauses (Aggregate, Window, Sort, Limit, Union) and cannot be reduced to a table scan, emit it as a derived subquery (SELECT ...) AS alias via self.derive() instead of recursing into the child and flattening it.
  • Added a helper requires_derived_subquery() that identifies plan types requiring this treatment.

Are these changes tested?

Yes. A new test test_unparse_manual_join_with_subquery_aggregate is added that constructs a SubqueryAlias > Aggregate plan (without an intermediate Projection) and asserts the unparsed SQL preserves the MAX() aggregate function call. This test fails without the fix. All current unparser tests succeed without modification

Are there any user-facing changes?

No.

…gregate

When the SQL unparser encountered a SubqueryAlias node whose direct
child was an Aggregate (or other clause-building plan like Window, Sort,
Limit, Union), it would flatten the subquery into a simple table alias,
losing the aggregate entirely.

For example, a plan representing:
  SELECT j1.col FROM j1 JOIN (SELECT max(id) AS m FROM j2) AS b ON j1.id = b.m

would unparse to:
  SELECT j1.col FROM j1 INNER JOIN j2 AS b ON j1.id = b.m

dropping the MAX aggregate and the subquery.

Root cause: the SubqueryAlias handler in select_to_sql_recursively would
call subquery_alias_inner_query_and_columns (which only unwraps
Projection children) and unparse_table_scan_pushdown (which only handles
TableScan/SubqueryAlias/Projection). When both returned nothing useful
for an Aggregate child, the code recursed directly into the Aggregate,
merging its GROUP BY into the outer SELECT instead of wrapping it in a
derived subquery.

The fix adds an early check: if the SubqueryAlias's direct child is a
plan type that builds its own SELECT clauses (Aggregate, Window, Sort,
Limit, Union), emit it as a derived subquery via self.derive() with the
alias always attached, rather than falling through to the recursive
path that would flatten it.
@github-actions github-actions bot added the sql SQL Planner label Mar 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

sql SQL Planner

Projects

None yet

Development

Successfully merging this pull request may close these issues.

SQL Unparser loses subquery structure when SubqueryAlias directly wraps an Aggregate (PR to follow)

1 participant