Skip to content

Commit b1b8299

Browse files
KyleAMathewsclaude
andauthored
fix: validate against duplicate collection aliases in subqueries (#719)
* docs: investigation of same collection alias bug in subqueries Investigated Discord bug report where using the same collection with the same alias in both a subquery and main query causes: 1. Empty results when subquery has joins 2. Aggregation cross-leaking (wrong aggregated values) ROOT CAUSE: - Parent and subquery share the same input streams from allInputs when they use the same alias - IVM stream sharing causes interference between query contexts - Caching compounds the issue by reusing compilations with wrong bindings ATTEMPTED FIXES (all unsuccessful): - Input filtering: Doesn't solve the core issue of shared streams - Fresh caching: Breaks existing caching tests and doesn't fix the bug - Both approaches together: Still fails CONCLUSION: This requires a fundamental architectural change to support independent input streams per query context. Current workaround: wrap query in an extra from() layer to avoid alias conflicts. See INVESTIGATION_SAME_COLLECTION_BUG.md for full details. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix: validate against duplicate collection aliases in subqueries Implements validation to prevent the Discord bug where using the same alias for a collection in both parent query and subquery causes empty results or incorrect aggregation values. The issue occurs when both parent and subquery use the same alias to reference a collection directly (e.g., both use `vote: votesCollection`). This causes them to share the same input stream, leading to interference. Solution: Add validation that throws a clear DuplicateAliasInSubqueryError when this pattern is detected. Implementation: - New error: DuplicateAliasInSubqueryError with clear message - collectDirectCollectionAliases(): Collects only direct CollectionRef aliases - validateSubqueryAliases(): Checks for conflicts before compiling subqueries - Only validates DIRECT collection references, not QueryRef (subquery outputs) This allows legitimate patterns like: const sub = q.from({ issue: collection }) q.from({ issue: sub }) // OK - different scopes But prevents problematic patterns like: const sub = q.from({ x: coll }).join({ vote: voteColl }, ...) q.from({ vote: voteColl }).join({ x: sub }, ...) // Error! Users should rename the alias in either parent or subquery to fix. Fixes Discord bug reported by JustTheSyme. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * docs: clean up investigation doc and add changeset - Cleaned up INVESTIGATION_SAME_COLLECTION_BUG.md to focus on solution - Added PR_UPDATE.md with suggested title and body for PR #719 - Added changeset for the alias validation fix 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * chore: clean up temporary documentation files Removed investigation and PR update docs that were used during development. 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * fix: run duplicate alias validation before query optimization Moved the validation for duplicate collection aliases in subqueries to run before query optimization rather than during compilation. This prevents false positives from optimizer-generated subqueries while still catching user-written duplicate aliases. The optimizer can legitimately create internal subqueries that reuse aliases (e.g., for predicate pushdown), but users should not be allowed to write queries with conflicting aliases between parent and subquery. Changes: - Added validateQueryStructure() in compiler/index.ts that recursively validates the entire query tree before optimization - Removed validation logic from processFrom() and processJoinSource() in joins.ts - Fixed TypeScript type errors in discord-alias-bug.test.ts All tests pass (1403 tests, 0 type errors). 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> * refactor: improve alias validation test naming and descriptions Renamed test file from discord-alias-bug.test.ts to validate-aliases.test.ts and updated test descriptions to focus on expected behavior rather than referencing the Discord bug report. Changes: - Renamed: discord-alias-bug.test.ts → validate-aliases.test.ts - Updated describe block: "Alias validation in subqueries" - Updated test names to describe expected behavior: - "should throw DuplicateAliasInSubqueryError when subquery reuses a parent query collection alias" - "should allow subqueries when all collection aliases are unique" - Improved inline comments to be more descriptive - Simplified assertion comments Addresses feedback from @samwillis 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <[email protected]> --------- Co-authored-by: Claude <[email protected]>
1 parent f416231 commit b1b8299

File tree

4 files changed

+212
-0
lines changed

4 files changed

+212
-0
lines changed
Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,5 @@
1+
---
2+
"@tanstack/db": patch
3+
---
4+
5+
Validate against duplicate collection aliases in subqueries. Prevents a bug where using the same alias for a collection in both parent and subquery causes empty results or incorrect aggregation values. Now throws a clear `DuplicateAliasInSubqueryError` when this pattern is detected, guiding users to rename the conflicting alias.

packages/db/src/errors.ts

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -404,6 +404,22 @@ export class CollectionInputNotFoundError extends QueryCompilationError {
404404
}
405405
}
406406

407+
/**
408+
* Error thrown when a subquery uses the same alias as its parent query.
409+
* This causes issues because parent and subquery would share the same input streams,
410+
* leading to empty results or incorrect data (aggregation cross-leaking).
411+
*/
412+
export class DuplicateAliasInSubqueryError extends QueryCompilationError {
413+
constructor(alias: string, parentAliases: Array<string>) {
414+
super(
415+
`Subquery uses alias "${alias}" which is already used in the parent query. ` +
416+
`Each alias must be unique across parent and subquery contexts. ` +
417+
`Parent query aliases: ${parentAliases.join(`, `)}. ` +
418+
`Please rename "${alias}" in either the parent query or subquery to avoid conflicts.`
419+
)
420+
}
421+
}
422+
407423
export class UnsupportedFromTypeError extends QueryCompilationError {
408424
constructor(type: string) {
409425
super(`Unsupported FROM type: ${type}`)

packages/db/src/query/compiler/index.ts

Lines changed: 74 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,6 +3,7 @@ import { optimizeQuery } from "../optimizer.js"
33
import {
44
CollectionInputNotFoundError,
55
DistinctRequiresSelectError,
6+
DuplicateAliasInSubqueryError,
67
HavingRequiresGroupByError,
78
LimitOffsetRequireOrderByError,
89
UnsupportedFromTypeError,
@@ -99,6 +100,11 @@ export function compileQuery(
99100
return cachedResult
100101
}
101102

103+
// Validate the raw query BEFORE optimization to check user's original structure.
104+
// This must happen before optimization because the optimizer may create internal
105+
// subqueries (e.g., for predicate pushdown) that reuse aliases, which is fine.
106+
validateQueryStructure(rawQuery)
107+
102108
// Optimize the query before compilation
103109
const { optimizedQuery: query, sourceWhereClauses } = optimizeQuery(rawQuery)
104110

@@ -375,6 +381,74 @@ export function compileQuery(
375381
return compilationResult
376382
}
377383

384+
/**
385+
* Collects aliases used for DIRECT collection references (not subqueries).
386+
* Used to validate that subqueries don't reuse parent query collection aliases.
387+
* Only direct CollectionRef aliases matter - QueryRef aliases don't cause conflicts.
388+
*/
389+
function collectDirectCollectionAliases(query: QueryIR): Set<string> {
390+
const aliases = new Set<string>()
391+
392+
// Collect FROM alias only if it's a direct collection reference
393+
if (query.from.type === `collectionRef`) {
394+
aliases.add(query.from.alias)
395+
}
396+
397+
// Collect JOIN aliases only for direct collection references
398+
if (query.join) {
399+
for (const joinClause of query.join) {
400+
if (joinClause.from.type === `collectionRef`) {
401+
aliases.add(joinClause.from.alias)
402+
}
403+
}
404+
}
405+
406+
return aliases
407+
}
408+
409+
/**
410+
* Validates the structure of a query and its subqueries.
411+
* Checks that subqueries don't reuse collection aliases from parent queries.
412+
* This must be called on the RAW query before optimization.
413+
*/
414+
function validateQueryStructure(
415+
query: QueryIR,
416+
parentCollectionAliases: Set<string> = new Set()
417+
): void {
418+
// Collect direct collection aliases from this query level
419+
const currentLevelAliases = collectDirectCollectionAliases(query)
420+
421+
// Check if any current alias conflicts with parent aliases
422+
for (const alias of currentLevelAliases) {
423+
if (parentCollectionAliases.has(alias)) {
424+
throw new DuplicateAliasInSubqueryError(
425+
alias,
426+
Array.from(parentCollectionAliases)
427+
)
428+
}
429+
}
430+
431+
// Combine parent and current aliases for checking nested subqueries
432+
const combinedAliases = new Set([
433+
...parentCollectionAliases,
434+
...currentLevelAliases,
435+
])
436+
437+
// Recursively validate FROM subquery
438+
if (query.from.type === `queryRef`) {
439+
validateQueryStructure(query.from.query, combinedAliases)
440+
}
441+
442+
// Recursively validate JOIN subqueries
443+
if (query.join) {
444+
for (const joinClause of query.join) {
445+
if (joinClause.from.type === `queryRef`) {
446+
validateQueryStructure(joinClause.from.query, combinedAliases)
447+
}
448+
}
449+
}
450+
}
451+
378452
/**
379453
* Processes the FROM clause, handling direct collection references and subqueries.
380454
* Populates `aliasToCollectionId` and `aliasRemapping` for per-alias subscription tracking.
Lines changed: 117 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,117 @@
1+
import { beforeEach, describe, expect, test } from "vitest"
2+
import { createLiveQueryCollection, eq } from "../../src/query/index.js"
3+
import { createCollection } from "../../src/collection/index.js"
4+
import { mockSyncCollectionOptions } from "../utils.js"
5+
6+
type Lock = { _id: number; name: string }
7+
type Vote = { _id: number; lockId: number; percent: number }
8+
9+
const locks: Array<Lock> = [
10+
{ _id: 1, name: `Lock A` },
11+
{ _id: 2, name: `Lock B` },
12+
]
13+
14+
const votes: Array<Vote> = [
15+
{ _id: 1, lockId: 1, percent: 10 },
16+
{ _id: 2, lockId: 1, percent: 20 },
17+
{ _id: 3, lockId: 2, percent: 30 },
18+
]
19+
20+
function createTestCollections() {
21+
return {
22+
locksCollection: createCollection(
23+
mockSyncCollectionOptions<Lock>({
24+
id: `locks`,
25+
getKey: (lock) => lock._id,
26+
initialData: locks,
27+
autoIndex: `eager`,
28+
})
29+
),
30+
votesCollection: createCollection(
31+
mockSyncCollectionOptions<Vote>({
32+
id: `votes`,
33+
getKey: (vote) => vote._id,
34+
initialData: votes,
35+
autoIndex: `eager`,
36+
})
37+
),
38+
}
39+
}
40+
41+
describe(`Alias validation in subqueries`, () => {
42+
let locksCollection: ReturnType<
43+
typeof createTestCollections
44+
>[`locksCollection`]
45+
let votesCollection: ReturnType<
46+
typeof createTestCollections
47+
>[`votesCollection`]
48+
49+
beforeEach(() => {
50+
const collections = createTestCollections()
51+
locksCollection = collections.locksCollection
52+
votesCollection = collections.votesCollection
53+
})
54+
55+
test(`should throw DuplicateAliasInSubqueryError when subquery reuses a parent query collection alias`, () => {
56+
expect(() => {
57+
createLiveQueryCollection({
58+
startSync: true,
59+
query: (q) => {
60+
const locksAgg = q
61+
.from({ lock: locksCollection })
62+
.join({ vote: votesCollection }, ({ lock, vote }) =>
63+
eq(lock._id, vote.lockId)
64+
)
65+
.select(({ lock }) => ({
66+
_id: lock._id,
67+
lockName: lock.name,
68+
}))
69+
70+
return q
71+
.from({ vote: votesCollection }) // Reuses "vote" alias from subquery
72+
.join({ lock: locksAgg }, ({ vote, lock }) =>
73+
eq(lock._id, vote.lockId)
74+
)
75+
.select(({ vote, lock }) => ({
76+
voteId: vote._id,
77+
lockName: lock!.lockName,
78+
}))
79+
},
80+
})
81+
}).toThrow(/Subquery uses alias "vote"/)
82+
})
83+
84+
test(`should allow subqueries when all collection aliases are unique`, () => {
85+
const query = createLiveQueryCollection({
86+
startSync: true,
87+
query: (q) => {
88+
const locksAgg = q
89+
.from({ lock: locksCollection })
90+
.join(
91+
{ v: votesCollection }, // Uses unique alias "v" instead of "vote"
92+
({ lock, v }) => eq(lock._id, v.lockId)
93+
)
94+
.select(({ lock }) => ({
95+
_id: lock._id,
96+
lockName: lock.name,
97+
}))
98+
99+
return q
100+
.from({ vote: votesCollection })
101+
.join({ lock: locksAgg }, ({ vote, lock }) =>
102+
eq(lock._id, vote.lockId)
103+
)
104+
.select(({ vote, lock }) => ({
105+
voteId: vote._id,
106+
lockName: lock!.lockName,
107+
}))
108+
},
109+
})
110+
111+
const results = query.toArray
112+
113+
// Should successfully execute and return results
114+
expect(results.length).toBeGreaterThan(0)
115+
expect(results.every((r) => r.lockName)).toBe(true)
116+
})
117+
})

0 commit comments

Comments
 (0)