Fix recursive flattening failure with combined primary and foreign key by Copilot · Pull Request #2312 · cynkra/dm

Copilot · 2025-08-01T05:13:31Z

Rewrite recursive flatten loop to track column equivalences instead of using regex pattern matching
Remove unused missing_right variable and flawed regex-based column remapping
Expand test to skip_if_src_not(c("df", "duckdb"))
Fix test comment about expected columns (join keys from RHS are dropped by dplyr)
All 1849 tests pass

💡 You can make Copilot smarter by setting up custom instructions, customizing its development environment and configuring Model Context Protocol (MCP) servers. Learn more Copilot coding agent tips in the docs.

…ign key Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>

Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>

Copilot

Pull request overview

Fixes a failure in dm_flatten_to_tbl(.recursive = TRUE) when an intermediate table column is simultaneously a primary key and a foreign key, by changing how recursive joins are executed and adding a regression test for the reported model shape.

Changes:

Reworked recursive flatten join execution from a reduce2() pipeline to an explicit loop with pre-join column checks/mapping.
Added a test reproducing the combined PK/FK recursive flatten failure (issue #2234).

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 5 comments.

File	Description
`R/flatten.R`	Alters recursive flatten join strategy and adds logic intended to map renamed/missing join columns.
`tests/testthat/test-flatten.R`	Adds regression coverage for recursive flattening when a column is both PK and FK.

Copilot · 2026-02-20T10:52:29Z

+  if (squash && length(order_df$name) > 0) {
+    # For recursive flattening, use an iterative approach to handle 
+    # cases where columns serve as both PK and FK
+    result <- tbl_impl(prep_dm, start)
+
+    for (i in seq_along(order_df$name)) {
+      table_to_join <- ordered_table_list[[i]]
+      join_by <- by[[i]]
+
+      # Additional safety check: verify join columns exist
+      result_cols <- colnames(result)
+      left_cols <- names(join_by)
+      right_cols <- colnames(table_to_join)
+      join_right_cols <- unname(join_by)
+
+      # Check if all required columns exist
+      missing_left <- setdiff(left_cols, result_cols)
+      missing_right <- setdiff(join_right_cols, right_cols)
+
+      if (length(missing_left) > 0) {
+        # This is where the error likely occurs
+        # For recursive joins, we might need to handle column name mapping
+        # Look for disambiguated versions of missing columns
+        mapped_join_by <- join_by
+        for (missing_col in missing_left) {
+          # Look for pattern like "original_name.table_suffix"
+          pattern_matches <- result_cols[grepl(paste0("^", missing_col, "\\."), result_cols)]
+          if (length(pattern_matches) > 0) {
+            # Replace the missing column name with the disambiguated version
+            # Use the first match if multiple exist
+            names(mapped_join_by)[names(mapped_join_by) == missing_col] <- pattern_matches[1]
+          }
+          # If no pattern matches found, keep the original name and let the join fail with a clear error
+        }
+        join_by <- mapped_join_by
+      }
+
+      result <- join(result, table_to_join, by = join_by)
+    }


The recursive-join fix only tries to remap missing LHS join columns to disambiguated variants like c.<suffix>. In the reported scenario (and in the new test), the column is missing because the previous join drops the RHS join key when by maps different names (e.g. b = c), so there is no c (or c.*) to remap. As a result, the subsequent join can still fail with the same "Join columns in x must be present" error. Consider instead (a) preserving RHS join keys (keep = TRUE) when they are needed for later joins and then dropping them once no longer needed, or (b) tracking equivalence from earlier by specs (e.g. c is equivalent to b after the b = c join) and rewriting later by clauses accordingly.

Replaced the regex-based approach with proper column equivalence tracking. After each join where lhs_col != rhs_col, we record rhs_col → lhs_col. Before each subsequent join, any missing LHS keys are remapped through this equivalence map. This correctly handles the reported case without needing keep = TRUE. (commit 8efe212)

Copilot · 2026-02-20T10:52:29Z

+          # Look for pattern like "original_name.table_suffix"
+          pattern_matches <- result_cols[grepl(paste0("^", missing_col, "\\."), result_cols)]


grepl(paste0("^", missing_col, "\\."), result_cols) treats missing_col as a regex. Column names can contain regex metacharacters (e.g. +, (, [), which can make the match incorrect and cause wrong join key mapping. Prefer a literal prefix check such as startsWith(result_cols, paste0(missing_col, ".")) (or otherwise escape missing_col for regex).

Suggested change

# Look for pattern like "original_name.table_suffix"

pattern_matches <- result_cols[grepl(paste0("^", missing_col, "\\."), result_cols)]

# Look for pattern like "original_name.table_suffix" using a literal prefix match

pattern_matches <- result_cols[startsWith(result_cols, paste0(missing_col, "."))]

The regex-based approach has been removed entirely. The fix now uses column equivalence tracking, which avoids any string pattern matching altogether. (commit 8efe212)

Copilot · 2026-02-20T10:52:30Z

+      # Check if all required columns exist
+      missing_left <- setdiff(left_cols, result_cols)
+      missing_right <- setdiff(join_right_cols, right_cols)
+


missing_right is computed but never used, which makes the new join loop harder to follow. Either remove it or use it to produce a clearer error when the RHS join columns are missing.

Removed missing_right along with the rest of the old approach. (commit 8efe212)

Copilot · 2026-02-20T10:52:30Z


+test_that("`dm_flatten_to_tbl(.recursive = TRUE)` works with combined primary and foreign key", {
+  # Test for issue where a column serves as both PK and FK
+  skip_if_src_not("df")  # Only test on data frame source for now


This test is currently restricted to the df backend, but the flattening code path is shared across sources and this file already runs flatten tests for duckdb too. Consider expanding this to skip_if_src_not(c("df", "duckdb")) (or whatever set is supported) so the regression is caught for other backends.

Suggested change

skip_if_src_not("df") # Only test on data frame source for now

skip_if_src_not(c("df", "duckdb"))

Updated to skip_if_src_not(c("df", "duckdb")). (commit 8efe212)

Copilot · 2026-02-20T10:52:30Z

+  # Expected result should have all columns from x, y, and z
+  expected <- tibble(
+    a = 1L,
+    b = 2L, 
+    d = 3L,
+    f = 4L
+  )


The comment says the expected result should have all columns from x, y, and z, but expected omits the key columns (c and e). This is a bit misleading—either update the comment to reflect that join keys are dropped in the flattened output, or include the missing columns in expected if they are supposed to be present.

Updated the comment to: "Join keys from the RHS (c and e) are dropped by dplyr; only non-key columns remain." (commit 8efe212)

Run: https://github.com/cynkra/dm/actions/runs/22221025863

krlmlr · 2026-02-20T11:17:30Z

@copilot apply changes based on the comments in this thread . Also fix now failing tests.

Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>

Copilot · 2026-02-20T11:26:56Z

Just as a heads up, I was blocked by some firewall rules while working on your feedback. Expand below for details.

Warning

Firewall rules blocked me from connecting to one or more addresses (expand for details)

I tried to connect to the following addresses, but was blocked by firewall rules:

captive.apple.com
- Triggering command: /opt/R/4.5.2/lib/R/bin/exec/R /opt/R/4.5.2/lib/R/bin/exec/R -q -e testthat::test_local(reporter~+~=~+~"check") (dns block)

If you need me to access, download, or install something from one of these locations, you can either:

Configure Actions setup steps to set up my environment, which run before the firewall is enabled
Add the appropriate URLs or hosts to the custom allowlist in this repository's Copilot coding agent settings (admins only)

krlmlr · 2026-02-21T11:04:31Z

Too big and too unclear.

Initial plan

03c4707

Copilot AI assigned Copilot and krlmlr Aug 1, 2025

Copilot started work on behalf of krlmlr August 1, 2025 05:13 View session

Copilot AI and others added 2 commits August 1, 2025 05:32

Implement fix for recursive flattening with combined primary and fore…

28117fb

…ign key Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>

Refine fix for recursive flattening and add comprehensive testing

3127a64

Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>

Copilot AI changed the title ~~[WIP] Recursive flattening failure with combined primary and foreign key~~ Fix recursive flattening failure with combined primary and foreign key Aug 1, 2025

Copilot AI requested a review from krlmlr August 1, 2025 05:36

Copilot finished work on behalf of krlmlr August 1, 2025 05:36

Merge branch 'main' into copilot/fix-2234

c9b2670

krlmlr requested a review from Copilot February 20, 2026 10:45

Copilot started reviewing on behalf of krlmlr February 20, 2026 10:45 View session

Copilot AI reviewed Feb 20, 2026

View reviewed changes

chore: Auto-update from GitHub Actions

9d6cf4a

Run: https://github.com/cynkra/dm/actions/runs/22221025863

Copilot started work on behalf of krlmlr February 20, 2026 11:17 View session

fix: use column equivalence tracking for recursive flatten joins

8efe212

Co-authored-by: krlmlr <1741643+krlmlr@users.noreply.github.com>

Copilot finished work on behalf of krlmlr February 20, 2026 11:27

Merge branch 'main' into copilot/fix-2234

ab949fa

krlmlr closed this Feb 21, 2026

		# Look for pattern like "original_name.table_suffix"
		pattern_matches <- result_cols[grepl(paste0("^", missing_col, "\\."), result_cols)]

	skip_if_src_not("df") # Only test on data frame source for now
	skip_if_src_not(c("df", "duckdb"))

Conversation

Copilot AI commented Aug 1, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

Copilot AI Feb 20, 2026

Choose a reason for hiding this comment

Uh oh!

krlmlr commented Feb 20, 2026

Uh oh!

Copilot AI commented Feb 20, 2026

I tried to connect to the following addresses, but was blocked by firewall rules:

Uh oh!

krlmlr commented Feb 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Milestone

Development

Uh oh!

3 participants

Copilot AI commented Aug 1, 2025 •

edited

Loading