Skip to content

[Epic]: Google Summer of Code 2025 Correlated Subquery Support #16059

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
5 tasks
irenjj opened this issue May 15, 2025 · 2 comments
Open
5 tasks

[Epic]: Google Summer of Code 2025 Correlated Subquery Support #16059

irenjj opened this issue May 15, 2025 · 2 comments
Labels
enhancement New feature or request

Comments

@irenjj
Copy link
Contributor

irenjj commented May 15, 2025

This ticket tracks the progress for a 2025 Google Summer of Code (GSOC) sponsored project on Correlated Subquery Support

Project Documentation

Is your feature request related to a problem or challenge?

DataFusion currently has limited support for correlated subqueries. This project aims to implement comprehensive support for correlated subqueries in Apache DataFusion by applying Hyper's 'Unnesting Arbitrary
Queries'
framework.

Timeline:

Except from the Official GSOC Timeline:

  • May 8 - June 1: Community Bonding Period | GSoC contributors get to know mentors, read documentation, get up to speed to begin working on their projects
  • June 2: Coding officially begins!
  • July 18: Mid term evaluation
  • August 25: Final week
  • Sep 8: Final evaluations due / wrap up

Work

Epics tracking technical work:

Other potential future work

Related work:

Related documentation

Newer research that might be interesting

@irenjj irenjj added the enhancement New feature or request label May 15, 2025
@alamb alamb changed the title [Epic]: Enhance Correlated Subquery Support [Epic]: Google Summer of Code 2025 Correlated Subquery Support May 15, 2025
@alamb
Copy link
Contributor

alamb commented May 15, 2025

@irenjj could you attach your GSOC proposal to this issue so it is public? I think you did quite a good job on that and it would be great to make it more public

@alamb
Copy link
Contributor

alamb commented May 18, 2025

Let's plan to use this ticket for high level project planning / coordination, and will use other tickets for specific technical work

I update the description on this ticket with some more detail and the GOSC timeline, along with some other related work I know about. If anyone else has suggested readings we can add to the list that would be great

@irenjj and I had a discussion today and covered the following items:

  1. As described above, @irenjj plans to first focus on support for multi-level subqueries Nested correlated subquery error with a depth exceeding 1 #15558 (hopefully we'll be able to collaborate with @suibianwanwank and @duongcongtoai for this)
  2. It is likely that supporting multiple subquery decorrelation (e.g. arbitrary levels of correlation) will require a more general subquery decorrelation framework (which we will discuss in more detail on related tickets)
  3. One of the major lessons for working with DataFusion and open source is figuring out how to work with the broader community (among other things, you never know who is reading an issue or who might be interested in an update)
  4. We also hope to write a blog post about correlated subquery support, as the final report: Blog about DataFusion correlated subquery support #16084

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants