Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

auto-join #64

Open
maxigit opened this issue Apr 27, 2014 · 2 comments
Open

auto-join #64

maxigit opened this issue Apr 27, 2014 · 2 comments

Comments

@maxigit
Copy link

maxigit commented Apr 27, 2014

This is not a issue, but a question about an eventual feature.

I've seen a system a couple of years ago which was doing "auto-joins" in a really elegant way. I've been thinking of implementing it for a while and I wonder if Esqueleto is the right tool to do so.

The main idea is, except for a few exception, joins between tables are always the same. So for a given "context", you can deduce the joins which are needed only from the fields which are involved in the request.

The concept of "context" is the key thing here. Instead of relying on foreign keys to do auto-join, a "context" defines a DAG of tables and for each query corresponds to a subset of this graph.

Example, I have some Users, Comments (made by a User) and Items (purchased by a User).

Classical queries would be :

  • All users having purchased this items (join User and Items)
  • All User having made a such comment (join User and Comments)
  • All Items purchased by a user having made a such comment (Join User And Items And Comments)

The 2 first joins are a subset of the last one.

In theory, I could solve my problem by writing a giant view like :

SELECT * 
FROM user
JOIN item ON (user.id  = item.user_id)
JOIN comment ON (user.id = comment_id)

And then write all of my queries using this giant view like :

SELECT user
FROM giant_view 
WHERE comment = 'hello'

SELECT user
FROM giant_view
WHERE item = 'map'

SELECT item
FROM giant_view
WHERE comment = 'hello'

This almost works but there is a lot of problems, the main one is results get spoiled by table which haven't been used. If a user has bought 3 items, then he will appears 3 times in the first query. That can be solved by using DISTINCT but is probably inefficient. Then if the user hasn't bought any thing. It won't appear at all even if he made an 'hello' comment. That also can be solved by doing a outer join etc ....
It seems that one static view doesn't work, but a dynamically tweaked on could work.

Also, we need different "contexts" (or different giant views) because we need an acyclic graph and the database schema can have cycle. A "context" is then maximum acyclic graph of the overall one.

Anyway, generating the sub view covering a set of field seems relatively easy, but to get this list of field, there are two options 👍

  • parsing the raw SQL itself
  • or having an EDSL.

Parsing the raw SQL can be messy and I'm not sure it can be done easily in a type-safe way.
If I'm using an EDSL, I'll probably better use an existing one : there is HaskellDB and Esqueleto.

HaskellDB seems dead (am I wrong ?) and not flexible enough (you have to specify the table)
Esqueleto is clever enough to figure out tables itself from the field and even cross join, so it seems to be the perfect candidate for this project.

So my question is, does anyone think this can be done relatively easily as an Esqueleto extension, or I am mistaken and the current structure of esqueleto won't allow me to do this.

@meteficha
Copy link
Member

It's hard for me to tell you if you're going to be able to create what you want on top of esqueleto because it's not clear to me exactly what needs to be done and how it could be done.

AFAIK, HaskellDB is not dead, it simply doesn't have as much users as persistent does. These two libraries have very different philosophies and it may be the case that you may be able to implement your idea on both of them. In the end, however, whether you prefer writing the rest of your query on HaskellDB or esqueleto is a different matter, so I think you should focus your effort based on what library you pretend to use.

I don't think parsing the SQL is going to be a good solution, especially because there are legitimate reasons to do a join and discard one of the tables from the results. You'd definetely need to distinguish both cases.

@meteficha
Copy link
Member

This is related to #67.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants