-
-
Notifications
You must be signed in to change notification settings - Fork 2
Type level parsing of raw sql #17
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Hey there, yes, I still use this for https://mero.chat so it is maintained. The library is older than visible type applications so yeah I am open to anything that could speed up compilation/simplify the API. |
Cool, I'll make a PR this week 👍 |
Nice! |
Started taking a look this morning 🌞 It's tricky because the proxies are used to build up lots of stuff. For example, there's no clean VTA way to turn The VTAs would need to apply to even larger types, meaning you'd need
The endgame is that you wind up basically reconstructing an SQL parser at the typelevel, which is a stone's throw away from parsing a typelevel string containing SQL. The latter could be pretty fun - that's how, for example, I made Anyway, I digress. While VTAs are cool, they create a yak that just gets hairier and hairier the more you try to shave it. What do you think? |
So your plan was to use type applications instead of proxies for the selected columns? While currently every column must have a name the only place that actually requires aliases is when joining. This is a gigantic hack so maybe it could work if you moved all the logic to the last minute checks in ToQuery.
I remember trying this at first and it being way too slow to compile (and harder to compose). Maybe now it is better? I had a revaluation of the current approach in my todo list for the semi near future. Breaking changes are fine as long as it is still type safe and (somewhat) composable. |
The compiler has improved a lot, pursx compiles reasonably fast. The nice thing about raw SQL these days is that models like GPT-4 and Claude are trained on bucketloads of it, so you go much faster if you can work with that. |
I dunno about llms, but I'd take sql strings just cos it is more familiar to write. I sketched an approach with type level strings queries and vtas for both table definitions and parameters. Whenever I have the time I will try and implement it but it might not be soon |
Nice! I thought a bit about it as well. Here's my initial musings... You can define a db's tables like so: type User = { id :: String, active :: Boolean, username :: String }
type Friendship = { id :: String, user1 :: From User "id", user2 :: From User "id" } And then construct a world that contains all of the tables as proof: db = world @(User : Friendship : Nil) -- typelevel list Then, at some point you ask GPT to construct your query: https://chatgpt.com/share/02e8d2b9-cf73-4ab6-9435-f09adfb1ebe3. Paste it in: res = query db @"""SELECT
u1.id AS user_id,
u1.username AS username,
u2.id AS friend_id,
u2.username AS friend_username
FROM
Friendship f
JOIN
User u1 ON f.user1 = u1.id
JOIN
User u2 ON f.user2 = u2.id
WHERE
u2.active = true;
"""
And `res` would automatically be of type `Array { user_id :: String, username :: String, friend_id :: String, friend_username :: String }` from introspecting the query.
GPT basically never gets these queries wrong, so it may be enough just to focus on the columns to construct the return type without needing to validate any of the other stuff. |
Actually, even what I'm proposing above may be overkill. If you do correct RAG, you don't even need that. Check out https://chatgpt.com/share/4289e5f6-72e0-450a-812e-e0fa8aa086e7 . It correctly generates: query db @(Array { id1 :: String, id2 :: String }) @"""
SELECT f.user1 as id1, f.user2 as id2
FROM Friendship f
JOIN User u ON f.user2 = u.id
WHERE u.active = true
""" At which point the PS framework can be super small. |
If you just need something to automatically parse query results into objects there already exists I was thinking of how to keep the interface type safe but switch out to type level strings. The table and columns and whatnot could stay as it is. I think the only new heavy lifting would be parsing the sql query string. |
Yeah, exactly. The table stuff wouldn't change much. The typelevel parser would be annoying for the reason writing any parser by hand is annoying. If you get it into an a typelevel ADT, validating it against a schema and generating the return type should be less painful. |
Do you mean querying the database schema? |
Yeah, for example if you parse the typelevel String into a typelevel ADT that has a SELECT statement of columns foo and bar from table baz, the next step (still at the typelevel) would be to look at the schema and verify there is in fact a table baz with columns foo and bar (and also check their types if there are where statements or joins). |
Ah okay, so how it already works. I thought you meant asking the actual database, which is a thing I saw some haskell library do |
Today I started putting together a little project to rewrite the HTML parser in Deku. It's going to be a full-fledged typelevel parser. When I finish it (maybe over the weekend) take it on a spin and see if it works with a rudimentary SQL ADT. If so, it'd be great if a single typelevel parser could be used for both projects. Here's the repo: https://github.com/mikesol/purescript-tldr There are a few tests already of the combinators and matchers. |
Nice! That looks great. I am a bit busy with easafe/purescript-flame#87 but after that I will try it for sure |
Just finished the initial work on the repo. It's completely undocumented, so whenever you have time to dip into it, let me know and we can sync on how it works. That'll be a good time for me to add docs and format error messages a bit better. There's tests for everything, though. A few things to know:
|
That was fast; incredible work. It should make it easier to write a more proper type checker too. How are you finding compilation times so far? |
The compilation times are manageable so far, the tests have lots of examples and compile reasonably fast in my IDE. |
Had a bit of time to stress test the lib with If you hover over Proxy (Success (Cons (TableDef (Proxy "foo") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "lastname")) SText) (Cons (ColumnDef (Column (Proxy "firstname")) SText) Nil)))) (Cons (TableDef (Proxy "bar") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "time_created")) SInt) (Cons (ColumnDef (Column (Proxy "nickname")) SText) Nil)))) (Cons (TableDef (Proxy "baz") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "username")) SText) Nil))) (Cons (TableDef (Proxy "qux") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "user1")) SText) (Cons (ColumnDef (Column (Proxy "user2")) SText) (Cons (ColumnDef (Column (Proxy "user3")) SText) (Cons (ColumnDef (Column (Proxy "user4")) SText) (Cons (ColumnDef (Column (Proxy "user5")) SText) Nil))))))) (Cons (TableDef (Proxy "goo") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "last_updated")) SInt) Nil))) (Cons (TableDef (Proxy "fooA") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "lastname")) SText) (Cons (ColumnDef (Column (Proxy "firstname")) SText) Nil)))) (Cons (TableDef (Proxy "barA") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "time_created")) SInt) (Cons (ColumnDef (Column (Proxy "nickname")) SText) Nil)))) (Cons (TableDef (Proxy "bazA") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "username")) SText) Nil))) (Cons (TableDef (Proxy "quxA") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "user1")) SText) (Cons (ColumnDef (Column (Proxy "user2")) SText) Nil)))) (Cons (TableDef (Proxy "gooA") (Cons (ColumnDef (Column (Proxy "id")) SInt) (Cons (ColumnDef (Column (Proxy "last_updated")) SInt) Nil))) Nil)))))))))) "") |
Thats pretty good. I will take a look sometime this week |
What solution are you currently using @mikesol ? I didnt forget about this, just havent had the time yet |
Do you mean for SQL? I'm using https://github.com/mikesol/oooooooooorrrrrrrmm . |
I was looking for a PS postgres library and this seems like it could be a really good fit! Is the library still maintained? If so, would you be open to using VTAs instead of proxies in some places?
The text was updated successfully, but these errors were encountered: