Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chris martin's seven parser types #202

Open
jamesdbrock opened this issue Jul 7, 2022 · 4 comments
Open

Chris martin's seven parser types #202

jamesdbrock opened this issue Jul 7, 2022 · 4 comments

Comments

@jamesdbrock
Copy link
Member

Something to aspire to:

https://twitter.com/chris__martin/status/1545102338360107008?t=MnxHlQ4zc6fzL28jX_IOzg&s=09

@chris-martin
Copy link

chris-martin commented Jul 8, 2022

I've went ahead and made my in-progress Haskell project public so you can have a look.

A few things that might be worth reading:

There are actually only two different type definitions, Any and Sure. The other distinctions aren't guaranteed by construction, they're just promises you have to make when you apply the constructors or coerce. This is surprising, because you'd expect e.g. Query to be only a Read operation on the cursor, not a State operation. But this is a streaming library, and so everything (even parsers that are not supposed to move the cursor) still have to be able to modify the state because they may need to buffer more input to produce their result. It might be possible to get really fancy with a more complex model to improve on this situation, but I'm not sure it's worth it. We still provide sufficient safety for the end user of the parsing library, who never uses the constructors directly anyway.

One noteworthy observation that I hadn't realized upfront when I started this is that traditional applicative/monadic style goes right out the window if you introduce this many types, for a few reasons:

  • I suspect you'd be constantly casting up to get two parsers to be the same type to be able to <*> or >>= them (like the experience of lifting monad transformers, but worse)
  • Move and AtomicMove are not Applicative, because they don't support pure.
  • To make Atom be Applicative, the sequencing operations would have to implicitly add backtracking, and I'm not sure that's a great idea.

To get back the convenience of monadic parsing, I defined my own >>=, <*>, etc. operators that make use of the KindJoin type family.

type family KindJoin (k1 :: ActionKind) (k2 :: ActionKind) :: ActionKind

type a :> b = KindJoin a b -- its infix alias is convenient sometimes

My parser API then has a handful of functions like this that replace the standard Applicative and Monad ones:

(>>=) ::
    Monad m => IsAction k1 => IsAction k2 => IsAction k3 => ActionJoin k1 k2 => k1 :> k2 ~ k3 =>
    Parser text k1 m a -> (a -> Parser text k2 m b) -> Parser text k3 m b

This isn't so bad if we make use of the QualifiedDo extension introduced in GHC 9.0, which allows the built-in do syntax to use our custom (>>=) et al. operators instead of the standard functions. (Does purescript have something like this?) You can see it in use in the test spec

Anyway, caveats aside, I've been obsessed with this for a week now, a good bit of it is working, and I think it shows promise 😄

Oof, I wrote a lot. Going to copy some of this text into discussions on my own repo.

@jamesdbrock
Copy link
Member Author

Thanks for the writeup @chris-martin !

(Purescript does have qualified do).

@chris-martin
Copy link

Update - I've still been noodling on this library, and seven is down to five.

I've removed the parser types that include a movement guarantee. I had no way to codify this property in the type definitions, and so it relied entirely on unsafe casts to indicate that I know that a parser will always advance if it succeeds. I had been hoping that just a few primitive parsers would need unsafe casts, and that the information would end up getting propagated upward through the KindJoin family, but this turned out not to be the case very often, and so it has just required inserting a bunch of unsafe casts everywhere to assert that parsers have guaranteed movement. And all for dubious benefit; it seems like the "many" combinator is essentially the only place where movement guarantee is needed.

We are left with...

Four core parser types:

  • Sure - read/write
  • SureQuery - read-only
  • Any - read/write, returns Either
  • Query - read-only, returns Either

And the composition:

  • Atom - a Query that returns a Sure

@jamesdbrock
Copy link
Member Author

Nice.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants