Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

general regex/grammar matching on TaggedSentences #9

Open
creswick opened this issue Jan 23, 2014 · 0 comments
Open

general regex/grammar matching on TaggedSentences #9

creswick opened this issue Jan 23, 2014 · 0 comments

Comments

@creswick
Copy link
Owner

I want to support matching on patterns of strings / POS tags with expressions similar to NLTK's, for example:

grammar = r"""
  NP: {<DT|PP\$>?<JJ>*<NN>}   # chunk determiner/possessive, adjectives and nouns
      {<NNP>+}                # chunk sequences of proper nouns
"""

But with a few changes:

  • I don't want to use the string representation (this seems like a /great/ place for an eDSL)
  • I want support for literal strings, literal tokens, and tags.
  • I'd like to build this on an API that essentially does regex parsing, but by taking a sequence of equality relations across tokens, rather than strict equality. So, instead of expressing the pattern in terms of specific tokens, you can express it in terms of predicates.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant