This is an experimental C++ library for rapidly building parsers. It is inspired by parser combinators in haskell such as attoparsec and, like those libraries, allows for the construction of fully fledged parsers in a few lines of code. There are however the following problems:
- It gets a little ugly when you want recursive grammars (but no more ugly than c++ normally is)
- Performance hasn't been tested yet. Not much effort has been put into performance yet
- It requires C++ 17
The library uses the LGPL-3.0 licence. The library can be used as a header only library which makes the functionality of LGPL the same as a BSD licence. For more info, see here
make && sudo make install
When building, don't forget to add -lparser
to your compiler involcation!
make install-header
. If this is how you intend on using the library,
you must include the library with:
#define HEADER_ONLY
#include <cpp_parser/parser.h>
// Optionally:
#undef HEADER_ONLY
You can build the demos with make demo
We are going to create a simple commandline parser. First we must include the header:
#include <cpp_parser/parser.h>
Next we want to define our parser. We have 4 different types of command we want to parse:
- Short flags, e.g.
-f
- Short arguments, e.g.
-a arg
or-aarg
- Long flags, e.g.
--flag
- Long arguments, e.g.
--argument arg
or--argument=arg
Translating this to the library is quite easy. We will start with the short options:
ParserT<char> ShortFlag = Char('-') >> AnyChar;
ParserT<std::tuple<char,std::string>> ShortArg = ShortFlag & AnyLit
| (ShortFlag >> Char(' ')) & AnyLit
;
The first line says that to parse a short flag, we parse a '-', discard it
and then parse any character. To parse a short argument, we parse a short flag
and then any string or we do the same but with a space between the short flag and the string.
The &
combinator saves the results of the parsers of either side in a tuple.
I could have written the last line in a number of different ways, my favorite being:
auto ShortArg = ShortFlag & ((Char(' ') | True) >> AnyLit)
But it essentially means the same thing.
We now will do the same for long arguments:
ParserT<std::string> LongFlag = Lit("--") >> AnyLit
ParserT<std::tuple<std::string,std::string>> LongArg =
LongFlag & ((Char(' ') | Char('=')) >> AnyLit)
It works in the same way as the one before but with a double dash and
AnyLit
instead of AnyChar
.
To simplify the next task, we are going to make a datatype that any flag or arg can be placed in:
struct CLOption {
std::string name;
bool hasArg;
std::string arg;
};
We shall now make some functions that convert the output of the parsers to this type:
auto mkSFlag = [](char c){
return (CLOption){std::string(1,c) ,false,""};
};
auto mkSArg = [](std::tuple<char,std::string>t){
return (CLOption){std::string(1,std::get<0>(t)), true, std::get<1>(t)};
};
auto mkLFlag = [](std::string s) {
return (CLOption){s, false, ""};
};
auto mkLArg = [](std::tuple<std::string,std::string>t) {
return (CLOption){std::get<0>(t), true, std::get<1>(t)};
};
Next we will use fmap
to convert all the parsers to parser that output
something of type CLOption
and join them together.
auto Option = fmap<std::tuple<std::string,std::string>,CLOption>(mkLArg, LongArg)
| fmap<std::string,CLOption>(mkLFlag)(mkLFlag, LongFlag)
| fmap<std::tuple<char,std::string>,CLOption>(mkSArg, ShortArg)
| fmap<char,CLOption>(mkSFlag, ShortFlag)
;
Now Option
is our finished parser! But what does that mean? Option
is
a function that takes a string_view
as an input and outputs something of type
ParserRet<T>
. Luckily, instead of digging into this type, you can use the parser
like this:
std::optional<CLOption> result = Run(Option(some_string_view));
Below is a short reference. If it doesn't provide enough details, you can consult the source. It's only one file and it isn't long!
This is the return type for a parser unit.
parserRet<T> = std::optional<std::tuple<T, std::string_view>>
Optional means that a parser can fail. The first element of the tuple is what was parsed. The second tuple is the part of the string that hasn't been consumed.
The parser type:
ParserT<T> = std::function<ParserRet<T>(std::string_view)>;
A parser is a function from a string to a ParserRet
Just an alias for a list
Many<T> = std::list<T>;
This is a helper for when using monadic bind. You need to cast lambdas in order for the code to compile. Lambda types are long and so this helper provides some relief!
M<A,B> = std::function<ParserT<B>(A)>
Returns the result of a parse or a default value:
T Get(ParserRet<T> result, T def)
Like Get
but returns optional values instead of a default value when handling failure:
std::optional<T> Run(ParserRet<T> result)
A << B
created a parser out of the parsers A
and B
. The behaviour is the same as
running parser A
followed by B
. If either fail, the new parsers failes. Else,
return the result of parser A.
ParserT<A> operator<< (const ParserT<A>& l, const ParserT<B>& r)
Consider the example that parsers an integer followed by a space:
Parser<Int> intSpace = Integer << Char(' ');
A >> B
creates a parser out of the parsers A
and B
. The behaviour of the new parser is equivalent
to running A
. If it fails, we fail. Else we discard the result and run parser B.
ParserT<B> operator>> (const ParserT<A>& l, const ParserT<B>& r)
This is the alternative combinator. A | B
creates a parser that out of the parsers
A
and B
. The behavior is the same as running A
. If it succeeds, return the output,
else we return B
.
ParserT<T> operator| (const ParserT<T>& l, const ParserT<T>& r)
A & B
creates a parser out of the parsers A
and B
. It means run A
then B
storing the output of
both in a tuple.
ParserT<std::tuple<A,B>> operator& (const ParserT<A>& l, const ParserT<B>& r)
Useful way of constructing complex parsers without the need to extract the raw tuples as you go along!
ParserT<B> operator>>= (ParserT<A> xm, M<A,B> f)
See network_proto.cpp
for an example.
fmap(f, A)
means create a new parser from function f
and parser A
that is equivilent to applying
the function f
on the result of running A
.
ParserT<B> fmap(std::function<B(A)> f, const ParserT<A>& r) {
Tag(P, t)
creates a parser out of parser P
and value t
that when run,
returns the result of P
ina tuple with value t
ParserT<std::tuple<A,T>> Tag(const ParserT<A>& p, T tag)
Replace(P,t)
creates a parser put of parser P
and value t
that whe n run,
will replace the output of parser P
with t
if successful.
Runs a parser over and over again until it fails.
ParserT<Many<T>> many(const ParserT<T> & p)
Example: We want to parse a comma seporated list.
auto SpaceChars = Char(' ') | Char('\t') | Char('\n') | Char('\r');
auto WhiteSpace = many(SpaceChars);
auto ListItem = // Your parser for a list item
auto InnerList = Char(',') >> WhiteSpace >> ListItem;
auto cons = [](std::tuple<Item,Many<Item>> i) {
auto [head,tail] = i;
tail.push_front(head);
return tail;
};
auto List = fmap<std::tuple<Item,Many<Item>>>(cons, ListItem & many(InnerList));
same as many, but fails instead of returning the empty list.
Parser that always fails
ParserT<bool> False
Parser that always succeeds
ParserT<bool> True
Not(P)
creates a parser that fails if P
succeeds and succeeds if P
fails.
ParserT<bool> Not(const ParserT<T>& p)
Const(t)
creates a parser that consumes no input, always succeeds and retuns t
.
ParserT<T> Const(T t)
Parses the provided character
ParserT<char> Char(char c)
Parses the provided string literal
ParserT<std::string> Lit(std::string lit)
Parses any character
ParserT<char> AnyChar
Parses any Character that is a letter
ParserT<char> Alpha
A set of characters that aren't parses by AnyLit
ParserT<char> Special
Any String that doesn't contain special
ParserT<std::string> AnyLit
Takes a function, pred
, as input that takes in a char
and returns a bool
. It parses the char c
iff pred(c) == true
ParserT<char> Satisfy(std::function<bool(char)> pred)
Takes a function, pred
, as input that takes in a char
and returns a bool
. It consumes the input string while pred(s[i]) == true
ParserT<std::string> TakeWhile(std::function<bool(char)> pred)
Same as TakeWhile
but fails if the result is of length 0
Takes an integer length as input. Consumes that length of string.
ParserT<std::string> Take(int len)
Parses a single digit into a character
ParserT<char> DigitC
Parses a single digit to an int
.
ParserT<int> Digit;
Parses a natural number
ParserT<int> Natural
parses an integer
ParserT<int> Integer