Simple Typescript Parser Combinator library
The goal of this project is to make simple parsers more readable and
declarative. No more indexing into a RegExpExecArray
and hope you grabbed the
right group.
There are many techniques for writing parsers, but what make parser combinators so powerful are
- Readability
- Reusability
- the ability to transform while matchings
Finally, this library is capable of parsing languages which cannot normally be parsed by regular expressions.
Consider this simple example with date strings. We want to match dates in the
YYYY-MM-DD
format. Using regular expressions we might write something like the
following.
const re = /^(\d{4})-(\d{2})-(\d{2})/
function parseDate(source: string) {
const match = re.exec(source);
if (match === null) {
return null;
}
const [, year, month, day] = match;
return { year, month, day }
}
This is fairly readable and maintainable.
But now suppose we want to match not just dates like "2021-03-26"
, but also
dates like "2021/03/26"
.
Ok no problem, just a small update.
const re = /^(\d{4})(-|\/)(\d{2})(-|\/)(\d{2})/
function parseDate(source: string) {
const match = re.exec(source);
if (match === null) {
return null;
}
const [, year, , month, , day] = match;
return { year, month, day }
}
A little bit harder to read, but still ok.
But we have introduces a small problem. Now we match strings like "2021-03/26"
and "2021/03-26"
! Maybe this is ok and maybe it isn't. How would we update our
regular expression to make sure these are not accepted?
const re = /^(\d{4})((-(\d{2})-(\d{2}))|(\/(\d{2})\/(\d{2})))/;
function parseDate(source) {
const match = re.exec(source);
if (match === null) {
return null;
}
const [, year, , , dashMonth, dashDay, , slashMonth, slashDay] = match;
return { year, month: dashMonth || slashMonth, day: dashDay || slashDay };
}
Now we really have a maintainance problem.
The situation can be improved somewhat with an explicit RegExp
constructor and
template strings. Something like this.
const year = /\d{4}/.source;
const month = /\d{2}/.source;
const day = /\d{2}/.source;
const re = new RegExp(`^(${year})((-(${month})-(${day}))|(\\/(${month})\\/(${day})))`)
But what happens if we want to support more separators? How much time would we spend trying to figure out how this works in a year from now?
Now compare this to using ts-combinator
.
Here is the simpler date parser that matched "2021-03-26"
and "2021/03/26"
, but
also "2021-03/26"
.
type Year = { year: string };
type Month = { month: string };
type Day = { day: string };
type DateObject = Year & Month & Day;
const yearParser = map(
(yearDigits): Year => ({ year: yearDigits.join("") }),
sequence(singleDigit(), singleDigit(), singleDigit(), singleDigit()),
);
const monthParser = map(
(monthDigits): Month => ({ month: monthDigits.join("") }),
sequence(singleDigit(), singleDigit()),
);
const dayParser = map(
(dayDigits): Day => ({ day: dayDigits.join("") }),
sequence(singleDigit(), singleDigit()),
);
const simpleDateParser = map(
([year, , month, , day]): DateObject => ({ ...year, ...month, ...day }),
sequence(
yearParser,
oneOf(exact("-"), exact("/")),
monthParser,
oneOf(exact("-"), exact("/")),
dayParser,
),
);
With the map
function, we can transform as we match and we get type safety
practically for free!
And the more complex date parser that ensures separator consistency?
const dateParser = map(
([year, [, month, , day]]): DateObject => ({ ...year, ...month, ...day }),
sequence(
yearParser,
oneOf(
sequence(exact("/"), monthParser, exact("/"), dayParser),
sequence(exact("-"), monthParser, exact("-"), dayParser),
),
),
);
And adding new separators?
We can write our own combinator!
const monthDayWithSeparator = (
sep: Parser<string>,
month: Parser<Month>,
day: Parser<Day>,
) => sequence(sep, month, sep, day);
const dateParser = map(
([year, [, month, , day]]): DateObject => ({ ...year, ...month, ...day }),
sequence(
yearParser,
oneOf(
monthDayWithSeparator(exact(":"), monthParser, dayParser),
monthDayWithSeparator(exact("/"), monthParser, dayParser),
monthDayWithSeparator(exact("-"), monthParser, dayParser),
),
),
);
Finally, the best part is that when a match fails, instead of just null
, we
can get nicer error messages.
dateParser.parse("2021-03/27");
// Error at (line: 1, column: 8)
// Expected "-" but got "/" instead
//
// 2021-03/26
// ^
We can also get finer control over parsing and error messages with the
conditional
combinator. See the Conditionals documentat
for more details.
A combinator is a (usually higher-order) function which only refers to its arguments.
For example
const eval = (f, x) => f(x);
is a combinator. But something like
const eval = (f, x) => f(x, y);
is not, since y
is not an argument of eval
.
A famous example is the "Y-combinator".
const y = f => (x => f(x(x)))(x => f(x(x)));
However, I use the term "combinator" quite loosely, since the point of using TypeScript is to get both the benefits of parser combinators, while still having the flexibility of writing functions which are not technically combinators.
3.0.0 : 2021-05-21
- Conditional parser
- Updates to index-based system
- Adds better error messages
- Parse result format
2.2.0 : 2021-04-02
- documentation
2.1.0 : 2021-04-01
lazy
combinator- Casing converter example
2.0.0 : 2021-04-01
Maybe
type- proper
maybe
combinator
- Renamed
maybe
tooptional
1.0.0 : 2021-03-27
- Number parsers
maybe
combinator
- Updated changelog section to match format from "keep a changelog"
0.1.0 : 2021-03-26
- Changelog
- Initial release!
- Some atomic parsers
- Basic Combinators to parse regular languages