PHP implementation of Lexical Analyzer.
Warning This is not a GENERATOR like classical lex is. It does not produce any php code. It's a simple plain scanner of the given input string and tokenizer into given set of tokens by matching regular expressions. Thus, at runtime you can change the token definition and use one same code for any token set.
Tokens are defined with TokenDefinition
class that holds token name and regular expression. Token name can be
empty, and in that case, lexer will ignore/skip such tokens.
The lexer configuration holds a list of all token definitions. With LexerArrayConfig
it can be easily created from an array
where keys are regular expressions and values are names of tokens.
Lexer's static method scan($config, $input)
can be used to scan given input string and return an array of tokens.
Instance of the Lexer
class be used to walk trough scanned tokens with single look-ahead token.
It's similar in API to doctrine/lexer, just tokens are defined and scanned differently, w/out
the need for recognizing the token type/name from the tokenized value - rather the token type/name is given by the same TokenDefn
that gave the regex to recognize the token.
<?php
$config = new LexerArrayConfig([
'\\s' => '',
'\\d+' => 'number',
'\\+' => 'plus',
'-' => 'minus',
'\\*' => 'mul',
'/' => 'div',
]);
// static scan method that returns an array of
$tokens = Lexer::scan($config, '2 + 3');
array_map(function ($t) { return $t->getName(); }, $tokens); // ['number', 'plus', 'number']
// lexer instance
$lexer = new Lexer($config);
$lexer->setInput('2 + 3');
$lexer->moveNext();
while ($lexer->getLookahead()) {
print $lexer->getLookahead()->getName();
$lexer->moveNext();
}