-
Notifications
You must be signed in to change notification settings - Fork 0
Lexical Structure
soup only accepts ASCII encoding, although not all ASCII characters are necessarily represented in a valid token.
"Whitespace" consists of any combination of one or more of the following four ASCII characters:
-
' '
: ASCII character0x20
, also known as "Space" -
'\t'
: ASCII character0x9
, also known as "Horizontal Tab" -
'\n'
: ASCII character0xA
, also known as "Line Feed" or "New Line" -
'\r'
: ASCII character0xD
, also known as "Carriage Return"
"Comments" begin with two consecutive forward slashes //
and continue until the end of the line (i.e. the next "New Line" \n
). A comment can begin at any point in a file and has a higher precedence than any other construct in the language (for example, a comment cannot be included inside a string, as the rest of the line will be ignored).
An "Identifier" is an unlimited-length sequence of alphabetic characters, digits, and underscores. An identifier cannot start with a digit. soup also has 11 reserved words (see Reserved Words) plus the boolean literals true
and false
, and none of these cannot be reused as identifiers. The following are examples of valid identifiers:
identifier
iden_tifier
_ID
iD
_
id_1
I2d
And the following are examples of invalid identifiers:
3id
return
false
An "Integer Literal" consists of one or more digits and can only be represented in base 10. For example, an integer literal of length 2 or more beginning with 0
is not an octal number, it is simply the decimal number without the leading zero(s). The following are examples of integer literals:
0
02
1309242463024963
Note that the definition of an integer literal does not include an optional hyphen -
to indicate negative numbers. Negative numbers are represented in the compiler as a unary minus operator applied to an integer literal.
A "String Literal" consists of zero or more characters contained within two quotation marks "
. The following escape characters are supported:
\b
\f
\t
\r
\n
\'
\"
\\
As mentioned above, two consecutive forward slashes //
can not be contained in a string because the compiler will interpret that as the beginning of a comment.
A "Boolean Literal" is one of the following, formed from ASCII characters:
true
false
A "Reserved Word" is one of the following, formed from ASCII characters:
int
bool
void
if
else
while
break
return
func
returns
main
An "Operator" is one of the following, formed from ASCII characters:
+
+=
-
-=
*
*=
/
/=
%
%=
<
<=
>
>=
=
==
!
!=
&&
||
A "Separator" is one of the following ASCII characters:
(
)
{
}
;
,
An "End Of File" does not correspond to an ASCII character, it is simply appended to the list of tokens at the very end of the scanning portion of compilation, to check for errors such as a string not being closed.