This repository houses the Nestable Readable Document Language. It is a JSON superset that was specifically written to be:
- Simple to implement.
- Easy to read.
- Good enough to replace YAML.
- Generic enough to be useful from any programming language.
- Featureful enough to support the functional languages, particularly those in the Lisp family (it supports keywords/symbols/atoms).
Join the Matrix channel (nrdl:matrix.org
)!
# What now brown cow
{
the-wind "bullseye"
the-trees false
the-sparrows his-eye
poem
# I don't know if you can hear me
|His eye
# or if
# you're even there
|is on
# I don't know if you can listen
|The sparrow
^
# to a gypsy's prayer
this-should-still-work 15.0
other
|And I know
|He's watching
|Over me
^
`force push` >I sing
>because
>I'm happy
^
"I am sysadmin" true
"I am webadmin" false
"you are so wrong" null
wendover [
{
so 1
much -10
gambling 100
but 1000
also -1000
apparently 10000
paramedics -10000
and 1.01
}
{
die in
a fire
}
15
|this
|that
^
>Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do
>eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim
>ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
>aliquip ex ea commodo consequat. Duis aute irure dolor in
>reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla
>pariatur. Excepteur sint occaecat cupidatat non proident, sunt in
>culpa qui officia deserunt mollit anim id est laborum."
^
]
}
NRDL is currently only implemented in Common Lisp. I would love to change this; see below under the "Contributing" header.
NRDL is available as a package in the OCICL package manager.
The system name is com.djhaskin.nrdl
, as is the package name. Using
package-local nicknames is recommended.
There are three functions of note: parse-from
, generate-to
,
nested-to-alist
.
parse-from
takes one argument, which is a stream, t
or nil
, just like
format
. It deserializes one NRDL value (which may be a compound value, such as
an object) from the stream and stops parsing at the point after the object ends
in the stream, much like read
does. It deserializes
objects into hash tables, arrays into lists, true
into t
, false
into
nil
, and null
into cl:null
, and straight up read
s numbers in (as long as
they validate as valid JSON numbers).
The function generate-to
serializes a compound data structure to a stream. It
has this signature:
(generate-to strm val &key (pretty-indent 0) json-mode)
strm
is a stream specifier, as aboveval
is a hash table, sequence, boolean, number, string, keyword, or any arbitrary combination of those value types.pretty-indent
is the number of spaces that should be used for each indentation level in a pretty-printed document.0
means the document will not be pretty printed, but will be minified-ish.json-mode
is a convenience parameter. It requests that the document printed out be a valid JSON document. This allows the NRDL library to be useful as both a JSON and NRDL (de)serializer.
The function nested-to-alist
is a convenience function. It takes an
arbitrarily nested structure of hash tables and sequences, and converts them to
a nested structure of alists and lists instead.
This is a language I made up because I am writing a command line tool framework and I need a language for it to both read configuration from and use also to write its output to standard output.
Normally, YAML would be perfect for this; however, in Common Lisp, YAML means using libyaml and the CFFI, which I wish to avoid.
I needed a language that has nestable documents and also have the language be pleasing to the eye, like is often done with YAML. However, I didn't want to spend too much time writing a document parser and printer. YAML is very complicated. I didn't need a lot of its features.
So I wrote NRDL. Nestable, because you can have nestable documents in it. Readable, because it's easy both to parse mechanically and is (or can be) pleasing to the eye.
NRDL is a superset of JSON.
There are specific additions to JSON that NRDL makes that are worth pointing out.
To please the eye, NRDL uses whitespace as a separator. However, it does not
use it as a signal of structure. The presence of whitespace is used to separate
keys and values, but indentation doesn't mean anything. For backwards
compatibility with JSON, the colon :
and comma ,
are considered whitespace
characters outside of strings. Braces and brackets are still used to delimit
arrays and objects. Objects must have an even-numbered number of values in it,
with every other value being keys and their complements being values. Keys can
be any type.
Comments exist. We need these. They start with a hash symbol #
and end with
the end of the line.
There are two types of multi-line strings in NRDL.
The first is a verbatim multi-line string. This string is created by first
taking a normal multiline string, and prefixing each line with an arbitrary
amount of whitespace and the pipe character |
. The string ends with a line
that only has arbitrary whitespace followed by the caret ^
.
For example, this multi-line string:
Once upon a midnight dreary
While I stumbled, weak and weary
Would become this:
|Once upon a midnight dreary
|While I stumbled, weak and weary
^
Note that each newline in the string needs its own pipe in order to be counted. So, this string:
"a\nb\nc\n"
Would need to be encoded like this, if pipe encoding were desired:
|a
|b
|c
|
^
The normal JSON notation "a\nb\nc\n"
can still be used if desired.
Comments can be interleaved in between the lines. So, this:
|a
# x
|b
# y
|c
# z
|
^
Is the same as the JSON string "a\nb\nc\n"
.
Note, you CANNOT put comments on the same line as multi-line comments. This
string is equivalent to the JSON string "a # x\nb # y\nc # z\n"
:
|a # x
|b # y
|c # z
|
^
This feature is to enable nested documents within NRDL. Most editors have
block-select and block-edit. You can select multiple lines within a file and
insert |
in the front of all of them pretty easily these days. This means
you don't actually have to edit the contents of the document; you can simply
insert a prefix to each line. This operation is easy and easily reversible, and
keeps things readable. These three goals could not be achieved using
backslashes.
Prose multi-line strings are exactly like verbatim strings, except that their line feed characters are replaced by a single space character.
So,
>a
>b
>c
^
Is equivalent to the following quoted string:
"a b c"
This is to enable a clean way to print long strings within a document. Editors have the ability to select a block of text and auto-wrap it into a paragraph, and can be prefix-aware as well, so that they can make paragraphs in comments.
This same capability of editors can easily be adapted to make it easy to
auto-wrap a long string with a >
prefix.
Symbols are another type of string introduced into NRDL in hopes of easing parsing and loading serialization into data structures without adding too much overhead or complexity.
They are "program strings", strings intended as bookkeeping help to the loading program, but not really intended to store user data.
Consider the following JSON structure:
{
"name": "Daniel Haskin",
"address": "<REDACTED>",
"eye_color": "BROWN"
}
The strings "name"
and "address"
don't actually have anything to do with the
data they are hauling. They are really just structure markers. In languages like
C and Golang, they are usually elided entirely, and are simply used to ensure
that "Daniel Haskin"
finds its way into the Name
member of a struct. Even
the string "BROWN"
is only one of a few possible values, and might be encoded
as the name of an enum member in a compiled language.
NRDL has its own data type for this use case. It is a symbol. This is a string
which can be encoded in one of two ways. The first is using backticks,
`
like this:
`name`
`address`
Being a string that names something, it is an error to have a symbol which is
empty. The expression ``
is not allowed.
The other is as barewords. The rules around bare words is that they must start with characters that ensure they don't get confused with numbers. You can't start with a dash or a dot for example. But you can start with pretty much anything else:
name
address
__dunder_address__
<tag>
+constant+
*very-important-concept*
/materialized/path
These are symbols.
Symbols should be deserialized using the following guidelines:
- If the symbol is
true
,false
, andnull
, these should be decoded into the appropriate equivalent concept in the language, as with JSON. This is for JSON backwards compatibility and to obey the principle of least surprise. - Otherwise, they should be decoded into a "special string":
- Compiled typed languages should attempt to resolve them into actual program names, such as struct member names, enumeration values, or names of constants. This category includes such languages as C, C++, Java, Golang, Rust, and Zig.
- Languages with a concept of a keyword (a.k.a. atom) datatype should decode NRDL symbols as keywords. This includes such languages as Common Lisp, Clojure, Janet, Elixir, and Ruby.
- Languages with the concept of a symbol as a datatype but not keywords should decode NRDL symbols as symbols. This includes members of the Scheme family of Lisps.
- Otherwise, if these concepts are unavailable or un-idiomatic, they can simply be decoded into normal strings without consequence. This category includes Python, Perl, PHP, etc.
Symbols can be used as keys or values, but they are especially useful as object keys.
- Easy to learn for users, with familiar syntax that matches what is used elsewhere.
- Easy enough to read that it is usable as a configuration language.
- Very easy to parse for implementers, requiring only single-character lookahead.
- Documents can be nested in other documents without changing their content or adding a bunch of backslashes everywhere, just like YAML.
- Comments work as good or better than they do in YAML, since they can be interleaved with multi-line strings and object in a way that is often impossible in YAML.
- Unlike YAML, it is syntactically specified. This allows editors to help with formatting and navigation. This is especially helpful when documents become large. It also solves odd corner cases where YAML comment's can't appear in specific places in the file without confusing the parser.
- Symbols as a data type is introduced, allowing for easier parsing in typed languages and keyword/symbol representation in Lisps and others.
- Many of the same disadvantages carry over from JSON:
- If a NRDL document is deserialized, it cannot be serialized again and still have it be the same byte-for-byte as it was before. This is because order of the keys is lost in the parsing process, as are comments.
- Care must be taken when desigining APIs because order is not guaranteed in objects.
- Infinity and NaN are not defined.
- Prose doesn't compress multiple whitespace characters at the end of the line into one character, as YAML does. This is to ease parsing and keep the language simple.
- The literals
`true`
andtrue
are equivalent, and should be treated the same as each other. The same goes for`false`
andfalse
, and`null`
andnull
. This is, again, in service of ensuring that implementing a parser is simple. - Adding a caret (
^
) at the end of multi-line strings is important in order to ensure that NRDL is an LL(1) language, ensuring implementation simplicity. It can also be mildly annoying though.
This format was the result of a lot of research into how other serialization languages solved the same problems which I wished to solve with NRDL. These earlier languages had serious advantages and represented rock-solid design decisions. They also have shortcomings, which is why NRDL was written.
From hindsight, JSON seems obvious -- or horrifying -- depending on who is asked. Having delved the depths of serialization, and beholding its horrors, I am in awe of the JSON standard. It strikes all the right compromises, particularly when it comes to number serialization. Its syntax is super familiar too. To a first approximation, all developers know what a curly brace and a square bracket means.
Its only shortcoming is that it is too simple to be used as a configuration file language. It has no comments and no verbatim multi-line strings. These are the "killer features" of YAML, and why YAML has taken over the configuration file space to date, despite its quirks.
Finally, it doesn't support keywords. Using a serialization language that doesn't support them from a programming language that does causes friction, while using a serialization language that does support them from a programming language that doesn't feels just fine.
NRDL addresses these shortcomings with its verbatim multi-line strings, hash line comments, and support for keywords via its symbols, while largely preserving its simple, ubiquitous syntax.
The YAML creators made something really special. Theirs was the first language I used which allowed me to just indent a JSON blob and have that document able to be passed as a verbatim string to other programs using its pipe multi-line strings. Kubernetes makes heavy use of this, and I can't imagine solving these kinds of ops problems without a language that made document embedding that easy. They also made a language that, in the wild, looks really nice, almost like I was reading a simple plain-text email.
The only real shortcoming of YAML is complexity. The language is too complex to implement easily. Many languages either simply import PyYAML or libyaml, both of which were implemented by the designers of YAML themselves, or they implement a subset of YAML. They do this because it is very hard to implement the full parser.
They can get away with this because almost no one uses all of YAML's features. Type tags are both useless and confusing. Typed languages don't need them; the type of the input values are generally known ahead of time upon deserialization. Untyped language parsers don't need them either. They can generally deduce the type based on JSON field kind.
I have seen anchors and refs in the wild, but they generally just cause confusion and headache. They are too complicated to reason about and don't really solve the problem they were intended to solve.
Many other minefields and problems exist because of YAML's complexity. The Norway problem, the Billion Laughs attack, the 63 different kinds of multi-line strings. Indeed, NRDL largely agrees with StrictYAML's critiques of YAML, with the exception that NRDL recognizes the value of providing some implicit typing, at least insofar as that provided by JSON's types and literals.
Whitespace as a structure marker is also a huge problem. Tabs are forbidden, but are indistinguishable from spaces. Indentation level works to a point, but if one key contains an embedded multi-line string and the next key at the same indentation level is off the page, it's very difficult without additional tooling to tell which indentation level to which the next key belongs. Some have jokingly resorted to using a carpenter's square for this problem. This problem crops up all the time in Kubernetes manifests, for example.
NRDL seeks to preserve the good parts of YAML -- verbatim and prose multi-line strings, comments, implicit typing to support dynamic languages, pleasing syntax with emphasis on whitespace, that "text email" feel -- without the bad parts. It does this by adding just enough syntax to provide YAML's killer features and nothing else.
This approach also helps NRDL achieve simplicity in parsing. Realizing that the
viability of a technology depends on its
virality, a major goal of
NRDL is to maintain its status as an LL(1)
language, requiring only
single-character lookahead in the parser implementation. The reference
implementation of NRDL is a simple recursive-descent parser that uses
peek-char
and
read-char
.
Most languages have these or similar. This ensures that NRDL can be read without
any buffering, but can deserialized directly. The decision of maintaining LL(1)
status is a design decision that, at one stroke, drastically simplifies the
workload of parser implementers. This aims to solve the hampered implementation
proliferation problem that we see with YAML.
If NRDL has good looks, it's because it took them in part from EDN. Its treatment of commas and colons as whitespace was, in particular, stolen outright. So also was the use of whitespace as delimiters, but not as structure markers. I have a lot of respect for Rich Hickey's ability to make code and data beautiful.
However, EDN is very Clojure-centric. Both symbols and keywords are supported,
greatly complicating the syntax without providing much usefulness. People rarely
use the symbol feature, opting instead to serialize with keywords. Within
keywords, only one forward slash is allowed. This is in accordance with
Clojure's syntax, but not with other languages in the Lisp family. The M
and
N
suffixes for numbers are rather JVM-specific, and pres-suppose both integer
and floating-point types, when many languages only provide support for 64-bit
floats, such as JavaScript and Janet. The struct tags
that EDN provides share many of the disadvantages within a configuration file
context which YAML tags have. Finally, the syntax is very unfamiliar to most
developers' eyes. The Lisp community is comfortable with semicolons as comment
prefixes, but this can be jarring from other perspectives. Configuration files
are meant to be used by a wide audience.
NRDL seeks to support the Lisp family in a practical way while also ensuring that users of NRDL configuration files don't need to write Lisp to understand the file. It provides a single symbol type, corresponding to atomic strings commonly found in data within the Lisp family. It also respects the design trade-off that JSON made to just specify "numbers" and have the target language figure out what that means. In doing these things, NRDL aims to be useful within the context of as many programming languages as possible, with special attention paid to the functional and Lisp languages.
Thanks for reading this far! If you are interested in joining the effort, I would be more than happy for the help. Thanks for your consideration.
Here are some things I'd like to see for NRDL some day:
- A logo. I am no graphic designer. If you make a logo and it looks pretty good, I would be happy to add it to the repo and the matrix channel.
- Multiple implementations in multiple languages. I have implemented the Common Lisp implementation, but I would like to see one in C, Scheme, Janet, Elixir, etc. I'd be happy to add your implementation to this repository as the standard if you wish to implement such a parser.
- I don't use Emacs, but would welcome and love a contribution for a NRDL Emacs plugin. The same goes for VS Code.
Check out the vim NRDL plugin.
NRDL has its own Matrix channel (nrdl:matrix.org
).
NRDL also has an IRC channel on Libera.Chat, #nrdl
. The matrix channel is
preferred.
NRDL-text = *( sep )
value
; String section
value = string
/ symbol
/ number
/ array
/ object
object = begin-object
*sep
*(value 1*sep value)
*(1*sep value 1*sep value)
*sep
end-object
begin-object = %x7B ws ; {
end-object = %x7D ; }
array = begin-array
*sep
*value
*(1*sep value)
*sep
end-array
begin-array = %x5B ; [
end-array = %x5D ; ]
; Number
number = [ minus ]
int
[ frac ]
[ exp ]
decimal-point = %x2E ; .
digit1-9 = %x31-39 ; 1-9
e = %x65 ; e
/ %x45 ; E
exp = e
[ minus
/ plus ]
1*digit
frac = decimal-point
1*digit
int = zero
/ ( digit1-9
*digit )
zero = %x30 ; 0
one = %x31 ;
string = quoted-string
/ prose-string
/ verbatim-string
quoted-string = double-quote
*char
double-quote
char = unescaped
/ escape
/ ( %x27 ; ":U+0022
/ %x5C ; \:U+005C
/ %x2F ; /:U+002F
/ %x62 ; b:U+0008
/ %x66 ; f:U+000C
/ %x6E ; n:U+000A
/ %x72 ; r:U+000D
/ %x74 ; t:U+0009
/ %x75 4hex-digits ) ; uXXXX:
; U+XXXX
unescaped = %x20-21 ; all
/ %x23-5B ; except
/ %x5D-10FFFF ; " and \
; Prose section
prose-string = prose-line *( sep prose-line ) *sep caret
prose-line = prose-mark
line-content
line-delimiter
prose-mark = greater-than
; Verbatim section
verbatim-string = verbatim-line *( sep verbatim-line ) *sep caret
verbatim-line = verbatim-mark
line-content
line-delimiter
verbatim-mark = pipe
symbol = bareword
/ quoted-symbol
quoted-symbol = single-quote
1*symchar
single-quote
symchar = backtick-unescaped
/ escape ( %x60 ; `
/ %x5C ; \:U+005C
/ %x2F ; /:U+002F
/ %x62 ; b:U+0008
/ %x66 ; f:U+000C
/ %x6E ; n:U+000A
/ %x72 ; r:U+000D
/ %x74 ; t:U+0009
/ %x75 4hex-digits) ; uXXXX: U+XXXX
escape = %x5C ; \
backtick-unescaped = %x20-5B ; all
/ %x5D-5F ; except `
/ %x61-10FFFF ; and \
bareword = bareword-start
*( bareword-middle )
bareword-start = %x21 ; !
/ %x24-26 ; %$&
/ %x2B ; +
/ %x2F ; /
/ %x3C ; <
/ %x3D ; =
/ %x3F ; ?
/ %x40-5A ; @A-Z
/ %x5F ; _
/ %x61-7A ; a-z
/ %x80-10FF ; hand wave
bareword-middle = bareword-start
/ digit
/ plus
/ minus
/ dot
bareword-end = bareword-start
/ digit
/ plus
/ minus
/ dot
comment = comment-start
line-content
line-delimiter
comment-start = hash
sep = ws
*( comment
ws )
ws = blank
/ line-delimiter
/ colon
/ comma
hex-digit = digit ; 0-9
/ %x41-46 ; A-F
/ %x61-66 ; a-f
line-content = %x09 ; all
/ %x20-10FFF ; but \r and \n
line-delimiter = %x0D %x0A ; \r\n
/ %x0A ; \n
/ %0D ; \r
blank = %x20 ; Space
/ %x09 ; \t
comma = %x2C ; ,
colon = %x3A ; :
hash = %x43 ; #
dot = %x2E ; .
plus = %x2B ; +
minus = %x2D ; -
double-quote = %x27 ; "
single-quote = %x22 ; '
backtick = %x60 ; `
fslash = %x2F ; /
pipe = %x7C ; |
digit = %x30-39 ; 0-9
greater-than = %x3E ; >
caret = %x5E ; ^