🚨If you have not already, please consider filling-up this short 2-minute survey to make
usfm-grammar
even better: 👉 link
An elegant USFM parser (or validator) that uses a parsing expression grammar to model USFM. The grammar is written using ohm. Supports USFM 3.x.
The parsed USFM is an intuitive and easy to manipulate JSON structure that allows for painless extraction of scripture and other content from the markup. USFM Grammar is also capable of reconverting the generated JSON back to USFM.
Currently, the parser is implemented in JavaScript. But it is possible to re-use the grammar and port this library into other programming languages too. Contributions are welcome!
Note: Refer the docs for more information like the disclaimer, release notes, etc.
- USFM validation
- USFM to JSON convertor with 2 different levels of strictness
- JSON to USFM convertor
- CSV/TSV converter for both USFM and JSON
- Command Line Interface (CLI)
Try out the usfm-grammar
based online convertor: https://usfm-grammar-revant.netlify.app/
Input USFM | Parsed JSON Output | Parsed JSON with only filtered Scripture Content |
---|---|---|
|
|
|
The converted JSON structure adheres to the JSON Schema defined here.
The converted JSON uses USFM marker names as its property names along with the following additional names:
book
, bookCode
, description
, meta
, chapters
, contents
, verseNumber
, verseText
, attributes
, defaultAttribute
, closing
, footnote
, endnote
, extended-footnote
, cross-ref
, extended-cross-ref
, caller
(used within notes), list
, table
, header
(used within table
), milestone
and namespace
.
The parser is available on NPM and can be installed by:
npm install usfm-grammar
To use this tool from the command line install it globally like:
npm install -g usfm-grammar
Then from the command line (terminal) to convert a valid USFM file into JSON (on stdout
) run:
usfm-grammar /path/to/file.usfm
$ usfm-grammar -h
usfm-grammar <file>
Parse/validate USFM 3.x to/from JSON.
Positionals:
file The path of the USFM or JSON file to be parsed and/or converted. By
default, auto-detects input USFM and converts it into JSON and
vice-versa.
Options:
-l, --level Level of strictness in parsing. This defaults to `strict`.
[choices: "relaxed"]
--filter Filter out content from input USFM. Not applicable for input
JSON or for CSV/TSV output. [choices: "scripture"]
-o, --output The output format to convert input into.
[choices: "csv", "tsv", "usfm", "json"]
-h, --help Show help [boolean]
-v, --version Show version number [boolean]
The options -l
(--level
) and --filter
do not have any effect if used for JSON to USFM conversion.
USFMParser.toJSON()
USFMParser.toJSON(grammar.FILTER.SCRIPTURE)
const grammar = require('usfm-grammar');
var input = '\\id PSA\n\\c 1\n\\p\n\\v 1 Blessed is the one who does not walk in step with the wicked or stand in the way that sinners take or sit in the company of mockers,';
const myUsfmParser = new grammar.USFMParser(input);
// Returns JSON representation of a valid input USFM string
var jsonOutput = myUsfmParser.toJSON();
// Returns a simplified (scripture-only) JSON representation while excluding other USFM content
var scriptureJsonOutput = myUsfmParser.toJSON(grammar.FILTER.SCRIPTURE);
Note
If you intend to re-convert a USFM from the generated JSON, we recommend using.toJSON()
without thegrammar.FILTER.SCRIPTURE
option in order to retain all information of the original USFM file.
relaxed
Mode
There is high chance that a USFM file you encounter in the wild is not fully valid according to the specifications. In order to accomodate such cases and provide a parse-able output to work with we created a relaxed mode. This maybe used as shown:
const myRelaxedUsfmParser = new grammar.USFMParser(input, grammar.LEVEL.RELAXED);
var jsonOutput = myRelaxedUsfmParser.toJSON();
The relaxed
mode provides relaxation from checking several rules in the USFM specifcation. It tries hard to accomodate non-standard USFM markup and attempts to generate a JSON output for it. Only the most important markers are checked for, like the \id
at the start, presence of \c
and \v
markers. Though all the markers in the input USFM file are preserved in the generated JSON output, their syntax or their positions in the file is not verified for correctness. Even misspelled markers would be accepted!
Caution: Errors may go unnoticed that might lead to loss of information when using the
relaxed
mode. For example, if the input USFM has erroneously does not have a space between the verse marker and the verse number (e.g.\v3
) the parser inrelaxed
mode would treat it as a separate marker (v3
as opposed tov
) and fail to recognise it is a verse. The right (or the hard) thing to do is fix the markup according to the specification. We generally recommend using the grammar in the default mode.
USFMParser.validate()
// Returns a Boolean indicating whether the input USFM text satisfies the grammar or not.
// This method is available in both default and relaxed modes.
var isUsfmValid = myUsfmParser.validate();
Note
- The input JSON should have been generated by
usfm-grammar
(or in the same format).- If a USFM file is converted to JSON and then back to USFM, the re-created USFM will have the same contents but spacing and new-lines will be normalized.
JSONParser.toUSFM()
const myJsonParser = new grammar.JSONParser(jsonOutput);
// Returns the original USFM that was previously converted to JSON
let reCreatedUsfm = myJsonParser.toUSFM();
This method works with JSON output created with or without the grammar.FILTER.SCRIPTURE
option.
JSONParser.validate()
// Returns a Boolean indicating whether the input JSON confines to grammar.JSONSchemaDefinition.
var isJsonValid = myJsonParser.validate();
-
USFMParser.toCSV()
-
JSONParser.toCSV()
-
USFMParser.toTSV()
-
JSONParser.toTSV()
// Example usage:
// Returns CSV and TSV from a USFM, respectively
var csvString = myUsfmParser.toCSV();
var tsvString = myUsfmParser.toTSV();
The toCSV()
and toTSV()
methods return a tabular representation of the verses in the format:
<BOOK, CHAPTER, VERSE-NUMBER, VERSE-TEXT>