Skip to content

Commit 78146c0

Browse files
authored
Merge pull request #19 from aradi/formatting
Formatting
2 parents 4f7e74d + 3bb47ad commit 78146c0

16 files changed

+916
-370
lines changed

docs/api.rst

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -19,7 +19,6 @@ High level routines
1919
.. autofunction:: hsd.dump
2020

2121

22-
2322
Lower level building blocks
2423
===========================
2524

@@ -31,3 +30,9 @@ Lower level building blocks
3130

3231
.. autoclass:: hsd.HsdDictBuilder
3332
:members:
33+
34+
.. autoclass:: hsd.HsdDictWalker
35+
:members:
36+
37+
.. autoclass:: hsd.HsdFormatter
38+
:members:

docs/conf.py

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,8 @@
1212
#
1313
import os
1414
import sys
15+
import doctest
16+
1517
sys.path.insert(0, os.path.abspath('../src'))
1618

1719
# -- Project information -----------------------------------------------------
@@ -37,6 +39,8 @@
3739

3840
autodoc_member_order = 'bysource'
3941

42+
doctest_default_flags = doctest.NORMALIZE_WHITESPACE
43+
4044
# Add any paths that contain templates here, relative to this directory.
4145
templates_path = ['_templates']
4246

docs/hsd.rst

Lines changed: 266 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,266 @@
1+
**************
2+
The HSD format
3+
**************
4+
5+
General description
6+
===================
7+
8+
You can think about the Human-readable Structured Data format as a pleasant
9+
representation of a tree structure. It can represent a subset of what you
10+
can do for example with XML. The following constraints compared
11+
to XML apply:
12+
13+
* Every node of a tree, which is not empty, either contains further nodes
14+
or data, but never both.
15+
16+
* Every node may have a single (string) attribute only.
17+
18+
These constraints allow a very natural looking formatting of the data.
19+
20+
As an example, let's have a look at a data tree, which represents input
21+
for scientific software. In the XML representation, it could be written as ::
22+
23+
<Hamiltonian>
24+
<Dftb>
25+
<Scc>Yes</Scc>
26+
<Filling>
27+
<Fermi>
28+
<Temperature attrib="Kelvin">77</Temperature>
29+
</Fermi>
30+
<Filling>
31+
</Dftb>
32+
</Hamiltonian>
33+
34+
The same information can be encoded in a much more natural and compact form in HSD
35+
format as ::
36+
37+
Hamiltonian {
38+
Dftb {
39+
Scc = Yes
40+
Filling {
41+
Fermi {
42+
Temperature [Kelvin] = 77
43+
}
44+
}
45+
}
46+
}
47+
48+
The content of a node are passed either between an opening and a closing
49+
curly brace or after an equals sign. In the latter case the end of the line will
50+
be the closing delimiter. The attribute (typically the unit of the data
51+
which the node contains) is specified between square brackets after
52+
the node name.
53+
54+
The equals sign can be used to assign data as a node content (provided
55+
the data fits into one line), or to assign a single child node as content
56+
for a given node. This leads to a compact and expressive notation for those
57+
cases, where (by the semantics of the input) a given node is only allowed to
58+
have a single child node as content. The tree above is a piece of a typical
59+
DFTB+ input, where only one child node is allowed for the nodes ``Hamiltonian``
60+
and ``Filling``, respectively (They specify the type of the Hamiltonian
61+
and the filling function). By making use of equals signs, the
62+
simplified HSD representation can be as compact as ::
63+
64+
Hamiltonian = Dftb {
65+
Scc = Yes
66+
Filling = Fermi {
67+
Temperature [Kelvin] = 77
68+
}
69+
}
70+
71+
and still represent the same tree.
72+
73+
74+
Mapping to dictionaries
75+
=======================
76+
77+
Being basically a subset of XML, HSD data is best represented as an XML
78+
DOM-tree. However, very often a dictionary representation is more desirable,
79+
especially when the language used to query and manipulate the tree offers
80+
dictionaries as primary data type (e.g. Python). The data in an HSD input
81+
can be easily represented with the help of nested dictionaries and lists. The
82+
input from the previous section would have the following representation as
83+
Python dictionary (or as a JSON formatted input file)::
84+
85+
{
86+
"Hamiltonian": {
87+
"Dftb": {
88+
"Scc": Yes,
89+
"Filling": {
90+
"Fermi": {
91+
"Temperature": 77,
92+
"Temperature.attrib": "Kelvin"
93+
}
94+
}
95+
}
96+
}
97+
}
98+
99+
The attribute of a node is stored under a special key containting the name of
100+
the node and the ``.attrib`` suffix.
101+
102+
One slight complication of the dictionary representation arises in the case
103+
of node which has multiple child nodes with the same name ::
104+
105+
<ExternalField>
106+
<PointCharges>
107+
<GaussianBlurWidth>3</GaussianBlurWidth>
108+
<CoordsAndCharges>
109+
3.3 -1.2 0.9 9.2
110+
1.2 -3.4 5.6 -3.3
111+
</CoordsAndCharges>
112+
</PointCharges>
113+
<PointCharges>
114+
<GaussianBlurWidth>10</GaussianBlurWidth>
115+
<CoordsAndCharges>
116+
1.0 2.0 3.0 4.0
117+
-1.0 -2.0 -3.0 -4.0
118+
</CoordsAndCharges>
119+
</PointCharges>
120+
</ExternalField>
121+
122+
While the HSD representation has no problem to cope with the situation ::
123+
124+
ExternalField {
125+
PointCharges {
126+
GaussianBlurWidth = 3
127+
CoordsAndCharges {
128+
3.3 -1.2 0.9 9.2
129+
1.2 -3.4 5.6 -3.3
130+
}
131+
}
132+
PointCharges {
133+
GaussianBlurWidth = 10
134+
CoordsAndCharges {
135+
1.0 2.0 3.0 4.0
136+
-1.0 -2.0 -3.0 -4.0
137+
}
138+
}
139+
}
140+
141+
a trick is needed for the dictionary / JSON representation, as multiple keys
142+
with the same name are not allowed in a dictionary. Therefore, the repetitive
143+
nodes will be mapped to one key, which will contain a list of dictionaries
144+
(instead of a single dictionary as in the usual case)::
145+
146+
{
147+
"ExternalField": {
148+
// Note the list of dictionaries here!
149+
"PointCharges": [
150+
{
151+
"GaussianBlurWidth": 3,
152+
"CoordsAndCharges": [
153+
[3.3, -1.2, 0.9, 9.2],
154+
[1.2, -3.4, 5.6, -3.3]
155+
]
156+
},
157+
{
158+
"GaussianBlurWidth": 10,
159+
"CoordsAndCharges": [
160+
[1.0, 2.0, 3.0, 4.0 ],
161+
[-1.0, -2.0, -3.0, -4.0 ]
162+
]
163+
},
164+
]
165+
}
166+
}
167+
168+
The mapping works in both directions, so that this dictionary (or the JSON file
169+
created from it) can be easily converted back to the HSD form again.
170+
171+
172+
Processing related information
173+
==============================
174+
175+
Additional to the data stored in an HSD-file, further processing related
176+
information can be recorded on demand. The current Python implementation is able
177+
to record following additional data for each HSD node:
178+
179+
* the line, where the node was defined in the input (helpful for printing out
180+
informative error messages),
181+
182+
* the name of the HSD node, as found in the input (useful if the tag names are
183+
converted to lower case to ease case-insensitive handling of the input) and
184+
185+
* whether an equals sign was used to open the block.
186+
187+
If this information is being recorded, a special key with the
188+
``.hsdattrib`` suffix will be generated for each node in the dictionary/JSON
189+
presentation. The correpsonding value will be a dictionary with those
190+
information.
191+
192+
As an example, let's store the input from the previous section ::
193+
194+
Hamiltonian = Dftb {
195+
Scc = Yes
196+
Filling = Fermi {
197+
Temperature [Kelvin] = 77
198+
}
199+
}
200+
201+
in the file `test.hsd`, parse it and convert the node names to lower case
202+
(to make the input processing case-insensitive). Using the Python command ::
203+
204+
inpdict = hsd.load("test.hsd", lower_tag_names=True, include_hsd_attribs=True)
205+
206+
will yield the following dictionary representation of the input::
207+
208+
{
209+
'hamiltonian.hsdattrib': {'equal': True, 'line': 0, 'tag': 'Hamiltonian'},
210+
'hamiltonian': {
211+
'dftb.hsdattrib': {'line': 0, 'tag': 'Dftb'},
212+
'dftb': {
213+
'scc.hsdattrib': {'equal': True, 'line': 1, 'tag': 'Scc'},
214+
'scc': True,
215+
'filling.hsdattrib': {'equal': True, 'line': 2, 'tag': 'Filling'},
216+
'filling': {
217+
'fermi.hsdattrib': {'line': 2, 'tag': 'Fermi'},
218+
'fermi': {
219+
'temperature.attrib': 'Kelvin',
220+
'temperature.hsdattrib': {'equal': True, 'line': 3,
221+
'tag': 'Temperature'},
222+
'temperature': 77
223+
}
224+
}
225+
}
226+
}
227+
}
228+
229+
The recorded line numbers can be used to issue helpful error messages with
230+
information about where the user should search for the problem.
231+
The node names and formatting information about the equal sign ensures
232+
that the formatting is similar to the original HSD, if the data is dumped
233+
into the HSD format again. Dumping the dictionary with ::
234+
235+
hsd.dump(inpdict, "test2-formatted.hsd", use_hsd_attribs=True)
236+
237+
would indeed yield ::
238+
239+
Hamiltonian = Dftb {
240+
Scc = Yes
241+
Filling = Fermi {
242+
Temperature [Kelvin] = 77
243+
}
244+
}
245+
246+
which is basically identical with the original input. If the additional
247+
processing information is not recorded when the data is loaded, or
248+
it is not considered when the data is dumped as HSD again ::
249+
250+
inpdict = hsd.load("test.hsd", lower_tag_names=True)
251+
hsd.dump(inpdict, "test2-unformatted.hsd")
252+
253+
the resulting formatting will more strongly differ from the original HSD ::
254+
255+
hamiltonian {
256+
dftb {
257+
scc = Yes
258+
filling {
259+
fermi {
260+
temperature [Kelvin] = 77
261+
}
262+
}
263+
}
264+
}
265+
266+
Still nice and readable, but less compact and with different casing.

docs/index.rst

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -11,4 +11,5 @@ HSD-python documentation
1111
:maxdepth: 2
1212

1313
introduction
14+
hsd
1415
api

docs/introduction.rst

Lines changed: 6 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -5,13 +5,13 @@ Introduction
55
This package contains utilities to read and write files in the Human-friendly
66
Structured Data (HSD) format.
77

8-
The HSD-format is very similar to both JSON and YAML, but tries to minimize the
8+
The HSD-format is very similar to XML, JSON and YAML, but tries to minimize the
99
effort for **humans** to read and write it. It ommits special characters as much
10-
as possible (in contrast to JSON) and is not indentation dependent (in contrast
11-
to YAML). It was developed originally as the input format for the scientific
12-
simulation tool (`DFTB+ <https://github.com/dftbplus/dftbplus>`_), but is
13-
of general purpose. Data stored in HSD can be easily mapped to a subset of JSON
14-
or XML and vica versa.
10+
as possible (in contrast to XML and JSON) and is not indentation dependent (in
11+
contrast to YAML). It was developed originally as the input format for the
12+
scientific simulation tool (`DFTB+ <https://github.com/dftbplus/dftbplus>`_),
13+
but is of general purpose. Data stored in HSD can be easily mapped to a subset
14+
of JSON, YAML or XML and *vice versa*.
1515

1616

1717
Installation

src/hsd/__init__.py

Lines changed: 7 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,10 @@
77
"""
88
Toolbox for reading, writing and manipulating HSD-data.
99
"""
10-
from .dictbuilder import HsdDictBuilder
11-
from .eventhandler import HsdEventHandler
12-
from .io import load, load_string, dump, dump_string
13-
from .parser import HsdParser
10+
from hsd.common import HSD_ATTRIB_LINE, HSD_ATTRIB_EQUAL, HSD_ATTRIB_SUFFIX,\
11+
HSD_ATTRIB_NAME, HsdError
12+
from hsd.dict import HsdDictBuilder, HsdDictWalker
13+
from hsd.eventhandler import HsdEventHandler, HsdEventPrinter
14+
from hsd.formatter import HsdFormatter
15+
from hsd.io import load, load_string, dump, dump_string
16+
from hsd.parser import HsdParser

src/hsd/common.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,11 @@
77
"""
88
Implements common functionalities for the HSD package
99
"""
10+
try:
11+
import numpy as np
12+
except ModuleNotFoundError:
13+
np = None
14+
1015

1116

1217
class HsdError(Exception):
@@ -26,18 +31,21 @@ def unquote(txt):
2631
# Suffix to mark attribute
2732
ATTRIB_SUFFIX = ".attrib"
2833

29-
# Length of the attribute suffix
30-
LEN_ATTRIB_SUFFIX = len(ATTRIB_SUFFIX)
31-
3234
# Suffix to mark hsd processing attributes
3335
HSD_ATTRIB_SUFFIX = ".hsdattrib"
3436

35-
# Lengths of hsd processing attribute suffix
36-
LEN_HSD_ATTRIB_SUFFIX = len(HSD_ATTRIB_SUFFIX)
37-
37+
# HSD attribute containing the original tag name
38+
HSD_ATTRIB_NAME = "name"
3839

40+
# HSD attribute containing the line number
3941
HSD_ATTRIB_LINE = "line"
4042

43+
# HSD attribute marking that a node is equal to its only child (instead of
44+
# containing it)
4145
HSD_ATTRIB_EQUAL = "equal"
4246

43-
HSD_ATTRIB_TAG = "tag"
47+
# String quoting delimiters (must be at least two)
48+
QUOTING_CHARS = "\"'"
49+
50+
# Special characters
51+
SPECIAL_CHARS = "{}[]= "

0 commit comments

Comments
 (0)