-
Notifications
You must be signed in to change notification settings - Fork 3
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Physical structure and display #270
Comments
The summary is spot on. But why ask for a hierarchy? These are pointlike empty elements that represent transitions in the text. Sure, there is an implicit hierarchy that actually manifests in a typical text: pagelike is higher than linelike. A gridlike hierarchy is always independent of the page-line hierarchy: if you have something like the three examples under EGD §3.6.6, it does not really make sense to ask whether the word "hole" is in "line 1 of column/block/fragment A" or in "column/block/fragment A of line 1" - it's a grid: column/block/fragment A does not contain the whole of line 1, and line 1 does not contain the whole of column/block/fragment A. The word is in the intersection of column/block/fragment A and line 1, but the two are just as independent of each other as the rows and columns of a spreadsheet. Because the text proceeds linearly from line to line, and is interrupted on its way by columns/blocks/fragments (so that the "contents" of a line are always a contiguous chunk of text, whereas the "contents" of a column/block/fragment are never), we conceive of the former as hierarchically superior to the latter, and include milestones for the beginning of a particular column/block/fragment in every line, rather than encoding the beginning of a particular line in every column/block/fragment. Headings and labels are to be displayed as you say in your summary. (And in logical and full view, pagelike and linelike elements are also displayed inline, i.e. without a new paragraph and with inline labels. But this is already implemented.) So as far as I can see, there is no need to worry any more about the hierarchy. If you need something more absolute for a technical reason, please explain a bit more, but even then it may not be possible to establish something universally. In a typical case, there will be at most one pagelike segmentation, exactly one linelike segmentation and at most one gridlike segmentation in a document, and even if all three of these levels are present, they will be clearly hierarchical. But there may, very rarely (like in 0.01% of inscriptions, at a guess), be special cases, such as your example with A slightly more likely case of mixed hierarchies (say, 0.1% of inscriptions?) would be a document with more than one kind of gridlike segmentation, for example an inscription whose original layout included a gridlike setup (e.g. written across the faces of a polygonal column), and the object is also broken, so we have two or more fragments whose boundaries are not the same as the boundaries of the layout grid. See my attempt at ASCII art below, where .-s stand for the writing, which is laid out in two virtual columns (so column 1 and 2 of line 1 must be read before proceeding to line 2, otherwise the columns would be a pagelike partition, not a gridlike one), and the x-s represent a line of fracture running diagonally across the stone.
As all gridlike segmentations are independent of other hierarchies, there is again no way to suborn the grids into the hierarchy of lines (and pages, if relevant), except by arbitrarily saying that both grids are below the lowest level of the (page-)line hierarchy. As for the hierarchisation of the two grids relative to each other, I see no other way than to arbitrarily choose which to put first in the code:
The last point in EGD §3.6.3 anticipates this scenario, but only says that different numeration schemes must be used for the two grids; it does not say anything about the order in which their milestones should be encoded. Is there any technical reason why one should be enforced in preference over the other? Or why we should altogether avoid both and try to come up with yet another extremely complex custom encoding solution for an extremely rare special case? |
I am trying to address two things: the physical display, and the referencing of elements with For the physical display, I think we are good as long as I can assume that gridlike elements must always be represented in the same way, whatever their I have spotted a single valid example of the use of several gridlike elements: DHARMA_INSEIAD00039. For the use of several pagelike elements, we have DHARMA_INSCIK00090, DHARMA_INSCIK00523 and DHARMA_INSCIK00601. Right now, I guess pagelike milestones are not supposed to appear in the TOC, to avoid confusion with |
Yes, I would be perfectly happy with that. For pagelike elements, my suggestion is to get rid of the ⎘ icon and instead show "page" for
I'll have to take a look at these when I'm back to work after the 8th of April.
I certainly think that all pagelike elements should be handled in the same way: same display formatting and same behaviour for TOC. If we're talking only about the physical display, then I think it's all right to make these look and behave more like headings, i.e. set off from text font (size, bold, whatever) and to include them in the TOC. I see that textpart headings are There's a reason why these are conceived of as "pagelike". It really helps to think of these in terms of page breaks in a book; textpart divs would then be analogous to chapters in a book. The page numbers are non-intrusive and try not to interrupt the text, since they are at points where the text as an abstract thing does not break. They are only physical, and would not necessarily be identical in a reprint of the book. Conversely, chapter headings mark the beginnings of major sections of the text and would have to be at the same points in a reprint. But when we're looking at a diplomatic edition where the physicality of the text is foregrounded, it's OK to treat these in a way similar to (but still subordinate to) chapter headings. |
OK, thank you. |
Indeed, DHARMA_INSEIAD00039 looks like a correct example of multiple (superimposed, non-hierarchical) gridlike structures. DHARMA_INSCIK00523 is in all probability incorrectly encoded: it has
The remaining INSCIK examples are more difficult because of their complexity, because I do not know what some of the objects actually look like, and because I don't understand the language. Here are my thoughts on them. These are partly for myself so that I can discuss the encoding later on with Kunthea, but I'd like you to skim these so that you and I can decide together about a policy toward hierarchical pagelike milestones. As far as display is concerned, the solutions we have discussed above should work for these too. In DHARMA_INSCIK00090, there should definitely not be two instances of
In DHARMA_INSCIK00601, we have:
And here we come back to the theoretical discussion. What I'd like you to help me decide is whether we should allow multilevel hierarchies of pagelike milestones, or forbid them outright. Given that we have only a very few cases where such a hierarchy may be desirable (I think my guesstimate of 0.01% was not far off), we may on the whole be better off if we eliminate this kind of complication from our encoding. Here are the alternatives as far as I can see them:
At the moment, I vacillate between 1 and 2; I'm not in favour of 3, since in the early days, we have intentionally downplayed the use of textpart divs, recommending their use in the EGD only when there is really no way to read the parts linearly as a coherent whole. What do you think? Another theoretical question concerns the use of gridlike milestones when a grid does not affect all the lines of an inscription. I've recently had to encode a copper plate with a corner broken off, sort of like this:
(contrast my earlier ASCII art above, where the fracture affected each line).
I have provisionally gone with the third solution, but I'm really uncertain about this and would appreciate your opinion. |
For processing texts mechanically, we need to make sure our encoding remains
Pagelike milestonesI am in favor of not allowing multilevel hierarchies, at least until it becomes clear we On the technical side, having multiple levels
Not having hierarchies would also simplify numbering rules: instead of "pagelike With this model, we would have two basic types of computer-generated references: Gridlike milestonesAmong your three solutions, the first one and the third can be mechanically Consider the following, for instance:
If someone encoded it with the third method, viz. by only encoding the three Overall, the third method is the most economical and probably the safest bet. If |
Hmm, thanks for the detail and clarity. This will take me some time to process and I write as I think. I'm not familiar with BNF grammar, but I've managed to work out what the notation means. I'm happy that you have formulated this and agree that it covers the "normal" cases.
Supposing that we keep allowing more than one
Yes, but as I keep saying, this is what they are meant to be. Why do we need there to be any real structure? Am I missing something?
I don't think so. The
They have to resist the temptation. The EGD says and has always said that the That said, I'm still inclined to forbid multiple pagelike hierarchies. It's not that I'm against it, it's just that I'd like to know what we might gain by doing so, and so far the slight simplification of the rules (and the resulting reduced chance of human error) is the only real advantage that I see; I do not see a significant gain in processing. Comments on that are welcome. I'll come back to the question of gridlike milestones later. |
It is not that we need a rigorous hierarchy, but that there is already one, albeit implicit. For instance, the rule:
does not apply to a structure like this:
If you do not know that Likewise, milestones order is significant. For instance, the above is not the same as:
Now, I can treat milestones as black boxes that have no relationship whatsoever, but I see from a mile away people asking me things like "can you put my 'item' milestones headings in a larger font?", "can you add more space between my 'item' milestones?", etc. Even for such seemingly simple tasks, I need to parse the text into a tree-shaped structure. This complicates the above grammar a lot. We end up with something like:
... and so on for all possible combinations of pagelike milestones. |
THIS is what I've missed, thanks for pointing it out.
Please confirm this would be OK. I'll then schedule a talk with Kunthea to understand how the complex Cambodian inscriptions are laid out and tell her how best to encode them. I'm going to recommend a combination of my alternatives 2 and 3 above, i.e. to prefer a flat hierarchy with just one kind of pagelike milestone, and to fall back to textpart divs if a multilevel hierarchy is essential. Does that sound all right to you? Once we are done with that, in due time I'll write up the new rules for the next release of the EGD, and may have more questions about this to you later on. |
Re gridlike milestones: thanks for your thoughts on this. I shall then stick to the third method and prescribe it explicitly in the next EGD. |
Thanks @danbalogh and @michaelnmmeyer for working this out so patiently. As for the discussion with Kunthea, I think it might be best if I participated too, but a practical dilemma is that I will be only fieldwork in Vietnam from 13 through 28 April, so practically it would have to be tomorrow or after the 28th. |
@danbalogh Yes, this works. |
Yes, indeed, thanks to both of you, Daniel and Michaël!!! |
I propose a few assumptions and clarifications to improve the mechanical processing of inscriptions' physical structure. This concerns the elements
<milestone>
,<pb>
and<lb>
.I assume there are three types of physical elements:
pagelike
,linelike
andgridlike
. In the physical display:pagelike
elements introduce a new section or subsection, and are represented as headings.linelike
elements represent divisions withinpagelike
elements and are displayed as paragraphs with hanging indentgridlike
elements represent divisions withinlinelike
and are displayed inlineSo we have something like:
Now:
<pb>
is always apagelike
element<lb>
is always alinelike
element and is the onlylinelike
element<milestone>
can be either apagelike
element or agridlike
element, depending on the value of@type
; if type is absent, it is assumed to begridlike
This clarifies the display, but still does not allow me to figure out the elements' hierarchy. For instance, if I have:
I cannot tell, in the general case, how these elements nest, so we could have, among other solutions:
If we want to have a "real" hierarchy between physical elements, the simplest solution I can think of is to use a numbering scheme that allows to tell unambiguously how the physical elements fit into each other. For instance, if we have:
I can tell, just by looking at
@n
, that the structure is:I am not sure yet how to make numbering unambiguous
The text was updated successfully, but these errors were encountered: