implement images in POD #128

dk · 2020-10-24T09:58:54Z

The patch allows pod formatters to recognize new image targets,
and include these images in the output. The new image target is
described as YAML, such as:

  =begin image

  src:image.png
  title: I<Fig.1>

  =end image

or simply

  =for image src:image.png

Both older parsers and the formatters that are not aware of the new
functionality will retain full backward compatibility. There are added
extra provisions for them, to render POD instead of images:

  =for image-text src:image.png

  I<text fallback>

  =for image-cut

See more:

lib/Pod/Simple/Images.pod - syntax extensions
lib/Pod/Simple/HTML.pm - image-aware HTML formatters
lib/Pod/Simple/XHTML.pm
lib/Pod/Simple/YAML.pm - YAML 1.2 parser

The patch allows pod formatters to recognize new image targets, and include these images in the output. The new image target is described as YAML, such as: =begin image src:image.png title: I<Fig.1> =end image or simply =for image src:image.png Both older parsers and the formatters that are not aware of the new functionality will retain full backward compatibility. There are added extra provisions for them, to render POD instead of images: =for image-text src:image.png I<text fallback> =for image-cut See more: - lib/Pod/Simple/Images.pod - syntax extensions - lib/Pod/Simple/HTML.pm - image-aware HTML formatters - lib/Pod/Simple/XHTML.pm - lib/Pod/Simple/YAML.pm - YAML 1.2 parser

marekro · 2020-10-26T11:56:55Z

As much as I welcome adding more features to POD, I can't help but being sceptical about images (side note: IMHO tables would be more important). A couple of questions/comments:

image paths
The examples refer to e.g. "=for image src:onion.png", and let's assume we're rendering to HTML. If the path is relative (like in the example), then we end up with <img src="onion.png">? How does this image file get to the folder where the *.html is generated in? To make it even more complicated, if we were to render Pod::Simple::XHTML, then the HTML file should be written to Pod/Simple/XTML.html, right? And even if the onion.png file is right next to XHTML.pm or XHTML.pod, how does it get to the output target? Or is this meant to be done by the caller separately?
This concern applies to other target formats, e.g. LaTeX. Browsers like Tk::Pod would just need to find the image file - is that meant to be along with the .pm/.pod source?
embedding images
To solve the above, an option would be to embed the image source in the POD itself, such that the rendered HTML result would be e.g.: <img src="data:image/gif;base64,...... /> This could even open up the door for rendering images in ASCII on the fly. Or perhaps allow to specify an alternative ASCII-style version of the image (instead of just an "alt" text).

Thank you for the work you are investing into this!

dk · 2020-10-26T12:12:55Z

@marekro Hi Marek,

#1 I've addressed in Images.pod:

This field should point to an image file, with the same path considerations as
in perlpod's C<< F<> >> command. I.e., C<src:img.png> will tell the formatter
to generate path to the image C<img.png> that is to be found in the current
directory.

If FILEPATH starts with /, that means the root is the distribution root, so that
f ex online services like github or metacpan can correctly trace and point image
requests to their internal storage.

Generally it is a good idea to either store image files together with .pod files, 
or, if there are also .pm files, create a special directory for images.

which I mean should be a guideline what to do when one want images in the same directory. F ex the XHTML formatter is not responsible for installing images together with html files, but Makefile.PL is; it should it its job. LaTeX though very much can be made responsible, because it can only understand EPS files, and it's usually not a part of Makefile.PL. I would prefer that image installation should only be Makefile.PL's business, however if pod2html and pod2latex would introduce extra options for this, that should be okay I believe.

#2: I've considered this but left it unimplemented (see YAML.pm):

Types marked as C<!!type> are not recognized, but tolerated. Specifically,
binary format in form C<!!binary BASE64_TEXT> does not decode base64.  If the
image tag will evolve to have image bits embedded in a document, this will
be implemented.

So, entirely possible with either =for image src: !!binary BYTES or =for image raw: !!binary BYTES, not sure which one is better. The reason for not implementing it that I wanted to cut some unnecessary but clean-to-implement pieces to focus on the central idea. So if/when the patch will merge, I'll work on that next.

lindleyw · 2020-11-03T14:20:08Z

If the goal for these images to be diagrams, would a text-based diagram description language like Pic be more suited to readability and version-control? See for example Pikchr, which is actively being developed and "is designed to be embedded in fenced code blocks of Markdown or similar mechanisms of other documentation markup languages."

Most of the brute-force image substitutes in Pod, such as in the Mojolicious Growing Guide, seem to be diagrams that Pic / Pikchr would be well suited to creating, in a human-readable and editable fashion.

dk · 2020-11-03T14:30:54Z

@lindleyw No the goal is not diagrams, the goal is image files, preferrably in jpeg/png/gif/webp format.

and what to do if image is not found

khwilliamson · 2020-11-05T19:38:35Z

Do you think this is now ready to merge; I see you made some recent trivial changes.

Grinnz · 2020-11-05T20:44:03Z

In my opinion using YAML for this is significantly overcomplicating things. Pod renderers should be able to be trivially implemented. A simple text based format would be perfectly sufficient.

dk · 2020-11-05T21:42:16Z

@khwilliamson Yes it is ready. These changes were rather microscopic.

@Grinnz I'm not sure I understand your point. If you're talking about a pod renderer, not a pod parser, then it doesn't even know that there is yaml under the hood. The renderer receives a simple perl hash such as { src => 'path' } and that's it.

Grinnz · 2020-11-05T21:44:34Z

The parser component having to understand YAML places even more burden in the wrong place, IMO.

dk · 2020-11-05T21:47:06Z

Indeed it does, but if we're going to define a simple text based format instead of yaml, then we're going to define a specification, and a bnf syntax, and what not. Yaml might be an overkill for the task but it saves us equally, if not greater burden to define and standartize such simple text format. Which might not necessarily become simple in the end.

Grinnz · 2020-11-05T21:48:34Z

Defining a text format means we can define it to be only as complex as the task at hand requires. YAML is incredibly complex to support and recommend even when we only need a tiny subset of it.

dk · 2020-11-05T21:55:29Z

I'm not convinced that work and definition of the format for the task at hand if easier and better suited than a well-understood format, albeit more complex. There is no guarantee that this task at hand will require a simple syntax, too, - the img tag is open for extension, if someone wants later multi-resolution support, for example, syntax for widths: [200,300,500] will be already there. The yaml is complex, yes, but this is a finite complexity. The new syntax will always be a content point, and will require lots of effort later when extending the syntax. A technical debt of sorts.

Finally, if a pod parser writer consider it hard to parse yaml, which is a fair point, given that cpan's YAML is obsolete, then it's okay simply ignore these sections. Or simply matching /src:\s+(\S+) would cover 99% cases anyway.

karenetheridge · 2020-11-05T21:58:59Z

FWIW the only in-core YAML parser is CPAN::Meta::YAML, which is a very stripped down implementation and only intended to support the very limited syntax of META.yml files.

dk · 2020-11-05T22:02:22Z

@karenetheridge the pull request includes Pod::Parser::YAML which is a rather (but not completely) full implementation, a single file of 19K/857 loc. If it comes in the core then it can be used for anything YAML, and the argument that 'yaml is complex' will be mostly moot

Grinnz · 2020-11-05T22:02:27Z

I'll also point out that this is currently implemented in the PR by bundling yet another pure-perl YAML parser implementation that has to be tested and maintained separately from the ones already on CPAN. I'm not sure how this is less burden than a specific syntax parser.

dk · 2020-11-05T22:04:47Z

Well I see it that way: a yaml parser will require burden of test/maintenance. While a custom other syntax parser will require burden of specification, documentation, and again, test/maintenance.

Grinnz · 2020-11-05T22:13:18Z

I'll put it this way. The entire rest of the pod specification is a simple paragraph based text format which requires minimal understanding and complexity even to manually parse - the most complicated aspect is getting the character encoding right. The use of YAML is very out of step with that design. And yes, it can be ignored by parsers - many specific comment types are - but Pod::Simple is practically speaking the reference parser, and if parsers ignore it then renderers won't be able to use it.

karenetheridge · 2020-11-05T22:18:26Z

What configuration needs to be parsed right now? It appears to be whitespace-separated key-value pairs. Why is YAML needed to parse that?

I'm against including a full-fledged YAML parser here. "If it comes in the core then it can be used for anything YAML" is not a selling point.

dk · 2020-11-05T22:26:38Z

I don't agree that pod is that simple. C<< embedded<>brackets >> syntax, E<&html_codepoints;> , Z<> breaks, these things are not trivial to implement in a pod parser. So are you saying that E<&html_codepoint;> that has literally piece of HTML embedded, is not a step out of that design? Why them YAML is?

@karenetheridge if core doesn't want use YAML it's fine, it's not a selling point for the core. The module is needed to parse image tags. The reason behing YAML is that we need multiline entries, and if we're going to implement something 'simple', as Dan suggests, that 'simple' needs to be discussed, documented, standardized etc which will be a huge pain by itself, and will not be simple after all. I see YAML's selling point is that it is well-understood, well-known, and gives great support for different styles of multiline scalars.

perlpunk · 2020-11-05T22:35:04Z

I haven't looked at the code of Pod::Simple::YAML yet. Try running it against the YAML Test Suite.
So I don't know what it allows and what not.

I just wanted to comment in general that providing a "simple" subset of YAML always has the same problems that YAML::Tiny and others have:

A real subset should only parse valid YAML and error otherwise.
If it accepts invalid YAML, then it is also a superset

The goal of the most "simple" YAML parsers is to be, well, simpler to implement than a real parser.
But to implement a subset, one has to be very careful of what's allowed and what not.
You can look at YAML::Tiny issues for example.

So, if people write YAML that is accepted by Pod::Simple::YAML, then it might be impossible to read that YAML with a different parser (in perl or other languages).

I would advise to call the format something like SimpleYAML, ... and be really clear that it is not a subset of YAML.

dk · 2020-11-05T22:42:41Z

@perlpunk fair point, but I believe this implementation should be (meant to be, at least) a subset, except for that one reservation where space in "key: value" is not required - that was for my own sense of beauty as in =for image src:foo.png. I didn't think in terms of subset/superset, so and I'm not 100% sure that the space is critical, ie other YAML parsers cannot read yaml without it. But if it does more harm than good, the space requirement can be reinstated.

dk · 2020-11-05T22:50:13Z

Also it's an interesting idea to run it against the YAML test suite, but it's not quite it because the module by definition will fail on valid yaml (anchors, f ex), but what's more interesting if it consumes invalid yaml - can it do that? If yes, running through the suite will be the next step.

Grinnz · 2020-11-05T22:54:52Z

except for that one reservation where space in "key: value" is not required - that was for my own sense of beauty as in =for image src:foo.png.

To me this seems contradictory with "yaml is well-understood/well-known" (which I disagree with, for the record) - this would imply that this modification would be against well-understood syntax.

dk · 2020-11-05T23:00:51Z

As said, I'm open to remove it. The argument of parsing pod in other languages escaped me, but I absolutely don't want these other languages to write a SimpleYAML that is not yaml. I'd rather sacrifice the space if other languages can plug their standard YAML modules.

perlpunk · 2020-11-05T23:04:52Z

Also it's an interesting idea to run it against the YAML test suite, but it's not quite it because the module by definition will fail on valid yaml (anchors, f ex)

The test suite has a list of tags for each test, like tag and alias/anchor. So you can require that these tests should fail.

but what's more interesting if it consumes invalid yaml - can it do that?

Not sure what you mean here. It should fail on invalid YAML.

dk · 2020-11-05T23:06:47Z

I didn't look closely, but what I want is to use only the part of the suite that tests that invalid yaml fails, as intended. Can I extract these invalid yamls easily? (That's for step #1, the #2 will be adapting the suite to fail for things I know should fail)

perlpunk · 2020-11-05T23:14:52Z

I didn't look closely, but what I want is to use only the part of the suite that tests that invalid yaml fails, as intended. Can I extract these invalid yamls easily?

Yes, invalid YAML tests have an error tag and data point (file in the data branch).

But it should also fail on things which are allowed in YAML 1.2, but not supported in your subset.
The current result of

---
a : b
c: &x d
e: *x

is

$VAR1 = {
          'c' => 'd',
          'e' => '',
          'a' => 'b'
        };

which is wrong.
So loading valid YAML with a different result is as bad as loading invalid YAML, IMHO.
Another thing is, you seem to ignore document-end markers ....

dk · 2020-11-05T23:19:10Z

Thank you this very valuable. Aliases I can fix so these will fail. As for ... I need to check for use-cases; but I guess this is going to be the part of the suite run.

perlpunk · 2020-11-05T23:32:22Z

Also for special types like booleans check this out: https://perlpunk.github.io/yaml-test-schema/schemas.html
In YAML 1.2 Core (the default) more than just true and false are recognized as booleans.
Also you mention binary in the pod, but that type is not in the default 1.2 Schema anymore anyway.

dk · 2020-11-05T23:36:27Z

btw what would you say about nulls? I represent these as empty strings, not undefs, to not burden pod formatters with if defined check. Would that stand or you see a problem with this?

perlpunk · 2020-11-08T12:14:07Z

btw what would you say about nulls? I represent these as empty strings, not undefs [...]

It's a question of what you call your parser and what it is documented to do.
In the last years we've been trying to get more YAML parsers behave the same, and there is still a lot to do: https://matrix.yaml.io/
That matrix (and the test suite) is mostly about the syntax, and there is https://github.com/perlpunk/yaml-test-schema
If you deviate from the default, then it should be documented.

The YAML Test Suite from https://github.com/yaml/yaml-test-suite.git was used as a testbed for the following: 1) the parser converts valid yaml into expected perl structures 2) the parser fails on invalid yaml 3) the parser fails when input contains unimplemented features All three pass now fine, so with the exception of aliases,anchors,tags,and explicit keys, the module finally can call itself a YAML subset parser.

dk · 2020-11-12T21:52:14Z

@perlpunk I think I managed to run the parser through the suite, and while I had to skip on lots of tests, the parser got significantly more robust now. I'd very much like to ask you if you could possibly take a look at the current state of things, to see if the module can, actually, be called a yaml parser.

Upd: there's xt/yaml-test-suite.t that contains most info about what's left unimplemented

dk · 2020-11-17T19:49:59Z

@khwilliamson Karl, what do you think of the status now? I'd love to have Tina to bless the patch, as I think I addressed every concern, and some more. But there's not much happening..

karenetheridge · 2020-11-18T01:08:46Z

I'm strongly against bringing a YAML decoder into core for this purpose. All the necessary information should be possible to convey in a simpler format.

Grinnz · 2020-11-18T01:44:24Z

To put a finer point on it, I disagree with requiring a YAML parser for any part of the POD spec.

dk · 2020-11-18T08:25:31Z

@karenetheridge What I'm trying to bring here is not a full yaml parser but a rather reduced yaml parser, decidedly not implementing lots of yaml stuff. By cutting those corners yaml in full spec becomes a much more simpler format, not much more complex than any other format that should support image features that will be needed for C<=for image>.

So let me repeat the pro points what do we gain by having a reduced YAML:

The syntax is well known
Specification is already defined
Test suite is already there
Pod/Image syntax doesn't need extra work in case it needs to be having new features
Other languages get POD/Image parsing automatically

I really would like to understand why is it you guys against it. Consider this: okay we try to define another sub-language for Pod/Image. What's immediately needs to happen:

We need to define the syntax
Then formalize the specs
And also writing the tests -- with cornercases
Again, writing another module for parsing it
If we forgot something, then start from begining, rinse, repeat.
Finally, adoption time will be longer because it's a new syntax with a single implementation

Who is going to do that? Are you going to go through all of it? Not me, because when I invested lots of time asking around for opinions, what is best and what is not, your opinion was not in the discussion. I'm not going to throw away all that code and effort and start from scratch because of this. And as a result, we won't have images in the pod. Everyone is happy.

What's your conterpoints, or is it against for the sake of against? The reduced YAML parser module is a single file with 1100 lines including comments. How that could be complex if Pod::Simple::BlackBox.pm alone is 2500 lines, and Pod::Simple.pm is 1600 more?

(Finally, there actually is already even more simple yaml parser for doing META.yml, in the core)

perlpunk · 2020-11-18T10:46:52Z

Just a note, I'm extremely busy with personal stuff right now, and checking if a module implements a (subset of a) YAML parser is quite some work. I'm not sure when I'll have time to look at it.

dk · 2020-11-18T11:04:45Z

@perlpunk thanks, no problem!

dk · 2020-11-25T07:57:11Z

@khwilliamson Karl, should we, in your opinion, wait for Tina to take a look at the patch? I believe I addressed all her strategical concerns, passing the test suite and failing on improper/unknown yaml. If she or anyone has comments about finer point about syntax supported or unsupported that could be fixed easily.

xenu · 2020-12-07T16:29:14Z

Seriously, we don't need a YAML parser to parse a key-value list, it's extreme overkill. Also, I don't think that 1143 line parser can be considered "simple".

xenu · 2020-12-07T16:34:32Z

Consider this: okay we try to define another sub-language for Pod/Image. What's immediately needs to happen:

You already defined a brand new language and it's much more complicated than it should be.

briandfoy · 2020-12-17T07:54:36Z

I do quite a bit with Pod and most of my books are written in Pod. I'd handle this all in basic, existing Pod with the translators I've created myself (because you're supposed to create your own translators).

=begin figure images/lnp6_1301.png

The chicken before the egg

=end figure

dk · 2020-12-17T08:39:30Z

@briandfoy that would work for a book as a single target media that is capable of graphics. However when you want to display something on a man page or plain text render or old pod parser that is not enough, unfortunately; there needs to be some sort of a fallback.

dk · 2020-12-17T09:11:42Z

@xenu "it's much more complicated than it should be." -- care to define how complicated it should be?

briandfoy · 2020-12-17T10:54:36Z

@briandfoy that would work for a book as a single target media that is capable of graphics.

I use it to output to several different formats at once, including some that are not capable of graphics. There is no single target. Since I can already add what targets I accept and how I handle them, there's nothing more I need from Pod::Simple to do what you are trying to do. I handle it all one level up.

dk · 2020-12-17T11:05:20Z

@briandfoy I don't really understand how would you generate a fallback text on a non-graphic media. F.ex. if I want image.png on html/latex and "this is some text" instead on a man page/text, how would it look in your syntax?

dk mentioned this pull request Oct 24, 2020

[feature] Add =image to perl pod Perl/perl5#18169

Open

minor text and comment fixes

9bb92ac

dk and others added 2 commits November 3, 2020 15:46

document supported image formats

bfed085

and what to do if image is not found

Typo fixes

354c8ab

dk added 2 commits November 13, 2020 11:01

exclude the suite test from the automated testing

1a6848f

fixed for 5.6.1 -- shouldn't it be dropped?

de43e94

implement images in POD #128

Are you sure you want to change the base?

implement images in POD #128

Conversation

dk commented Oct 24, 2020

marekro commented Oct 26, 2020 • edited Loading

dk commented Oct 26, 2020

lindleyw commented Nov 3, 2020 • edited Loading

dk commented Nov 3, 2020

khwilliamson commented Nov 5, 2020

Grinnz commented Nov 5, 2020

dk commented Nov 5, 2020

Grinnz commented Nov 5, 2020

dk commented Nov 5, 2020

Grinnz commented Nov 5, 2020

dk commented Nov 5, 2020

karenetheridge commented Nov 5, 2020

dk commented Nov 5, 2020

Grinnz commented Nov 5, 2020

dk commented Nov 5, 2020

Grinnz commented Nov 5, 2020

karenetheridge commented Nov 5, 2020

dk commented Nov 5, 2020

perlpunk commented Nov 5, 2020

dk commented Nov 5, 2020

dk commented Nov 5, 2020

Grinnz commented Nov 5, 2020

dk commented Nov 5, 2020

perlpunk commented Nov 5, 2020

dk commented Nov 5, 2020 • edited Loading

perlpunk commented Nov 5, 2020

dk commented Nov 5, 2020 • edited Loading

perlpunk commented Nov 5, 2020 • edited Loading

dk commented Nov 5, 2020

perlpunk commented Nov 8, 2020

dk commented Nov 12, 2020 • edited Loading

dk commented Nov 17, 2020

karenetheridge commented Nov 18, 2020

Grinnz commented Nov 18, 2020

dk commented Nov 18, 2020 • edited Loading

perlpunk commented Nov 18, 2020

dk commented Nov 18, 2020

dk commented Nov 25, 2020

xenu commented Dec 7, 2020

xenu commented Dec 7, 2020

briandfoy commented Dec 17, 2020

dk commented Dec 17, 2020 • edited Loading

dk commented Dec 17, 2020

briandfoy commented Dec 17, 2020

dk commented Dec 17, 2020

marekro commented Oct 26, 2020 •

edited

Loading

lindleyw commented Nov 3, 2020 •

edited

Loading

dk commented Nov 5, 2020 •

edited

Loading

dk commented Nov 5, 2020 •

edited

Loading

perlpunk commented Nov 5, 2020 •

edited

Loading

dk commented Nov 12, 2020 •

edited

Loading

dk commented Nov 18, 2020 •

edited

Loading

dk commented Dec 17, 2020 •

edited

Loading