Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Markdown] Prepare Web/API docs for Markdowning #7898

Closed
13 tasks done
wbamberg opened this issue Aug 14, 2021 · 16 comments
Closed
13 tasks done

[Markdown] Prepare Web/API docs for Markdowning #7898

wbamberg opened this issue Aug 14, 2021 · 16 comments
Labels
Content:WebAPI Web API docs

Comments

@wbamberg
Copy link
Collaborator

wbamberg commented Aug 14, 2021

This is a tracking bug for the work we have to do to prepare the Web/API documentation for Markdown conversion.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 1, 2021

Resolving <sup> and <sub>

There's an issue listed above to decide what to do about <sup> and <sub> in Web/API. Per https://developer.mozilla.org/en-US/docs/MDN/Contribute/Markdown_in_MDN#superscript_and_subscript, we can use these HTML elements in our Markdown if we want to. My sense is that authors will make different choices here - some people really want to minimise the amount of HTML we have, and others place more value on the better-looking results of using the HTML elements.

For Web/API there are no uses of <sup> and 134 uses of <sub>. However, all the uses of sub are in just three pages:

https://developer.mozilla.org/en-US/docs/Web/API/RTCIceCandidateStats/priority (just two here)
https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/blendFunc
https://developer.mozilla.org/en-US/docs/Web/API/WebGLRenderingContext/blendFuncSeparate

Since these usages are in just a few pages, I'm going to propose we keep them.

@hamishwillee
Copy link
Collaborator

Since these usages are in just a few pages, I'm going to propose we keep them.

I'd go further and say that unless the <sub> isn't actually needed and can be removed we should always keep it. There isn't an alternative in markdown.

More generally, markdown is there to help editors and reviewers. In the cases where it doesn't, we should have no qualms about using HTML. Main case in point being tables.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 3, 2021

Tables in Web/API

There are 980 tables in the Web/API documentation. Of these we are not converting 597.

Properties tables

Of the unconverted tables, 362 are .properties tables. These get a different style from normal tables (see for example https://developer.mozilla.org/en-US/docs/Web/API/Document/visibilitychange_event). Of the .properties tables, 340 are for event pages and 22 are for audio nodes (e.g. https://developer.mozilla.org/en-US/docs/Web/API/BiquadFilterNode).

I think these are not getting converted because they have a header column. Even if they didn't, we might decide we like this different styling enough to want to leave these tables as HTML, so they can still get the .properties class. There was some previous discussion of this in #7890, and the outcome was to keep them in HTTP: #7815.

It would be good to have a consistent policy here - if we want to keep them in HTTP then we probably also want to keep them in Web/API, and we should document table.properties as a specific tool authors can use in these situations.

The rest

So if we don't consider those .properties tables, that leaves us with 618 tables, of which 396 (64%) are being converted to Markdown, and 222 (36%) are staying as HTML.

Are we converting too few tables?

This proportion is more or less in line with other areas we've migrated so far. I've looked through the tables that aren't being converted, and there are a few themes:

Then there are clusters that use non-standard page structures that involve lots of tables. This includes the IndexedDB docs, lots of the file-related docs, and some of the SVG docs. For example:
https://developer.mozilla.org/en-US/docs/web/api/directoryentrysync
https://developer.mozilla.org/en-US/docs/web/api/fileentrysync
https://developer.mozilla.org/en-US/docs/web/api/idbcursor/continue
https://developer.mozilla.org/en-US/docs/web/api/idbcursorsync
https://developer.mozilla.org/en-US/docs/web/api/svganimatedlength

It would be good to fix these up to use standard structures, but that's a lot of work that I don't want to include now.

Finally there are tables that are "accidentally" unconvertible, that could be made convertible with small changes that improve the pages anyway. I've tried to address these in #8588.

Are we converting too many tables?

GFM tables are nice for simple tables but become unusable when the table rows get too long. We should pick a max length and refuse to convert tables longer than that. I propose 150 or 200 characters as the max.

Of the 396 converted tables:

  • 155 are > 150 characters wide, when converted to Markdown
  • 88 are > 200 characters wide, when converted to Markdown

It would be nice if the conversion tool helped us here, but I think we can do this in content, if we can agree on max length.

@teoli2003
Copy link
Contributor

Properties tables

Let's keep them for the moment (for events). When we have a better idea, they will be easy to find & replace. Note that there are a few events that have a<dl>' instead of the table. I find this ugly and we should convert them to a .properties` table

tables that are legitimately complex (fancy colspans, or even just column and row headers, like https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto#supported_algorithms). I don't think we should try to change these

Agreed.

tables that have cells containing block elements, especially lists. Again I don't think changing these is in scope for this project

Do we have many of these? It may be interesting to have a look at these and to decide on a case by case basis.

Then there are clusters that use non-standard page structures that involve lots of tables. This includes the IndexedDB docs, lots of the file-related docs, and some of the SVG docs. For example:
https://developer.mozilla.org/en-US/docs/web/api/directoryentrysync
https://developer.mozilla.org/en-US/docs/web/api/fileentrysync
https://developer.mozilla.org/en-US/docs/web/api/idbcursor/continue
https://developer.mozilla.org/en-US/docs/web/api/idbcursorsync
https://developer.mozilla.org/en-US/docs/web/api/svganimatedlength

I think we should convert these to the modern structure. It is annoying to do, but it has to be done anyway and that way we won't drag these old structures into the new shiny Markdown world.

If you have a list, happy to give you a hand.

Are we converting too many tables?

GFM tables are nice for simple tables but become unusable when the table rows get too long. We should pick a max length and refuse to convert tables longer than that. I propose 150 or 200 characters as the max.

I am not sure this is so easy to define. Automatic line breaks are making quite a few "long" tables shorter on display, aren't they?

@hamishwillee
Copy link
Collaborator

@teoli2003 @wbamberg All sounds very pragmatic.

I'm leaning more and more towards the "default" should be not to not convert tables to markdown. They make things harder to edit and add no benefit to review. Therefore they do not add any value in markdown.

Anyway, IMO

  • 150 chars width
  • Let's agree a bare minimum set of classes that we're happy be applied to tables, document them, and move on. The default should be none.
  • ideally prettify all HTML tables automatically

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 6, 2021

To help answer these questions I've written up an analysis for the pages that contain (non-.properties) tables that aren't currently getting converted: https://docs.google.com/spreadsheets/d/11v4TQStmtjXh5GSlFPggq5hqqXZ4QikqcdVUnz27UQU/edit#gid=493027305

Column B lists the reason the table was unconverted. There are a few patterns here as I said:

  • "col/row headers", "col headers": tables that use header columns
  • "rowspan"/"colspan": tables that use rowspan/colspan
  • "cell containing ...": tables that contain various types of block elements - usually paragraphs and lists
  • "table-related page style": pages that use tables to implement some obsolete nonstandard way of documenting APIs
  • a special subset of tables that use column headers is SVG pages that look like they want to be using .properties tables, but didn't know about them

Properties tables

Let's keep them for the moment (for events). When we have a better idea, they will be easy to find & replace. Note that there are a few events that have a<dl>' instead of the table. I find this ugly and we should convert them to a .properties` table

tables that are legitimately complex (fancy colspans, or even just column and row headers, like https://developer.mozilla.org/en-US/docs/Web/API/SubtleCrypto#supported_algorithms). I don't think we should try to change these

Agreed.

tables that have cells containing block elements, especially lists. Again I don't think changing these is in scope for this project

Do we have many of these? It may be interesting to have a look at these and to decide on a case by case basis.

So one thing that came out of this analysis is that there are quite a few tables that use <p> where they don't have to, like https://developer.mozilla.org/en-US/docs/web/api/idbkeyrange/only#exceptions for example. I'm going to file a PR to fix these.

In general though I don't think we should try to change pages that have cells containing multiple paragraphs or lists (for example, https://developer.mozilla.org/en-US/docs/web/api/idbcursor/continue#exceptions). It's quite reasonable for authors to want to do things like this, and I think undoing changes like this would make the docs worse. We do have to acknowledge and work around the limitations of GFM table markup.

Then there are clusters that use non-standard page structures that involve lots of tables. This includes the IndexedDB docs, lots of the file-related docs, and some of the SVG docs. For example:
https://developer.mozilla.org/en-US/docs/web/api/directoryentrysync
https://developer.mozilla.org/en-US/docs/web/api/fileentrysync
https://developer.mozilla.org/en-US/docs/web/api/idbcursor/continue
https://developer.mozilla.org/en-US/docs/web/api/idbcursorsync
https://developer.mozilla.org/en-US/docs/web/api/svganimatedlength

I think we should convert these to the modern structure. It is annoying to do, but it has to be done anyway and that way we won't drag these old structures into the new shiny Markdown world.

If you have a list, happy to give you a hand.

Well there is a list in the spreadsheet, but I don't really want to do this for the Markdown work. It's a substantial amount of work including adding a fair few new pages. And it's not really related to Markdown. And we are dragging all kinds of terrible things into the Markdown world already :).

Are we converting too many tables?

GFM tables are nice for simple tables but become unusable when the table rows get too long. We should pick a max length and refuse to convert tables longer than that. I propose 150 or 200 characters as the max.

I am not sure this is so easy to define. Automatic line breaks are making quite a few "long" tables shorter on display, aren't they?

I'm not sure what "automatic line breaks" means here. If you mean soft wrapping added by editors, the way I'm calculating this is with a Python script that reads a converted MD file line by line hunting for GFM tables, and reporting the line length. I don't think this is going to be affected by that. Or did you have something else in mind?

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 6, 2021

I'm leaning more and more towards the "default" should be not to not convert tables to markdown. They make things harder to edit and add no benefit to review. Therefore they do not add any value in markdown.

I'm not sure what work "default" is doing here. There are really only 3 options:

  1. force only GFM tables, and ban any table structures which make them unconvertible
  2. use only HTML tables
  3. sometimes use GFM tables and sometimes use HTML tables

I don't think anyone's arguing for (1) so we're choosing (2) or (3).

(2) has the great benefit of simplicity and clarity in the meta-documentation. However I think there are a substantial number of tables (probably somewhere between a third and a half) in MDN which work well as GFM tables. If we're prepared to accept that, and that the value they add is worth the extra meta-documentation complexity, then we are really just discussing the conditions in which we want GFM tables. I'm trying to document these conditions in https://developer.mozilla.org/en-US/docs/MDN/Contribute/Markdown_in_MDN#tables. If we want to adjust this in specific ways then I'm happy to do so - and I think defining a max line length is worth doing, for example.

150 chars width

+1

Let's agree a bare minimum set of classes

I think this is .properties only, and as a long-term thing we should aim at pushing these into metadata and generating the tables, as we already do for CSSInfo.

@hamishwillee
Copy link
Collaborator

I'm not sure what work "default" is doing here.

Ah, I mean our "intent" is to convert as many tables as possible to GFM - we try to find ways to move to GFM if we can and the table still works. I was suggesting that a change in intent to really only look at a very small subset that really suit GFM.

I am leaning towards "2" because as an editor I am not sure how many tables I have seen where using GFM will actually make things easier to edit. You're suggesting "(probably somewhere between a third and a half)". I'd say that is around the number you can convert and which look OK. That doesn't make them easier to edit and review changes in. Try and edit any table with more than 2 columns or over a page width of chars and and you'll see what I mean - spotting the change is hard in review, and fixing the lines to have the nice layout is irritating.

The number of easy to edit GFM tables would be much smaller - IMO order of 5 to 10%. Those with 2 columns, a small number of rows, no need for multiple paragraphs or bullets, width less than a page for all lines.

That said, I'll go with the consensus. What we are doing now is completely workable.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 7, 2021

I'm not sure what work "default" is doing here.

Ah, I mean our "intent" is to convert as many tables as possible to GFM - we try to find ways to move to GFM if we can and the table still works. I was suggesting that a change in intent to really only look at a very small subset that really suit GFM.

I am leaning towards "2" because as an editor I am not sure how many tables I have seen where using GFM will actually make things easier to edit. You're suggesting "(probably somewhere between a third and a half)". I'd say that is around the number you can convert and which look OK. That doesn't make them easier to edit and review changes in. Try and edit any table with more than 2 columns or over a page width of chars and and you'll see what I mean - spotting the change is hard in review, and fixing the lines to have the nice layout is irritating.

The number of easy to edit GFM tables would be much smaller - IMO order of 5 to 10%. Those with 2 columns, a small number of rows, no need for multiple paragraphs or bullets, width less than a page for all lines.

I'm not sure about those numbers and wouldn't be without doing some real analysis. In general I am sympathetic to your worries here - I agree that there is a lot of space for tables that are technically Markdownable but are not actually usable in Markdown format, and I worry that we are not getting this right.

That said: to me this isn't about "intent" so much as defining clear guidelines for what makes a table suitable for Markdown, beyond the hard requirements (i.e. no block elements in cells, fancy rowspan...). Line length is definitely a good one here. I'm not sure about number of cols or rows. We don't currently consider this, I think 2 cols is pretty limiting, and I'm not sure how limiting row count is useful.

I find that it's difficult to get a sense about what is usable - which restrictions we should define - without spending some time working with these tables. So I hope it's good enough to make the best call we can for now, and be open to revisiting this later on if we want to adjust things.

One thing I'm definitely not doing is making changes that regress the content, and most of the changes clearly improve it. For example, many of the "make tables Markdownable" changes are to add proper headings to tables that should have them but don't. That's an improvement to the content whether it stays in HTML or gets converted to GFM.

@hamishwillee
Copy link
Collaborator

I'm not sure about number of cols or rows. We don't currently consider this, I think 2 cols is pretty limiting, and I'm not sure how limiting row count is useful.

Yes, I made up the numbers :-).
Here is a GFM table. If I just increase one cell beyond the max then i have to edit every single row.

| Heading 1 | Heading 2       |
| ----------| ----------------|
| Row 1 Cell 1 | Row 1 Cell 2 |
| Row 2 Cell 1 | Row 2 - 2    |
| Heading 1 | Heading 2       |
| ----------| ----------------|
| Row 1 Cell 1 | Row 1 Cell 2 |
| I hate my life | Row 2 - 2    |

We don't auto-fix the layout (do we?) so by default that is manual work to alighn on change. It is fine for small tables, but quickly becomes painful. With more columns the alignment can become more annoying.

One thing I'm definitely not doing is making changes that regress the content, and most of the changes clearly improve it. For example, many of the "make tables Markdownable" changes are to add proper headings to tables that should have them but don't. That's an improvement to the content whether it stays in HTML or gets converted to GFM.

Could not agree more. All this work is useful and the end result is better.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 7, 2021

We don't auto-fix the layout (do we?) so by default that is manual work to align on change.

Prettier will take care of this for you. I just filed #8729 to figure out how we can run Prettier on commit, but you can configure your editor to format on save.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 7, 2021

Re: sub and sup: I've got 3 👍 on that comment so I'm going to mark it resolved.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 7, 2021

GFM tables are nice for simple tables but become unusable when the table rows get too long. We should pick a max length and refuse to convert tables longer than that. I propose 150 or 200 characters as the max.

Of the 396 converted tables:

* 155 are > 150 characters wide, when converted to Markdown

* 88 are > 200 characters wide, when converted to Markdown

It would be nice if the conversion tool helped us here, but I think we can do this in content, if we can agree on max length.

I've filed #8736 for this, using 150 characters as the max length.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 9, 2021

I'd like to summarise the state of the conversation about converting tables:

Other tables (honestly not that many after all those exclusions) are being converted.

@wbamberg
Copy link
Collaborator Author

wbamberg commented Sep 9, 2021

I just ran a conversion of Web/API from mdn/content at 59e9fd4 . Here's the summary conversion report:

- tr (919)
- th[scope="row"] (784)
- table.properties (363)
- kbd (259)
- table.no-markdown (226)
- td (170)
- table.standard-table (73)
- th (52)
- sub (26)
- table (9)
- th[colSpan="4"] (7)
- th[scope="col"] (4)
- th[colSpan="3"] (4)
- table.fullwidth-table (3)
- td.header (3)
- td[colSpan="2"] (3)
- th[colSpan="2"] (3)
- td[rowSpan="2"] (2)
- th[colSpan="2"][scope="col"] (2)
- th[rowSpan="2"][scope="col"] (2)
- td[colSpan="3"] (1)
- td[colSpan="5"] (1)
- dl (1)
- th[rowSpan="13"][scope="row"] (1)
- th[rowSpan="2"][scope="row"] (1)
- td[colSpan="8"] (1)
- td[colSpan="12"] (1)

(full report is at https://gist.github.com/wbamberg/06ebf797687f466a7cca503aeee13ece)

Note the unconverted dl - this is due to a PR that landed yesterday that included broken dl markup, for which I've filed a fix here: #8776.

Apart from that the only unconverted elements are:

  • kbd, which we had agreed to leave as HTML (although apparently I didn't document that yet)
  • sub which as above we will leave as HTML in these pages
  • table and related elements.

For table we can see the converted .properties and .no-markdown elements, and a fair few .standard-table elements that are unconvertible. So this is as expected.

So I think this work is done and we can close this issue - once #8776 we can create a PR for the conversion.

@hamishwillee
Copy link
Collaborator

So I think this work is done and we can close this issue - once #8776 we can create a PR for the conversion.

Agree. FWIW you said "Prettier will take care of this for you". Unless it is part of the standard yari toolchain, it is "dead to me". So no that is not a valid assumption.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Content:WebAPI Web API docs
Projects
None yet
Development

No branches or pull requests

4 participants