Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linter/Formatter for Org documentation #307

Open
jvdydev opened this issue Jul 22, 2023 · 8 comments
Open

Linter/Formatter for Org documentation #307

jvdydev opened this issue Jul 22, 2023 · 8 comments
Assignees
Labels
discussion Discuss things, make decisions documentation Improvements or additions to documentation enhancement New feature or request

Comments

@jvdydev
Copy link
Contributor

jvdydev commented Jul 22, 2023

More of a "tracking/WIP" issue, @jeffbowman mentioned in #166 the idea of building a linter/formatter for the documentation to enforce a style guide.

Had some time today, so I hacked together a small line-based regex-parser with a small linter (a bit under 300 lines of elisp).
Example output (popup buffer):

ce-linter-test-out

The core needs a rewrite, but the concept is there.

As for formatting, I'm still debating whether to make it a separate function or to replace the compile-buffer-like output with it acting on the buffer.

Just putting this up as an issue for now, ideas and suggestions welcome.

@jvdydev jvdydev added documentation Improvements or additions to documentation enhancement New feature or request labels Jul 22, 2023
@jvdydev jvdydev self-assigned this Jul 22, 2023
@jvdydev
Copy link
Contributor Author

jvdydev commented Jul 28, 2023

Small update on this.
I experimented with the initial proof-of-concept version and retro-fitted buffer-based movement onto it (instead of operating on buffer-substring).

However, while thinking about it more, I found there are a few more granular lint rules that really should be in there.
This could in the future also allow for fixing common lint issues instead of just warning about them.

These are the rules I came up with, some of these are opinionated (suggestions welcome).

Section-based

  • no-blank-line-after-section: Exactly one blank line after section
  • multiple-blank-lines-after-section: Exactly one blank line after section
  • avoid-deeply-nested-lists: May show up weird in info
  • avoid-deeply-nested-headings: May show up weird in info

Edit/For discussion:
The org-exporter seems to handle headings up to 3 levels.
Similar testing needs to be done for nesting lists for a maximum.
There probably should be a threshold of some capacity (to be tested).

Line-based

General

  • trailing-whitespace
  • line-too-long: 80 characters maximum per line
    • does not apply for attribute lines (#+)

For discussion:
Either enforce some form of indentation rules (e.g. every paragraph has to be indented by 2 spaces ... /tabs?) or make every leading whitespace (that's not list indentation) an error and use org-indent-mode.

Blocks (#+begin_ ... #+end_)

Mostly skipped, begin and end are checked for whitespace.

Attributes (#+, that are not blocks)

  • streamline metadata naming, one of:
    • avoid-capslock-metadata: (#+TITLE -> #+title)
    • use-caps-for-metadata: (#+title -> #+TITLE)

Maybe not needed, but here for discussion:

  • duplicate-attribute: multiple attribute lines of the same type

Word- or Structure-based

Comments (# )

  • fixme: TODO and FIX ME comments (# TODO, # FIX ME, # FIXME)

Links

  • broken-link: A link with broken syntax
  • prefer-info-link-over-org-link: link to info file instead of other org documents

Branding

  • crafted-emacs-branding: prefer Crafted Emacs (or crafted-emacs for repo reference)

Changelog

I update this post with suggestions instead of re-posting it over and over.

  • give options for metadata naming convention, leading-whitespace
  • expand description for duplicate-attribute and leading-whitespace, nested headings/lists
  • rename avoid-nested-lists -> avoid-deeply-nested-lists
  • fixed arrow for avoid-capslock-metadata (totally wasn't using org-mode with inline LaTeX to write the original message ... I'd never)

@jeffbowman
Copy link
Contributor

Good ideas. I'll have a think on them, most I agree with, the only one I hesitate on is org-indent-mode, as it is not a mode I use.

@jvdydev
Copy link
Contributor Author

jvdydev commented Jul 29, 2023

well, nobody has to use org-indent-mode.

I think rephrasing it like "If you want your paragraphs/subheadings to be indented, use org-indent-mode instead of manually managing whitespace."
I (personally) like this approach since it keeps the file structured in similar to most markup format, but allows for a more "pretty" representation in Emacs as an option, not a default.

However I'm obviously open to instead have a default indentation for sections (or something like that).

@jeffbowman
Copy link
Contributor

I'm not against using it, but I don't know anything about it, since I don't use it. I'll look into it. I do like the idea of a standard indent and if the module makes that easy then we should definitely investigate it.

@jeffbowman
Copy link
Contributor

Okay, I've put more thought into this, read some docs, tried a few
things, etc. Pretty much aligned with you on all points, my feedback
is below:

... This could in the future also allow for fixing common lint
issues instead of just warning about them.

I like the idea of fixing things when we can detect them and do
something intelligent. I think that feature should be on the list of
things to do. Not necessarily needed for an initial release, but a
"must-have" for a release at some point I would think.

These are the rules I came up with, some of these are opinionated
(suggestions welcome).

Opinions welcome ☺

Section-based

  • no-blank-line-after-section: Exactly one blank line after section
  • multiple-blank-lines-after-section: Exactly one blank line after
    section
  • avoid-nested-lists: May show up weird in info
  • avoid-deeply-nested-headings: May show up weird in info

Yes to all.

Questions

- What is the depth at which we warn about nested lists?  Or are
  we saying _no nested lists_?
- What is the depth at which we warn about nested headings?
  Current info files seem to allow (at least) 3 levels (usually
  noted as x.y.z numbering)

Line-based

General

  • leading-whitespace: Prefer org-indent-mode
    • does not apply to list indentation
  • trailing-whitespace
  • line-too-long: 80 characters maximum per line
    • does not apply for attribute lines (#+)
  • org-indent-mode... Jury still out on that one for me. I've read
    through the docs and tinkered with it a little. The UI for it
    throws me off a bit, as I'm not used to it. I'm happy to defer if
    that is more commonly used than not. On these lines, it could be
    added to the #+STARTUP: header as indent, so I wonder if maybe
    we generate an auto-insert template for doc files. This would allow
    us to specify it without hijacking the users configuration. Thoughts?
  • Trailing whitespace should be eliminated - so agree with this one,
    somewhat strongly even.
  • 80 character maximum is a good standard. Weird that it still exists
    in the age of HiDPI monitors and larger resolutions, but in terminal
    modes, definitely makes sense. Lowest common denominator and stuff.
    Agree with this also

Blocks (#+begin_ ... #+end_)

Mostly skipped, begin and end are checked for whitespace.

Yep. I'm all for whitespace cleanup. There is even a built-in
library for that kind of thing.

Attributes (#+, that are not blocks)

  • duplicate-attribute: multiple attribute lines of the same type
  • avoid-capslock-metadata: (#+TITLE \rightarrow #+title)
  • Yes, duplicates are "bad". I wonder if there is a use case though?
  • avoid capslock metadata? why? Consistency is a thing, and I'm 110%
    in for consistency, but sometimes Emacs inserts things like that in
    ALL CAPS and sometimes it doesn't. I typically use the all caps
    versions, but that's probably just an old habit more than anything.
    We should decide on a standard, and I'm curious on your thoughts
    about not using the all caps versions. As I think about it, for me,
    the all caps versions make them stand out a bit more than lower case
    versions. Kinda makes the point this is metadata not part of the
    content of the file.

Word- or Structure-based

Comments (# )

  • fixme: TODO and FIX ME comments (# TODO, # FIX ME, # FIXME)

Agree.

Links

  • broken-link: A link with broken syntax
  • prefer-info-link-over-org-link: link to info file instead of other
    org documents

Yes!! Absolutely!

Branding

  • crafted-emacs-branding: prefer Crafted Emacs (or crafted-emacs
    for repo reference)

Agree.

@jvdydev
Copy link
Contributor Author

jvdydev commented Jul 29, 2023

Thanks for the lengthy feedback.

Section-based

What is the depth at which we warn about nested lists? Or are we saying _no nested lists_?

I originally was going to say "no nested lists" (when I started implementing the prototype),
because it simplified the rules and worked well in the module docs.

Lists get interpreted similarly to the headings when exporting, so I figured these should be limited in some capacity (although same applies for headings).
I don't think "no nested lists" is the solution, instead looking at the overall nesting structure is probably a better point (cause really, we want an overall document with somewhat-flat-but-useful hierarchy).

What is the depth at which we warn about nested headings?
Current info files seem to allow (at least) 3 levels (usually noted as x.y.z numbering)

See above, but 3 seems good, the modules should be 2 as they are already under a heading in the main crafted-emacs.org file. That we can maybe use file-local variables or similar for. There is a caveat where 3 headings should still allow a flat list (e.g. crafted-org.org has 2 headings + 1 external + 1 list).

So I'd say total 4 levels of nesting, maximum 3 headings deep (I'll do more testing on that as I (re)implement rules etc.).

leading-whitespace

The reasoning for replacing leading whitespace with org-indent-mode for me was:

  • I was already using org-indent-mode
  • Managing the whitespace reminded me a little too much of python
    and overall felt like more work to maintain

Obviously I am not locked into using org-indent-mode personally, however I think we should either enforce no leading whitespace + org-indent-mode or some sort of indentation rules, for consistency across files and sections.

I think adding org-indent-mode automatically for file-local editing might be a compromise, however I'm open for abandoning the org-indent-mode idea as well (again, we probably would need rules for indentation instead).

line-too-long

80 characters/columns is quite nice I find, especially helps when reading (apparently many things are laid out even tighter with 60 columns for readability, but 80 seems like a reasonable middle-ground).

Attribute duplicates

The duplicate seems ... very unlikely.

I was mainly inspired by code linting tools (which often have some code duplication detection). I don't think there is necessarily a need for it and org shouldn't care about it either when exporting (although I haven't tested that).

Attribute caps

As for no-caps on attribute, that was more a rule I put in for any form of consistency that already matched up with what crafted-emacs.org had. I'm open to instead enforce capslock metadata instead (or other forms).

I agree that using caps makes them stand out from the text (and attribute vals) more, so I think enforcing caps might be a good way to go.

I updated some of the post above to show more alternative options for discussion.

@jvdydev
Copy link
Contributor Author

jvdydev commented Sep 11, 2023

So ... small update as I've made new discoveries.

Turns out I won't need to parse the org file myself because there is org-element-parse-buffer which does about what it sounds like (parse the org-mode buffer, return AST).

Additionally, turns out it's part of org-lint (because of course it has its own linter, how could I have not expected that).
It's mainly there to warn of outdated org stuff (at least from what I can grasp), but custom linting functions can be added to it.

I see two potential paths from here:

  • Ignore org-lint and build the UI (as shown above) using org-element-parse-buffer
  • Use org-lint, meaning "our" linter reduces to adding a few linting functions (those still have to go through the AST, it's basically a multi-pass linter)

The obvious choice would be option 2 ("more" built-in, probably less code).
I do think there may be a few points for rolling our own wrapper with org-element-parse-buffer:

  • potential performance improvements because we don't traverse the tree multiple times (every linting rule traverses the tree separately in org-lint)
  • integration with other mechanisms (like flymake/flycheck) or auto-fixing minor issues would be easier as we don't have to gut org-lint's internal state (there is no public way of viewing the results from my understanding right now)
  • we have more control over how the data is rendered (I personally prefer the compile-buffer way of displaying linting things, org-lint use tabulated-list-mode)

As one can probably read from my points, I would prefer to roll a more custom solution, but I think this is a good point for discussion.

@jvdydev jvdydev added the discussion Discuss things, make decisions label Sep 11, 2023
@jeffbowman
Copy link
Contributor

I think you should do what makes you happiest. I would be interested in the flymake/flycheck integration, that would be super cool. I'm not as concerned about multi-pass as a solution, that does raise some concerns about conflicting rules, but as long as we can avoid those that should be low risk. In theory, we are running the linter "once in a while", like as part of the exporter or just before it, maybe in a make file or something. Obviously, a flymake/flycheck integration would run it more frequently, but could be prioritized during idle time (in theory).

All that said, faster and more efficient is always better, even when it doesn't seem to matter. (Another example of this would be Emacs startup time, we'd prefer to minimize that as much as possible, even in the face of common usage patterns where either Emacs is started and runs for days, or using Emacs in server mode to accomplish the same result).

@jvdydev jvdydev changed the title [craftedv2beta] Linter/Formatter for Org documentation [v2] Linter/Formatter for Org documentation Sep 22, 2023
@jvdydev jvdydev changed the title [v2] Linter/Formatter for Org documentation Linter/Formatter for Org documentation Oct 20, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
discussion Discuss things, make decisions documentation Improvements or additions to documentation enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants