`httpspell`

This is a spellchecker that recursively fetches HTML pages, converts them to plain text (using pandoc), and spellchecks them with hunspell. Unknown words will be printed to stdout, which makes the tool a good candidate for CI pipelines where you might want to take action when a spelling error is found on a web page.

Words that are not in the dictionary for the given language (inferred from the lang attribute of the HTML document's root element) can be added to a personal dictionary, which will mark the word as correctly spelled.

Usage

The following command will retrieve the HTML document at https://example.com, spellcheck it, and not print anything because there are no errors:
```
$ httpspell https://example.com
```
The exit code is 0.
The following command will spellcheck the README of this project as rendered by GitHub, and print a list of unknown words. Note that we set the language to en_US because GitHub declares 'en' as document language, but the installed dictionaries usually refer the a specific language variant like en_US:
```
$ httpspell https://github.com/suhlig/httpspell/blob/master/README.markdown --language en_US
suhlig
Permalink
httpspell
sloc
pandoc
hunspell
...
```
The exit code is 1.

What is not checked

When spidering a site, httpspell will skip all responses with a content-type header other than text/html (unless pointing it to file, in which case it accepts anything).
Before converting, httpspell removes the following nodes from the HTML DOM as they are not a good target for spellchecking:
- code
- pre
- Elements with spellcheck='false' (this is how HTML5 allows tagging elements as a being target for spellchecking or not)

Misc

If you produce content with kramdown (e.g. using Jekyll), an Inline Attribute List can be used to set spellcheck='false' for an element by adding this line after the element (e.g. heading):

{: spellcheck="false"}

Dictionaries

Hunspell uses the system dictionary paths; on the Mac this is ~/Library/Spelling/. Get some dictionaries as explained in the hunspell project:

$ wget -O ~/Library/Spelling/en_US.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.aff
$ wget -O ~/Library/Spelling/en_US.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/en/en_US.dic

German:

$ wget -O ~/Library/Spelling/de_DE.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.dic
$ wget -O ~/Library/Spelling/de_DE.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/de/de_DE_frami.aff

Italian (for integration tests):

$ wget -O ~/Library/Spelling/it_IT.dic https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.dic
$ wget -O ~/Library/Spelling/it_IT.aff https://cgit.freedesktop.org/libreoffice/dictionaries/plain/it_IT/it_IT.aff

Name	Name	Last commit message	Last commit date
Latest commit dependabot[bot] Bump rubocop from 1.65.0 to 1.65.1 (#268 ) Aug 7, 2024 d52730d · Aug 7, 2024 History 310 Commits
.github	.github	Upgrade to GitHub-native Dependabot	May 3, 2021
exe	exe	Add --ignorewords	Jun 1, 2024
lib/http_spell	lib/http_spell	Add --ignorewords	Jun 1, 2024
spec	spec	Add --ignorewords	Jun 1, 2024
.gitignore	.gitignore	Handle elements with separate language attribute	May 29, 2024
.mergify.yml	.mergify.yml	Add mergify	Sep 2, 2021
.rspec	.rspec	Add custom rspec config	Jun 1, 2018
.rubocop.yml	.rubocop.yml	Ignore same-page links and duplicates	May 30, 2024
.ruby-version	.ruby-version	Bump to Ruby 3.3.1	May 28, 2024
Gemfile	Gemfile	Handle elements with separate language attribute	May 29, 2024
Gemfile.lock	Gemfile.lock	Bump rubocop from 1.65.0 to 1.65.1 (#268 )	Aug 7, 2024
Guardfile	Guardfile	Handle elements with separate language attribute	May 29, 2024
README.markdown	README.markdown	Add --ignorewords	Jun 1, 2024
Rakefile	Rakefile	Bump to Ruby 3.3.1	May 28, 2024
TODO.markdown	TODO.markdown	Rename inclusion and exclusion parameters	May 31, 2024
httpspell.gemspec	httpspell.gemspec	Don't visit the same link twice	May 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

`httpspell`

Usage

What is not checked

Misc

Dictionaries

About

Releases

Packages

Contributors 4

Languages

suhlig/httpspell

Folders and files

Latest commit

History

Repository files navigation

httpspell

Usage

What is not checked

Misc

Dictionaries

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

`httpspell`

Packages