codes.clj.docs/extractor

Tool to extract namespace/functions documentation from Clojure projects into indexed datalevin file.

Config `resources/config.edn`

This file has two main functions:

:db key is defined where the datalevin files will be placed.
:deps key is defined which libraries will be downloaded, parsed and indexed in the database
- This configuration is made using the same clojure.tools.deps's git coordinates with some extra keys:
  - :project/group use to overide the project group, currently extracted from the lib ORG
  - :deps/manifest :deps since we use tools.deps we need to set this on pure lein projects to force the download of the lib.
  - :project/source-paths if a lein lib uses :source-paths, like reitit, we need to add this information here to indicate to clj-kondo where are the files to analyze.

CLI

Extract and generate datalevin file

clojure -X:extract

Flow

sequenceDiagram;
autonumber
participant C as Config
participant A as Analysis
participant AD as Adapters
participant DB as Datalevin
participant CI
participant RE as Release 

C->>A: Read file "resources/config.edn"
A->>A: Parse config projects to download
Note left of A: tools.deps
A->>A: Download projects
Note left of A: clj-kondo
A->>AD: Parse projects function/docs analysis
AD->>DB: Adapts & Indexes data into datoms
DB->>CI: Bulk-transact all datoms
CI->>RE: Zip and Publish

Using

Go to this github release page, download and unzip the docs-db.zip file.

Connecting

This database depends on datalevin/datalevin {:mvn/version "0.9.3"} onwards.

Since v.0.3.0, because of the new full-text index analyzers, this database requires this minimal connection opts to be used:

(require '[datalevin.core :as d]
         '[datalevin.search-utils :as su]
         '[datalevin.interpret :refer [inter-fn]])

(defn merge-tokenizers
  "Merges the results of tokenizer a and b into one sequence."
  [tokenizer-a tokenizer-b]
  (inter-fn [^String s]
    (into (sequence (tokenizer-a s))
      (sequence (tokenizer-b s)))))

(def conn-opts
  (let [query-analyzer (su/create-analyzer
                        {:tokenizer (merge-tokenizers
                                     (inter-fn [s] [[s 0 0]])
                                     (su/create-regexp-tokenizer #"[\s:/\.;,!=?\"'()\[\]{}|<>&@#^*\\~`\-]+"))
                         :token-filters [su/lower-case-token-filter]})]
    {:search-domains {"project-name" {:query-analyzer query-analyzer}
                      "namespace-name" {:query-analyzer query-analyzer}
                      "definition-name" {:query-analyzer query-analyzer}}}))

(d/get-conn "path/to/db" nil conn-opts)

Tools used

tools.deps Download projects/libraries
clj-kondo Parses/Analyses projects data
datalevin Indexing and Storage of the data

Developing

Repl

clojure -M:dev:nrepl

Tests

clojure -M:dev:test

Build

clojure -T:build uberjar

Lint

clojure -M:clojure-lsp diagnostics
clojure -M:clojure-lsp clean-ns
clojure -M:clojure-lsp format

Other iterations

https://github.com/rafaeldelboni/clojure-document-extractor

License

This is free and unencumbered software released into the public domain.
For more information, please refer to http://unlicense.org

Name		Name	Last commit message	Last commit date
Latest commit History 71 Commits
.clj-kondo		.clj-kondo
.github/workflows		.github/workflows
.lsp		.lsp
dev		dev
resources		resources
src/codes/clj/docs/extractor		src/codes/clj/docs/extractor
test/codes/clj/docs/extractor		test/codes/clj/docs/extractor
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
build.clj		build.clj
deps.edn		deps.edn
pom.xml		pom.xml
tests.edn		tests.edn

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

codes.clj.docs/extractor

Config `resources/config.edn`

CLI

Extract and generate datalevin file

Flow

Using

Connecting

Tools used

Developing

Repl

Tests

Build

Lint

Other iterations

License

About

Uh oh!

Releases 12

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

clj-codes/docs.extractor

Folders and files

Latest commit

History

Repository files navigation

codes.clj.docs/extractor

Config resources/config.edn

CLI

Extract and generate datalevin file

Flow

Using

Connecting

Tools used

Developing

Repl

Tests

Build

Lint

Other iterations

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 12

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Config `resources/config.edn`

Packages