-
Notifications
You must be signed in to change notification settings - Fork 893
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
docs: document new integrations (#532)
Signed-off-by: Panos Vagenas <[email protected]>
- Loading branch information
Showing
12 changed files
with
52 additions
and
26 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,6 @@ | ||
Docling is available in [Cloudera](https://www.cloudera.com/) through the *RAG Studio* | ||
Accelerator for Machine Learning Projects (AMP). | ||
|
||
- π» [RAG Studio AMP GitHub][github] | ||
|
||
[github]: https://github.com/cloudera/CML_AMP_RAG_Studio |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,13 +1,10 @@ | ||
## Get started | ||
|
||
Docling is used by the [Data Prep Kit](https://ibm.github.io/data-prep-kit/) open-source toolkit for preparing unstructured data for LLM application development ranging from laptop scale to datacenter scale. | ||
|
||
Below you find the Data Prep Kit modules powered by Docling. | ||
|
||
## PDF ingestion to Parquet | ||
## Components | ||
### PDF ingestion to Parquet | ||
- π» [PDF-to-Parquet GitHub](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/pdf2parquet) | ||
- π [PDF-to-Parquet Docs](https://ibm.github.io/data-prep-kit/transforms/language/pdf2parquet/python/) | ||
- π [PDF-to-Parquet docs](https://ibm.github.io/data-prep-kit/transforms/language/pdf2parquet/python/) | ||
|
||
## Document chunking | ||
### Document chunking | ||
- π» [Doc Chunking GitHub](https://github.com/IBM/data-prep-kit/tree/dev/transforms/language/doc_chunk) | ||
- π [Doc Chunking Docs](https://ibm.github.io/data-prep-kit/transforms/language/doc_chunk/python/) | ||
- π [Doc Chunking docs](https://ibm.github.io/data-prep-kit/transforms/language/doc_chunk/python/) |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,9 +1,12 @@ | ||
Docling is available in [Prodigy][home] as a [Prodigy-PDF plugin][plugin] recipe. | ||
|
||
- π [Prodigy Home][home] | ||
- π [Prodigy-PDF Plugin][plugin] | ||
- π§π½βπ³ [pdf-spans.manual Recipe][recipe] | ||
More details can be found in this [blog post][blog]. | ||
|
||
- π [Prodigy home][home] | ||
- π [Prodigy-PDF plugin][plugin] | ||
- π§π½βπ³ [pdf-spans.manual recipe][recipe] | ||
|
||
[home]: https://prodi.gy/ | ||
[plugin]: https://prodi.gy/docs/plugins#pdf | ||
[recipe]: https://prodi.gy/docs/plugins#pdf-spans.manual | ||
[blog]: https://explosion.ai/blog/pdfs-nlp-structured-data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,10 @@ | ||
Docling is powering document processing in [Red Hat Enterprise Linux AI][home] (RHEL AI), | ||
enabling users to unlock the knowledge hidden in documents and present it to | ||
InstructLab's fine-tuning for aligning AI models to the user's specific data. | ||
|
||
More details can be found in this [blog post][blog]. | ||
|
||
- π [RHEL AI home][home] | ||
|
||
[home]: https://www.redhat.com/en/technologies/linux-platforms/enterprise-linux/ai | ||
[blog]: https://www.redhat.com/en/blog/docling-missing-document-processing-companion-generative-ai |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -1,11 +1,12 @@ | ||
# spaCy | ||
Docling is available in [spaCy](https://spacy.io/) as the *spaCy Layout* plugin. | ||
|
||
Docling is available in [spaCy](https://spacy.io/) as the "SpaCy Layout" plugin: | ||
More details can be found in this [blog post][blog]. | ||
|
||
- π» [SpacyLayout GitHub][github] | ||
- π [SpacyLayout Docs][docs] | ||
- π [SpacyLayout docs][docs] | ||
- π¦ [SpacyLayout PyPI][pypi] | ||
|
||
[github]: https://github.com/explosion/spacy-layout | ||
[docs]: https://github.com/explosion/spacy-layout?tab=readme-ov-file#readme | ||
[pypi]: https://pypi.org/project/spacy-layout/ | ||
[blog]: https://explosion.ai/blog/pdfs-nlp-structured-data |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
Docling is available as a text extraction backend for [txtai](https://neuml.github.io/txtai/). | ||
|
||
- π» [txtai GitHub][github] | ||
- π [txtai docs][docs] | ||
- π [txtai Docling backend][integration_docs] | ||
|
||
[github]: https://github.com/neuml/txtai | ||
[docs]: https://neuml.github.io/txtai | ||
[integration_docs]: https://neuml.github.io/txtai/pipeline/data/filetohtml/#docling |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters