Fix typo, acknowledge more contributors

lfoppiano · Nov 29, 2023 · 4e6f989 · 4e6f989
1 parent 048eb6f
commit 4e6f989
Showing 1 changed file with 16 additions and 11 deletions.
diff --git a/README.md b/README.md
@@ -19,13 +19,13 @@ license: apache-2.0
 ## Introduction
 
 Question/Answering on scientific documents using LLMs: ChatGPT-3.5-turbo, Mistral-7b-instruct and Zephyr-7b-beta.
-The streamlit application demonstrate the implementaiton of a RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS (National Institute for Materials Science), in Tsukuba, Japan.
-Differently to most of the projects, we focus on scientific articles. 
-We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) that provide and cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
+The streamlit application demonstrates the implementation of a RAG (Retrieval Augmented Generation) on scientific documents, that we are developing at NIMS (National Institute for Materials Science), in Tsukuba, Japan.
+Different to most of the projects, we focus on scientific articles. 
+We target only the full-text using [Grobid](https://github.com/kermitt2/grobid) which provides cleaner results than the raw PDF2Text converter (which is comparable with most of other solutions).
 
 Additionally, this frontend provides the visualisation of named entities on LLM responses to extract <span stype="color:yellow">physical quantities, measurements</span> (with [grobid-quantities](https://github.com/kermitt2/grobid-quantities)) and <span stype="color:blue">materials</span> mentions (with [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors)).
 
-The conversation is kept in memory up by a buffered sliding window memory (top 4 more recent messages) and the messages are injected in the context as "previous messages".   
+The conversation is kept in memory by a buffered sliding window memory (top 4 more recent messages) and the messages are injected in the context as "previous messages".   
 
 (The image on the right was generated with https://huggingface.co/spaces/stabilityai/stable-diffusion)
 
@@ -35,9 +35,9 @@ The conversation is kept in memory up by a buffered sliding window memory (top 4
 
 ## Getting started
 
-- Select the model+embedding combination you want ot use 
+- Select the model+embedding combination you want to use 
 - Enter your API Key ([Open AI](https://platform.openai.com/account/api-keys) or [Huggingface](https://huggingface.co/docs/hub/security-tokens)). 
-- Upload a scientific article as PDF document. You will see a spinner or loading indicator while the processing is in progress. 
+- Upload a scientific article as a PDF document. You will see a spinner or loading indicator while the processing is in progress. 
 - Once the spinner stops, you can proceed to ask your questions
 
  ![screenshot2.png](docs%2Fimages%2Fscreenshot2.png)
@@ -53,9 +53,9 @@ With default settings, each question uses around 1000 tokens.
 
 ### Chunks size
 When uploaded, each document is split into blocks of a determined size (250 tokens by default). 
-This setting allow users to modify the size of such blocks. 
-Smaller blocks will result in smaller context, yielding more precise sections of the document. 
-Larger blocks will result in larger context less constrained around the question.
+This setting allows users to modify the size of such blocks. 
+Smaller blocks will result in a smaller context, yielding more precise sections of the document. 
+Larger blocks will result in a larger context less constrained around the question.
 
 ### Query mode
 Indicates whether sending a question to the LLM (Language Model) or to the vector storage. 
@@ -65,7 +65,7 @@ Indicates whether sending a question to the LLM (Language Model) or to the vecto
 ### NER (Named Entities Recognition)
 
 This feature is specifically crafted for people working with scientific documents in materials science. 
-It enables to run NER on the response from the LLM, to identify materials mentions and properties (quantities, masurements).
+It enables to run NER on the response from the LLM, to identify materials mentions and properties (quantities, measurements).
 This feature leverages both [grobid-quantities](https://github.com/kermitt2/grobid-quanities) and [grobid-superconductors](https://github.com/lfoppiano/grobid-superconductors) external services. 
 
 
@@ -78,7 +78,9 @@ To release a new version:
 
 To use docker: 
 
-- docker run `lfoppiano/document-insights-qa:latest`
+- docker run `lfoppiano/document-insights-qa:{latest_version)`
+
+- docker run `lfoppiano/document-insights-qa:latest-develop` for the latest development version 
 
 To install the library with Pypi: 
 
@@ -88,6 +90,9 @@ To install the library with Pypi:
 ## Acknolwedgement 
 
 This project is developed at the [National Institute for Materials Science](https://www.nims.go.jp) (NIMS) in Japan in collaboration with the [Lambard-ML-Team](https://github.com/Lambard-ML-Team).
+Contributed by Pedro Ortiz Suarez (@pjox), Tomoya Mato (@t29mato). 
+Thanks also to [Patrice Lopez](https://www.science-miner.com), the author of [Grobid](https://github.com/kermitt2/grobid).
+