Skip to content

Commit

Permalink
Merge branch 'main' into main
Browse files Browse the repository at this point in the history
  • Loading branch information
mpskex authored Oct 20, 2023
2 parents a1f33db + 9b798f8 commit eb78e9c
Show file tree
Hide file tree
Showing 256 changed files with 4,063 additions and 1,075 deletions.
27 changes: 27 additions & 0 deletions .github/workflows/publish_release.yml
Original file line number Diff line number Diff line change
Expand Up @@ -19,3 +19,30 @@ jobs:
with:
pypi_token: ${{ secrets.LLAMA_INDEX_PYPI_TOKEN }}
ignore_dev_requirements: "yes"

- name: Create GitHub Release
id: create_release
uses: actions/create-release@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }} # This token is provided by Actions, you do not need to create your own token
with:
tag_name: ${{ github.ref }}
release_name: ${{ github.ref }}
draft: false
prerelease: false

- name: Get Asset name
run: |
export PKG=$(ls dist/ | grep tar)
set -- $PKG
echo "name=$1" >> $GITHUB_ENV
- name: Upload Release Asset (sdist) to GitHub
id: upload-release-asset
uses: actions/upload-release-asset@v1
env:
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
upload_url: ${{ steps.create_release.outputs.upload_url }}
asset_path: dist/${{ env.name }}
asset_name: ${{ env.name }}
asset_content_type: application/zip
13 changes: 11 additions & 2 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -16,15 +16,24 @@ repos:
- id: mixed-line-ending
- id: trailing-whitespace
- repo: https://github.com/charliermarsh/ruff-pre-commit
rev: v0.0.292
rev: v0.1.0

hooks:
- id: ruff
args: [--fix, --exit-non-zero-on-fix]
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 23.9.1
rev: 23.10.0
hooks:
- id: black-jupyter
name: black-src
exclude: docs/
- repo: https://github.com/psf/black-pre-commit-mirror
rev: 23.10.0
hooks:
- id: black-jupyter
name: black-docs
files: docs/
args: [--line-length=79]
- repo: https://github.com/pappasam/toml-sort
rev: v0.23.1
hooks:
Expand Down
24 changes: 24 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,32 @@

### Bug Fixes / Nits

- Fixed missing spaces in prompt templates (#8190)

## [0.8.47] - 2023-10-19

### New Features

- add response synthesis to text-to-SQL (#8196)
- Added support for `LLMRailsEmbeddings` (#8169)
- Inferring MPS device with PyTorch (#8195)
- Consolidated query/text prepending (#8189)

## [0.8.46] - 2023-10-18

### New Features

- Add fine-tuning router support + embedding selector (#8174)
- add more document converters (#8156)

### Bug Fixes / Nits

- Add normalization to huggingface embeddings (#8145)
- Improve MyScale Hybrid Search (#8159)
- Fixed duplicate `FORMAT_STR` being inside prompt (#8171)
- Added: support for output_kwargs={'max_colwidth': xx} for PandasQueryEngine (#8110)
- Minor fix in the description for an argument in cohere llm (#8163)
- Fix Firestore client info (#8166)

## [0.8.45] - 2023-10-13

Expand Down
3 changes: 2 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,8 @@ help: ## Show all Makefile targets.
@grep -E '^[a-zA-Z_-]+:.*?## .*$$' $(MAKEFILE_LIST) | awk 'BEGIN {FS = ":.*?## "}; {printf "\033[33m%-30s\033[0m %s\n", $$1, $$2}'

format: ## Run code autoformatters (black).
black .
pre-commit install
pre-commit run black-jupyter --all-files

lint: ## Run linters: pre-commit (black, ruff, codespell) and mypy
pre-commit install && pre-commit run --all-files
Expand Down
26 changes: 20 additions & 6 deletions docs/core_modules/data_modules/index/index_progress_bars.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,9 @@
"outputs": [],
"source": [
"# Load documents\n",
"documents = SimpleDirectoryReader(\"../../../examples/data/paul_graham\").load_data()"
"documents = SimpleDirectoryReader(\n",
" \"../../../examples/data/paul_graham\"\n",
").load_data()"
]
},
{
Expand Down Expand Up @@ -226,11 +228,15 @@
"source": [
"llm_chatgpt = OpenAI(temperature=0, model=\"gpt-3.5-turbo\")\n",
"\n",
"service_context = ServiceContext.from_defaults(llm=llm_chatgpt, chunk_size=1024)\n",
"service_context = ServiceContext.from_defaults(\n",
" llm=llm_chatgpt, chunk_size=1024\n",
")\n",
"\n",
"print(\"\\nDocumentSummaryIndex with show_progress=True\\n\")\n",
"response_synthesizer = get_response_synthesizer(\n",
" response_mode=\"tree_summarize\", use_async=True, service_context=service_context\n",
" response_mode=\"tree_summarize\",\n",
" use_async=True,\n",
" service_context=service_context,\n",
")\n",
"DocumentSummaryIndex.from_documents(\n",
" documents,\n",
Expand Down Expand Up @@ -556,16 +562,24 @@
"llm = MockLLM(max_tokens=256)\n",
"service_context = ServiceContext.from_defaults(llm=llm)\n",
"TreeIndex.from_documents(\n",
" documents, service_context=service_context, show_progress=True, use_async=True\n",
" documents,\n",
" service_context=service_context,\n",
" show_progress=True,\n",
" use_async=True,\n",
")\n",
"\n",
"print(\"\\nTreeIndex with show_progress=True, use_async=False\\n\")\n",
"TreeIndex.from_documents(\n",
" documents, service_context=service_context, show_progress=True, use_async=False\n",
" documents,\n",
" service_context=service_context,\n",
" show_progress=True,\n",
" use_async=False,\n",
")\n",
"\n",
"print(\"\\nTreeIndex with show_progress=False, use_async=True\\n\")\n",
"TreeIndex.from_documents(documents, service_context=service_context, use_async=True)\n",
"TreeIndex.from_documents(\n",
" documents, service_context=service_context, use_async=True\n",
")\n",
"\n",
"print(\"\\nTreeIndex with show_progress=False, use_async=False\\n\")\n",
"TreeIndex.from_documents(documents, service_context=service_context)"
Expand Down
36 changes: 27 additions & 9 deletions docs/core_modules/data_modules/index/vector_store_guide.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,9 @@
"from llama_index import VectorStoreIndex, SimpleDirectoryReader\n",
"\n",
"# Load documents and build index\n",
"documents = SimpleDirectoryReader(\"../../examples/data/paul_graham\").load_data()\n",
"documents = SimpleDirectoryReader(\n",
" \"../../examples/data/paul_graham\"\n",
").load_data()\n",
"index = VectorStoreIndex.from_documents(documents)"
]
},
Expand All @@ -63,16 +65,22 @@
"\n",
"# init pinecone\n",
"pinecone.init(api_key=\"<api_key>\", environment=\"<environment>\")\n",
"pinecone.create_index(\"quickstart\", dimension=1536, metric=\"euclidean\", pod_type=\"p1\")\n",
"pinecone.create_index(\n",
" \"quickstart\", dimension=1536, metric=\"euclidean\", pod_type=\"p1\"\n",
")\n",
"\n",
"# construct vector store and customize storage context\n",
"storage_context = StorageContext.from_defaults(\n",
" vector_store=PineconeVectorStore(pinecone.Index(\"quickstart\"))\n",
")\n",
"\n",
"# Load documents and build index\n",
"documents = SimpleDirectoryReader(\"../../examples/data/paul_graham\").load_data()\n",
"index = VectorStoreIndex.from_documents(documents, storage_context=storage_context)"
"documents = SimpleDirectoryReader(\n",
" \"../../examples/data/paul_graham\"\n",
").load_data()\n",
"index = VectorStoreIndex.from_documents(\n",
" documents, storage_context=storage_context\n",
")"
]
},
{
Expand Down Expand Up @@ -191,7 +199,9 @@
"source": [
"from llama_index import get_response_synthesizer\n",
"from llama_index.indices.vector_store.retrievers import VectorIndexRetriever\n",
"from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine\n",
"from llama_index.query_engine.retriever_query_engine import (\n",
" RetrieverQueryEngine,\n",
")\n",
"\n",
"# build retriever\n",
"retriever = VectorIndexRetriever(\n",
Expand Down Expand Up @@ -256,8 +266,12 @@
"outputs": [],
"source": [
"from llama_index import get_response_synthesizer\n",
"from llama_index.indices.vector_store.retrievers import VectorIndexAutoRetriever\n",
"from llama_index.query_engine.retriever_query_engine import RetrieverQueryEngine\n",
"from llama_index.indices.vector_store.retrievers import (\n",
" VectorIndexAutoRetriever,\n",
")\n",
"from llama_index.query_engine.retriever_query_engine import (\n",
" RetrieverQueryEngine,\n",
")\n",
"from llama_index.vector_stores.types import MetadataInfo, VectorStoreInfo\n",
"\n",
"\n",
Expand All @@ -278,15 +292,19 @@
")\n",
"\n",
"# build retriever\n",
"retriever = VectorIndexAutoRetriever(index, vector_store_info=vector_store_info)\n",
"retriever = VectorIndexAutoRetriever(\n",
" index, vector_store_info=vector_store_info\n",
")\n",
"\n",
"# build query engine\n",
"query_engine = RetrieverQueryEngine(\n",
" retriever=retriever, response_synthesizer=get_response_synthesizer()\n",
")\n",
"\n",
"# query\n",
"response = query_engine.query(\"Tell me about two celebrities from United States\")"
"response = query_engine.query(\n",
" \"Tell me about two celebrities from United States\"\n",
")"
]
}
],
Expand Down
1 change: 1 addition & 0 deletions docs/core_modules/model_modules/embeddings/modules.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,5 +14,6 @@ maxdepth: 1
/examples/embeddings/huggingface.ipynb
/examples/embeddings/elasticsearch.ipynb
/examples/embeddings/clarifai.ipynb
/examples/embeddings/llm_rails.ipynb
/examples/embeddings/text_embedding_inference.ipynb
```
39 changes: 27 additions & 12 deletions docs/core_modules/model_modules/embeddings/usage_pattern.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,32 +109,47 @@ If you wanted to use embeddings not offered by LlamaIndex or Langchain, you can
The example below uses Instructor Embeddings ([install/setup details here](https://huggingface.co/hkunlp/instructor-large)), and implements a custom embeddings class. Instructor embeddings work by providing text, as well as "instructions" on the domain of the text to embed. This is helpful when embedding text from a very specific and specialized topic.

```python
from pydantic import PrivateAttr
from typing import Any, List
from InstructorEmbedding import INSTRUCTOR
from llama_index.embeddings.base import BaseEmbedding

class InstructorEmbeddings(BaseEmbedding):
_model: Any = PrivateAttr()
_instruction: str = PrivateAttr()

def __init__(
self,
instructor_model_name: str = "hkunlp/instructor-large",
model_name: str = "hkunlp/instructor-large",
instruction: str = "Represent the Computer Science documentation or question:",
embed_batch_size: int = 2,
**kwargs: Any,
) -> None:
self._model = INSTRUCTOR(instructor_model_name)
self._model = INSTRUCTOR(model_name)
self._instruction = instruction
super().__init__(**kwargs)
super().__init__(model_name=model_name, embed_batch_size=embed_batch_size, **kwargs)

@classmethod
def class_name(cls) -> str:
return "InstructorEmbedding"

def _get_query_embedding(self, query: str) -> List[float]:
embeddings = self._model.encode([[self._instruction, query]])
return embeddings[0]

def _get_text_embedding(self, text: str) -> List[float]:
embeddings = self._model.encode([[self._instruction, text]])
return embeddings[0]

def _get_query_embedding(self, query: str) -> List[float]:
embeddings = self._model.encode([[self._instruction, query]])
return embeddings[0]
def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
embeddings = self._model.encode([[self._instruction, text] for text in texts])
return embeddings

def _get_text_embedding(self, text: str) -> List[float]:
embeddings = self._model.encode([[self._instruction, text]])
return embeddings[0]
async def _aget_query_embedding(self, query: str) -> List[float]:
return self._get_query_embedding(query)

def _get_text_embeddings(self, texts: List[str]) -> List[List[float]]:
embeddings = self._model.encode([[self._instruction, text] for text in texts])
return embeddings
async def _aget_text_embedding(self, text: str) -> List[float]:
return self._get_text_embedding(text)
```

## Standalone Usage
Expand Down
1 change: 1 addition & 0 deletions docs/end_to_end_tutorials/finetuning.md
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ maxdepth: 1
---
Fine-tuning an Adapter </examples/finetuning/embeddings/finetune_embedding_adapter.ipynb>
Embedding Fine-tuning Guide </examples/finetuning/embeddings/finetune_embedding.ipynb>
Router Fine-tuning </examples/finetuning/router/router_finetune.ipynb>
```

**Old**
Expand Down
18 changes: 12 additions & 6 deletions docs/end_to_end_tutorials/structured_data/Airbyte_demo.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -311,7 +311,9 @@
"\n",
" import snowflake.sqlalchemy.snowdialect\n",
"\n",
" snowflake.sqlalchemy.snowdialect.SnowflakeDialect.returns_unicode_strings = True\n",
" snowflake.sqlalchemy.snowdialect.SnowflakeDialect.returns_unicode_strings = (\n",
" True\n",
" )\n",
"\n",
" # make has_table() support the `info_cache` kwarg\n",
" import snowflake.sqlalchemy.snowdialect\n",
Expand Down Expand Up @@ -474,9 +476,7 @@
" sql_database=sql_database,\n",
" tables=[\"github_issues\", \"github_comments\", \"github_users\"],\n",
")\n",
"query_str = (\n",
" \"Which issues have the most comments? Give the top 10 and use a join on url.\"\n",
")\n",
"query_str = \"Which issues have the most comments? Give the top 10 and use a join on url.\"\n",
"response = query_engine.query(query_str)\n",
"display(Markdown(f\"<b>{response}</b>\"))"
]
Expand Down Expand Up @@ -606,8 +606,14 @@
}
],
"source": [
"from llama_index.indices.struct_store.sql_query import SQLTableRetrieverQueryEngine\n",
"from llama_index.objects import SQLTableNodeMapping, ObjectIndex, SQLTableSchema\n",
"from llama_index.indices.struct_store.sql_query import (\n",
" SQLTableRetrieverQueryEngine,\n",
")\n",
"from llama_index.objects import (\n",
" SQLTableNodeMapping,\n",
" ObjectIndex,\n",
" SQLTableSchema,\n",
")\n",
"from llama_index import VectorStoreIndex\n",
"\n",
"table_node_mapping = SQLTableNodeMapping(sql_database)\n",
Expand Down
Loading

0 comments on commit eb78e9c

Please sign in to comment.