diff --git a/.github/.is_A b/.github/.is_A
deleted file mode 100644
index 8b137891791fe..0000000000000
--- a/.github/.is_A
+++ /dev/null
@@ -1 +0,0 @@
-
diff --git a/.github/1500x667new.gif b/.github/1500x667new.gif
deleted file mode 100644
index 7560375eb4424..0000000000000
Binary files a/.github/1500x667new.gif and /dev/null differ
diff --git "a/.github/1500\321\205667.gif" "b/.github/1500\321\205667.gif"
deleted file mode 100644
index c022c0d59a7c2..0000000000000
Binary files "a/.github/1500\321\205667.gif" and /dev/null differ
diff --git a/.github/2.0/1.xvs2.0BaseExecutor.svg b/.github/2.0/1.xvs2.0BaseExecutor.svg
new file mode 100644
index 0000000000000..c59f580311fb5
--- /dev/null
+++ b/.github/2.0/1.xvs2.0BaseExecutor.svg
@@ -0,0 +1,40 @@
+
\ No newline at end of file
diff --git a/.github/2.0/cookbooks/CleanCode.md b/.github/2.0/cookbooks/CleanCode.md
new file mode 100644
index 0000000000000..0cf5fadc31457
--- /dev/null
+++ b/.github/2.0/cookbooks/CleanCode.md
@@ -0,0 +1,84 @@
+# Temporary Cookbook on Clean Code
+
+Jina is designed as a lean and efficient framework. Solutions built on top of Jina also mean to be so. Here are some
+tips to help you write clean & beautiful code.
+
+
+
+
+
+
+
+
+1. `from jina import Document, DocumentArray, Executor, Flow, requests` is all you need. Copy-paste it as the first line of your code.
+
+1. No need to implement `__init__` if your `Executor` does not contain initial states.
+
+ ✅ Do:
+ ```python
+ from jina import Executor
+
+ class MyExecutor(Executor):
+ def foo(self, **kwargs):
+ ...
+ ```
+ 😔 Don't:
+ ```python
+ from jina import Executor
+
+ class MyExecutor(Executor):
+ def __init__(**kwargs):
+ super().__init__(**kwargs)
+
+ def foo(self, **kwargs):
+ ...
+ ```
+
+1. Use `@requests` without specifying `on=` if your function mean to work on all requests. You can use it for catching all requests that are not for this Executor.
+
+ ✅ Do:
+ ```python
+ from jina import Executor, requests
+
+ class MyExecutor(Executor):
+
+ @requests
+ def _skip_all(self, **kwargs):
+ pass
+ ```
+ 😔 Don't:
+ ```python
+ from jina import Executor
+
+ class MyExecutor(Executor):
+ @requests(on='/index')
+ def _skip_index(self, **kwargs):
+ pass
+
+ @requests(on='/search')
+ def _skip_search(self, **kwargs):
+ pass
+ ```
+
+1. Fold unnecessary arguments into `**kwargs`, only get what you need.
+
+ ✅ Do:
+ ```python
+ from jina import Executor, requests
+
+ class MyExecutor(Executor):
+
+ @requests
+ def foo_need_pars_only(self, parameters, **kwargs):
+ print(parameters)
+ ```
+ 😔 Don't:
+ ```python
+ from jina import Executor, requests
+
+ class MyExecutor(Executor):
+
+ @requests
+ def foo_need_pars_only(self, docs, parameters, docs_matrix, groundtruths_matrix, **kwargs):
+ print(parameters)
+ ```
\ No newline at end of file
diff --git a/.github/2.0/cookbooks/Document.md b/.github/2.0/cookbooks/Document.md
new file mode 100644
index 0000000000000..662cb5d3d7c98
--- /dev/null
+++ b/.github/2.0/cookbooks/Document.md
@@ -0,0 +1,612 @@
+Document, Executor, Flow are three fundamental concepts in Jina.
+
+- [**Document**](Document.md) is the basic data type in Jina;
+- [**Executor**](Executor.md) is how Jina processes Documents;
+- [**Flow**](Flow.md) is how Jina streamlines and scales Executors.
+
+*Learn them all, nothing more, you are good to go.*
+
+---
+
+# Cookbook on `Document`/`DocumentArray` 2.0 API
+
+`Document` is the basic data type that Jina operates with. Text, picture, video, audio, image, 3D mesh, they are
+all `Document` in Jina.
+
+`DocumentArray` is a sequence container of `Document`. It is the first-class citizen of `Executor`, serving as the input
+& output.
+
+One can say `Document` to Jina is like `np.float` to Numpy, then `DocumentArray` is like `np.ndarray`.
+
+
+
+Table of Contents
+
+- [Minimum working example](#minimum-working-example)
+- [`Document` API](#document-api)
+ - [`Document` Attributes](#document-attributes)
+ - [Construct `Document`](#construct-document)
+ - [Exclusivity of `doc.content`](#exclusivity-of-doccontent)
+ - [Conversion between `doc.content`](#conversion-between-doccontent)
+ - [Construct with Multiple Attributes](#construct-with-multiple-attributes)
+ - [Construct from Dict or JSON String](#construct-from-dict-or-json-string)
+ - [Construct from Another `Document`](#construct-from-another-document)
+ - [Construct from JSON, CSV, `ndarray` and Files](#construct-from-json-csv-ndarray-and-files)
+ - [Serialize `Document`](#serialize-document)
+ - [Add Recursion to `Document`](#add-recursion-to-document)
+ - [Recursive Attributes](#recursive-attributes)
+ - [Visualize `Document`](#visualize-document)
+ - [Add Relevancy to `Document`](#add-relevancy-to-document)
+ - [Relevance Attributes](#relevance-attributes)
+- [`DocumentArray` API](#documentarray-api)
+ - [Construct `DocumentArray`](#construct-documentarray)
+ - [Persistence via `save()`/`load()`](#persistence-via-saveload)
+ - [Access Element](#access-element)
+ - [Sort Elements](#sort-elements)
+ - [Filter Elements](#filter-elements)
+ - [Use `itertools` on `DocumentArray`](#use-itertools-on-documentarray)
+ - [Get Attributes in Bulk](#get-attributes-in-bulk)
+
+
+
+## Minimum working example
+
+```python
+from jina import Document
+
+d = Document()
+```
+
+## `Document` API
+
+### `Document` Attributes
+
+A `Document` object has the following attributes, which can be put into the following categories:
+
+| | |
+|---|---|
+| Content attributes | `.buffer`, `.blob`, `.text`, `.uri`, `.content`, `.embedding` |
+| Meta attributes | `.id`, `.weight`, `.mime_type`, `.location`, `.tags`, `.offset`, `.modality` |
+| Recursive attributes | `.chunks`, `.matches`, `.granularity`, `.adjacency` |
+| Relevance attributes | `.score`, `.evaluations` |
+
+### Construct `Document`
+
+##### Content Attributes
+
+| | |
+| --- | --- |
+| `doc.buffer` | The raw binary content of this document |
+| `doc.blob` | The `ndarray` of the image/audio/video document |
+| `doc.text` | The text info of the document |
+| `doc.uri` | A uri of the document could be: a local file path, a remote url starts with http or https or data URI scheme |
+| `doc.content` | One of the above non-empty field |
+| `doc.embedding` | The embedding `ndarray` of this Document |
+
+You can assign `str`, `ndarray`, `buffer`, `uri` to a `Document`.
+
+```python
+from jina import Document
+import numpy as np
+
+d1 = Document(content='hello')
+d2 = Document(content=b'\f1')
+d3 = Document(content=np.array([1, 2, 3]))
+d4 = Document(content='https://static.jina.ai/logo/core/notext/light/logo.png')
+```
+
+```text
+
+
+
+
+```
+
+The content will be automatically assigned to one of `text`, `buffer`, `blob`, `uri` fields, `id` and `mime_type` are
+auto-generated when not given.
+
+In Jupyter notebook or use `.plot()`, you can get the visualization of a `Document` object.
+
+
+
+#### Exclusivity of `doc.content`
+
+![](../doc.content.svg?raw=true)
+
+Note that one `Document` can only contain one type of `content`: it is one of `text`, `buffer`, `blob`, `uri`.
+Setting `text` first and then set `uri` will clear the `text field.
+
+```python
+d = Document(text='hello world')
+d.uri = 'https://jina.ai/'
+assert not d.text # True
+
+d = Document(content='https://jina.ai')
+assert d.uri == 'https://jina.ai' # True
+assert not d.text # True
+d.text = 'hello world'
+
+assert d.content == 'hello world' # True
+assert not d.uri # True
+```
+
+#### Conversion between `doc.content`
+
+You can use the following methods to convert between `.uri`, `.text`, `.buffer`, `.blob`:
+
+```python
+doc.convert_buffer_to_blob()
+doc.convert_blob_to_buffer()
+doc.convert_uri_to_buffer()
+doc.convert_buffer_to_uri()
+doc.convert_text_to_uri()
+doc.convert_uri_to_text()
+```
+
+You can convert a URI to data URI (a data in-line URI scheme) using `doc.convert_uri_to_datauri()`. This will fetch the
+resource and make it inline.
+
+In particular, when you work with the image `Document`, there are some extra helpers that enables more conversion.
+
+```python
+doc.convert_image_buffer_to_blob()
+doc.convert_image_blob_to_uri()
+doc.convert_image_uri_to_blob()
+doc.convert_image_datauri_to_blob()
+```
+
+##### Set Embedding
+
+Embedding is the high-dimensional representation of a `Document`. You can assign any Numpy `ndarray` as its embedding.
+
+```python
+import numpy as np
+from jina import Document
+
+d1 = Document(embedding=np.array([1, 2, 3]))
+d2 = Document(embedding=np.array([[1, 2, 3], [4, 5, 6]]))
+```
+
+#### Construct with Multiple Attributes
+
+##### Meta Attributes
+
+| | |
+| --- | --- |
+| `doc.tags` | A structured data value, consisting of field which map to dynamically typed values |
+| `doc.id` | A hexdigest that represents a unique document ID |
+| `doc.weight` | The weight of this document |
+| `doc.mime_type` | The mime type of this document |
+| `doc.location` | The position of the doc, could be start and end index of a string; could be x,y (top, left) coordinate of an image crop; could be timestamp of an audio clip |
+| `doc.offset` | The offset of this doc in the previous granularity document|
+| `doc.modality` | An identifier to the modality this document belongs to|
+
+You can assign multiple attributes in the constructor via:
+
+```python
+from jina import Document
+
+d = Document(content='hello',
+ uri='https://jina.ai',
+ mime_type='text/plain',
+ granularity=1,
+ adjacency=3,
+ tags={'foo': 'bar'})
+```
+
+```text
+
+```
+
+#### Construct from Dict or JSON String
+
+You can build a `Document` from `dict` or a JSON string.
+
+```python
+from jina import Document
+import json
+
+d = {'id': 'hello123', 'content': 'world'}
+d1 = Document(d)
+
+d = json.dumps({'id': 'hello123', 'content': 'world'})
+d2 = Document(d)
+```
+
+##### Parsing Unrecognized Fields
+
+Unrecognized fields in Dict/JSON string are automatically put into `.tags` field.
+
+```python
+from jina import Document
+
+d1 = Document({'id': 'hello123', 'foo': 'bar'})
+```
+
+```text
+
+```
+
+You can use `field_resolver` to map the external field name to `Document` attributes, e.g.
+
+```python
+from jina import Document
+
+d1 = Document({'id': 'hello123', 'foo': 'bar'}, field_resolver={'foo': 'content'})
+```
+
+```text
+
+```
+
+#### Construct from Another `Document`
+
+Assigning a `Document` object to another `Document` object will make a shallow copy.
+
+```python
+from jina import Document
+
+d = Document(content='hello, world!')
+d1 = d
+
+assert id(d) == id(d1) # True
+```
+
+To make a deep copy, use `copy=True`,
+
+```python
+d1 = Document(d, copy=True)
+
+assert id(d) == id(d1) # False
+```
+
+You can update a `Document` partially according to another source `Document`,
+
+```python
+from jina import Document
+
+s = Document(
+ id='🐲',
+ content='hello-world',
+ tags={'a': 'b'},
+ chunks=[Document(id='🐢')],
+)
+d = Document(
+ id='🐦',
+ content='goodbye-world',
+ tags={'c': 'd'},
+ chunks=[Document(id='🐯')],
+)
+
+# only update `id` field
+d.update(s, include_fields=('id',))
+
+# only preserve `id` field
+d.update(s, exclude_fields=('id',))
+```
+
+#### Construct from JSON, CSV, `ndarray` and Files
+
+You can also construct `Document` from common file types such as JSON, CSV, `ndarray` and text files. The following functions will give a generator of `Document`, where each `Document` object corresponds to a line/row in the original format:
+
+| | |
+| --- | --- |
+| `Document.from_ndjson()` | Yield `Document` from a line-based JSON file, each line is a `Document` object |
+| `Document.from_csv()` | Yield `Document` from a CSV file, each line is a `Document` object |
+| `Document.from_files()` | Yield `Document` from a glob files, each file is a `Document` object |
+| `Document.from_ndarray()` | Yield `Document` from a `ndarray`, each row (depending on `axis`) is a `Document` object |
+
+Using generator is sometimes less memory demanding, as it does not load build all Document objects in one shot.
+
+### Serialize `Document`
+
+You can serialize a `Document` into JSON string or Python dict or binary string via
+
+```python
+from jina import Document
+
+d = Document(content='hello, world')
+d.json()
+```
+
+```
+{
+ "id": "6a1c7f34-aef7-11eb-b075-1e008a366d48",
+ "mimeType": "text/plain",
+ "text": "hello world"
+}
+```
+
+```python
+d.dict()
+```
+
+```
+{'id': '6a1c7f34-aef7-11eb-b075-1e008a366d48', 'mimeType': 'text/plain', 'text': 'hello world'}
+```
+
+```python
+d.binary_str()
+```
+
+```
+b'\n$6a1c7f34-aef7-11eb-b075-1e008a366d48R\ntext/plainj\x0bhello world'
+```
+
+### Add Recursion to `Document`
+
+#### Recursive Attributes
+
+`Document` can be recurred in both horizontal & vertical way.
+
+| | |
+| --- | --- |
+| `doc.chunks` | The list of sub-documents of this document. They have `granularity + 1` but same `adjacency` |
+| `doc.matches` | The list of matched documents of this document. They have `adjacency + 1` but same `granularity` |
+| `doc.granularity` | The recursion "depth" of the recursive chunks structure |
+| `doc.adjacency` | The recursion "width" of the recursive match structure |
+
+You can add **chunks** (sub-document) and **matches** (neighbour-document) to a `Document` via the following ways:
+
+- Add in constructor:
+
+ ```python
+ d = Document(chunks=[Document(), Document()], matches=[Document(), Document()])
+ ```
+
+- Add to existing `Document`:
+
+ ```python
+ d = Document()
+ d.chunks = [Document(), Document()]
+ d.matches = [Document(), Document()]
+ ```
+
+- Add to existing `doc.chunks` or `doc.matches`:
+
+ ```python
+ d = Document()
+ d.chunks.append(Document())
+ d.matches.append(Document())
+ ```
+
+Note that both `doc.chunks` and `doc.matches` return `DocumentArray`, which we will introduce later.
+
+### Visualize `Document`
+
+To better see the Document's recursive structure, you can use `.plot()` function. If you are using JupyterLab/Notebook,
+all `Document` objects will be auto-rendered.
+
+
+
+### Add Relevancy to `Document`
+
+#### Relevance Attributes
+
+| | |
+| --- | --- |
+| `doc.score` | The relevance information of this document |
+| `doc.evaluations` | The evaluation information of this document |
+
+You can add relevance score to a `Document` object via:
+
+```python
+from jina import Document
+d = Document()
+d.score.value = 0.96
+d.score.description = 'cosine similarity'
+d.score.op_name = 'cosine()'
+```
+
+```text
+
+```
+
+Score information is often used jointly with `matches`. For example, you often see the indexer adding `matches` as
+follows:
+
+```python
+from jina import Document
+
+# some query document
+q = Document()
+# get match document `m`
+m = Document()
+m.score.value = 0.96
+q.matches.append(m)
+```
+
+## `DocumentArray` API
+
+`DocumentArray` is a list of `Document` objects. You can construct, delete, insert, sort, traverse a `DocumentArray`
+like a Python `list`.
+
+Methods supported by `DocumentArray`:
+
+| | |
+|--- |--- |
+| Python `list`-like interface | `__getitem__`, `__setitem__`, `__delitem__`, `__len__`, `insert`, `append`, `reverse`, `extend`, `pop`, `remove`, `__iadd__`, `__add__`, `__iter__`, `__clear__`, `sort` |
+| Persistence | `save`, `load` |
+| Advanced getters | `get_attributes`, `get_attributes_with_docs` |
+
+### Construct `DocumentArray`
+
+One can construct a `DocumentArray` from iterable of `Document` via:
+
+```python
+from jina import DocumentArray, Document
+
+# from list
+da1 = DocumentArray([Document(), Document()])
+
+# from generator
+da2 = DocumentArray((Document() for _ in range(10)))
+
+# from another `DocumentArray`
+da3 = DocumentArray(da2)
+```
+
+### Persistence via `save()`/`load()`
+
+To save all elements in a `DocumentArray` in a JSON lines format:
+
+```python
+from jina import DocumentArray, Document
+
+da = DocumentArray([Document(), Document()])
+
+da.save('data.json')
+da1 = DocumentArray.load('data.json')
+```
+
+### Access Element
+
+You can access a `Document` in the `DocumentArray` via integer index, string `id` and `slice` indices.
+
+```python
+from jina import DocumentArray, Document
+
+da = DocumentArray([Document(id='hello'), Document(id='world'), Document(id='goodbye')])
+
+da[0]
+#
+
+da['world']
+#
+
+da[1:2]
+#
+```
+
+### Sort Elements
+
+`DocumentArray` is a subclass of `MutableSequence`, therefore you can use built-in Python `sort` to sort elements in a `DocumentArray` object, e.g.
+
+```python
+from jina import DocumentArray, Document
+
+da = DocumentArray(
+ [
+ Document(tags={'id': 1}),
+ Document(tags={'id': 2}),
+ Document(tags={'id': 3})
+ ]
+)
+
+da.sort(key=lambda d: d.tags['id'], reverse=True)
+print(da)
+```
+
+this sorts elements in `da` in-place, using `tags[id]` value in a descending manner:
+
+```text
+
+
+{'id': '6a79982a-b6b0-11eb-8a66-1e008a366d49', 'tags': {'id': 3.0}},
+{'id': '6a799744-b6b0-11eb-8a66-1e008a366d49', 'tags': {'id': 2.0}},
+{'id': '6a799190-b6b0-11eb-8a66-1e008a366d49', 'tags': {'id': 1.0}}
+```
+
+### Filter Elements
+
+You can use [built-in Python `filter()`](https://docs.python.org/3/library/functions.html#filter) to filter elements in a `DocumentArray` object, e.g.
+
+```python
+from jina import DocumentArray, Document
+
+da = DocumentArray([Document() for _ in range(6)])
+
+for j in range(6):
+ da[j].score.value = j
+
+for d in filter(lambda d: d.score.value > 2, da):
+ print(d)
+```
+
+```text
+
+
+
+```
+
+You can build a `DocumentArray` object from the filtered result:
+
+```python
+from jina import DocumentArray, Document
+
+da = DocumentArray([Document(weight=j) for j in range(6)])
+da2 = DocumentArray(list(filter(lambda d: d.weight > 2, da)))
+
+print(da2)
+```
+
+```text
+DocumentArray has 3 items:
+{'id': '3bd0d298-b6da-11eb-b431-1e008a366d49', 'weight': 3.0},
+{'id': '3bd0d324-b6da-11eb-b431-1e008a366d49', 'weight': 4.0},
+{'id': '3bd0d392-b6da-11eb-b431-1e008a366d49', 'weight': 5.0}
+```
+
+### Use `itertools` on `DocumentArray`
+
+As `DocumenArray` is an `Iterable`, you can also use [Python built-in `itertools` module](https://docs.python.org/3/library/itertools.html) on it. This enables advanced "iterator algebra" on the `DocumentArray`.
+
+### Get Attributes in Bulk
+
+`DocumentArray` implements powerful getters that allows one to fetch multiple attributes from the documents it contains
+in one-shot.
+
+```python
+import numpy as np
+
+from jina import DocumentArray, Document
+
+da = DocumentArray([Document(id=1, text='hello', embedding=np.array([1, 2, 3])),
+ Document(id=2, text='goodbye', embedding=np.array([4, 5, 6])),
+ Document(id=3, text='world', embedding=np.array([7, 8, 9]))])
+
+da.get_attributes('id', 'text', 'embedding')
+```
+
+```text
+[('1', '2', '3'), ('hello', 'goodbye', 'world'), (array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9]))]
+```
+
+This can be very useful when extracting a batch of embeddings,
+
+```python
+import numpy as np
+
+np.stack(da.get_attributes('embedding'))
+```
+
+```text
+[[1 2 3]
+ [4 5 6]
+ [7 8 9]]
+```
diff --git a/.github/2.0/cookbooks/Executor.md b/.github/2.0/cookbooks/Executor.md
new file mode 100644
index 0000000000000..5a49e7c2ecc69
--- /dev/null
+++ b/.github/2.0/cookbooks/Executor.md
@@ -0,0 +1,475 @@
+Document, Executor, Flow are three fundamental concepts in Jina.
+
+- [**Document**](Document.md) is the basic data type in Jina;
+- [**Executor**](Executor.md) is how Jina processes Documents;
+- [**Flow**](Flow.md) is how Jina streamlines and scales Executors.
+
+*Learn them all, nothing more, you are good to go.*
+
+---
+
+# Cookbook on `Executor` 2.0 API
+
+
+
+Table of Contents
+
+- [Minimum working example](#minimum-working-example)
+ - [Pure Python](#pure-python)
+ - [With YAML](#with-yaml)
+- [Executor API](#executor-api)
+ - [Inheritance](#inheritance)
+ - [`__init__` Constructor](#__init__-constructor)
+ - [Method naming](#method-naming)
+ - [`@requests` decorator](#requests-decorator)
+ - [Default binding: `@requests` without `on=`](#default-binding-requests-without-on)
+ - [Multiple binding: `@requests(on=[...])`](#multiple-binding-requestson)
+ - [No binding](#no-binding)
+ - [Method Signature](#method-signature)
+ - [Method Arguments](#method-arguments)
+ - [Method Returns](#method-returns)
+ - [YAML Interface](#yaml-interface)
+ - [Load and Save Executor's YAML config](#load-and-save-executors-yaml-config)
+- [Executor Built-in Features](#executor-built-in-features)
+ - [1.x vs 2.0](#1x-vs-20)
+ - [Workspace](#workspace)
+ - [Metas](#metas)
+ - [`.metas` & `.runtime_args`](#metas--runtime_args)
+- [Migration in Practice](#migration-in-practice)
+ - [`jina hello fashion`](#jina-hello-fashion)
+ - [Encoder](#encoder)
+- [Remarks](#remarks)
+ - [Joining/Merging](#joiningmerging)
+
+
+
+## Minimum working example
+
+### Pure Python
+
+```python
+from jina import Executor, Flow, Document, requests
+
+
+class MyExecutor(Executor):
+
+ @requests
+ def foo(self, **kwargs):
+ print(kwargs)
+
+
+f = Flow().add(uses=MyExecutor)
+
+with f:
+ f.post(on='/random_work', inputs=Document(), on_done=print)
+```
+
+### With YAML
+
+`my.yml`:
+
+```yaml
+jtype: MyExecutor
+with:
+ bar: 123
+metas:
+ name: awesomeness
+ description: my first awesome executor
+requests:
+ /random_work: foo
+```
+
+```python
+from jina import Executor, Flow, Document
+
+
+class MyExecutor(Executor):
+
+ def __init__(self, bar: int, **kwargs):
+ super().__init__(**kwargs)
+ self.bar = bar
+
+ def foo(self, **kwargs):
+ print(f'foo says: {self.bar} {self.metas} {kwargs}')
+
+
+f = Flow().add(uses='my.yml')
+
+with f:
+ f.post(on='/random_work', inputs=Document(), on_done=print)
+```
+
+## Executor API
+
+- All `executor` come from `Executor` class directly.
+- An `executor` class can contain arbitrary number of functions with arbitrary names. It is a bag of functions with
+ shared state (via `self`).
+- Functions decorated by `@requests` will be invoked according to their `on=` endpoint.
+
+### Inheritance
+
+Every new executor should be inherited directly from `jina.Executor`.
+
+The 1.x inheritance tree is removed, `Executor` does not have polymorphism anymore.
+
+You can name your executor class freely.
+
+### `__init__` Constructor
+
+If your executor defines `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)`
+in the body, e.g.
+
+```python
+from jina import Executor
+
+
+class MyExecutor(Executor):
+
+ def __init__(self, foo: str, bar: int, **kwargs):
+ super().__init__(**kwargs)
+ self.bar = bar
+ self.foo = foo
+```
+
+Here, `kwargs` contains `metas` and `requests` (representing the request-to-function mapping) values from YAML config,
+and `runtime_args` injected on startup. Note that you can access their values in `__init__` body via `self.metas`
+/`self.requests`/`self.runtime_args`, or modifying their values before sending to `super().__init__()`.
+
+### Method naming
+
+`Executor`'s method can be named freely. Methods are not decorated with `@requests` are irrelevant to Jina.
+
+### `@requests` decorator
+
+`@requests` defines when a function will be invoked. It has a keyword `on=` to define the endpoint.
+
+To call an Executor's function, uses `Flow.post(on=..., ...)`. For example, given
+
+```python
+from jina import Executor, Flow, requests
+
+
+class MyExecutor(Executor):
+
+ @requests(on='/index')
+ def foo(self, **kwargs):
+ print(kwargs)
+
+ @requests(on='/random_work')
+ def bar(self, **kwargs):
+ print(kwargs)
+
+
+f = Flow().add(uses=MyExecutor)
+
+with f:
+ pass
+```
+
+Then:
+
+- `f.post(on='/index', ...)` will trigger `MyExecutor.foo`;
+- `f.post(on='/random_work', ...)` will trigger `MyExecutor.bar`;
+- `f.post(on='/blah', ...)` will throw an error, as no function bind with `/blah`;
+
+#### Default binding: `@requests` without `on=`
+
+A class method decorated with plain `@requests` (without `on=`) is the default handler for all endpoints. That means, it
+is the fallback handler for endpoints that are not found. `f.post(on='/blah', ...)` will invoke `MyExecutor.foo`
+
+```python
+from jina import Executor, requests
+
+
+class MyExecutor(Executor):
+
+ @requests
+ def foo(self, **kwargs):
+ print(kwargs)
+
+ @requests(on='/index')
+ def bar(self, **kwargs):
+ print(kwargs)
+```
+
+#### Multiple binding: `@requests(on=[...])`
+
+To bind a method with multiple endpoints, one can use `@requests(on=['/foo', '/bar'])`. This allows
+either `f.post(on='/foo', ...)` or `f.post(on='/bar', ...)` to invoke that function.
+
+#### No binding
+
+A class with no `@requests` binding plays no part in the Flow. The request will simply pass through without any processing.
+
+
+### Method Signature
+
+Class method decorated by `@request` follows the signature below:
+
+```python
+def foo(docs: Optional[DocumentArray],
+ parameters: Dict,
+ docs_matrix: List[DocumentArray],
+ groundtruths: Optional[DocumentArray],
+ groundtruths_matrix: List[DocumentArray]) -> Optional[DocumentArray]:
+ pass
+```
+
+### Method Arguments
+
+The Executor's method receive the following arguments in order:
+
+| Name | Type | Description |
+| --- | --- | --- |
+| `docs` | `Optional[DocumentArray]` | `Request.docs`. When multiple requests are available, it is a concatenation of all `Request.docs` as one `DocumentArray`. When `DocumentArray` has zero element, then it is `None`. |
+| `parameters` | `Dict` | `Request.parameters`, given by `Flow.post(..., parameters=)` |
+| `docs_matrix` | `List[DocumentArray]` | When multiple requests are available, it is a list of all `Request.docs`. On single request, it is `None` |
+| `groundtruths` | `Optional[DocumentArray]` | `Request.groundtruths`. Same behavior as `docs` |
+| `groundtruths_matrix` | `List[DocumentArray]` | Same behavior as `docs_matrix` but on `Request.groundtruths` |
+
+Note, executor's methods not decorated with `@request` do not enjoy these arguments.
+
+The arguments order is designed as common-usage-first. Not based on alphabetical order or semantic closeness.
+
+If you don't need some arguments, you can suppress it into `**kwargs`. For example:
+
+```python
+@requests
+def foo(docs, **kwargs):
+ bar(docs)
+
+
+@requests
+def foo(docs, parameters, **kwargs):
+ bar(docs)
+ bar(parameters)
+
+
+@requests
+def foo(**kwargs):
+ bar(kwargs['docs_matrix'])
+```
+
+### Method Returns
+
+Method decorated with `@request` can return `Optional[DocumentSet]`. If not `None`, then the current `Request.docs` will
+be overridden by the return value.
+
+If return is just a shallow copy of `Request.docs`, then nothing happens.
+
+### YAML Interface
+
+Executor can be load from and stored to a YAML file. The YAML file has the following format:
+
+```yaml
+jtype: MyExecutor
+with:
+ ...
+metas:
+ ...
+requests:
+ ...
+```
+
+- `jtype` is a string. Defines the class name, interchangeable with bang mark `!`;
+- `with` is a map. Defines kwargs of the class `__init__` method
+- `metas` is a map. Defines the meta information of that class, comparing to `1.x` it is reduced to the following
+ fields:
+ - `name` is a string. Defines the name of the executor;
+ - `description` is a string. Defines the description of this executor. It will be used in automatics docs UI;
+ - `workspace` is a string. Defines the workspace of the executor
+ - `py_modules` is a list of string. Defines the python dependencies of the executor.
+- `requests` is a map. Defines the mapping from endpoint to class method name.
+
+### Load and Save Executor's YAML config
+
+You can use class method `Executor.load_config` and object method `exec.save_config` to load & save YAML config as
+follows:
+
+```python
+from jina import Executor
+
+
+class MyExecutor(Executor):
+
+ def __init__(self, bar: int, **kwargs):
+ super().__init__(**kwargs)
+ self.bar = bar
+
+ def foo(self, **kwargs):
+ pass
+
+
+y_literal = """
+jtype: MyExecutor
+with:
+ bar: 123
+metas:
+ name: awesomeness
+ description: my first awesome executor
+requests:
+ /random_work: foo
+"""
+
+exec = Executor.load_config(y_literal)
+exec.save_config('y.yml')
+Executor.load_config('y.yml')
+```
+
+## Executor Built-in Features
+
+In 2.0 Executor class has few built-in features than in 1.x. The design principles are (`user` here means "Executor
+developer"):
+
+- **Do not surprise user**: keep `Executor` class as Pythonic as possible, it should be as light and less intrusive as
+ a `mixin` class:
+ - do not customize the class constructor logic;
+ - do not change its builtin interface `__getstate__`, `__setstate__`;
+ - do not add new members to the `Executor` object unless we must.
+- **Do not overpromise to user**: do not promise features that we can hardly deliver. Trying to control the interface
+ while delivering just loosely implemented features is bad for scaling the core framework. For example, `save`, `load`
+ , `on_gpu`, etc.
+
+We want to give back the programming freedom to user. If a user is a good Python programmer, he/she should pick
+up `Executor` in no time - not spending extra time on learning the implicit boilerplate as in 1.x. Plus,
+subclassing `Executor` should be easy.
+
+### 1.x vs 2.0
+
+- ❌: Completely removed. Users have to implement it on their own.
+- ✅: Preserved.
+
+| 1.x | 2.0 |
+| --- | --- |
+| `.save_config()` | ✅ |
+| `.load_config()` | ✅ |
+| `.close()` | ✅ |
+| `workspace` interface | ✅ [Refactored](#workspace). |
+| `metas` config | Moved to `self.metas.xxx`. [Number of fields are greatly reduced](#yaml-interface). |
+| `._drivers` | Refactored and moved to `self.requests.xxx`. |
+| `.save()` | ❌ |
+| `.load()` | ❌ |
+| `.logger` | ❌ |
+| Pickle interface | ❌ |
+| init boilerplates (`pre_init`, `post_init`) | ❌ |
+| Context manager interface | ❌ |
+| Inline `import` coding style | ❌ |
+
+![](1.xvs2.0%20BaseExecutor.svg)
+
+### Workspace
+
+Executor's workspace is inherited according to the following rule (`OR` is a python `or`, i.e. first thing first, if NA
+then second):
+
+![](../workspace-inherit.svg?raw=true)
+
+### Metas
+
+The meta attributes of an `Executor` object are now gathered in `self.metas`, instead of directly posing them to `self`,
+e.g. to access `name` use `self.metas.name`.
+
+### `.metas` & `.runtime_args`
+
+An `Executor` object by default contains two collections of attributes `.metas` and `.runtime_args`. They are both
+in `SimpleNamespace` type and contain some key-value information. However, they are defined and serve differently.
+
+- **`.metas` are statically defined.** "Static" means, e.g. from hardcoded value in the code, from a YAML file.
+- **`.runtime_args` are dynamically determined during runtime.** Means that you don't know the value before running
+ the `Executor`, e.g. `pea_id`, `replicas`, `replica_id`. Those values are often related to the system/network
+ environment around the `Executor`, and less about `Executor` itself.
+
+In 2.0rc1, the following fields are valid for `metas` and `runtime_args`:
+
+|||
+| --- | --- |
+| `.metas` (static values from hardcode, YAML config) | `name`, `description`, `py_modules`, `workspace` |
+| `.runtime_args` (runtime values from its containers, e.g. `Runtime`, `Pea`, `Pod`) | `name`, `description`, `workspace`, `log_config`, `quiet`, `quiet_error`, `identity`, `port_ctrl`, `ctrl_with_ipc`, `timeout_ctrl`, `ssh_server`, `ssh_keyfile`, `ssh_password`, `uses`, `py_modules`, `port_in`, `port_out`, `host_in`, `host_out`, `socket_in`, `socket_out`, `read_only`, `memory_hwm`, `on_error_strategy`, `num_part`, `uses_internal`, `entrypoint`, `docker_kwargs`, `pull_latest`, `volumes`, `host`, `port_expose`, `quiet_remote_logs`, `upload_files`, `workspace_id`, `daemon`, `runtime_backend`, `runtime_cls`, `timeout_ready`, `env`, `expose_public`, `pea_id`, `pea_role`, `noblock_on_start`, `uses_before`, `uses_after`, `parallel`, `replicas`, `polling`, `scheduling`, `pod_role`, `peas_hosts` |
+
+Note that, YAML API will ignore `.runtime_args` during save & load as they are not for statically stored.
+
+Also note that, for any other parametrization of the Executor, you can still access its constructor arguments (defined in the class `__init__`) and the request `parameters`.
+
+---
+
+## Migration in Practice
+
+### `jina hello fashion`
+
+#### Encoder
+
+Left is 1.x, right is 2.0.
+
+![img.png](../migration-fashion.png?raw=true)
+
+Line number corresponds to the 1.x code:
+
+- `L5`: change imports to top-level namespace `jina`;
+- `L8`: all executors now subclass from `Executor` class;
+- `L13-14`: there is no need to inherit from `__init__`, no signature is enforced;
+- `L20`: `.touch()` is removed; for this particular encoder as long as the seed is fixed there is no need to store;
+- `L22`: adding `@requests` to decorate the core method, changing signature to `docs, **kwargs`;
+- `L32`:
+ - the content extraction and embedding assignment are now done manually;
+ - replacing previous `Blob2PngURI` and `ExcludeQL` driver logic using `Document` built-in
+ methods `convert_blob_to_uri` and `pop`
+ - there is nothing to return, as the change is done in-place.
+
+## Remarks
+
+### Joining/Merging
+
+Combining `docs` from multiple requests is already done by the `ZEDRuntime` before feeding to Executor's function.
+Hence, simple joining is just returning this `docs`. Complicated joining should be implemented at `Document`
+/`DocumentArray`
+
+```python
+from jina import Executor, requests, Flow, Document
+
+
+class C(Executor):
+
+ @requests
+ def foo(self, docs, **kwargs):
+ # 6 docs
+ return docs
+
+
+class B(Executor):
+
+ @requests
+ def foo(self, docs, **kwargs):
+ # 3 docs
+ for idx, d in enumerate(docs):
+ d.text = f'hello {idx}'
+
+
+class A(Executor):
+
+ @requests
+ def A(self, docs, **kwargs):
+ # 3 docs
+ for idx, d in enumerate(docs):
+ d.text = f'world {idx}'
+
+
+f = Flow().add(uses=A).add(uses=B, needs='gateway').add(uses=C, needs=['pod0', 'pod1'])
+
+with f:
+ f.post(on='/some_endpoint',
+ inputs=[Document() for _ in range(3)],
+ on_done=print)
+```
+
+You can also modify the docs while merging, which is not feasible to do in 1.x, e.g.
+
+```python
+class C(Executor):
+
+ @requests
+ def foo(self, docs, **kwargs):
+ # 6 docs
+ for d in docs:
+ d.text += '!!!'
+ return docs
+```
diff --git a/.github/2.0/cookbooks/Flow.md b/.github/2.0/cookbooks/Flow.md
new file mode 100644
index 0000000000000..9632f724b564c
--- /dev/null
+++ b/.github/2.0/cookbooks/Flow.md
@@ -0,0 +1,321 @@
+Document, Executor, Flow are three fundamental concepts in Jina.
+
+- [**Document**](Document.md) is the basic data type in Jina;
+- [**Executor**](Executor.md) is how Jina processes Documents;
+- [**Flow**](Flow.md) is how Jina streamlines and scales Executors.
+
+*Learn them all, nothing more, you are good to go.*
+
+---
+
+# Cookbook on `Flow` 2.0 API
+
+
+
+Table of Contents
+
+- [Minimum working example](#minimum-working-example)
+ - [Pure Python](#pure-python)
+ - [With YAML](#with-yaml)
+- [Flow API](#flow-api)
+ - [Create a Flow](#create-a-flow)
+ - [Add Executor to a Flow](#add-executor-to-a-flow)
+ - [Create Inter & Intra Parallelism via `needs`](#create-inter--intra-parallelism-via-needs)
+ - [Decentralized Flow](#decentralized-flow)
+- [Send Data to Flow](#send-data-to-flow)
+ - [`post` method](#post-method)
+ - [Fetch Result from Flow](#fetch-result-from-flow)
+ - [Asynchronous Flow](#asynchronous-flow)
+ - [REST Interface](#rest-interface)
+
+
+
+## Minimum working example
+
+### Pure Python
+
+```python
+from jina import Flow, Document
+
+f = Flow().add(name='foo')
+
+with f:
+ f.post(on='/bar', inputs=Document(), on_done=print)
+```
+
+### With YAML
+
+`my.yml`:
+
+```yaml
+jtype: Flow
+executors:
+ - name: foo
+```
+
+```python
+from jina import Flow, Document
+
+f = Flow.load_config('my.yml')
+
+with f:
+ f.post(on='/bar', inputs=Document(), on_done=print)
+```
+
+## Flow API
+
+In Jina, Flow is how Jina streamlines and scales Executors. A `Flow` object has the following methods:
+
+| | |
+|---|---|
+|Construct| `.add()`, `.needs()`, `.needs_all()` `.inspect()`, `.gather_inspect()`, `.use_grpc_gateway`, `.use_rest_gateway` |
+|Request| `.post()`, `.index()`, `.search()`, `.update()`, `.delete()`|
+
+### Create a Flow
+
+An empty Flow can be created via:
+
+```python
+from jina import Flow
+
+f = Flow()
+```
+
+To use `f`, always open it as a content manager:
+
+```python
+with f:
+ ...
+```
+
+### Add Executor to a Flow
+
+`Flow.add()` is the method to add executor to the `Flow` object. It is often used with `uses` parameter to specify
+the [Executor](Executor.md).
+
+`uses` accepts multiple value types including class name, Docker image, (inline) YAML.
+
+```python
+from jina import Flow
+
+f = (Flow()
+ .add(uses=MyExecutor) # the class of a Jina Executor
+ .add(uses='myexecutor.yml') # YAML serialization of a Jina Executor
+ .add(uses='''
+jtype: MyExecutor
+with:
+ bar: 123
+metas:
+ name: awesomeness
+ description: my first awesome executor
+requests:
+ /random_work: foo
+ ''') #inline YAML
+ .add(uses={'jtype': 'MyBertEncoder', 'with': {'param': 1.23}})) # dict config object with __cls keyword
+```
+
+The power of Jina lies in its decentralized architecture: Each `add` creates a new Executor, and these Executors can be
+run as a local thread/process, a remote process, inside a Docker container, or even inside a remote Docker container.
+
+### Create Inter & Intra Parallelism via `needs`
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/basic-inter-intra-parallelism.ipynb)
+
+Chaining `.add()`s creates a sequential Flow. For parallelism, use the `needs` parameter:
+
+```python
+from jina import Flow
+
+f = (Flow()
+ .add(name='p1', needs='gateway')
+ .add(name='p2', needs='gateway')
+ .add(name='p3', needs='gateway')
+ .needs(['p1', 'p2', 'p3'], name='r1').plot())
+```
+
+
+
+`p1`, `p2`, `p3` now subscribe to `Gateway` and conduct their work in parallel. The last `.needs()` blocks all Executors
+until they finish their work. Note: parallelism can also be performed inside a Executor using `parallel`:
+
+```python
+
+from jina import Flow
+
+f = (Flow()
+ .add(name='p1', needs='gateway')
+ .add(name='p2', needs='gateway')
+ .add(name='p3', parallel=3)
+ .needs(['p1', 'p3'], name='r1').plot())
+```
+
+
+
+### Decentralized Flow
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/decentralized-flow.ipynb)
+
+A Flow does not have to be local-only: You can put any Executor to remote(s). In the example below, with the `host`
+keyword `gpu-exec`, is put to a remote machine for parallelization, whereas other Executors stay local. Extra file
+dependencies that need to be uploaded are specified via the `upload_files` keyword.
+
+
+
+
123.456.78.9
+
+
+```bash
+# have docker installed
+docker run --name=jinad --network=host -v /var/run/docker.sock:/var/run/docker.sock jinaai/jina:latest-daemon --port-expose 8000
+ to stop it
+docker rm -f jinad
+```
+
+
+
+### Send Data to Flow
+
+#### `post` method
+
+`post` is the core method. All 1.x methods, e.g. `index`, `search`, `update`, `delete` are just sugary syntax of `post`
+by specifying `on='/index'`, `on='/search'`, etc.
+
+```python
+def post(
+ self,
+ on: str,
+ inputs: InputType,
+ on_done: CallbackFnType = None,
+ on_error: CallbackFnType = None,
+ on_always: CallbackFnType = None,
+ parameters: Optional[dict] = None,
+ target_peapod: Optional[str] = None,
+ **kwargs,
+) -> None:
+ """Post a general data request to the Flow.
+
+ :param on: the endpoint is used for identifying the user-defined ``request_type``, labeled by ``@requests(on='/abc')``
+ :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id.
+ :param on_done: the function to be called when the :class:`Request` object is resolved.
+ :param on_error: the function to be called when the :class:`Request` object is rejected.
+ :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
+ :param target_peapod: a regex string represent the certain peas/pods request targeted
+ :param parameters: the kwargs that will be sent to the executor
+ :param kwargs: additional parameters
+ :return: None
+ """
+```
+
+Comparing to 1.x Client/Flow API, the three new arguments are:
+
+- `on`: endpoint, as explained above
+- `parameters`: the kwargs that will be sent to the executor, as explained above
+- `target_peapod`: a regex string represent the certain peas/pods request targeted
+
+### Fetch Result from Flow
+
+Once a request is done, callback functions are fired. Jina Flow implements a Promise-like interface: You can add
+callback functions `on_done`, `on_error`, `on_always` to hook different events. In the example below, our Flow passes
+the message then prints the result when successful. If something goes wrong, it beeps. Finally, the result is written
+to `output.txt`.
+
+```python
+def beep(*args):
+ # make a beep sound
+ import os
+ os.system('echo -n "\a";')
+
+
+with Flow().add() as f, open('output.txt', 'w') as fp:
+ f.index(numpy.random.random([4, 5, 2]),
+ on_done=print, on_error=beep, on_always=lambda x: fp.write(x.json()))
+```
+
+### Asynchronous Flow
+
+[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/basic-inter-intra-parallelism.ipynb)
+
+While synchronous from outside, Jina runs asynchronously under the hood: it manages the eventloop(s) for scheduling the
+jobs. If the user wants more control over the eventloop, then `AsyncFlow` can be used.
+
+Unlike `Flow`, the CRUD of `AsyncFlow` accepts input and output functions
+as [async generators](https://www.python.org/dev/peps/pep-0525/). This is useful when your data sources involve other
+asynchronous libraries (e.g. motor for MongoDB):
+
+```python
+from jina import AsyncFlow
+
+
+async def input_function():
+ for _ in range(10):
+ yield Document()
+ await asyncio.sleep(0.1)
+
+
+with AsyncFlow().add() as f:
+ async for resp in f.index(input_function):
+ print(resp)
+```
+
+`AsyncFlow` is particularly useful when Jina and another heavy-lifting job are running concurrently:
+
+```python
+async def run_async_flow_5s(): # WaitDriver pause 5s makes total roundtrip ~5s
+ with AsyncFlow().add(uses='- !WaitDriver {}') as f:
+ async for resp in f.index_ndarray(numpy.random.random([5, 4])):
+ print(resp)
+
+
+async def heavylifting(): # total roundtrip takes ~5s
+ print('heavylifting other io-bound jobs, e.g. download, upload, file io')
+ await asyncio.sleep(5)
+ print('heavylifting done after 5s')
+
+
+async def concurrent_main(): # about 5s; but some dispatch cost, can't be just 5s, usually at <7s
+ await asyncio.gather(run_async_flow_5s(), heavylifting())
+
+
+if __name__ == '__main__':
+ asyncio.run(concurrent_main())
+```
+
+`AsyncFlow` is very useful when using Jina inside a Jupyter Notebook. where it can run out-of-the-box.
+
+### REST Interface
+
+In practice, the query Flow and the client (i.e. data sender) are often physically separated. Moreover, the client may
+prefer to use a REST API rather than gRPC when querying. You can set `port_expose` to a public port and turn
+on [REST support](https://api.jina.ai/rest/) with `restful=True`:
+
+```python
+f = Flow(port_expose=45678, restful=True)
+
+with f:
+ f.block()
+```
+
+
diff --git a/.github/2.0/doc.content.svg b/.github/2.0/doc.content.svg
new file mode 100644
index 0000000000000..721c6926b8528
--- /dev/null
+++ b/.github/2.0/doc.content.svg
@@ -0,0 +1,140 @@
+
\ No newline at end of file
diff --git a/.github/2.0/migration-fashion.png b/.github/2.0/migration-fashion.png
new file mode 100644
index 0000000000000..370bd9a8650f0
Binary files /dev/null and b/.github/2.0/migration-fashion.png differ
diff --git a/.github/2.0/workspace-inherit.svg b/.github/2.0/workspace-inherit.svg
new file mode 100644
index 0000000000000..094f3cd838917
--- /dev/null
+++ b/.github/2.0/workspace-inherit.svg
@@ -0,0 +1,162 @@
+
\ No newline at end of file
diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS
index 1c3eac3f87ee0..a90da486b9506 100644
--- a/.github/CODEOWNERS
+++ b/.github/CODEOWNERS
@@ -4,7 +4,10 @@
# review when someone opens a pull request.
* @jina-ai/engineering
-# Han Xiao owns CICD and README.md
-.github @hanxiao
+.github/workflows @hanxiao
setup.py @hanxiao
-extra-requirements.txt @hanxiao
\ No newline at end of file
+extra-requirements.txt @hanxiao
+jina/__init__.py @hanxiao
+requirements.txt @hanxiao
+MANIFEST.in @hanxiao
+README.md @hanxiao
\ No newline at end of file
diff --git a/.github/banner.gif b/.github/banner.gif
deleted file mode 100644
index c022c0d59a7c2..0000000000000
Binary files a/.github/banner.gif and /dev/null differ
diff --git a/.github/i18n/README.de.md b/.github/i18n/README.de.md
deleted file mode 100644
index 58b81a3b2aa8b..0000000000000
--- a/.github/i18n/README.de.md
+++ /dev/null
@@ -1,397 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai)
-[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai)
-
-
-
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]()
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22)
-
-
-Jina ist ein durch Deep Learning gestütztes Framework um Cross- und/Multi-Modale Suchsysteme (e.g. text, images, video, audio) in der Cloud zu erstellen.
-
-⏱️ **Zeitersparnis** – Erstellen Sie ein KI-System innerhalb weniger Minuten.
-
-🧠 **Erstklassige KI Modelle** – *Das* Designmuster für neuronale Systeme, mit erstklassiger Unterstützung durch [state-of-the-art KI Modelle](https://docs.jina.ai/chapters/all_exec.html).
-
-🌌 **Universelle Suchlösung** – Skalierbares Indizieren und Suchen von beliebigen Daten. Z. B.: Videos, Bilder, lange und kurze Texte, Musik, Quellcode, usw.
-
-☁️ **Cloud Ready** - Dezentralisierte Architektur mit integrierten Cloud Native-Funktionen. Z.B.: Containervirtualisierung, Microservices, Skalierung, Sharding, Async IO, REST, gRPC.
-
-🧩 **Plug-and-play** – Einfach mit Python erweiterbar.
-
-❤️ **Mit Liebe gemacht** – Qualität steht an erster Stelle, und wird von unseren [Teams](https://jina.ai) kompromissfrei gewährleistet.
-
----
-
-
-
-
-## Inhaltsverzeichnis
-
-
-
-
-
-
-- [Installieren](#installieren)
-- [Jina "Hallo, Welt!" 👋🌍](#jina-hallo-welt-)
-- [Erste Schritte](#erste-schritte)
-- [Dokumentation](#dokumentation)
-- [Beitragend](#beitragend)
-- [Gemeinschaft](#gemeinschaft)
-- [Fahrplan](#fahrplan)
-- [Lizenz](#lizenz)
-
-
-
-## Installieren
-
-#### Aus PyPi installieren
-
-Unter Linux/MacOS mit installiertem Python >= 3.7 führen Sie einfach diesen Befehl in Ihrem Terminal aus:
-
-```bash
-pip install jina
-```
-
-So installieren Sie Jina mit zusätzlichen Abhängigkeiten, oder installieren Sie es auf Raspberry Pi[bitte beachten Sie die Dokumentationen](https://docs.jina.ai).
-
-#### ...oder Ausführen mit Docker-Container
-
-Wir bieten ein universelles Docker-Image (nur 80MB!) an, das mehrere Architekturen unterstützt (einschließlich x64, x86, arm-64/v7/v6), einfach tun
-
-```bash
-docker run jinaai/jina
-```
-
-## Jina "Hallo, Welt!" 👋🌍
-
-Als Einsteiger sind Sie eingeladen, Jinas "Hello, World" auszuprobieren - eine einfache Demo der neuronalen Bildsuche für[Mode-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Keine zusätzlichen Abhängigkeiten nötig, einfach tun:
-
-```bash
-jina hello-world
-```
-
-...oder noch einfacher für Docker-Benutzer,**keine Installation erforderlich,** einfach:
-
-```bash
-docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux
-```
-
-
-Click here to see the console output
-
-
-
-
-
-
-
-Sie lädt die Trainings- und Testdaten von Fashion-MNIST herunter; Jina wird angewiesen, 60.000 Bilder aus dem Trainingsset zu indexieren. Dann entnimmt sie nach dem Zufallsprinzip Bilder aus dem Testset als Abfragen und bittet Jina, relevante Ergebnisse abzurufen. Nach etwa 1 Minute öffnet sie eine Webseite und zeigt Ergebnisse wie dieses:
-
-
-
-
-
-Und die Umsetzung dahinter? So einfach wie es sein sollte:
-
-
-
-![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true)
-
-
-
-
-
-Alle großen Wörter, die Sie nennen können: Computer Vision, neuronale IR, Mikroservice, Nachrichtenwarteschlange, elastisch, Repliken & Scherben geschahen in nur einer Minute!
-
-Interessiert? Spielen Sie und probieren Sie verschiedene Optionen aus:
-
-```bash
-jina hello-world --help
-```
-
-[Vergewissern Sie sich, dass Sie mit unserem Jina 101 Leitfaden fortfahren](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - alle Schlüsselkonzepte von Jina in 3 Minuten verstehen!
-
-## Erste Schritte
-
-### Starten Sie ein Projekt von der Vorlage aus
-
-```bash
-pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina
-```
-
-### Tutorials
-
-
-Learn to how to use SOTA visual representation for searching Pokémon!
-
-
🚀
-
-
-
-
-
-## Dokumentation
-
-
-
-
-
-Der beste Weg, Jina gründlich kennenzulernen, ist, unsere Dokumentation zu lesen. Die Dokumentation wird bei jedem Push, Merge und Release-Ereignis des Master-Zweiges erstellt. Weitere Einzelheiten zu den folgenden Themen finden Sie in unserer Dokumentation.
-
-- [Jina Befehlszeilenschnittstelle Argumente erklärt](https://docs.jina.ai/chapters/cli/index.html)
-- [Jina Python API-Schnittstelle](https://docs.jina.ai/api/jina.html)
-- [Jina YAML-Syntax für Ausführer, Treiber und Ablauf](https://docs.jina.ai/chapters/yaml/yaml.html)
-- [Jina Protobuf-Schema](https://docs.jina.ai/chapters/proto/index.html)
-- [In Jina verwendete Umgebungsvariablen](https://docs.jina.ai/chapters/envs.html)
-- ..[und mehr](https://docs.jina.ai/index.html)
-
-Sind Sie ein "Doc"-Star? Bejaht? Kommen Sie zu uns! Wir begrüßen alle Arten von Verbesserungen an der Dokumentation
-
-[Dokumentationen für die älteren Versionen werden hier archiviert](https://github.com/jina-ai/docs/releases).
-
-## Beitragend
-
-Wir begrüßen alle Arten von Beiträgen aus der Open-Source-Gemeinschaft, von Einzelpersonen und Partnern. Ohne Ihre aktive Beteiligung wird Jina nicht erfolgreich sein.
-
-Die folgenden Ressourcen werden Ihnen helfen, einen guten ersten Beitrag zu leisten:
-
-- [Richtlinien zur Beitragsleistung](CONTRIBUTING.md)
-- [Release-Zyklen und Entwicklungsstufen](RELEASE.md)
-
-## Gemeinschaft
-
-- [Schlupfkanal](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - eine Kommunikationsplattform für Entwickler, um über Jina zu diskutieren
-- [Rundbrief der Gemeinschaft](mailto:newsletter+subscribe@jina.ai) - abonnieren Sie die neuesten Aktualisierungs-, Veröffentlichungs- und Veranstaltungsnachrichten von Jina
-- [VerlinktIn](https://www.linkedin.com/company/jinaai/) - jina AI als Unternehmen kennenlernen und Stellenangebote finden
-- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - folgen Sie uns und interagieren Sie mit uns mittels Hashtag`#JinaSearch`
-- [Unternehmen](https://jina.ai) - erfahren Sie mehr über unser Unternehmen, wir setzen uns voll und ganz für Open-Source ein!
-
-## Fahrplan
-
-[Meilensteine von GitHub](https://github.com/jina-ai/jina/milestones) den Weg zu den künftigen Verbesserungen aufzuzeigen.
-
-Wir suchen nach Partnerschaften zum Aufbau eines Open-Governance-Modells (z.B. Technischer Lenkungsausschuss) um Jina herum, das ein gesundes Open-Source-Ökosystem und eine entwicklerfreundliche Kultur ermöglicht. Wenn Sie an einer Teilnahme interessiert sind, zögern Sie nicht, uns zu kontaktieren[hello@jina.ai](mailto:hello@jina.ai).
-
-## Lizenz
-
-Urheberrecht (c) 2020 Jina AI Limited. Alle Rechte vorbehalten.
-
-Jina ist unter der Apache-Lizenz, Version 2.0, lizenziert[Siehe LIZENZ für den vollständigen Lizenztext.](LICENSE)
diff --git a/.github/i18n/README.es.md b/.github/i18n/README.es.md
deleted file mode 100644
index bd319b577a5b5..0000000000000
--- a/.github/i18n/README.es.md
+++ /dev/null
@@ -1,422 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/)
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/)
-[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina)
-
-
-
-Jina es un framework de búsqueda basado en IA que permite a los desarrolladores crear sistemas de búsqueda **cross/multi-modals** (como texto, imágenes, video, audio) en la nube.
-
-⏱️ **Ahorro de tiempo** - Inicie un sistema AI-powered en sólo unos minutos..
-
-🧠 **Modelos IA de primera clase** - *El* patrón de diseño de los sistemas de búsqueda neuronal, con soporte de primera clase para [modelos IA de última generación](https://docs.jina.ai/chapters/all_exec.html).
-
-🌌 **Búsqueda universal** - indexación y consulta a gran escala de cualquier tipo de datos en múltiples plataformas: vídeo, imagen, texto largo/corto, música, código fuente, etc.
-
-☁️ **Cloud Ready** - Arquitectura descentralizada con características propias cloud-natives: contenedorización, microservicio, escalado, sharding, async IO, REST, gRPC.
-
-🧩 **Plug & Play** - Fácilmente ampliable con la interfaz Pythonic.
-
-❤️ **Hecho con amor** - La calidad es lo primero, nunca se compromete, mantenido por un [equipo a tiempo completo, respaldado por la empresa](https://jina.ai).
-
-
-## Resumen
-
-
-
-
-
-
-- [Instalación](#instalaci%C3%B3n)
-- [Jina "Hola, mundo!" 👋🌍](#jina-hola-mundo-)
-- [Tutoriales](#tutoriales)
-- [Documentación](#documentaci%C3%B3n)
-- [Contribuyendo](#contribuyendo)
-- [Comunidad](#comunidad)
-- [Gobernanza abierta](#gobernanza-abierta)
-- [Únase](#%C3%BAnase)
-- [Licencia](#licencia)
-
-
-
-## Instalación
-
-### Con PyPi
-
-En sistemas operativos Linux/MacOS con Python >= 3.7:
-
-```bash
-pip install jina
-```
-
-Para instalar Jina con dependencias adicionales o en Raspberry Pi, [por favor revise la documentación](https://docs.jina.ai).
-
-### En un contenedor Docker
-
-Ofrecemos una imagen Docker universal con soporte para varios tipos de arquitectura (incluyendo x64, x86, arm-64/v7/v6). Simplemente funciona:
-
-```bash
-docker run jinaai/jina --help
-```
-
-## Jina "Hola, mundo!" 👋🌍
-
-Para empezar, puede probar nuestro "Hola, Mundo" - una simple demostración de búsqueda de imágenes mediante redes neuronales [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). No se necesitan dependencias adicionales, simplemente ejecute:
-
-```bash
-jina hello-world
-```
-
-...o, más fácilmente, para los usuarios de Docker, **sin necesidad de instalación**:
-
-```bash
-docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # Reemplaza "open" por "xdg-open" en Linux
-```
-
-
-Haga clic aquí para ver la salida en la consola
-
-
-
-
-
-
-
-La imagen de Docker descarga el conjunto de datos de entrenamiento y pruebas del Fashion-MNIST y le dice a Jina que indexe 60.000 imágenes de los datos de entrenamiento. La imagen de Docker selecciona muestras aleatorias de imágenes de prueba, las define como consultas y le pide a Jina que extraiga los resultados relevantes. Todo este proceso toma alrededor de 1 minuto, y eventualmente abre una página web con resultados, que se ven así:
-
-
-
-
-
-La implementación detrás de esto es simple:
-
-
-Mejore el rendimiento utilizando el prefetching y el sharding
-
-
-
-
-
-
-## Documentación
-
-
-
-
-
-La mejor manera de aprender Jina en profundidad es leyendo nuestra documentación. La documentación se construye sobre cada actualización y publicación en la rama master.
-
-#### El básico
-
-- [Utilice la API de flujo para componer su Workflow (flujo de trabajo) de búsqueda](https://docs.jina.ai/chapters/flow/index.html)
-- [Funciones de entrada y salida en Jina](https://docs.jina.ai/chapters/io/index.html)
-- [Registra y monitorea con el Dashboard gráfico de Jina](https://github.com/jina-ai/dashboard)
-- [Distribuya su Workflow(flujo de trabajo) de forma remota](https://docs.jina.ai/chapters/remote/index.html)
-- [Construye tu Pod en una imagen Docker: Cómo y por qué](https://docs.jina.ai/chapters/hub/index.html)
-
-#### Referencia
-
-- [Argumentos de la interfaz de la línea de comando(CLI)](https://docs.jina.ai/chapters/cli/index.html)
-- [interfaz Python API](https://docs.jina.ai/api/jina.html)
-- [Sintaxis YAML para Executor, Driver y Flow](https://docs.jina.ai/chapters/yaml/yaml.html)
-- [Protobuf schema](https://docs.jina.ai/chapters/proto/index.html)
-- [Las variables de entorno](https://docs.jina.ai/chapters/envs.html)
-- ... [y más](https://docs.jina.ai/index.html)
-
-¿Eres una estrella del "Doc"? ¡Únase a nosotros! Toda clase de ayuda con la documentación es bienvenida.
-
-[La documentación de las versiones anteriores está archivada aquí](https://github.com/jina-ai/docs/releases).
-
-## Contribuyendo
-
-Todo tipo de contribuciones de la comunidad de código abierto son bienvenidas, individuos y socios. Debemos nuestro éxito a su participación activa.
-
-- [Pautas para la contribución](CONTRIBUTING.md)
-- [Ciclos de publicación y etapas de desarrollo](RELEASE.md)
-
-### Colaboradores ✨
-
-
-[![All Contributors](https://img.shields.io/badge/all_contributors-66-orange.svg?style=flat-square)](#contributors-)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-## Comunidad
-
-- [Slack workspace](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - Únase al General en nuestro Slack para conocer al equipo y hacer preguntas
-- [Canal en YouTube](https://youtube.com/c/jina-ai) - regístrese para nuestros últimos tutoriales, demostraciones de lanzamiento, seminarios web y presentaciones
-- [LinkedIn](https://www.linkedin.com/company/jinaai/) - conozca Jina AI como empresa y encuentre oportunidades de trabajo
-- [![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - síganos e interactue con nosotros usando hashtag `#JinaSearch`
-- [Empresa](https://jina.ai) - aprenda más sobre nuestra empresa y cómo estamos totalmente comprometidos con el código abierto.
-
-## Gobernanza abierta
-[Marcos/milestones GitHub](https://github.com/jina-ai/jina/milestones) planee el camino para las futuras mejoras de Jina.
-
-Como parte de nuestro modelo de gobernanza abierta, alojamos [Engineering All Hands]((https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/)) de Jina publicamente. Esta reunión en Zoom tiene lugar mensualmente el segundo martes de cada mes a las 14:00-15:30 (CET). Cualquiera puede unirse mediante la siguiente invitación del calendario.
-
-- [Adicionar al Google Calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)
-- [Download .ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics)
-
-Se hará una transmisión en vivo de la reunión, que luego se publicará en nuestro [Canal de YouTube](https://youtube.com/c/jina-ai).
-
-## Únase
-
-Jina es un proyecto open-source. [Estamos contratando](https://jobs.jina.ai) desarrolladores full-stack, evangelistas, y PMs para construir el próximo ecosistema de búsqueda neural de código abierto(open-source)
-
-## Licencia
-
-Copyright (c) 2020 Jina AI Limited. All rights reserved.
-
-Jina está licenciada bajo la Licencia Apache, Version 2.0. [Ver LICENCIA para el texto completo de la licencia.](LICENSE)
diff --git a/.github/i18n/README.fr.md b/.github/i18n/README.fr.md
deleted file mode 100644
index 4e2a9b0205941..0000000000000
--- a/.github/i18n/README.fr.md
+++ /dev/null
@@ -1,390 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai)
-[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai)
-
-
-
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]()
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22)
-
-
-
-Vous souhaitez mettre en place un système de recherche fondé sur un apprentissage approfondi ? Vous êtes au bon endroit !
-
-Jina est le cadre de recherche neuronale natif des nuages alimenté par l'IA de pointe et l'apprentissage profond. Elle est soutenue à long terme par une équipe à plein temps, soutenue par le capital-risque.
-
-🌌**La solution de recherche universelle** - Jina permet l'indexation et la recherche à grande échelle de tout type de données sur de multiples plateformes et architectures. Que vous recherchiez des images, des clips vidéo, des extraits audio, de longs documents juridiques ou de courts tweets, Jina peut les gérer tous.
-
-🚀**Performances et état de l'art** - Jina vise l'AI-in-production. Vous pouvez facilement adapter votre VideoBERT, Xception, votre word tokenizer, votre segmenteur d'images et votre base de données pour traiter des données de plusieurs milliards de niveaux. Des fonctionnalités telles que les répliques et les tessons sont disponibles sur le marché.
-
-🐣**L'ingénierie des systèmes rendue facile** - Jina propose une solution unique qui vous libère de l'artisanat et du collage de paquets, de bibliothèques et de bases de données. Avec l'API la plus intuitive et[tableau de bord](https://github.com/jina-ai/dashboard)La construction d'un système de recherche dans les nuages n'est qu'une question de minutes.
-
-Jina est un projet open-source[Nous recrutons](https://jobs.jina.ai) Des ingénieurs en IA, des développeurs, des évangélistes et des députés pour construire le prochain écosystème de recherche neurale en open-source
-
-## Table des matières
-
-
-
-
-
-
-- [Installez](#installez)
-- [Jina "Bonjour, le monde ! 👋🌍](#jina-bonjour-le-monde--)
-- [Pour commencer](#pour-commencer)
-- [Documentation](#documentation)
-- [Contribuer à](#contribuer-%C3%A0)
-- [Communauté](#communaut%C3%A9)
-- [Feuille de route](#feuille-de-route)
-- [Licence](#licence)
-
-
-
-## Installez
-
-#### Installer à partir de PyPi
-
-Sous Linux/MacOS avec Python >= 3.7 installé, il suffit d'exécuter cette commande dans votre terminal :
-
-```bash
-pip install jina
-```
-
-Pour installer Jina avec des dépendances supplémentaires, ou l'installer sur Raspberry Pi[veuillez vous référer aux documentations](https://docs.jina.ai).
-
-#### ...ou courir avec un conteneur de docker
-
-Nous fournissons une image Docker universelle (seulement 80MB !) qui supporte plusieurs architectures (y compris x64, x86, arm-64/v7/v6), il suffit de le faire
-
-```bash
-docker run jinaai/jina
-```
-
-## Jina "Bonjour, le monde ! 👋🌍
-
-Pour commencer, vous êtes invités à essayer "Hello, World" de Jina - une simple démo de recherche neuronale d'images[Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Aucune dépendance supplémentaire n'est nécessaire, il suffit de le faire :
-
-```bash
-jina hello-world
-```
-
-...ou encore plus facile pour les utilisateurs de Docker,**aucune installation n'est nécessaire,** simplement :
-
-```bash
-docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux
-```
-
-
-Click here to see the console output
-
-
-
-
-
-
-
-Il télécharge les données de la formation et des tests du Fashion-MNIST ; il indique à Jina d'indexer 60 000 images de la série de formation. Ensuite, il échantillonne aléatoirement des images de la série de tests sous forme de requêtes, et demande à Jina de récupérer les résultats pertinents. Au bout d'une minute environ, elle ouvre une page web et affiche les résultats comme ceci :
-
-
-
-
-
-Et la mise en œuvre derrière ? Aussi simple qu'elle devrait l'être :
-
-
-
-![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true)
-
-
-
-
-
-Tous les grands mots que vous pouvez nommer : vision par ordinateur, IR neuronal, micro-services, file d'attente de messages, élastiques, répliques et tessons sont arrivés en une minute seulement !
-
-Intrigué ? Jouez et essayez différentes options :
-
-```bash
-jina hello-world --help
-```
-
-[Assurez-vous de continuer avec notre guide Jina 101](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - comprendre tous les concepts clés de Jina en 3 minutes !
-
-## Pour commencer
-
-### Démarrer un projet à partir du modèle
-
-```Bash
-pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina
-```
-
-### Tutoriels
-
-
-Learn to how to use SOTA visual representation for searching Pokémon!
-
-
🚀
-
-
-
-
-
-## Documentation
-
-
-
-
-
-La meilleure façon d'apprendre Jina en profondeur est de lire notre documentation. La documentation est construite sur chaque événement de poussée, de fusion et de libération de la branche maîtresse. Vous pouvez trouver plus de détails sur les sujets suivants dans notre documentation.
-
-- [Explication des arguments de l'interface en ligne de commande de Jina](https://docs.jina.ai/chapters/cli/index.html)
-- [Interface API Jina Python](https://docs.jina.ai/api/jina.html)
-- [Syntaxe Jina YAML pour l'exécuteur, le conducteur et le flux](https://docs.jina.ai/chapters/yaml/yaml.html)
-- [Schéma de Jina Protobuf](https://docs.jina.ai/chapters/proto/index.html)
-- [Variables environnementales utilisées dans Jina](https://docs.jina.ai/chapters/envs.html)
-- ..[et plus](https://docs.jina.ai/index.html)
-
-Etes-vous une star du "Doc" ? Affirmatif ? Rejoignez-nous ! Nous accueillons toutes sortes d'améliorations de la documentation
-
-[Les documents des anciennes versions sont archivés ici](https://github.com/jina-ai/docs/releases).
-
-## Contribuer à
-
-Nous accueillons toutes sortes de contributions de la communauté open-source, des individus et des partenaires. Sans votre participation active, Jina ne pourra pas réussir.
-
-Les ressources suivantes vous aideront à faire une bonne première contribution :
-
-- [Directives de contribution](CONTRIBUTING.md)
-- [Cycles de diffusion et stades de développement](RELEASE.md)
-
-## Communauté
-
-- [Canal Slack](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - une plateforme de communication pour les développeurs afin de discuter de Jina
-- [Bulletin d'information communautaire](mailto:newsletter+subscribe@jina.ai) - s'abonner à la dernière mise à jour, au communiqué et à l'actualité de Jina
-- [LinkedIn](https://www.linkedin.com/company/jinaai/) - apprendre à connaître Jina AI en tant qu'entreprise et trouver des opportunités d'emploi
-- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - nous suivre et interagir avec nous en utilisant hashtag`#JinaSearch`
-- [Société](https://jina.ai) - pour en savoir plus sur notre entreprise, nous nous engageons pleinement en faveur de l'open-source !
-
-## Feuille de route
-
-[Les grandes étapes de GitHub](https://github.com/jina-ai/jina/milestones) tracer la voie vers les améliorations futures.
-
-Nous recherchons des partenariats pour construire un modèle de gouvernance ouverte (par exemple un comité de pilotage technique) autour de Jina, qui permette un écosystème open source sain et une culture favorable aux développeurs. Si vous souhaitez participer, n'hésitez pas à nous contacter à l'adresse suivante[hello@jina.ai](mailto:hello@jina.ai).
-
-## Licence
-
-Copyright (c) 2020 Jina AI Limited. Tous droits réservés.
-
-Jina est sous licence Apache, version 2.0[Voir LICENCE pour le texte complet de la licence.](LICENSE)
diff --git a/.github/i18n/README.ja.md b/.github/i18n/README.ja.md
deleted file mode 100644
index 1e546d3bc44b7..0000000000000
--- a/.github/i18n/README.ja.md
+++ /dev/null
@@ -1,388 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai)
-[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai)
-
-
-
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]()
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22)
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/)
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/)
-[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina)
-
-
-
-지나(Jina)는 AI로 구동되는 검색 프레임워크로 개발자가 클라우드 상에 **크로스/멀티-모달 검색 시스템**(예: 텍스트, 이미지, 비디오, 오디오)을 만들 수 있도록 한다. 지나는 [풀타임, 벤처후원팀]의 지원을 받고 있다.(https://jina.ai).
-
-⏱️ **시간 절약** - 몇 분 안에 AI로 구동되는 시스템을 부트스트랩한다.
-
-🧠 **최상의 AI 모델** - 지나(Jina)는 신경 검색 시스템의 새로운 디자인 패턴으로, [최첨단 AI 모델]을 최상급으로 지원한다.(https://docs.jina.ai/chapters/all_exec.html).
-
-🌌 **광범위한 검색** - 여러 플랫폼에서 모든 종류의 대규모 인덱싱 및 데이터 쿼리를 지원한다: 비디오, 이미지, 긴/짧은 텍스트, 음악, 소스 코드 등
-
-🚀 **클라우드 준비** - 컨테이너화, 마이크로 서비스, 배포, 확장, 샤딩, 비동기 IO, REST, gRPC와 같은 클라우드 네이티브 기능을 사용하는 분산형 아키텍쳐이다.
-
-🧩 **플러그 앤 플레이** - Pythonic 인터페이스로 쉽게 확장할 수 있다.
-
-## Contents
-
-
-
-
-
-
-- [시작](#%EC%B0%A9%EC%88%98%ED%95%98%EB%8B%A4)
-- [Jina “Hello, World!” 👋🌍](#jina-%EC%95%88%EB%85%95-%EC%84%B8%EA%B3%84-)
-- [튜토리얼](#%EC%9E%90%EC%8A%B5%EC%84%9C)
-- [문서화](#%EB%AC%B8%EC%84%9C%ED%99%94)
-- [기여](#%EA%B8%B0%EC%97%AC%ED%95%98%EB%8A%94)
-- [커뮤니티](#community)
-- [오픈 거버넌스](#%EC%98%A4%ED%94%88-%EA%B1%B0%EB%B2%84%EB%84%8C%EC%8A%A4)
-- [참여하기](#%EC%B0%B8%EC%97%AC%ED%95%98%EA%B8%B0)
-- [라이선스](#%EB%A9%B4%ED%97%88%EC%A6%9D)
-
-
-
-## 설치
-
-Python 3.7/3.8이 포함된 Linux/MacOS:
-
-```bash
-pip install jina
-```
-
-추가적인 의존성을 가진 Jina를 설치하거나, Raspberry Pi에 설치하고자 한다면, [문서를 참조해라.](https://docs.jina.ai).
-
-⚠️ 윈도우 사용자들은 jina를 [윈도우상의 리눅스 하위 시스템](https://docs.jina.ai/chapters/install/via-pip.html?highlight=windows#on-windows-and-other-oses)을 통해 사용할 수 있다. 우리 커뮤니티는 [윈도우 지원](https://github.com/jina-ai/jina/issues/1252)에 대한 도움을 환영하고 있다.
-
-
-### Docker 컨테이너
-
-여러 아키텍쳐(x64, x86, arm-64/v7/v6을 포함)를 지원하는 범용적인 Docker 이미지를 제공한다. 아무것도 설치할 필요 없이, 그냥 수행하면 된다.
-
-```bash
-docker run jinaai/jina --help
-```
-
-## Jina "Hello, World!" 👋🌍
-
-스타터로서, [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/)를 위한 이미지 신경 검색의 간단한 데모인 "Hello, World!"를 사용해보세요. 추가 종속성이 필요하지 않으며 다음을 실행하십시오.:
-
-```bash
-jina hello-world
-```
-
-...또는 Docker 사용자의 경우, **설치가 필요하지 않습니다.**:
-
-```bash
-docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux
-```
-
-
-콘솔 출력을 보려면 여기를 클릭하십시오.
-
-
-
-
-
-
-
-이것은 Fashion-MNIST 교육과 테스트 데이터 세트를 다운로드하고 지나에게 교육 세트에서 6만 개의 이미지를 인덱싱하라고 말한다. 그런 다음 검사 세트에서 무작위로 영상을 샘플링해 조회하고 지나에게 관련 결과를 가져오라고 한다. 전체 과정은 약 1분이 소요되며, 결과적으로 웹 페이지를 열고 다음과 같은 결과를 보여준다.
-
-
-
-## 문서화
-
-
-
-
-
-지나를 깊이 있게 배우는 가장 좋은 방법은 우리의 문서를 읽는 것이다. 문서는 마스터 브랜치의 모든 푸쉬, 머지, 릴리즈에 기초하여 작성된다.
-
-#### 기본 사항
-
-- [Flow API를 사용하여 검색 워크플로우 구성](https://docs.jina.ai/chapters/flow/index.html)
-- [Jina의 입력 및 출력 기능](https://docs.jina.ai/chapters/io/index.html)
-- [Dashboard를 사용하여 jina 워크플로우의 인사이트 확보](https://github.com/jina-ai/dashboard)
-- [워크플로우를 원격으로 배포](https://docs.jina.ai/chapters/remote/index.html)
-- [Docker Container를 통해 Jina 포드 실행](https://docs.jina.ai/chapters/hub/index.html)
-
-#### 참조
-
-- [command line 인터페이스 논의](https://docs.jina.ai/chapters/cli/index.html)
-- [파이썬 API 인터페이스](https://docs.jina.ai/api/jina.html)
-- [Executor과 Driver, Flow를 위한 VAML 문장](https://docs.jina.ai/chapters/yaml/yaml.html)
-- [Protobuf 스키마](https://docs.jina.ai/chapters/proto/index.html)
-- [환경변수](https://docs.jina.ai/chapters/envs.html)
-- ... [그 외](https://docs.jina.ai/index.html)
-
-당신은 “DOC” 스타인가요? 우리와 함께해요! 우리는 문서에 대한 모든 종류의 개선을 환영합니다.
-
-[이전 버전에 대한 설명서는 여기에 보관되어 있다.](https://github.com/jina-ai/docs/releases).
-
-## 기여
-
-우리는 오픈 소스 커뮤니티, 개인 및 파트너의 모든 종류의 기부를 환영한다. 우리의 성공은 당신의 적극적인 참여 덕분이다.
-
-- [기여 지침](CONTRIBUTING.md)
-- [릴리스 주기 및 개발 단계](RELEASE.md)
-
-### 기부자 ✨
-
-
-[![모든 기부자](https://img.shields.io/badge/all_contributors-74-orange.svg?style=flat-square)](#기부자-)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-## community
-
-- [Slack 작업영역](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - #장군에 합류하여 우리 슬랙을 팀원들과 만나 질문하다.
-- [유튜브 채널](https://youtube.com/c/jina-ai) - 최신 비디오 튜토리얼, 릴리즈 데모, 웨비나 및 프리젠테이션을 구독하십시오.
-- [링크드인](https://www.linkedin.com/company/jinaai/) - 지나 AI를 기업으로서 알게 되고 취업의 기회를 찾다.
-- [![트위터 팔로우](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - 해시태그로 우리와 교류하다. `#JinaSearch`
-- [회사](https://jina.ai) - 우리 회사에 대해 더 많이 알고 어떻게 우리가 오픈소스에 전념하고 있는지 알고 있다..
-
-## 오픈 거버넌스
-
-[깃허브 이정표](https://github.com/jina-ai/jina/milestones)로 Jina의 미래 개선점들에 대한 윤곽을 잡았음
-
-여러분은 우리의 오픈 거버넌스 모델의 일환으로 모두를 위한 Jina의 공학을 주최한다.
-Zoom미팅은 매달 두 번째 화요일마다 진행을 하며 시간은 14:00-15:30(CET)이다. Calendar 초대를 통해 모두 참여가 가능하다.
-
-- [Google 캘린더에 추가](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)
-- [.ics다운로드 하기](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics)
-
-또한 이 회의는 생방송으로 송출될 것이며 이 후에 [유튜브 채널에 영상으로 제작될 것이다.](https://youtube.com/c/jina-ai).
-## 참여하기
-
-Jina는 오픈소스 프로젝트이다. 우리는 풀스택 개발자, evangelists, 프로젝트 매니저들을 [채용](https://jobs.jina.ai)하여 뉴럴 탐색 생태계를 오픈소스에 구축하려고 한다.
-
-## 라이선스
-
-Copyright (c) 2020 Jina AI Limited. All rights reserved
-
-Jina는 Apache Licence 2.0을 사용한다. [라이선스 문서의 전문을 확인하기 위해서는 License를 참조하세요.](LICENSE)
diff --git a/.github/i18n/README.pt_br.md b/.github/i18n/README.pt_br.md
deleted file mode 100644
index 60b95a2eefdf1..0000000000000
--- a/.github/i18n/README.pt_br.md
+++ /dev/null
@@ -1,421 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/)
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/)
-[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina)
-
-
-
-Jina é um framework de pesquisa baseada em IA, que permite que desenvolvedores criem sistemas de busca **cross/multi-modals** (como texto, imagens, video, áudio) na nuvem. Jina é mantida a logo prazo por [um grupo movido pela aventura em tempo integral](https://jina.ai).
-
-⏱️ **Economia de tempo** - Inicie um sistema impulsionado por IA em poucos minutos.
-
-🧠 **Modelos de IA de primeira classe** - Jina é um novo padrão de design para sistemas neurais de pesquisa com apoio especial para [modelos de IA de state-of-the-art](https://docs.jina.ai/chapters/all_exec.html).
-
-🌌 **Busca universal** - Indexação de larga escala e consulta de qualquer tipo em várias plataformas. Vídeo, imagem, texto curto/longo, código-fonte, e mais.
-
-🚀 **Production Ready** - Features nativas de nuvem que funcionam "out-of-the-box" (fora da caixa), por ex. conteinerização, microsserviço, distribuição, escalabilidade (scaling), sharding, async IO, REST, gRPC.
-
-🧩 **Conecte e use** - Com [Jina Hub](https://github.com/jina-ai/jina-hub), é fácil expandir Jina com simples scripts em Python ou com Imagens Docker otimizadas para seu campo de pesquisa.
-
-## Sumário
-
-
-
-
-
-
-- [Instalação](#instala%C3%A7%C3%A3o)
-- [Jina "Olá, mundo!" 👋🌍](#jina-ol%C3%A1-mundo-)
-- [Tutoriais](#tutoriais)
-- [Documentação](#documenta%C3%A7%C3%A3o)
-- [Contribuindo](#contribuindo)
-- [Comunidade](#comunidade)
-- [Governança Aberta](#governan%C3%A7a-aberta)
-- [Junte-se a nós](#junte-se-a-n%C3%B3s)
-- [Licença](#licen%C3%A7a)
-
-
-
-## Instalação
-
-### Com PyPi
-
-No sistemas operacionais Linux/MacOS com Python >= 3.7:
-
-```bash
-pip install jina
-```
-
-Para instalar Jina em dependências adicionais ou no Raspberry Pi, [favor checar a documentação](https://docs.jina.ai).
-
-### Em um Container Docker
-
-Nós oferecemos uma Imagem Docker universal com suporte para diversos tipos de arquitetura (incluindo x64, x86, arm-64/v7/v6). É só rodar:
-
-```bash
-docker run jinaai/jina --help
-```
-
-## Jina "Olá, mundo!" 👋🌍
-
-Paara começar, você pode tentar nosso "Hello, World" (que significa "Olá, mundo") - uma simples demonstração de busca neural de imagem para [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Nenhuma outra dependência é necessária. É só rodar:
-
-```bash
-jina hello-world
-```
-
-...ou, mais facilmente, para usuários de Docker, **sem instalação necessária**:
-
-```bash
-docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # substituir "open" por "xdg-open" no Linux
-```
-
-
-Clique aqui para ver a saída do console
-
-
-
-
-
-
-
-A Imagem Docker baixa o treinamento e o dataset de teste do Fashion-MNIST e diz para Jina indexar 60.000 imagens dos dados de treinamento. Em seguida, a Imagem Docker seleciona amostras aleatórias de imagens do teste, as define como queries e pede para Jina extrair os resultados relevantes. Todo esse processo leva em torno de 1 minuto, e eventualmente abre uma página web com resultados, que se parecem com esse:
-
-
-Melhore a performance usando prefetching e sharding
-
-
-
-
-
-
-## Documentação
-
-
-
-
-
-A melhor maneira de aprender Jina de forma aprofundada é lendo nossa documentação. A documentação é construída em cima de cada push, merge, e release na branch master.
-
-#### O básico
-
-- [Use Flow API para compor o seu Workflow de busca](https://docs.jina.ai/chapters/flow/index.html)
-- [Funções de Entrada e Saída em Jina](https://docs.jina.ai/chapters/io/index.html)
-- [Use Dashboard para conseguir Insight do Workflow de Jina](https://github.com/jina-ai/dashboard)
-- [Distribua seu Workflow remotamente](https://docs.jina.ai/chapters/remote/index.html)
-- [Rode Jina Pods via Conteiner Docker](https://docs.jina.ai/chapters/hub/index.html)
-
-#### Referência
-
-- [Argumentos de interface da linha de comando](https://docs.jina.ai/chapters/cli/index.html)
-- [Interface Python API](https://docs.jina.ai/api/jina.html)
-- [YAML sintaxe para Executor, Driver e Flow](https://docs.jina.ai/chapters/yaml/yaml.html)
-- [Protobuf schema](https://docs.jina.ai/chapters/proto/index.html)
-- [Variáveis de ambiente](https://docs.jina.ai/chapters/envs.html)
-- ... [e mais](https://docs.jina.ai/index.html)
-
-Você é um(a) "Doc"-star? Junte-se a nós! Todo tipo de ajuda na documentação é bem-vindo.
-
-[Documentação para versões antigas está arquivada aqui](https://github.com/jina-ai/docs/releases).
-
-## Contribuindo
-
-Todo tipo de contribuição da comunidade open-source é bem-vindo, individuais e parceiros. Nós devemos nosso sucesso à sua participação ativa.
-
-- [Orientações para contribuição](CONTRIBUTING.md)
-- [Ciclos de Release e estágios de desenvolvimento](RELEASE.md)
-
-### Contribuidores ✨
-
-
-[![All Contributors](https://img.shields.io/badge/all_contributors-66-orange.svg?style=flat-square)](#contributors-)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-## Comunidade
-
-- [Slack workspace](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - junte-se ao #general no nosso Slack para conhecer o time e fazer perguntas
-- [Canal no YouTube](https://youtube.com/c/jina-ai) - inscreva-se para receber nossos tutoriais mais recentes, demos de release, webinários e apresentações
-- [LinkedIn](https://www.linkedin.com/company/jinaai/) - conheça Jina AI como uma empresa e encontre oportunidades de emprego
-- [![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - siga e interaja conosco usando a hashtag `#JinaSearch`
-- [Empresa](https://jina.ai) - aprenda mais sobre nossa empresa e como somos totalmente comprometidos com open-source.
-
-## Governança Aberta
-
-[Marcos/milestones GitHub](https://github.com/jina-ai/jina/milestones) planeje o caminho para futuras melhoras de Jina.
-
-Como parte do nosso modelo de governança aberta, nós hosteamos [Engineering All Hands]((https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/)) de Jina publicamente. Essa reunião no Zoom ocorre mensalmente na segunda terça-feira de cada mês, às 14:00-15:30 (CET). Qualquer um pode se juntar por meio do convite de calendário a seguir.
-
-- [Adicionar ao Google Calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)
-- [Download .ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics)
-
-Será feita uma live-stream da reunião, que depois será publicada em nosso [Canal do YouTube](https://youtube.com/c/jina-ai).
-
-## Junte-se a nós
-
-Jina é um projeto open-source. [Estamos contratando](https://jobs.jina.ai) desenvolvedores full-stack, evangelists, e PMs para construir o próximo ecossistema de busca neural em open source.
-
-
-## Licença
-
-Copyright (c) 2020 Jina AI Limited. All rights reserved.
-
-Jina is licensed under the Apache License, Version 2.0. [See LICENSE for the full license text.](LICENSE)
diff --git a/.github/i18n/README.ru.md b/.github/i18n/README.ru.md
deleted file mode 100644
index 1b58c806298aa..0000000000000
--- a/.github/i18n/README.ru.md
+++ /dev/null
@@ -1,389 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai)
-[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai)
-
-
-
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]()
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22)
-
-
-Хотите построить поисковую систему, подкрепленную глубоким изучением? Вы пришли в нужное место!
-
-Jina - это облачная нейронная поисковая система, основанная на современном ИИ и глубоком обучении. Долгосрочную поддержку ей оказывает команда, работающая на полную ставку.
-
-🌌**Универсальное поисковое решение** - Jina позволяет создавать крупномасштабные индексы и запросы любого типа на различных платформах и архитектурах. Ищите ли вы изображения, видеоклипы, аудиофрагменты, длинные юридические документы, короткие твиты - Jina справится со всеми этими задачами.
-
-🚀**Высокая производительность и state of the art подходы** - Jina нацелена на промышленное применение ИИ. Вы можете легко масштабировать ваш VideoBERT, Xception, ваш токенизатор слов, сегментацию изображений и базу данных для обработки данных масштаба миллиардов объектов. Такие функции, как репликация и шардирование, работают из коробки.
-
-🐣**Системный инжиниринг стал простым** - Jina предлагает универсальное решение, которое освобождает вас от ручной работы и сборки пакетов, библиотек и баз данных. С самым интуитивным API и [дашбордом](https://github.com/jina-ai/dashboard) построить облачную поисковую систему - занимает всего лишь минуту.
-
-Jina - проект с открытым исходным кодом. [Мы нанимаем](https://jobs.jina.ai) ИИ Инженеров, full-stack разработчиков, евангелистов, менеджеров проектов для построения новой нейронной поисковой эко-системы с открытым исходным кодом
-
-## Содержание
-
-
-
-
-
-
-- [Установить](#%D0%A3%D1%81%D1%82%D0%B0%D0%BD%D0%BE%D0%B2%D0%B8%D1%82%D1%8C)
-- [Jina "Hello world!" 👋🌍](#jina-hello-world-)
-- [Начало работы](#%D0%9D%D0%B0%D1%87%D0%B0%D0%BB%D0%BE-%D1%80%D0%B0%D0%B1%D0%BE%D1%82%D1%8B)
-- [Документация](#%D0%94%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B0%D1%86%D0%B8%D1%8F)
-- [Вклад](#%D0%92%D0%BA%D0%BB%D0%B0%D0%B4)
-- [Сообщество](#%D0%A1%D0%BE%D0%BE%D0%B1%D1%89%D0%B5%D1%81%D1%82%D0%B2%D0%BE)
-- [Дорожная карта"](#%D0%94%D0%BE%D1%80%D0%BE%D0%B6%D0%BD%D0%B0%D1%8F-%D0%BA%D0%B0%D1%80%D1%82%D0%B0)
-- [Лицензия](#%D0%9B%D0%B8%D1%86%D0%B5%D0%BD%D0%B7%D0%B8%D1%8F)
-
-
-
-## Установить
-
-#### Установка из PyPi
-
-В Linux/MacOS с установленным Python >= 3.7 просто запустите эту команду в терминале:
-
-```bash
-pip install jina
-```
-
-Чтобы установить Jina с дополнительными зависимостями, или установить его на Raspberry Pi [пожалуйста, ознакомьтесь с документацией](https://docs.jina.ai).
-
-#### ...или запуск из Docker контейнера..
-
-Мы предоставляем универсальный образ Docker (всего 80MB!), который поддерживает несколько архитектур (включая x64, x86, arm-64/v7/v6), просто выполните эту команду:
-
-```bash
-docker run jinaai/jina
-```
-
-## Jina "Hello world!" 👋🌍
-
-Для начала советуем вам попробовать Jina "Hello World" - простую демо-версию нейросетевого поиска изображений для [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Никаких дополнительных зависимостей не нужно, просто запустите следующую команду:
-
-```bash
-jina hello-world
-```
-
-... или даже проще для пользователей Docker, **установка не требуется**:
-
-```bash
-docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux
-```
-
-
-Click here to see the console output
-
-
-
-
-
-
-
-Docker образ загружает данные тренинга и тестов Fashion-MNIST и запускает Jina так, чтобы она индексировала 60 000 изображений из учебного набора. Затем Jina случайным образом берет образцы с тестового набора в качестве запросов и получает соответствующие результаты. Примерно через 1 минуту откроется веб-страница с результатами:
-
-
-
-
-
-За всем этим стоит довлольно простая реализация:
-
-
-
-![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true)
-
-
-
-
-
-Все ключевые слова, которые вы можете назвать: компьютерное зрение, нейросетевой поиск информации, микро-сервис, очередь сообщений, гибкость, репликации и шардирование заработают всего за одну минуту!
-
-Заинтригованы? Попробуйте разные варианты:
-
-```bash
-jina hello-world --help
-```
-
-[Убедитесь в том, что вы продолжаете пользоваться нашим гидом Jina 101](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - понимание всех ключевых понятий Jina за 3 минуты!
-
-## Начало работы
-
-### Начать проект с шаблона.
-
-```bash
-pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina
-```
-
-### Учебники
-
-
-Learn to how to use SOTA visual representation for searching Pokémon!
-
-
🚀
-
-
-
-
-
-## Документация
-
-
-
-
-
-Лучший способ глубокого изучения Jina - это прочитать нашу документацию. Документация строится на каждом нажатии, слиянии и выпуске главного ветви. Вы можете найти более подробную информацию по следующим темам в нашей документации.
-
-- [Объяснение аргументов интерфейса командной строки Jina](https://docs.jina.ai/chapters/cli/index.html)
-- [Jina Python API интерфейс](https://docs.jina.ai/api/jina.html)
-- [Синтаксис Jina YAML для Executor, Driver и Flow](https://docs.jina.ai/chapters/yaml/yaml.html)
-- [схема Jina Protobuf](https://docs.jina.ai/chapters/proto/index.html)
-- [Переменные окружения, используемые в Jina](https://docs.jina.ai/chapters/envs.html)
-- ..[и более того](https://docs.jina.ai/index.html)
-
-Ты "Док"-звезда? Согласен? Присоединяйтесь к нам! Мы приветствуем всевозможные улучшения в документации
-
-[Документация для старых версий архивируется здесь](https://github.com/jina-ai/docs/releases).
-
-## Вклад
-
-Мы приветствуем все виды вклада со стороны сообщества с открытым исходным кодом, отдельных лиц и партнеров. Без вашего активного участия Jina не будет успешной.
-
-Следующие ресурсы помогут вам сделать хороший первый вклад:
-
-- [Руководство по оказанию содействия](CONTRIBUTING.md)
-- [Циклы выпуска и стадии разработки](RELEASE.md)
-
-## Сообщество
-
-- [канал Slack](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - коммуникационная платформа для разработчиков для обсуждения Jina
-- [Открытая рассылка](mailto:newsletter+subscribe@jina.ai) - подписаться на последние обновления, релизы и новости о событиях Jina
-- [Ссылка на сайт](https://www.linkedin.com/company/jinaai/) - познакомиться с Jina ИИ как с компанией и найти работу
-- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - следовать за нами и взаимодействовать с нами с помощью хэштэга`#JinaSearch`
-- [Компания](https://jina.ai) - Узнайте больше о нашей компании, мы полностью привержены открытому исходному коду!
-
-## Дорожная карта"
-
-[вехи GitHub](https://github.com/jina-ai/jina/milestones) проложить путь к будущим улучшениям.
-
-Мы ищем партнерства для построения модели открытого управления (например, Технического руководящего комитета) вокруг Jina, которая позволит создать здоровую экосистему с открытым исходным кодом и культуру, дружелюбную к разработчикам. Если вы заинтересованы в участии, не стесняйтесь обращаться к нам по адресу[hello@jina.ai](mailto:hello@jina.ai).
-
-## Лицензия
-
-Авторское право (с) 2020 г. "Джина АИ Лимитед". Все права защищены.
-
-Jina лицензирована по лицензии Apache, версия 2.0[Полный текст лицензии см. в ЛИЦЕНЗИИ.](LICENSE)
diff --git a/.github/i18n/README.uk.md b/.github/i18n/README.uk.md
deleted file mode 100644
index b6e5acb42f463..0000000000000
--- a/.github/i18n/README.uk.md
+++ /dev/null
@@ -1,424 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina має ліцензію Apache-2.0")](#license)
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina підтримує Python 3.7 та вище")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/)
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina є мультиархітектурною та може працювати на пристроях з різною архітектурою")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/)
-[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina)
-
-
-
-Jina - це пошукова система на основі ШІ, яка надає розробникам можливість створювати **крос-/мульти-модальні пошукові системи** (напр. текст, зображення, відео, аудіо) у хмарі. Jina має довгострокову підтримку [командою, яка працює full-time та має венчурну підтримку](https://jina.ai).
-
-⏱️ **Економія часу** - Завантажте систему з ШІ лише за кілька хвилин.
-
-🧠 **Взірцеві моделі ШІ** - Jina являє собою новий шаблон проєктування для нейронних пошукових систем з блискучою підтримкою [найсучасніших моделей ШІ](https://docs.jina.ai/chapters/all_exec.html).
-
-🌌 **Універсальний пошук** - Широкомасштабне індексування та запити даних будь-якого типу на багатьох платформах. Відео, зображення, об'ємні/короткі тести, музика, вихідний код, та більше.
-
-🚀 **Готове до використання** - Cloud-native можливості працюють одразу "з коробки", напр. контейнеризація, мікросервіси, розповсюдження, масштабування, sharding, асинхронні IO, REST, gRPC.
-
-🧩 **Підключіть та грайте** - Разом з [Jina Hub](https://github.com/jina-ai/jina-hub), з легкістю розширюйте Jina за допомогою Python-скриптів або образів Docker, оптимізованих для ваших сфер пошуку.
-
-## Зміст
-
-
-
-
-
-
-- [Розпочнімо](#%D0%A0%D0%BE%D0%B7%D0%BF%D0%BE%D1%87%D0%BD%D1%96%D0%BC%D0%BE)
-- [Jina "Привіт, світе!" 👋🌍](#jina-%D0%9F%D1%80%D0%B8%D0%B2%D1%96%D1%82-%D1%81%D0%B2%D1%96%D1%82%D0%B5-)
-- [Туторіали](#%D0%A2%D1%83%D1%82%D0%BE%D1%80%D1%96%D0%B0%D0%BB%D0%B8)
-- [Документація](#%D0%94%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B0%D1%86%D1%96%D1%8F)
-- [Допомога проєкту](#%D0%94%D0%BE%D0%BF%D0%BE%D0%BC%D0%BE%D0%B3%D0%B0-%D0%BF%D1%80%D0%BE%D1%94%D0%BA%D1%82%D1%83)
-- [Спільнота](#%D0%A1%D0%BF%D1%96%D0%BB%D1%8C%D0%BD%D0%BE%D1%82%D0%B0)
-- [Відкрите управління](#%D0%92%D1%96%D0%B4%D0%BA%D1%80%D0%B8%D1%82%D0%B5-%D1%83%D0%BF%D1%80%D0%B0%D0%B2%D0%BB%D1%96%D0%BD%D0%BD%D1%8F)
-- [Приєднуйтесь](#%D0%9F%D1%80%D0%B8%D1%94%D0%B4%D0%BD%D1%83%D0%B9%D1%82%D0%B5%D1%81%D1%8C)
-- [Ліцензія](#%D0%9B%D1%96%D1%86%D0%B5%D0%BD%D0%B7%D1%96%D1%8F)
-
-
-
-## Розпочнімо
-
-### З PyPi
-
-На пристроях Linux/MacOS з Python >= 3.7:
-
-```bash
-pip install jina
-```
-
-Для того, щоб встановити разом з Jina додаткові залежності або щоб встановити на Raspberry Pi, [будь-ласка, зверніть увагу на документацію](https://docs.jina.ai).
-
-### У Docker-контейнері
-
-Ми пропонуємо універсальний образ Docker, який підтримує різноманітні архітектури (Включаючи x64, x86, arm-64/v7/v6). Просто запустіть:
-
-```bash
-docker run jinaai/jina --help
-```
-
-## Jina "Привіт, світе!" 👋🌍
-
-Як новачок, ви можете спробувати наш "Привіт, світе" - просте демо нейропошуку по зображеннях для [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Жодних додаткових залежностей, просто запустіть:
-
-```bash
-jina hello-world
-```
-
-...або для користувачів Docker навіть ще простіше, **не потребуючи встановлення**:
-
-```bash
-docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # замініть "open" на "xdg-open" на Linux
-```
-
-
-Натисніть тут, щоб побачити вивід консолі
-
-
-
-
-
-
-
-Образ Docker завантажує навчально-тестовий набір даних Fashion-MNIST та каже Jina проіндексувати 60,000 зображень із навчального набору. Тоді він випадковим чином обирає зображення з тестового набору як запити та просить Jina отримати відповідні результати. Весь процес займає близько 1 хвилини, і в підсумку відкривається вебсторінка на якій відображаються такі результати:
-
-
-Покращіть продуктивність використовуючи попереднє отримання (prefetching) та sharding
-
-
-
-
-
-
-## Документація
-
-
-
-
-
-Найкращий спосіб поглиблено вивчити Jina - прочитати нашу документацію. Вона написана на основі кожного push, merge, та release головної гілки.
-
-#### Основи
-
-- [Використання Flow API для компонування пошукових процесів](https://docs.jina.ai/chapters/flow/index.html)
-- [Функції введення та виведення у Jina](https://docs.jina.ai/chapters/io/index.html)
-- [Використання Dashboard, щоб отримання статистики робочих процесів Jina](https://github.com/jina-ai/dashboard)
-- [Віддалений розподіл робочих процесів](https://docs.jina.ai/chapters/remote/index.html)
-- [Запуск Jina Pods з допомогою Docker-контейнера](https://docs.jina.ai/chapters/hub/index.html)
-
-#### Посилання
-
-- [Аргументи інтерфейсу командного рядка](https://docs.jina.ai/chapters/cli/index.html)
-- [Інтерфейс Python API](https://docs.jina.ai/api/jina.html)
-- [Синтаксис YAML для Executor, Driver та Flow](https://docs.jina.ai/chapters/yaml/yaml.html)
-- [Схеми Protobuf](https://docs.jina.ai/chapters/proto/index.html)
-- [Змінні середовища](https://docs.jina.ai/chapters/envs.html)
-- ... [та більше](https://docs.jina.ai/index.html)
-
-Ви "Док"-зірка? Приєднуйтесь! Ми вітаємо будь-які покращення документації.
-
-[Документації для попередніх версій зберігаються тут](https://github.com/jina-ai/docs/releases).
-
-## Допомога проєкту
-
-Ми вітаємо буль-які внески від учасників open-source спільноти, окремих осіб та партнерів. Своїм успіхом ми завдячуємо вашій активній участі.
-
-- [Правила допомоги](CONTRIBUTING.md)
-- [Цикли випуску та стадії розробки](RELEASE.md)
-
-### Учасники проєкту ✨
-
-
-[![All Contributors](https://img.shields.io/badge/all_contributors-71-orange.svg?style=flat-square)](#contributors-)
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-
-## Спільнота
-
-- [Slack workspace](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - долучіться до #general на нашому Slack, щоб зустрітися з командою та задати питання
-- [YouTube канал](https://youtube.com/c/jina-ai) - підпишіться заради найновіших відео-туторіалів, демо нових випусків, вебінарів та презентацій.
-- [LinkedIn](https://www.linkedin.com/company/jinaai/) - познайомтесь з Jina AI як компанією та знайдіть можливості для працевлаштування
-- [![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - слідкуйте та взаємодійте з нами використовуючи хештег `#JinaSearch`
-- [Компанія](https://jina.ai) - дізнайтесь більше про нашу компанію та як ми повністю віддані open-source.
-
-## Відкрите управління
-
-[GitHub milestones](https://github.com/jina-ai/jina/milestones) викладають шлях майбутніх вдосконалень Jina.
-
-В рамках нашої відкритої моделі управління, ми ведемо Jina [Engineering All Hands]((https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/)) публічно. Ці Zoom-зустрічі відбуваються щомісячно у другий вівторок кожного місяця, о 14:00-15:30 (CET). Кожен може приєднатися через наступне запрошення календаря.
-
-- [Додати до Google Calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)
-- [Завантажити .ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics)
-
-Зустріч також буде транслюватися наживо та пізніше буде опублікована на нашому [YouTube каналі](https://youtube.com/c/jina-ai).
-
-## Приєднуйтесь
-
-Jina - це проєкт з відкритим вихідним кодом. [Ми наймаємо](https://jobs.jina.ai) full-stack розробників, євангелістів та PM-ів для побудови майбутньої екосистеми з нейропошуку з відкритим вихідним кодом.
-
-
-## Ліцензія
-
-Copyright (c) 2020 Jina AI Limited. All rights reserved.
-
-Jina is licensed under the Apache License, Version 2.0. [Повний текст ліцензії розміщено у файлі LICENSE.](LICENSE)
diff --git a/.github/i18n/README.zh.md b/.github/i18n/README.zh.md
deleted file mode 100644
index b315ebabe3177..0000000000000
--- a/.github/i18n/README.zh.md
+++ /dev/null
@@ -1,392 +0,0 @@
-
-
-
-
-
-
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-)
-[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license)
-[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai)
-[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai)
-
-
-
-[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/)
-[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]()
-[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags)
-[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags)
-[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI)
-[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD)
-[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22)
-[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22)
-[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22)
-
-
-
-
-Jina is geared towards building search systems for any kind of data, including [text](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences), [images](https://github.com/jina-ai/examples/tree/master/pokedex-with-bit), [audio](https://github.com/jina-ai/examples/tree/master/audio-search), [video](https://github.com/jina-ai/examples/tree/master/tumblr-gif-search) and [many more](https://github.com/jina-ai/examples). With the modular design & multi-layer abstraction, you can leverage the efficient patterns to build the system by parts, or chaining them into a [Flow](https://101.jina.ai/#Flow) for an end-to-end experience.
-
+Jina is geared towards building search systems for any kind of data, including text, image, audio, video, PDF etc.
+Powered by deep learning and cloud-native techniques, you can leverage Jina to build a multimedia search system in
+minutes.
-🌌 **Search anything** - Large-scale indexing and querying of unstructured data: video, image, long/short text, music, source code, etc.
+🌌 **Search anything** - Large-scale indexing and querying of unstructured data: video, image, long/short text, music,
+source code, etc.
⏱️ **Save time** - *The* design pattern of neural search systems, from zero to a production-ready system in minutes.
-🍱 **Own your stack** - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with fragmented, multi-vendor, generic legacy tools.
-
-🧠 **First-class AI models** - First-class support for [state-of-the-art AI models](https://docs.jina.ai/chapters/all_exec.html), easily usable and extendable with a Pythonic interface.
-
-🌩️ **Fast & cloud-ready** - Decentralized architecture from day one. Scalable & cloud-native by design: enjoy containerizing, distributing, sharding, async, REST/gRPC/WebSocket.
+🍱 **Own your stack** - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with
+fragmented, multi-vendor, generic legacy tools.
+🌩️ **Fast & cloud-ready** - Decentralized architecture from day one. Scalable & cloud-native by design: enjoy
+containerizing, distributing, sharding, async, REST/gRPC/WebSocket.
## Installation
-```sh
-pip install -U jina
+```console
+$ pip install --pre jina
+$ jina -v
+2.0.0rcN
```
#### via Docker
-```sh
-docker run jinaai/jina:latest
+```console
+$ docker run jinaai/jina:master -v
+2.0.0rcN
```
@@ -83,145 +50,113 @@ docker run jinaai/jina:latest
| x86/64,arm/v6,v7,[v8 (Apple M1)](https://github.com/jina-ai/jina/issues/1781) | On Linux/macOS & Python 3.7/3.8/3.9 | Docker Users|
| --- | --- | --- |
-| Standard | `pip install -U jina` | `docker run jinaai/jina:latest` |
-| Daemon | `pip install -U "jina[daemon]"` | `docker run --network=host jinaai/jina:latest-daemon` |
-| With Extras | `pip install -U "jina[devel]"` | `docker run jinaai/jina:latest-devel` |
+| Standard | `pip install --pre jina` | `docker run jinaai/jina:2.0.0rc` |
+| Daemon | `pip install --pre "jina[daemon]"` | `docker run --network=host jinaai/jina:latest-daemon` |
+| With Extras | `pip install --pre "jina[devel]"` | `docker run jinaai/jina:latest-devel` |
| Dev/Pre-Release | `pip install --pre jina` | `docker run jinaai/jina:master` |
-Version identifiers [are explained here](https://github.com/jina-ai/jina/blob/master/RELEASE.md). To install Jina with extra dependencies [please refer to the docs](https://docs.jina.ai/chapters/install/os/via-pip/#cherry-pick-extra-dependencies). Jina can run on [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). We welcome the community to help us with [native Windows support](https://github.com/jina-ai/jina/issues/1252).
-
-
-
-
-💡 YAML Completion in PyCharm & VSCode
-
-Developing Jina app often means writing YAML configs. We provide a [JSON Schema](https://json-schema.org/) for your IDE to enable code completion, syntax validation, members listing and displaying help text. Here is a [video tutorial](https://youtu.be/qOD-6mihUzQ) to walk you through the setup.
-
-
-
-
-
-
-
-
-**PyCharm**
-
-1. Click menu `Preferences` -> `JSON Schema mappings`;
-2. Add a new schema, in the `Schema File or URL` write `https://api.jina.ai/schemas/latest.json`; select `JSON Schema Version 7`;
-3. Add a file path pattern and link it to `*.jaml` and `*.jina.yml`.
-
-
-
-
-
-
-
-
-
-**VSCode**
-
-1. Install the extension: `YAML Language Support by Red Hat`;
-2. In IDE-level `settings.json` add:
-
-```json
-"yaml.schemas": {
- "https://api.jina.ai/schemas/latest.json": ["/*.jina.yml", "/*.jaml"],
-}
-```
+Version identifiers [are explained here](https://github.com/jina-ai/jina/blob/master/RELEASE.md). Jina can run
+on [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). We welcome the community
+to help us with [native Windows support](https://github.com/jina-ai/jina/issues/1252).
-
-
-
## Get Started
+Document, Executor, Flow are three fundamental concepts in Jina.
-### Cookbook
+- 📄 **Document** is the basic data type in Jina;
+- ⚙️ **Executor** is how Jina processes Documents;
+- 🔀 **Flow** is how Jina streamlines and distributes Executors.
-[Bits, pieces and examples of Jina code](./.github/pages/snippets.md)
-
-### Run Quick Demo
-
-- [👗 Fashion image search](./.github/pages/hello-world.md#-fashion-image-search): `jina hello fashion`
-- [🤖 QA chatbot](./.github/pages/hello-world.md#-covid-19-chatbot): `pip install "jina[chatbot]" && jina hello chatbot`
-- [📰 Multimedia search](./.github/pages/hello-world.md#-multimodal-document-search): `pip install "jina[multimodal]" && jina hello multimodal`
-
-### The Basics
-
-- [What is neural search, and how is it different to symbolic search?](https://jina.ai/2020/07/06/What-is-Neural-Search-and-Why-Should-I-Care.html)
-- [Jina 101: Learn Jina's key components](https://docs.jina.ai/chapters/101/)
-- [Jina 102: Learn how Jina's components fit together](https://docs.jina.ai/chapters/102/)
-- [My First Jina App: Build your first simple app](https://docs.jina.ai/chapters/my_first_jina_app/)
+Copy-paste the minimum example below and run it:
+```python
+from jina import Document, Executor, Flow, requests
-### Video Tutorials
-
+class MyExecutor(Executor):
+ @requests
+ def foo(self, docs, parameters, **kwargs):
+ print(f'{parameters["p1"]} - {docs[0]}')
-### Examples ([View all](https://github.com/jina-ai/examples))
-
-#### [📄 NLP Semantic Wikipedia Search with Transformers and DistilBERT](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences)
- Brand new to neural search? See a simple text-search example to understand how Jina works
-#### [📄 Add Incremental Indexing to Wikipedia Search](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences-incremental)
- Index more effectively by adding incremental indexing to your Wikipedia search
+f = Flow().add(uses=MyExecutor)
-#### [📄 Search Lyrics with Transformers and PyTorch](https://github.com/jina-ai/examples/tree/master/multires-lyrics-search)
- Get a better understanding of chunks by searching a lyrics database. Now with shiny front-end!
-
-#### [🖼️ Google's Big Transfer Model in (Poké-)Production](https://github.com/jina-ai/examples/tree/master/pokedex-with-bit)
- Use SOTA visual representation for searching Pokémon!
-
-#### [🎧 Search YouTube audio data with Vggish](https://github.com/jina-ai/examples/tree/master/audio-search)
- A demo of neural search for audio data based Vggish model.
-
-#### [🎞️ Search Tumblr GIFs with KerasEncoder](https://github.com/jina-ai/examples/tree/master/tumblr-gif-search)
- Use prefetching and sharding to improve the performance of your index and query Flow when searching animated GIFs.
+with f:
+ f.post(on='/bar', inputs=Document(), parameters={'p1': 'hello'}, on_done=print)
+```
-Check our [examples repo](https://github.com/jina-ai/examples) for advanced and community-submitted examples.
+### Run Quick Demo
-## Documentation & Support
+- [👗 Fashion image search](./.github/pages/hello-world.md#-fashion-image-search)
+ ```console
+ $ jina hello fashion
+ ```
+- [🤖 QA chatbot](./.github/pages/hello-world.md#-covid-19-chatbot)
+ ```console
+ $ pip install "jina[chatbot]"
+ $ jina hello chatbot
+ ```
+- [📰 Multimodal search](./.github/pages/hello-world.md#-multimodal-document-search)
+ ```console
+ $ pip install "jina[multimodal]"
+ $ jina hello multimodal
+ ```
+
+#### Fork Demo & Build Your Own
+
+Copy the source code of a hello world to your own directory and start from there:
+
+```console
+$ jina hello fork fashion ../my-proj/
+```
-- Docs: https://docs.jina.ai
-- Join our [Slack community](https://slack.jina.ai) to chat to our engineers about your use cases, questions, and support queries.
+### Read Tutorials
+
+- 📄 `Document` & `DocumentArray`: the basic data type in Jina.
+ - [Minimum working example](.github/2.0/cookbooks/Document.md#minimum-working-example)
+ - [`Document` API](.github/2.0/cookbooks/Document.md#document-api)
+ - [`DocumentArray` API](.github/2.0/cookbooks/Document.md#documentarray-api)
+- ⚙️ `Executor`: how Jina processes Documents.
+ - [Minimum working example](.github/2.0/cookbooks/Executor.md#minimum-working-example)
+ - [Executor API](.github/2.0/cookbooks/Executor.md#executor-api)
+ - [Executor Built-in Features](.github/2.0/cookbooks/Executor.md#executor-built-in-features)
+ - [Migration from 1.x to 2.0 in Practice](.github/2.0/cookbooks/Executor.md#migration-in-practice)
+- 🔀 `Flow`: how Jina streamlines and distributes Executors.
+ - [Minimum working example](.github/2.0/cookbooks/Flow.md#minimum-working-example)
+ - [Flow API](.github/2.0/cookbooks/Flow.md#flow-api)
+- 🧼 [Write clean code in Jina](./.github/2.0/cookbooks/CleanCode.md)
+
+## Support
+
+- Join our [Slack community](https://slack.jina.ai) to chat to our engineers about your use cases, questions, and
+ support queries.
- Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features.
- **When?** The second Tuesday of every month
- - **Where?** Zoom ([calendar link](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)/[.ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics)) and [live stream on YouTube](https://youtube.com/c/jina-ai)
+ - **Where?**
+ Zoom ([calendar link](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)/[.ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics))
+ and [live stream on YouTube](https://youtube.com/c/jina-ai))
- Subscribe to the latest video tutorials on our [YouTube channel](https://youtube.com/c/jina-ai).
+## Join Us
+
+Jina is backed by [Jina AI](https://jina.ai). [We are actively hiring](https://jobs.jina.ai) full-stack developers,
+solution engineers to build the next neural search ecosystem in open source.
## Contributing
-We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement.
+We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to
+your active involvement.
- [Contributing guidelines](CONTRIBUTING.md)
-- [Code of conduct](https://github.com/jina-ai/jina/blob/master/.github/CODE_OF_CONDUCT.md) - play nicely with the Jina community
+- [Code of conduct](https://github.com/jina-ai/jina/blob/master/.github/CODE_OF_CONDUCT.md) - play nicely with the Jina
+ community
- [Good first issues](https://github.com/jina-ai/jina/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22)
- [Release cycles and development stages](RELEASE.md)
- [Upcoming features](https://portal.productboard.com/jinaai/) - what's being planned, what we're thinking about.
-
-
[![All Contributors](https://img.shields.io/badge/all_contributors-147-orange.svg?style=flat-square)](#contributors-)
@@ -250,9 +185,4 @@ We welcome all kinds of contributions from the open-source community, individual
-
-
-
-## Join Us
-
-Jina is backed by [Jina AI](https://jina.ai). [We are hiring](https://jobs.jina.ai) full-stack developers, evangelists, and PMs to build the next neural search ecosystem in open source.
+
\ No newline at end of file
diff --git a/cli/__init__.py b/cli/__init__.py
index f679e114df56c..f8cae4187fe28 100644
--- a/cli/__init__.py
+++ b/cli/__init__.py
@@ -1,6 +1,3 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
import sys
diff --git a/cli/api.py b/cli/api.py
index 2dbf5f8fd27c0..95c971a73840e 100644
--- a/cli/api.py
+++ b/cli/api.py
@@ -1,6 +1,3 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
if False:
from argparse import Namespace
@@ -20,6 +17,10 @@ def pod(args: 'Namespace'):
pass
+# alias
+executor = pod
+
+
def pea(args: 'Namespace'):
"""
Start a Pea
@@ -115,17 +116,6 @@ def export_api(args: 'Namespace'):
default_logger.info(f'API is exported to {f_name}')
-def hello_world(args: 'Namespace'):
- """
- Run the fashion hello world example
-
- :param args: arguments coming from the CLI.
- """
- from jina.helloworld.fashion import hello_world
-
- hello_world(args)
-
-
def hello(args: 'Namespace'):
"""
Run any of the hello world examples
@@ -133,15 +123,23 @@ def hello(args: 'Namespace'):
:param args: arguments coming from the CLI.
"""
if args.hello == 'fashion':
- from jina.helloworld.fashion import hello_world
+ from jina.helloworld.fashion.app import hello_world
+
+ hello_world(args)
elif args.hello == 'chatbot':
- from jina.helloworld.chatbot import hello_world
+ from jina.helloworld.chatbot.app import hello_world
+
+ hello_world(args)
elif args.hello == 'multimodal':
- from jina.helloworld.multimodal import hello_world
- else:
- raise ValueError(f'must be one of [`fashion`, `chatbot`, `multimodal`]')
+ from jina.helloworld.multimodal.app import hello_world
+
+ hello_world(args)
+ elif args.hello == 'fork':
+ from jina.helloworld.fork import fork_hello
- hello_world(args)
+ fork_hello(args)
+ else:
+ raise ValueError(f'must be one of [`fashion`, `chatbot`, `multimodal`, `fork`]')
def flow(args: 'Namespace'):
diff --git a/cli/autocomplete.py b/cli/autocomplete.py
index aa5a7b49b2755..9347a1e97d1e1 100644
--- a/cli/autocomplete.py
+++ b/cli/autocomplete.py
@@ -1,7 +1,3 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-
def _update_autocomplete():
from jina.parsers import get_main_parser
@@ -14,21 +10,21 @@ def _gaa(key, parser):
elif v.choices:
_compl.extend(v.choices)
for kk, vv in v.choices.items():
- _result.update(_gaa(" ".join([key, kk]).strip(), vv))
+ _result.update(_gaa(' '.join([key, kk]).strip(), vv))
# filer out single dash, as they serve as abbrev
- _compl = [k for k in _compl if (not k.startswith("-") or k.startswith("--"))]
+ _compl = [k for k in _compl if (not k.startswith('-') or k.startswith('--'))]
_result.update({key: _compl})
return _result
- compl = _gaa("", get_main_parser())
- cmd = compl.pop("")
- compl = {"commands": cmd, "completions": compl}
+ compl = _gaa('', get_main_parser())
+ cmd = compl.pop('')
+ compl = {'commands': cmd, 'completions': compl}
- with open(__file__, "a") as fp:
- fp.write(f"\nac_table = {compl}\n")
+ with open(__file__, 'a') as fp:
+ fp.write(f'\nac_table = {compl}\n')
-if __name__ == "__main__":
+if __name__ == '__main__':
_update_autocomplete()
ac_table = {
@@ -37,33 +33,26 @@ def _gaa(key, parser):
'--version',
'--version-full',
'hello',
+ 'executor',
'pod',
'flow',
- 'optimizer',
- 'gateway',
'ping',
- 'check',
- 'hub',
+ 'gateway',
'pea',
'client',
'export-api',
- 'hello-world',
+ 'check',
],
'completions': {
'hello fashion': [
'--help',
'--workdir',
'--download-proxy',
- '--shards',
- '--parallel',
- '--uses-index',
'--index-data-url',
'--index-labels-url',
- '--index-request-size',
- '--uses-query',
'--query-data-url',
'--query-labels-url',
- '--query-request-size',
+ '--request-size',
'--num-query',
'--top-k',
],
@@ -72,7 +61,6 @@ def _gaa(key, parser):
'--workdir',
'--download-proxy',
'--index-data-url',
- '--demo-url',
'--port-expose',
'--parallel',
'--unblock-query-flow',
@@ -83,15 +71,15 @@ def _gaa(key, parser):
'--download-proxy',
'--uses',
'--index-data-url',
- '--demo-url',
'--port-expose',
'--unblock-query-flow',
],
'hello': ['--help', 'fashion', 'chatbot', 'multimodal'],
- 'pod': [
+ 'executor': [
'--help',
'--name',
'--description',
+ '--workspace',
'--log-config',
'--quiet',
'--quiet-error',
@@ -110,8 +98,6 @@ def _gaa(key, parser):
'--host-out',
'--socket-in',
'--socket-out',
- '--load-interval',
- '--dump-interval',
'--read-only',
'--memory-hwm',
'--on-error-strategy',
@@ -140,37 +126,87 @@ def _gaa(key, parser):
'--uses-after',
'--parallel',
'--shards',
+ '--replicas',
'--polling',
'--scheduling',
'--pod-role',
'--peas-hosts',
],
- 'flow': [
+ 'pod': [
'--help',
'--name',
'--description',
+ '--workspace',
'--log-config',
'--quiet',
'--quiet-error',
'--identity',
+ '--port-ctrl',
+ '--ctrl-with-ipc',
+ '--timeout-ctrl',
+ '--ssh-server',
+ '--ssh-keyfile',
+ '--ssh-password',
'--uses',
- '--inspect',
+ '--py-modules',
+ '--port-in',
+ '--port-out',
+ '--host-in',
+ '--host-out',
+ '--socket-in',
+ '--socket-out',
+ '--read-only',
+ '--memory-hwm',
+ '--on-error-strategy',
+ '--num-part',
+ '--uses-internal',
+ '--entrypoint',
+ '--docker-kwargs',
+ '--pull-latest',
+ '--volumes',
+ '--host',
+ '--port-expose',
+ '--quiet-remote-logs',
+ '--upload-files',
+ '--workspace-id',
+ '--daemon',
+ '--runtime-backend',
+ '--runtime',
+ '--runtime-cls',
+ '--timeout-ready',
+ '--env',
+ '--expose-public',
+ '--pea-id',
+ '--pea-role',
+ '--noblock-on-start',
+ '--uses-before',
+ '--uses-after',
+ '--parallel',
+ '--shards',
+ '--replicas',
+ '--polling',
+ '--scheduling',
+ '--pod-role',
+ '--peas-hosts',
],
- 'optimizer': [
+ 'flow': [
'--help',
'--name',
'--description',
+ '--workspace',
'--log-config',
'--quiet',
'--quiet-error',
'--identity',
'--uses',
- '--output-dir',
+ '--inspect',
],
+ 'ping': ['--help', '--timeout', '--retries', '--print-response'],
'gateway': [
'--help',
'--name',
'--description',
+ '--workspace',
'--log-config',
'--quiet',
'--quiet-error',
@@ -189,8 +225,6 @@ def _gaa(key, parser):
'--host-out',
'--socket-in',
'--socket-out',
- '--load-interval',
- '--dump-interval',
'--read-only',
'--memory-hwm',
'--on-error-strategy',
@@ -217,70 +251,11 @@ def _gaa(key, parser):
'--pea-role',
'--noblock-on-start',
],
- 'ping': ['--help', '--timeout', '--retries', '--print-response'],
- 'check': ['--help', '--summary-exec', '--summary-driver'],
- 'hub login': ['--help'],
- 'hub new': ['--help', '--output-dir', '--template', '--type', '--overwrite'],
- 'hub init': ['--help', '--output-dir', '--template', '--type', '--overwrite'],
- 'hub create': ['--help', '--output-dir', '--template', '--type', '--overwrite'],
- 'hub build': [
- '--help',
- '--username',
- '--password',
- '--registry',
- '--repository',
- '--file',
- '--pull',
- '--push',
- '--dry-run',
- '--prune-images',
- '--raise-error',
- '--test-uses',
- '--test-level',
- '--timeout-ready',
- '--host-info',
- '--daemon',
- '--no-overwrite',
- ],
- 'hub push': [
- '--help',
- '--username',
- '--password',
- '--registry',
- '--repository',
- '--no-overwrite',
- ],
- 'hub pull': [
- '--help',
- '--username',
- '--password',
- '--registry',
- '--repository',
- '--no-overwrite',
- ],
- 'hub list': [
- '--help',
- '--name',
- '--kind',
- '--keywords',
- '--type',
- '--local-only',
- ],
- 'hub': [
- '--help',
- 'login',
- 'new',
- 'init',
- 'create',
- 'build',
- 'push',
- 'pull',
- 'list',
- ],
'pea': [
'--help',
'--name',
'--description',
+ '--workspace',
'--log-config',
'--quiet',
'--quiet-error',
@@ -299,8 +274,6 @@ def _gaa(key, parser):
'--host-out',
'--socket-in',
'--socket-out',
- '--load-interval',
- '--dump-interval',
'--read-only',
'--memory-hwm',
'--on-error-strategy',
@@ -329,8 +302,6 @@ def _gaa(key, parser):
'client': [
'--help',
'--request-size',
- '--mode',
- '--top-k',
'--mime-type',
'--continue-on-error',
'--return-results',
@@ -347,22 +318,6 @@ def _gaa(key, parser):
'--port-expose',
],
'export-api': ['--help', '--yaml-path', '--json-path', '--schema-path'],
- 'hello-world': [
- '--help',
- '--workdir',
- '--download-proxy',
- '--shards',
- '--parallel',
- '--uses-index',
- '--index-data-url',
- '--index-labels-url',
- '--index-request-size',
- '--uses-query',
- '--query-data-url',
- '--query-labels-url',
- '--query-request-size',
- '--num-query',
- '--top-k',
- ],
+ 'check': ['--help', '--summary-exec'],
},
}
diff --git a/cli/export.py b/cli/export.py
index cf4e43c57e48d..9881febd372fe 100644
--- a/cli/export.py
+++ b/cli/export.py
@@ -1,12 +1,12 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
import argparse
import os
from typing import List
def api_to_dict():
+ """Convert Jina API to a dict
+ :return: dict
+ """
from jina import __version__
from jina.parsers import get_main_parser
@@ -41,7 +41,7 @@ def api_to_dict():
def _export_parser_args(parser_fn, type_as_str: bool = False):
from jina.enums import BetterEnum
- from argparse import _StoreAction, _StoreTrueAction, _HelpAction, _SubParsersAction
+ from argparse import _StoreAction, _StoreTrueAction
from jina.parsers.helper import KVAppendAction
port_attr = ('help', 'choices', 'default', 'required', 'option_strings', 'dest')
diff --git a/daemon/parser.py b/daemon/parser.py
index 53b81532dbc04..787e195c7432e 100644
--- a/daemon/parser.py
+++ b/daemon/parser.py
@@ -10,6 +10,11 @@
def mixin_daemon_parser(parser):
+ """
+ # noqa: DAR101
+ # noqa: DAR102
+ # noqa: DAR103
+ """
gp = add_arg_group(parser, title='Daemon')
gp.add_argument(
@@ -19,15 +24,13 @@ def mixin_daemon_parser(parser):
help='do not start fluentd, no log streaming',
)
- gp.add_argument(
- '--workspace',
- type=str,
- default='/tmp/jinad',
- help='the directory for storing all uploaded dependencies',
- )
-
def get_main_parser():
+ """
+ Return main parser
+ :return: main parser
+ """
+
parser = set_base_parser()
mixin_remote_parser(parser)
@@ -36,6 +39,7 @@ def get_main_parser():
parser.set_defaults(
port_expose=8000,
+ workspace='/tmp/jinad',
log_config=os.getenv(
'JINAD_LOG_CONFIG',
resource_filename('jina', '/'.join(('resources', 'logging.daemon.yml'))),
diff --git a/daemon/stores/flow.py b/daemon/stores/flow.py
index 7cb8adaaf2f87..aea5216dc579f 100644
--- a/daemon/stores/flow.py
+++ b/daemon/stores/flow.py
@@ -1,5 +1,4 @@
import uuid
-from fastapi.exceptions import HTTPException
from typing import Optional, BinaryIO
from jina.flow import Flow
@@ -69,4 +68,6 @@ def update(
if kind == UpdateOperationEnum.rolling_update:
flow_obj.rolling_update(pod_name=pod_name, dump_path=dump_path)
elif kind == UpdateOperationEnum.dump:
- flow_obj.dump(pod_name=pod_name, dump_path=dump_path, shards=shards)
+ raise NotImplementedError(
+ f' sending post request does not work because asyncio loop is occupied'
+ )
diff --git a/extra-requirements.txt b/extra-requirements.txt
index fcc59b018d751..1188c360e854a 100644
--- a/extra-requirements.txt
+++ b/extra-requirements.txt
@@ -34,7 +34,7 @@ onnx: framework, py37
onnxruntime: framework, py37
Pillow: cv, cicd, multimodal
annoy>=1.9.5: index
-sklearn: numeric, cicd
+sklearn: numeric
plyvel: index
jieba: nlp
lz4<3.1.2: devel, cicd, perf, network
diff --git a/jina/__init__.py b/jina/__init__.py
index c16eb3e61c57d..8f008caf87e45 100644
--- a/jina/__init__.py
+++ b/jina/__init__.py
@@ -7,8 +7,6 @@
"""
-# DO SOME OS-WISE PATCHES
-
import datetime as _datetime
import os as _os
import platform as _platform
@@ -16,28 +14,10 @@
import sys as _sys
import types as _types
-from google.protobuf.internal import api_implementation as _api_implementation
-
-if _api_implementation._default_implementation_type != 'cpp':
- import warnings as _warnings
-
- _warnings.warn(
- '''
- You are using Python protobuf backend, not the C++ version, which is much faster.
-
- This is often due to C++ implementation failed to compile while installing Protobuf
- - You are using in Python 3.9 (https://github.com/jina-ai/jina/issues/1801)
- - You are using on architecture other than x86_64/armv6/armv7
- - You installation is broken, try `pip install --force protobuf`
- - You have C++ backend but you shut it down, try `export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp`
-
- ''',
- RuntimeWarning,
- )
-
if _sys.version_info < (3, 7, 0) or _sys.version_info >= (3, 10, 0):
raise OSError(f'Jina requires Python 3.7/3.8/3.9, but yours is {_sys.version_info}')
+# DO SOME OS-WISE PATCHES
if _sys.version_info >= (3, 8, 0) and _platform.system() == 'Darwin':
# temporary fix for python 3.8 on macos where the default start is set to "spawn"
# https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods
@@ -48,15 +28,12 @@
# fix fork error on MacOS but seems no effect? must do EXPORT manually before jina start
_os.environ['OBJC_DISABLE_INITIALIZE_FORK_SAFETY'] = 'YES'
-# Underscore variables shared globally
-
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
# do not change this line manually
# this is managed by git tag and updated on every release
# NOTE: this represents the NEXT release version
-__version__ = '1.2.5'
+
+# TODO: remove 'rcN' on final release
+__version__ = '2.0.0rc1'
# do not change this line manually
# this is managed by proto/build-proto.sh and updated on every execution
@@ -70,9 +47,6 @@
# 3. copy all lines EXCEPT the first (which is the grep command in the last line)
__jina_env__ = (
'JINA_ARRAY_QUANT',
- 'JINA_BINARY_DELIMITER',
- 'JINA_CONTRIB_MODULE',
- 'JINA_CONTRIB_MODULE_IS_LOADING',
'JINA_CONTROL_PORT',
'JINA_DEFAULT_HOST',
'JINA_DISABLE_UVLOOP',
@@ -84,48 +58,38 @@
'JINA_LOG_LEVEL',
'JINA_LOG_NO_COLOR',
'JINA_LOG_WORKSPACE',
+ 'JINA_OPTIMIZER_TRIAL_WORKSPACE',
'JINA_POD_NAME',
- 'JINA_RAISE_ERROR_EARLY',
'JINA_RANDOM_PORTS',
'JINA_RANDOM_PORT_MAX',
'JINA_RANDOM_PORT_MIN',
'JINA_SOCKET_HWM',
'JINA_VCS_VERSION',
'JINA_WARN_UNNAMED',
- 'JINA_WORKSPACE',
)
__default_host__ = _os.environ.get('JINA_DEFAULT_HOST', '0.0.0.0')
+__default_executor__ = 'BaseExecutor'
+__default_endpoint__ = '/default'
__ready_msg__ = 'ready and listening'
__stop_msg__ = 'terminated'
-__binary_delimiter__ = _os.environ.get(
- 'JINA_BINARY_DELIMITER', '460841a0a8a430ae25d9ad7c1f048c57'
-).encode()
+__num_args_executor_func__ = 5
__root_dir__ = _os.path.dirname(_os.path.abspath(__file__))
_names_with_underscore = [
'__version__',
- '__copyright__',
- '__license__',
'__proto_version__',
'__default_host__',
'__ready_msg__',
'__stop_msg__',
- '__binary_delimiter__',
'__jina_env__',
'__uptime__',
'__root_dir__',
+ '__default_endpoint__',
+ '__default_executor__',
+ '__num_args_executor_func__',
]
-# Primitive data type,
-# note, they must be loaded BEFORE all executors/drivers/... to avoid cyclic imports
-from jina.types.ndarray.generic import NdArray
-from jina.types.request import Request, Response
-from jina.types.message import Message
-from jina.types.querylang import QueryLang
-from jina.types.document import Document
-from jina.types.document.multimodal import MultimodalDocument
-from jina.types.arrays import DocumentArray, QueryLangArray
# ADD GLOBAL NAMESPACE VARIABLES
JINA_GLOBAL = _types.SimpleNamespace()
@@ -133,13 +97,10 @@
JINA_GLOBAL.tensorflow_installed = None
JINA_GLOBAL.torch_installed = None
-import jina.importer as _ji
-
-# driver first, as executor may contain driver
-_ji.import_classes('jina.drivers', show_import_table=False, import_once=True)
-_ji.import_classes('jina.executors', show_import_table=False, import_once=True)
-_ji.import_classes('jina.hub', show_import_table=False, import_once=True)
-
+# import jina.importer as _ji
+#
+# _ji.import_classes('jina.executors', show_import_table=False, import_once=True)
+#
_signal.signal(_signal.SIGINT, _signal.default_int_handler)
@@ -191,24 +152,20 @@ def _set_nofile(nofile_atleast=4096):
_set_nofile()
-# Flow
-from jina.flow import Flow
-from jina.flow.asyncio import AsyncFlow
+# ONLY FIRST CLASS CITIZENS ARE ALLOWED HERE, namely Document, Executor Flow
-# Client
-from jina.clients import Client
-from jina.clients.asyncio import AsyncClient
+# Document
+from jina.types.document import Document
+from jina.types.arrays.document import DocumentArray
# Executor
-from jina.executors import GenericExecutor as Executor
-from jina.executors.classifiers import BaseClassifier as Classifier
-from jina.executors.crafters import BaseCrafter as Crafter
-from jina.executors.encoders import BaseEncoder as Encoder
-from jina.executors.evaluators import BaseEvaluator as Evaluator
-from jina.executors.indexers import BaseIndexer as Indexer
-from jina.executors.rankers import BaseRanker as Ranker
-from jina.executors.segmenters import BaseSegmenter as Segmenter
+from jina.executors import BaseExecutor as Executor
from jina.executors.decorators import requests
+# Flow
+from jina.flow import Flow
+from jina.flow.asyncio import AsyncFlow
+
+
__all__ = [_s for _s in dir() if not _s.startswith('_')]
-__all__.extend([_s for _s in _names_with_underscore])
+__all__.extend(_names_with_underscore)
diff --git a/jina/checker.py b/jina/checker.py
index 4c29ce2a741c6..7ee3f7ad33e8b 100644
--- a/jina/checker.py
+++ b/jina/checker.py
@@ -1,6 +1,3 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
import os
from . import __jina_env__
@@ -38,15 +35,6 @@ def __init__(self, args: 'argparse.Namespace'):
with open(args.summary_exec, 'w') as fp:
_print_dep_tree_rst(fp, _r, 'Executor')
- default_logger.info('\navailable drivers\n'.upper())
- _r = import_classes('jina.drivers', show_import_table=True, import_once=False)
-
- if args.summary_driver:
- with open(args.summary_driver, 'w') as fp:
- _print_dep_tree_rst(fp, _r, 'Driver')
-
- # check available driver group
-
default_logger.info('\nenvironment variables\n'.upper())
default_logger.info(
'\n'.join(
diff --git a/jina/clients/__init__.py b/jina/clients/__init__.py
index 9a3f58c6fbbba..dd9a54948edc5 100644
--- a/jina/clients/__init__.py
+++ b/jina/clients/__init__.py
@@ -1,246 +1,23 @@
"""Module wrapping the Client of Jina."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Union, List
-
-from . import request
from .base import BaseClient, CallbackFnType, InputType, InputDeleteType
from .helper import callback_exec
+from .mixin import PostMixin
from .request import GeneratorSourceType
from .websocket import WebSocketClientMixin
-from ..enums import RequestType
-from ..helper import run_async, deprecated_alias
-class Client(BaseClient):
+class Client(PostMixin, BaseClient):
"""A simple Python client for connecting to the gRPC gateway.
It manages the asyncio event loop internally, so all interfaces are synchronous from the outside.
"""
- async def _get_results(self, *args, **kwargs):
- result = []
- async for resp in super()._get_results(*args, **kwargs):
- if self.args.return_results:
- result.append(resp)
-
- if self.args.return_results:
- return result
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def train(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ) -> None:
- """Issue 'train' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :return: None
- """
- self.mode = RequestType.TRAIN
- return run_async(
- self._get_results, inputs, on_done, on_error, on_always, **kwargs
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def search(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ) -> None:
- """Issue 'search' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :return: None
- """
- self.mode = RequestType.SEARCH
- self.add_default_kwargs(kwargs)
- return run_async(
- self._get_results, inputs, on_done, on_error, on_always, **kwargs
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def index(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ) -> None:
- """Issue 'index' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :return: None
- """
- self.mode = RequestType.INDEX
- return run_async(
- self._get_results, inputs, on_done, on_error, on_always, **kwargs
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def update(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ) -> None:
- """Issue 'update' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :return: None
- """
- self.mode = RequestType.UPDATE
- return run_async(
- self._get_results, inputs, on_done, on_error, on_always, **kwargs
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def delete(
- self,
- inputs: InputDeleteType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ) -> None:
- """Issue 'update' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :return: None
- """
- self.mode = RequestType.DELETE
- return run_async(
- self._get_results, inputs, on_done, on_error, on_always, **kwargs
- )
-
- def reload(
- self,
- targets: Union[str, List[str]],
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Send 'reload' request to the Flow.
-
- :param targets: the regex string or list of regex strings to match the pea/pod names.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :return: None
- """
-
- if isinstance(targets, str):
- targets = [targets]
- kwargs['targets'] = targets
-
- self.mode = RequestType.CONTROL
- return run_async(
- self._get_results,
- [],
- on_done,
- on_error,
- on_always,
- command='RELOAD',
- **kwargs,
- )
-
- def dump(
- self,
- targets: Union[str, List[str]],
- dump_path: str,
- shards: int,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Send 'reload' request to the Flow.
-
- :param shards: nr of shards to dump for
- :param dump_path: the path to which to dump
- :param targets: the regex string or list of regex strings to match the pea/pod names.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :return: None
- """
- if isinstance(targets, str):
- targets = [targets]
- kwargs['targets'] = targets
- # required in order for jina.clients.request.helper._add_control_propagate
- kwargs['args'] = {}
- kwargs['args']['dump_path'] = dump_path
- kwargs['args']['shards'] = shards
+ @property
+ def client(self) -> 'Client':
+ """Return the client object itself
- self.mode = RequestType.CONTROL
- return run_async(
- self._get_results,
- [],
- on_done,
- on_error,
- on_always,
- command='DUMP',
- **kwargs,
- )
+ .. # noqa: DAR201"""
+ return self
class WebSocketClient(Client, WebSocketClientMixin):
diff --git a/jina/clients/asyncio.py b/jina/clients/asyncio.py
index 79cd114edb952..7c3b5c9b904dd 100644
--- a/jina/clients/asyncio.py
+++ b/jina/clients/asyncio.py
@@ -1,14 +1,10 @@
"""Module wrapping AsyncIO ops for clients."""
-from typing import Union, List, AsyncGenerator
-from jina.types.request import Response
-
-from .base import InputType, InputDeleteType, BaseClient, CallbackFnType
+from .base import BaseClient
+from .mixin import AsyncPostMixin
from .websocket import WebSocketClientMixin
-from ..enums import RequestType
-from ..helper import deprecated_alias
-class AsyncClient(BaseClient):
+class AsyncClient(AsyncPostMixin, BaseClient):
"""
:class:`AsyncClient` is the asynchronous version of the :class:`Client`.
@@ -52,178 +48,6 @@ async def concurrent_main():
One can think of :class:`Client` as Jina-managed eventloop, whereas :class:`AsyncClient` is self-managed eventloop.
"""
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def train(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs
- ) -> AsyncGenerator[Response, None]:
- """Issue 'train' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :yield: result
- """
- self.mode = RequestType.TRAIN
- async for r in self._get_results(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def search(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs
- ) -> AsyncGenerator[Response, None]:
- """Issue 'search' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :yield: result
- """
- self.mode = RequestType.SEARCH
- self.add_default_kwargs(kwargs)
- async for r in self._get_results(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def index(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs
- ) -> AsyncGenerator[Response, None]:
- """Issue 'index' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :yield: result
- """
- self.mode = RequestType.INDEX
- async for r in self._get_results(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def delete(
- self,
- inputs: InputDeleteType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs
- ) -> AsyncGenerator[Response, None]:
- """Issue 'delete' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :yield: result
- """
- self.mode = RequestType.DELETE
- async for r in self._get_results(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def update(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs
- ) -> AsyncGenerator[Response, None]:
- """Issue 'update' request to the Flow.
-
- :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :yield: result
- """
- self.mode = RequestType.UPDATE
- async for r in self._get_results(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- async def reload(
- self,
- targets: Union[str, List[str]],
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs
- ):
- """Send 'reload' request to the Flow.
-
- :param targets: the regex string or list of regex strings to match the pea/pod names.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: additional parameters
- :yield: result
- """
-
- if isinstance(targets, str):
- targets = [targets]
- kwargs['targets'] = targets
-
- self.mode = RequestType.CONTROL
- async for r in self._get_results([], on_done, on_error, on_always, **kwargs):
- yield r
-
class AsyncWebSocketClient(AsyncClient, WebSocketClientMixin):
"""
diff --git a/jina/clients/base.py b/jina/clients/base.py
index 4b6e921b4a271..b761109369423 100644
--- a/jina/clients/base.py
+++ b/jina/clients/base.py
@@ -1,17 +1,15 @@
"""Module containing the Base Client for Jina."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
import argparse
-import os
-from typing import Callable, Union, Optional, Iterator, Iterable, Dict, AsyncIterator
import asyncio
+import inspect
+import os
+from typing import Callable, Union, Optional, Iterator, Iterable, AsyncIterator
import grpc
-import inspect
+
from .helper import callback_exec
from .request import GeneratorSourceType
-from ..enums import RequestType
from ..excepts import BadClient, BadClientInput, ValidationError
from ..helper import typename
from ..logging import default_logger, JinaLogger
@@ -45,31 +43,8 @@ def __init__(self, args: 'argparse.Namespace'):
# affect users os-level envs.
os.unsetenv('http_proxy')
os.unsetenv('https_proxy')
- self._mode = args.mode
self._inputs = None
- @property
- def mode(self) -> str:
- """
- Get the mode for this client (index, query etc.).
-
- :return: Mode of the client.
- """
- return self._mode
-
- @mode.setter
- def mode(self, value: RequestType) -> None:
- """
- Set the mode.
-
- :param value: Request type. (e.g. INDEX, SEARCH, DELETE, UPDATE, CONTROL, TRAIN)
- """
- if isinstance(value, RequestType):
- self._mode = value
- self.args.mode = value
- else:
- raise ValueError(f'{value} must be one of {RequestType}')
-
@staticmethod
def check_input(inputs: Optional[InputType] = None, **kwargs) -> None:
"""Validate the inputs and print the first request if success.
@@ -77,11 +52,17 @@ def check_input(inputs: Optional[InputType] = None, **kwargs) -> None:
:param inputs: the inputs
:param kwargs: keyword arguments
"""
+
+ if inputs is None:
+ # empty inputs is considered as valid
+ return
+
if hasattr(inputs, '__call__'):
# it is a function
inputs = inputs()
kwargs['data'] = inputs
+ kwargs['exec_endpoint'] = '/'
if inspect.isasyncgenfunction(inputs) or inspect.isasyncgen(inputs):
raise ValidationError(
@@ -123,12 +104,6 @@ def _get_requests(
return request_generator(**_kwargs)
- def _get_task_name(self, kwargs: Dict) -> str:
- tname = str(self.mode).lower()
- if 'mode' in kwargs:
- tname = str(kwargs['mode']).lower()
- return tname
-
@property
def inputs(self) -> InputType:
"""
@@ -138,10 +113,7 @@ def inputs(self) -> InputType:
:return: inputs
"""
- if self._inputs is not None:
- return self._inputs
- else:
- raise BadClient('inputs are not defined')
+ return self._inputs
@inputs.setter
def inputs(self, bytes_gen: InputType) -> None:
@@ -165,7 +137,6 @@ async def _get_results(
):
try:
self.inputs = inputs
- tname = self._get_task_name(kwargs)
req_iter = self._get_requests(**kwargs)
async with grpc.aio.insecure_channel(
f'{self.args.host}:{self.args.port_expose}',
@@ -178,7 +149,7 @@ async def _get_results(
self.logger.success(
f'connected to the gateway at {self.args.host}:{self.args.port_expose}!'
)
- with ProgressBar(task_name=tname) as p_bar, TimeContext(tname):
+ with ProgressBar() as p_bar, TimeContext(''):
async for resp in stub.Call(req_iter):
resp.as_typed_request(resp.request_type)
resp.as_response()
@@ -195,7 +166,7 @@ async def _get_results(
except KeyboardInterrupt:
self.logger.warning('user cancel the process')
except asyncio.CancelledError as ex:
- self.logger.warning(f'process error: {ex!r}, terminate signal send?')
+ self.logger.warning(f'process error: {ex!r}')
except grpc.aio._call.AioRpcError as rpc_ex:
# Since this object is guaranteed to be a grpc.Call, might as well include that in its name.
my_code = rpc_ex.code()
@@ -220,48 +191,3 @@ async def _get_results(
) from rpc_ex
else:
raise BadClient(msg) from rpc_ex
-
- def index(self):
- """Issue 'index' request to the Flow."""
- raise NotImplementedError
-
- def search(self):
- """Issue 'search' request to the Flow."""
- raise NotImplementedError
-
- def train(self):
- """Issue 'train' request to the Flow."""
- raise NotImplementedError
-
- @staticmethod
- def add_default_kwargs(kwargs: Dict):
- """
- Add the default kwargs to the instance.
-
- :param kwargs: the kwargs to add
- """
- # TODO: refactor it into load from config file
- if ('top_k' in kwargs) and (kwargs['top_k'] is not None):
- # associate all VectorSearchDriver and SliceQL driver to use top_k
- from jina import QueryLang
-
- topk_ql = [
- QueryLang(
- {
- 'name': 'SliceQL',
- 'priority': 1,
- 'parameters': {'end': kwargs['top_k']},
- }
- ),
- QueryLang(
- {
- 'name': 'VectorSearchDriver',
- 'priority': 1,
- 'parameters': {'top_k': kwargs['top_k']},
- }
- ),
- ]
- if 'queryset' not in kwargs:
- kwargs['queryset'] = topk_ql
- else:
- kwargs['queryset'].extend(topk_ql)
diff --git a/jina/clients/helper.py b/jina/clients/helper.py
index 60d956da7b310..866e372cbcb97 100644
--- a/jina/clients/helper.py
+++ b/jina/clients/helper.py
@@ -1,16 +1,14 @@
"""Helper functions for clients in Jina."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
from functools import wraps
from typing import Callable
-from .. import Response
from ..excepts import BadClientCallback
from ..helper import colored
from ..importer import ImportExtensions
from ..logging import JinaLogger
from ..proto import jina_pb2
+from ..types.request import Response
def pprint_routes(resp: 'Response', stack_limit: int = 3):
@@ -18,7 +16,6 @@ def pprint_routes(resp: 'Response', stack_limit: int = 3):
:param resp: the :class:`Response` object
:param stack_limit: traceback limit
- :return:
"""
from textwrap import fill
diff --git a/jina/clients/mixin.py b/jina/clients/mixin.py
new file mode 100644
index 0000000000000..171f21f13502e
--- /dev/null
+++ b/jina/clients/mixin.py
@@ -0,0 +1,107 @@
+from functools import partialmethod
+from typing import Optional, Dict, List, AsyncGenerator
+
+from .base import CallbackFnType, InputType
+from ..helper import run_async
+from ..types.request import Response
+
+
+class PostMixin:
+ """The Post Mixin class for Client and Flow """
+
+ def post(
+ self,
+ on: str,
+ inputs: Optional[InputType] = None,
+ on_done: CallbackFnType = None,
+ on_error: CallbackFnType = None,
+ on_always: CallbackFnType = None,
+ parameters: Optional[Dict] = None,
+ target_peapod: Optional[str] = None,
+ **kwargs,
+ ) -> Optional[List[Response]]:
+ """Post a general data request to the Flow.
+
+ :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id.
+ :param on: the endpoint is used for identifying the user-defined ``request_type``, labeled by ``@requests(on='/abc')``
+ :param on_done: the function to be called when the :class:`Request` object is resolved.
+ :param on_error: the function to be called when the :class:`Request` object is rejected.
+ :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
+ :param parameters: the kwargs that will be sent to the executor
+ :param target_peapod: a regex string represent the certain peas/pods request targeted
+ :param kwargs: additional parameters
+ :return: None
+ """
+
+ async def _get_results(*args, **kwargs):
+ result = []
+ c = self.client
+ async for resp in c._get_results(*args, **kwargs):
+ if c.args.return_results:
+ result.append(resp)
+
+ if c.args.return_results:
+ return result
+
+ return run_async(
+ _get_results,
+ inputs=inputs,
+ on_done=on_done,
+ on_error=on_error,
+ on_always=on_always,
+ exec_endpoint=on,
+ target_peapod=target_peapod,
+ parameters=parameters,
+ **kwargs,
+ )
+
+ # ONLY CRUD, for other request please use `.post`
+ index = partialmethod(post, '/index')
+ search = partialmethod(post, '/search')
+ update = partialmethod(post, '/update')
+ delete = partialmethod(post, '/delete')
+
+
+class AsyncPostMixin:
+ """The Async Post Mixin class for AsyncClient and AsyncFlow """
+
+ async def post(
+ self,
+ on: str,
+ inputs: Optional[InputType] = None,
+ on_done: CallbackFnType = None,
+ on_error: CallbackFnType = None,
+ on_always: CallbackFnType = None,
+ parameters: Optional[Dict] = None,
+ target_peapod: Optional[str] = None,
+ **kwargs,
+ ) -> AsyncGenerator[None, Response]:
+ """Post a general data request to the Flow.
+
+ :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id.
+ :param on: the endpoint is used for identifying the user-defined ``request_type``, labeled by ``@requests(on='/abc')``
+ :param on_done: the function to be called when the :class:`Request` object is resolved.
+ :param on_error: the function to be called when the :class:`Request` object is rejected.
+ :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
+ :param parameters: the kwargs that will be sent to the executor
+ :param target_peapod: a regex string represent the certain peas/pods request targeted
+ :param kwargs: additional parameters
+ :yield: Response object
+ """
+ async for r in self.client._get_results(
+ inputs=inputs,
+ on_done=on_done,
+ on_error=on_error,
+ on_always=on_always,
+ exec_endpoint=on,
+ target_peapod=target_peapod,
+ parameters=parameters,
+ **kwargs,
+ ):
+ yield r
+
+ # ONLY CRUD, for other request please use `.post`
+ index = partialmethod(post, '/index')
+ search = partialmethod(post, '/search')
+ update = partialmethod(post, '/update')
+ delete = partialmethod(post, '/delete')
diff --git a/jina/clients/request/__init__.py b/jina/clients/request/__init__.py
index 0aaf2fbb00ba2..91c68500e7932 100644
--- a/jina/clients/request/__init__.py
+++ b/jina/clients/request/__init__.py
@@ -1,16 +1,13 @@
"""Module for Jina Requests."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-from typing import Iterator, Union, Tuple, AsyncIterable, Iterable, Optional
+from typing import Iterator, Union, Tuple, AsyncIterable, Iterable, Optional, Dict
-from .helper import _new_request_from_batch
-from ... import Request
-from ...enums import RequestType, DataInputType
+from .helper import _new_data_request_from_batch, _new_data_request
+from ...enums import DataInputType
from ...helper import batch_iterator
from ...logging import default_logger
from ...types.document import DocumentSourceType, DocumentContentType, Document
-from ...types.arrays.querylang import AcceptQueryLangType
+from ...types.request import Request
SingletonDataType = Union[
DocumentContentType,
@@ -26,25 +23,25 @@
def request_generator(
+ exec_endpoint: str,
data: GeneratorSourceType,
request_size: int = 0,
- mode: RequestType = RequestType.INDEX,
mime_type: Optional[str] = None,
- queryset: Optional[
- Union[AcceptQueryLangType, Iterator[AcceptQueryLangType]]
- ] = None,
data_type: DataInputType = DataInputType.AUTO,
+ target_peapod: Optional[str] = None,
+ parameters: Optional[Dict] = None,
**kwargs, # do not remove this, add on purpose to suppress unknown kwargs
) -> Iterator['Request']:
"""Generate a request iterator.
+ :param exec_endpoint: the endpoint string, by convention starts with `/`
:param data: the data to use in the request
:param request_size: the request size for the client
- :param mode: the request mode (index, search etc.)
:param mime_type: mime type
- :param queryset: querylang set of queries
:param data_type: if ``data`` is an iterator over self-contained document, i.e. :class:`DocumentSourceType`;
or an iterator over possible Document content (set to text, blob and buffer).
+ :param parameters: the kwargs that will be sent to the executor
+ :param target_peapod: a regex string represent the certain peas/pods request targeted
:param kwargs: additional arguments
:yield: request
"""
@@ -52,12 +49,23 @@ def request_generator(
_kwargs = dict(mime_type=mime_type, weight=1.0, extra_kwargs=kwargs)
try:
- if not isinstance(data, Iterable):
- data = [data]
- for batch in batch_iterator(data, request_size):
- yield _new_request_from_batch(
- _kwargs, batch, data_type, mode, queryset, **kwargs
+ if data is None:
+ # this allows empty inputs, i.e. a data request with only parameters
+ yield _new_data_request(
+ endpoint=exec_endpoint, target=target_peapod, parameters=parameters
)
+ else:
+ if not isinstance(data, Iterable):
+ data = [data]
+ for batch in batch_iterator(data, request_size):
+ yield _new_data_request_from_batch(
+ _kwargs=kwargs,
+ batch=batch,
+ data_type=data_type,
+ endpoint=exec_endpoint,
+ target=target_peapod,
+ parameters=parameters,
+ )
except Exception as ex:
# must be handled here, as grpc channel wont handle Python exception
diff --git a/jina/clients/request/asyncio.py b/jina/clients/request/asyncio.py
index 50643c3dbd595..5ea9914e80275 100644
--- a/jina/clients/request/asyncio.py
+++ b/jina/clients/request/asyncio.py
@@ -1,49 +1,60 @@
"""Module for async requests generator."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-from typing import Iterator, Union, AsyncIterator, Optional
+from typing import AsyncIterator, Optional, Dict
-from .helper import _new_request_from_batch
+from .helper import _new_data_request_from_batch, _new_data_request
from .. import GeneratorSourceType
-from ... import Request
-from ...enums import RequestType, DataInputType
+from ...enums import DataInputType
from ...importer import ImportExtensions
from ...logging import default_logger
-from ...types.arrays.querylang import AcceptQueryLangType
+from ...types.request import Request
async def request_generator(
+ exec_endpoint: str,
data: GeneratorSourceType,
request_size: int = 0,
- mode: RequestType = RequestType.INDEX,
mime_type: Optional[str] = None,
- queryset: Optional[
- Union[AcceptQueryLangType, Iterator[AcceptQueryLangType]]
- ] = None,
data_type: DataInputType = DataInputType.AUTO,
+ target_peapod: Optional[str] = None,
+ parameters: Optional[Dict] = None,
**kwargs, # do not remove this, add on purpose to suppress unknown kwargs
) -> AsyncIterator['Request']:
"""An async :function:`request_generator`.
+ :param exec_endpoint: the endpoint string, by convention starts with `/`
:param data: the data to use in the request
:param request_size: the request size for the client
- :param mode: the request mode (index, search etc.)
:param mime_type: mime type
- :param queryset: querylang set of queries
:param data_type: if ``data`` is an iterator over self-contained document, i.e. :class:`DocumentSourceType`;
or an iterator over possible Document content (set to text, blob and buffer).
- :param kwargs: additional key word arguments
+ :param parameters: the kwargs that will be sent to the executor
+ :param target_peapod: a regex string represent the certain peas/pods request targeted
+ :param kwargs: additional arguments
:yield: request
"""
- _kwargs = dict(mime_type=mime_type, weight=1.0)
+
+ _kwargs = dict(mime_type=mime_type, weight=1.0, extra_kwargs=kwargs)
try:
- with ImportExtensions(required=True):
- import aiostream
+ if data is None:
+ # this allows empty inputs, i.e. a data request with only parameters
+ yield _new_data_request(
+ endpoint=exec_endpoint, target=target_peapod, parameters=parameters
+ )
+ else:
+ with ImportExtensions(required=True):
+ import aiostream
- async for batch in aiostream.stream.chunks(data, request_size):
- yield _new_request_from_batch(_kwargs, batch, data_type, mode, queryset)
+ async for batch in aiostream.stream.chunks(data, request_size):
+ yield _new_data_request_from_batch(
+ _kwargs=kwargs,
+ batch=batch,
+ data_type=data_type,
+ endpoint=exec_endpoint,
+ target=target_peapod,
+ parameters=parameters,
+ )
except Exception as ex:
# must be handled here, as grpc channel wont handle Python exception
default_logger.critical(f'inputs is not valid! {ex!r}', exc_info=True)
diff --git a/jina/clients/request/helper.py b/jina/clients/request/helper.py
index afc30118d876c..149c4a17a66d3 100644
--- a/jina/clients/request/helper.py
+++ b/jina/clients/request/helper.py
@@ -1,10 +1,41 @@
"""Module for helper functions for clients."""
-from typing import Tuple, Sequence
+from typing import Tuple
-from ... import Document, Request
-from ...enums import DataInputType, RequestType
+from ... import Document
+from ...enums import DataInputType
from ...excepts import BadDocType, BadRequestType
-from ...excepts import RequestTypeError
+from ...types.request import Request
+
+
+def _new_data_request_from_batch(
+ _kwargs, batch, data_type, endpoint, target, parameters
+):
+ req = _new_data_request(endpoint, target, parameters)
+
+ # add docs, groundtruths fields
+ try:
+ _add_docs_groundtruths(req, batch, data_type, _kwargs)
+ except Exception as ex:
+ raise BadRequestType(
+ f'error when building {req.request_type} from {batch}'
+ ) from ex
+
+ return req
+
+
+def _new_data_request(endpoint, target, parameters):
+ req = Request()
+ req.request_type = 'data'
+
+ # set up header
+ if endpoint:
+ req.header.exec_endpoint = endpoint
+ if target:
+ req.header.target_peapod = target
+ # add parameters field
+ if parameters:
+ req.parameters.update(parameters)
+ return req
def _new_doc_from_data(
@@ -32,42 +63,6 @@ def _build_doc_from_content():
return _build_doc_from_content()
-def _new_request_from_batch(_kwargs, batch, data_type, mode, queryset, **kwargs):
- req = Request()
- req.request_type = str(mode)
-
- try:
- # add type-specific fields
- if (
- mode == RequestType.INDEX
- or mode == RequestType.SEARCH
- or mode == RequestType.TRAIN
- or mode == RequestType.UPDATE
- ):
- if 'extra_kwargs' in _kwargs:
- _kwargs.pop('extra_kwargs') #: data request do not need extra kwargs
- _add_docs_groundtruths(req, batch, data_type, _kwargs)
- elif mode == RequestType.DELETE:
- _add_ids(req, batch)
- elif mode == RequestType.CONTROL:
- _add_control_propagate(req, _kwargs)
- else:
- raise RequestTypeError(
- f'generating request from {mode} is not yet supported'
- )
- except Exception as ex:
- raise BadRequestType(
- f'error when building {req.request_type} from {batch}'
- ) from ex
-
- # add common fields
- if isinstance(queryset, Sequence):
- req.queryset.extend(queryset)
- elif queryset is not None:
- req.queryset.append(queryset)
- return req
-
-
def _add_docs_groundtruths(req, batch, data_type, _kwargs):
for content in batch:
if isinstance(content, tuple) and len(content) == 2:
@@ -83,11 +78,6 @@ def _add_docs_groundtruths(req, batch, data_type, _kwargs):
req.docs.append(d)
-def _add_ids(req, batch):
- string_ids = (str(doc_id) for doc_id in batch)
- req.ids.extend(string_ids)
-
-
def _add_control_propagate(req, kwargs):
from ...proto import jina_pb2
@@ -113,5 +103,3 @@ def _add_control_propagate(req, kwargs):
raise ValueError(
f'command "{command}" is not supported, must be one of {_available_commands}'
)
- req.targets.extend(extra_kwargs.get('targets', []))
- req.control.propagate = True
diff --git a/jina/clients/sugary_io.py b/jina/clients/sugary_io.py
deleted file mode 100644
index 2ffbb4b93ba07..0000000000000
--- a/jina/clients/sugary_io.py
+++ /dev/null
@@ -1,189 +0,0 @@
-"""A module for sugary API wrapper around the clients."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import csv
-import glob
-import itertools as it
-import json
-import os
-import random
-from typing import List, Union, Iterator, Iterable, Dict, Generator, Optional
-
-import numpy as np
-
-if False:
- from jina import Document
-
-# https://github.com/ndjson/ndjson.github.io/issues/1#issuecomment-109935996
-_jsonl_ext = {'.jsonlines', '.ndjson', '.jsonl', '.jl', '.ldjson'}
-_csv_ext = {'.csv', '.tcsv'}
-
-
-def _sample(iterable, sampling_rate: Optional[float] = None):
- for i in iterable:
- if sampling_rate is None or random.random() < sampling_rate:
- yield i
-
-
-def _subsample(
- iterable, size: Optional[int] = None, sampling_rate: Optional[float] = None
-):
- yield from it.islice(_sample(iterable, sampling_rate), size)
-
-
-def _input_lines(
- lines: Optional[Iterable[str]] = None,
- filepath: Optional[str] = None,
- read_mode: str = 'r',
- line_format: str = 'json',
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
-) -> Generator[Union[str, 'Document'], None, None]:
- """Generator function for lines, json and sc. Yields documents or strings.
-
- :param lines: a list of strings, each is considered as a document
- :param filepath: a text file that each line contains a document
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in binary
- :param line_format: the format of each line ``json`` or ``csv``
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :yields: documents
-
- .. note::
- This function should not be directly used, use :meth:`Flow.index_files`, :meth:`Flow.search_files` instead
- """
- if filepath:
- file_type = os.path.splitext(filepath)[1]
- with open(filepath, read_mode) as f:
- if file_type in _jsonl_ext:
- yield from _input_ndjson(f)
- elif file_type in _csv_ext:
- yield from _input_csv(f, field_resolver, size, sampling_rate)
- else:
- yield from _subsample(f, size, sampling_rate)
- elif lines:
- if line_format == 'json':
- yield from _input_ndjson(lines)
- elif line_format == 'csv':
- yield from _input_csv(lines, field_resolver, size, sampling_rate)
- else:
- yield from _subsample(lines, size, sampling_rate)
- else:
- raise ValueError('"filepath" and "lines" can not be both empty')
-
-
-def _input_ndjson(
- fp: Iterable[str],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
-):
- from jina import Document
-
- for line in _subsample(fp, size, sampling_rate):
- value = json.loads(line)
- if 'groundtruth' in value and 'document' in value:
- yield Document(value['document'], field_resolver), Document(
- value['groundtruth'], field_resolver
- )
- else:
- yield Document(value, field_resolver)
-
-
-def _input_csv(
- fp: Iterable[str],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
-):
- from jina import Document
-
- lines = csv.DictReader(fp)
- for value in _subsample(lines, size, sampling_rate):
- if 'groundtruth' in value and 'document' in value:
- yield Document(value['document'], field_resolver), Document(
- value['groundtruth'], field_resolver
- )
- else:
- yield Document(value, field_resolver)
-
-
-def _input_files(
- patterns: Union[str, List[str]],
- recursive: bool = True,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: Optional[str] = None,
-) -> Iterator[Union[str, bytes]]:
- """Creates an iterator over a list of file path or the content of the files.
-
- :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]'
- :param recursive: If recursive is true, the pattern '**' will match any files
- and zero or more directories and subdirectories
- :param size: the maximum number of the files
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file is opened.
- 'r' for reading in text mode, 'rb' for reading in binary mode.
- If `read_mode` is None, will iterate over filenames.
- :yield: file paths or binary content
-
- .. note::
- This function should not be directly used, use :meth:`Flow.index_files`, :meth:`Flow.search_files` instead
- """
- if read_mode not in {'r', 'rb', None}:
- raise RuntimeError(f'read_mode should be "r", "rb" or None, got {read_mode}')
-
- def _iter_file_exts(ps):
- return it.chain.from_iterable(glob.iglob(p, recursive=recursive) for p in ps)
-
- d = 0
- if isinstance(patterns, str):
- patterns = [patterns]
- for g in _iter_file_exts(patterns):
- if sampling_rate is None or random.random() < sampling_rate:
- if read_mode is None:
- yield g
- elif read_mode in {'r', 'rb'}:
- with open(g, read_mode) as fp:
- yield fp.read()
- d += 1
- if size is not None and d > size:
- break
-
-
-def _input_ndarray(
- array: 'np.ndarray',
- axis: int = 0,
- size: Optional[int] = None,
- shuffle: bool = False,
-) -> Generator['np.ndarray', None, None]:
- """Create a generator for a given dimension of a numpy array.
-
- :param array: the numpy ndarray data source
- :param axis: iterate over that axis
- :param size: the maximum number of the sub arrays
- :param shuffle: shuffle the numpy data source beforehand
- :yield: ndarray
-
- .. note::
- This function should not be directly used, use :meth:`Flow.index_ndarray`, :meth:`Flow.search_ndarray` instead
- """
- if shuffle:
- # shuffle for random query
- array = np.take(array, np.random.permutation(array.shape[0]), axis=axis)
- d = 0
- for r in array:
- yield r
- d += 1
- if size is not None and d >= size:
- break
-
-
-# for back-compatibility
-_input_numpy = _input_ndarray
diff --git a/jina/clients/websocket.py b/jina/clients/websocket.py
index 6a66b02829063..835cd227dee00 100644
--- a/jina/clients/websocket.py
+++ b/jina/clients/websocket.py
@@ -47,7 +47,6 @@ async def _get_results(
self.inputs = inputs
- tname = self._get_task_name(kwargs)
req_iter = self._get_requests(**kwargs)
try:
client_info = f'{self.args.host}:{self.args.port_expose}'
@@ -78,7 +77,7 @@ async def _send_requests(request_iterator):
# There is nothing to send, disconnect gracefully
await websocket.close(reason='No data to send')
- with ProgressBar(task_name=tname) as p_bar, TimeContext(tname):
+ with ProgressBar() as p_bar, TimeContext(''):
# Unlike gRPC, any arbitrary function (generator) cannot be passed via websockets.
# Simply iterating through the `req_iter` makes the request-response sequential.
# To make client unblocking, :func:`send_requests` and `recv_responses` are separate tasks
diff --git a/jina/docker/checker.py b/jina/docker/checker.py
index fc446339864eb..28495452c5def 100644
--- a/jina/docker/checker.py
+++ b/jina/docker/checker.py
@@ -1,6 +1,4 @@
"""Module for validation functions."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
import os
import re
diff --git a/jina/docker/helper.py b/jina/docker/helper.py
index bd37641388726..483b9fdca9a78 100644
--- a/jina/docker/helper.py
+++ b/jina/docker/helper.py
@@ -1,6 +1,4 @@
"""Module for helper functions for Docker."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
from pathlib import Path
diff --git a/jina/docker/hubio.py b/jina/docker/hubio.py
index 735566689c7b6..dccc3c846a647 100644
--- a/jina/docker/hubio.py
+++ b/jina/docker/hubio.py
@@ -1,6 +1,4 @@
"""Module for wrapping Jina Hub API calls."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
import argparse
import glob
diff --git a/jina/drivers/__init__.py b/jina/drivers/__init__.py
deleted file mode 100644
index 391fcfe01dfb6..0000000000000
--- a/jina/drivers/__init__.py
+++ /dev/null
@@ -1,732 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import inspect
-import typing
-from functools import wraps
-from typing import (
- Any,
- Dict,
- Callable,
- Tuple,
- Optional,
- Sequence,
- Iterable,
- List,
- Union,
-)
-
-import numpy as np
-from google.protobuf.struct_pb2 import Struct
-
-from ..enums import OnErrorStrategy
-from ..excepts import LengthMismatchException
-from ..executors.compound import CompoundExecutor
-from ..executors.decorators import wrap_func
-from ..helper import (
- convert_tuple_to_list,
- cached_property,
- find_request_binding,
- _canonical_request_name,
-)
-from ..jaml import JAMLCompatible
-from ..types.querylang import QueryLang
-from ..types.arrays import DocumentArray
-
-# noinspection PyUnreachableCode
-if False:
- # fix type-hint complain for sphinx and flake
- from ..peapods.runtimes.zmq.zed import ZEDRuntime
- from ..executors import AnyExecutor
- from ..logging.logger import JinaLogger
- from ..types.message import Message
- from ..types.request import Request
- from ..types.arrays import QueryLangArray
- from ..types.document import Document
-
-
-def store_init_kwargs(func: Callable) -> Callable:
- """Mark the args and kwargs of :func:`__init__` later to be stored via :func:`save_config` in YAML
-
- :param func: the Callable to wrap
- :return: the wrapped Callable
- """
-
- @wraps(func)
- def _arg_wrapper(self, *args, **kwargs):
- if func.__name__ != '__init__':
- raise TypeError(
- 'this decorator should only be used on __init__ method of a driver'
- )
- taboo = {'self', 'args', 'kwargs'}
- all_pars = inspect.signature(func).parameters
- tmp = {k: v.default for k, v in all_pars.items() if k not in taboo}
- tmp_list = [k for k in all_pars.keys() if k not in taboo]
- # set args by aligning tmp_list with arg values
- for k, v in zip(tmp_list, args):
- tmp[k] = v
- # set kwargs
- for k, v in kwargs.items():
- if k in tmp:
- tmp[k] = v
-
- if self.store_args_kwargs:
- if args:
- tmp['args'] = args
- if kwargs:
- tmp['kwargs'] = {k: v for k, v in kwargs.items() if k not in taboo}
-
- if hasattr(self, '_init_kwargs_dict'):
- self._init_kwargs_dict.update(tmp)
- else:
- self._init_kwargs_dict = tmp
- convert_tuple_to_list(self._init_kwargs_dict)
- f = func(self, *args, **kwargs)
- return f
-
- return _arg_wrapper
-
-
-class QuerySetReader:
- """
- :class:`QuerySetReader` allows a driver to read arguments from the protobuf message. This allows a
- driver to override its behavior based on the message it receives. Extremely useful in production, for example,
- get ``top_k`` results, doing pagination, filtering.
-
- To register the field you want to read from the message, simply register them in :meth:`__init__`.
- For example, ``__init__(self, arg1, arg2, **kwargs)`` will allow the driver to read field ``arg1`` and ``arg2`` from
- the message. When they are not found in the message, the value ``_arg1`` and ``_arg2`` will be used. Note the underscore
- prefix.
-
- .. note::
- - To set default value of ``arg1``, use ``self._arg1 =``, note the underscore in the front.
- - To access ``arg1``, simply use ``self.arg1``. It automatically switch between default ``_arg1`` and ``arg1`` from the request.
-
- For successful value reading, the following condition must be met:
-
- - the ``name`` in the proto must match with the current class name
- - the ``disabled`` field in the proto should not be ``False``
- - the ``priority`` in the proto should be strictly greater than the driver's priority (by default is 0)
- - the field name must exist in proto's ``parameters``
-
- .. warning::
- For the sake of cooperative multiple inheritance, do NOT implement :meth:`__init__` for this class
- """
-
- @property
- def as_querylang(self):
- """Render as QueryLang parameters.
-
-
- .. # noqa: DAR201"""
- parameters = {
- name: getattr(self, name) for name in self._init_kwargs_dict.keys()
- }
- return QueryLang(
- {
- 'name': self.__class__.__name__,
- 'priority': self._priority,
- 'parameters': parameters,
- }
- )
-
- def _get_parameter(self, key: str, default: Any):
- if getattr(self, 'queryset', None):
- for q in self.queryset:
- if (
- not q.disabled
- and self.__class__.__name__ == q.name
- and q.priority > self._priority
- and key in q.parameters
- ):
- ret = q.parameters[key]
- return dict(ret) if isinstance(ret, Struct) else ret
- return getattr(self, f'_{key}', default)
-
- def __getattr__(self, name: str):
- # https://docs.python.org/3/reference/datamodel.html#object.__getattr__
- if name == '_init_kwargs_dict':
- # raise attribute error to avoid recursive call
- raise AttributeError
- if name in self._init_kwargs_dict:
- return self._get_parameter(name, default=self._init_kwargs_dict[name])
- raise AttributeError
-
-
-class DriverType(type(JAMLCompatible), type):
- """A meta class representing a Driver
-
- When a new Driver is created, it gets registered
- """
-
- def __new__(cls, *args, **kwargs):
- """Create and register a new class with this meta class.
-
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- :return: the newly registered class
- """
- _cls = super().__new__(cls, *args, **kwargs)
- return cls.register_class(_cls)
-
- @staticmethod
- def register_class(cls):
- """Register a class
-
- :param cls: the class
- :return: the class, after being registered
- """
- reg_cls_set = getattr(cls, '_registered_class', set())
- if cls.__name__ not in reg_cls_set or getattr(cls, 'force_register', False):
- wrap_func(cls, ['__init__'], store_init_kwargs)
- # wrap_func(cls, ['__call__'], as_reduce_method)
-
- reg_cls_set.add(cls.__name__)
- setattr(cls, '_registered_class', reg_cls_set)
- return cls
-
-
-class BaseDriver(JAMLCompatible, metaclass=DriverType):
- """A :class:`BaseDriver` is a logic unit above the :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`.
- It reads the protobuf message, extracts/modifies the required information and then return
- the message back to :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`.
-
- A :class:`BaseDriver` needs to be :attr:`attached` to a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` before
- using. This is done by :func:`attach`. Note that a deserialized :class:`BaseDriver` from file is always unattached.
-
- :param priority: the priority of its default arg values (hardcoded in Python). If the
- received ``QueryLang`` has a higher priority, it will override the hardcoded value
- :param args: not used (kept to maintain interface)
- :param kwargs: not used (kept to maintain interface)
- """
-
- store_args_kwargs = False #: set this to ``True`` to save ``args`` (in a list) and ``kwargs`` (in a map) in YAML config
-
- def __init__(self, priority: int = 0, *args, **kwargs):
- self.attached = False # : represent if this driver is attached to a
- # :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` (& :class:`jina.executors.BaseExecutor`)
- self.runtime = None # type: Optional['ZEDRuntime']
- self._priority = priority
-
- def attach(self, runtime: 'ZEDRuntime', *args, **kwargs) -> None:
- """Attach this driver to a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`
-
- :param runtime: the pea to be attached
- :param args: not used (kept to maintain interface)
- :param kwargs: not used (kept to maintain interface)
- """
- self.runtime = runtime
- self.attached = True
-
- @property
- def req(self) -> 'Request':
- """Get the current (typed) request, shortcut to ``self.runtime.request``
-
-
- .. # noqa: DAR201
- """
- return self.runtime.request
-
- @property
- def partial_reqs(self) -> Sequence['Request']:
- """The collected partial requests under the current ``request_id``
-
-
- .. # noqa: DAR401
-
-
- .. # noqa: DAR201
- """
- if self.expect_parts > 1:
- return self.runtime.partial_requests
- else:
- raise ValueError(
- f'trying to access all partial requests, '
- f'but {self.runtime} has only one message'
- )
-
- @property
- def expect_parts(self) -> int:
- """The expected number of partial messages
-
-
- .. # noqa: DAR201
- """
- return self.runtime.expect_parts
-
- @property
- def docs(self) -> 'DocumentArray':
- """The DocumentArray after applying the traversal
-
-
- .. # noqa: DAR201"""
- from ..types.arrays import DocumentArray
-
- if self.expect_parts > 1:
- return DocumentArray(
- [d for r in reversed(self.partial_reqs) for d in r.docs]
- )
- else:
- return self.req.docs
-
- @property
- def msg(self) -> 'Message':
- """Get the current request, shortcut to ``self.runtime.message``
-
-
- .. # noqa: DAR201
- """
- return self.runtime.message
-
- @property
- def queryset(self) -> 'QueryLangArray':
- """
-
-
- .. # noqa: DAR101
-
-
- .. # noqa: DAR102
-
-
- .. # noqa: DAR201
- """
- if self.msg:
- return self.msg.request.queryset
- else:
- return []
-
- @property
- def logger(self) -> 'JinaLogger':
- """Shortcut to ``self.runtime.logger``
-
-
- .. # noqa: DAR201
- """
- return self.runtime.logger
-
- def __call__(self, *args, **kwargs) -> None:
- """
-
-
- .. # noqa: DAR102
-
-
- .. # noqa: DAR101
- """
- raise NotImplementedError
-
- def __eq__(self, other):
- return self.__class__ == other.__class__
-
- def __getstate__(self) -> Dict[str, Any]:
- """
- Unlike `Executor`, driver is stateless.
-
- Therefore, on every save, it creates a new & empty driver object and save it.
- :return: the state in dict form
- """
-
- d = dict(self.__class__(**self._init_kwargs_dict).__dict__)
- return d
-
-
-class ContextAwareRecursiveMixin:
- """
- The full data structure version of :class:`FlatRecursiveMixin`, to be mixed in with :class:`BaseRecursiveDriver`.
- It uses :meth:`traverse` in :class:`DocumentArray` and allows direct manipulation of Chunk-/Match-/DocumentArrays.
-
- .. seealso::
- https://github.com/jina-ai/jina/issues/1932
-
- """
-
- def __call__(self, *args, **kwargs):
- """Traverse with _apply_all
-
- :param args: args forwarded to ``_apply_all``
- :param kwargs: kwargs forwarded to ``_apply_all``
- """
- document_sets = self.docs.traverse(self._traversal_paths)
- self._apply_all(document_sets, *args, **kwargs)
-
- def _apply_all(
- self,
- doc_sequences: Iterable['DocumentArray'],
- *args,
- **kwargs,
- ) -> None:
- """Apply function works on an Iterable of DocumentArray, modify the docs in-place.
-
- Each DocumentArray refers to a leaf (e.g. roots, matches or chunks wrapped
- in a :class:`jina.DocumentArray`) in the traversal_paths. Modifications on the
- DocumentArrays (e.g. adding or deleting Documents) are directly applied on the underlying objects.
- Adding a chunk to a ChunkArray results in adding a chunk to the parent Document.
-
- :param doc_sequences: the Documents that should be handled
- :param args: driver specific arguments, which might be forwarded to the Executor
- :param kwargs: driver specific arguments, which might be forwarded to the Executor
- """
-
-
-class FlatRecursiveMixin:
- """
- The batch optimized version of :class:`ContextAwareRecursiveMixin`, to be mixed in with :class:`BaseRecursiveDriver`.
- It uses :meth:`traverse_flattened_per_path` in :class:`DocumentArray` and yield much better performance
- when no context is needed and batching is possible.
-
- .. seealso::
- https://github.com/jina-ai/jina/issues/1932
-
- """
-
- def __call__(self, *args, **kwargs):
- """Traverse with _apply_all
-
- :param args: args forwarded to ``_apply_all``
- :param kwargs: kwargs forwarded to ``_apply_all``
- """
- path_documents = self.docs.traverse_flattened_per_path(self._traversal_paths)
- for documents in path_documents:
- if documents:
- self._apply_all(documents, *args, **kwargs)
-
- def _apply_all(
- self,
- docs: 'DocumentArray',
- *args,
- **kwargs,
- ) -> None:
- """Apply function works on a list of docs, modify the docs in-place.
-
- The list refers to all reachable leaves of a single ``traversal_path``.
-
- :param docs: the Documents that should be handled
- :param args: driver specific arguments, which might be forwarded to the Executor
- :param kwargs: driver specific arguments, which might be forwarded to the Executor
-
- """
-
-
-class DocsExtractUpdateMixin:
- """
- A Driver pattern for extracting attributes from Documents, feeding to an executor and updating the Documents with
- the results.
-
- Drivers equipped with this mixin will have :method:`_apply_all` inherited.
-
- The :method:`_apply_all` implements the following logics:
- - From ``docs``, it extracts the attributes defined :method:`exec_fn`'s arguments.
- - It feeds the attributes to the bind executor's :method:`exec_fn`.
- - It updates ``docs`` with results returned from :method:`exec_fn`
-
- The following shortcut logics are implemented:
- - while extracting: attributes defined :method:`exec_fn`'s arguments are extracted from ``docs``;
- - while extracting: attributes annotated with ``ndarray`` are stacked into Numpy NdArray objects;
- - while updating: if ``exec_fn`` returns a List of Dict, then ``doc.set_attrs(**exec_result)`` is called;
- - while updating: if ``exec_fn`` returns a Document, then ``doc.update(exec_result)` is called.
- - while updating: if none of above applies, then calling :meth:`update_single_doc`
-
- To override the update behavior, you can choose to override:
- - :meth:`update_docs` if you want to modify the behavior of updating docs in bulk
- - :meth:`update_single_doc` if you want to modify the behavior of updating a single doc
- """
-
- @property
- def _stack_document_content(self):
- return self._exec_fn_required_keys_is_ndarray
-
- def _apply_all(self, docs: 'DocumentArray') -> None:
- """Apply function works on a list of docs, modify the docs in-place.
-
- The list refers to all reachable leaves of a single ``traversal_path``.
-
- :param docs: the Documents that should be handled
- """
-
- contents, docs_pts = docs.extract_docs(
- *self._exec_fn_required_keys,
- stack_contents=self._stack_document_content,
- )
-
- if docs_pts:
- if len(self._exec_fn_required_keys) > 1:
- exec_results = self.exec_fn(*contents)
- else:
- exec_results = self.exec_fn(contents)
-
- if exec_results is not None:
- # if exec_fn returns None then exec_fn is assumed to be immutable wrt. doc, hence skipped
-
- try:
- len_results = len(exec_results)
- except:
- try:
- len_results = exec_results.shape[0]
- except:
- len_results = None
-
- if len(docs_pts) != len_results:
- msg = (
- f'mismatched {len(docs_pts)} docs from level {docs_pts[0].granularity} '
- f'and length of returned: {len_results}, their length must be the same'
- )
- raise LengthMismatchException(msg)
-
- self.update_docs(docs_pts, exec_results)
-
- def update_docs(
- self,
- docs_pts: 'DocumentArray',
- exec_results: Union[List[Dict], List['Document'], Any],
- ) -> None:
- """
- Update Documents with the Executor returned results.
-
- :param: docs_pts: the set of document to be updated
- :param: exec_results: the results from :meth:`exec_fn`
- """
- from ..types.document import Document
-
- if self._exec_fn_return_is_ndarray and not isinstance(exec_results, np.ndarray):
- r_type = type(exec_results).__name__
- if r_type in {'EagerTensor', 'Tensor', 'list'}:
- exec_results = np.array(exec_results, dtype=np.float32)
- else:
- raise TypeError(f'unrecognized type {exec_results!r}')
-
- for doc, exec_result in zip(docs_pts, exec_results):
- if isinstance(exec_result, dict):
- doc.set_attrs(**exec_result)
- elif isinstance(exec_result, Document):
- # doc id should not be override with this method
- doc.update(exec_result, exclude_fields=('id',))
- else:
- self.update_single_doc(doc, exec_result)
-
- def update_single_doc(self, doc: 'Document', exec_result: Any) -> None:
- """Update a single Document with the Executor returned result.
-
- :param doc: the Document object
- :param exec_result: the single result from :meth:`exec_fn`
- """
- raise NotImplementedError
-
- @cached_property
- def _exec_fn_required_keys(self) -> List[str]:
- """Get the arguments of :attr:`exec_fn`.
-
- If ``strict_method_args`` set, then all arguments of :attr:`exec_fn` must be valid :class:`Document` attribute.
-
- :return: a list of supported arguments
- """
-
- if not self.exec_fn:
- raise ValueError(
- f'`exec_fn` is None, maybe {self} is not attached? call `self.attach`.'
- )
-
- required_keys = [
- k
- for k in inspect.getfullargspec(inspect.unwrap(self.exec_fn)).args
- if k != 'self'
- ]
- if not required_keys:
- raise AttributeError(f'{self.exec_fn} takes no argument.')
-
- if not self._strict_method_args:
- return required_keys
-
- from .. import Document
-
- support_keys = Document.get_all_attributes()
- unrecognized_keys = set(required_keys).difference(support_keys)
-
- if not unrecognized_keys:
- return required_keys
-
- from ..proto import jina_pb2
-
- camel_keys = set(jina_pb2.DocumentProto().DESCRIPTOR.fields_by_camelcase_name)
- legacy_keys = {'data'}
- unrecognized_camel_keys = unrecognized_keys.intersection(camel_keys)
- if unrecognized_camel_keys:
- raise AttributeError(
- f'{unrecognized_camel_keys} are supported but you give them in CamelCase, '
- f'please rewrite them in canonical form.'
- )
- elif unrecognized_keys.intersection(legacy_keys):
- raise AttributeError(
- f'{unrecognized_keys.intersection(legacy_keys)} is now deprecated and not a valid argument of '
- 'the executor function, '
- 'please change `data` to `content: \'np.ndarray\'` in your executor function. '
- 'details: https://github.com/jina-ai/jina/pull/2313/'
- )
- else:
- raise AttributeError(
- f'{unrecognized_keys} are invalid Document attributes, must come from {support_keys}'
- )
-
- return required_keys
-
- @cached_property
- def _exec_fn_required_keys_is_ndarray(self) -> List[bool]:
- """Return a list of boolean indicators for showing if a key is annotated as ndarray
-
- :return: a list of boolean idicator, True if the corresponding key is annotated as ndarray
- """
-
- try:
- anno = typing.get_type_hints((inspect.unwrap(self.exec_fn)))
- return [
- anno.get(k, None) == np.ndarray for k in self._exec_fn_required_keys
- ]
- except NameError:
- return [False] * len(self._exec_fn_required_keys)
-
- @cached_property
- def _exec_fn_return_is_ndarray(self) -> bool:
- """Return a boolean value for showing if the return of :meth:`exec_fn` is annotated as `ndarray`
-
- :return: a bool indicator
- """
- try:
- return (
- typing.get_type_hints((inspect.unwrap(self.exec_fn))).get(
- 'return', None
- )
- == np.ndarray
- )
- except NameError:
- return False
-
-
-class BaseRecursiveDriver(BaseDriver):
- """A :class:`BaseRecursiveDriver` is an abstract Driver class containing information about the `traversal_paths`
- that a `Driver` must apply its logic.
- It is intended to be mixed in with either :class:`FlatRecursiveMixin` or :class:`ContextAwareRecursiveMixin`
- """
-
- def __init__(self, traversal_paths: Tuple[str] = ('c', 'r'), *args, **kwargs):
- """Initialize a :class:`BaseRecursiveDriver`
-
- :param traversal_paths: Describes the leaves of the document tree on which _apply_all are called
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
- super().__init__(*args, **kwargs)
- self._traversal_paths = [path.lower() for path in traversal_paths]
-
-
-class BaseExecutableDriver(BaseRecursiveDriver):
- """A :class:`BaseExecutableDriver` is an intermediate logic unit between the :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` and :class:`jina.executors.BaseExecutor`
- It reads the protobuf message, extracts/modifies the required information and then sends to the :class:`jina.executors.BaseExecutor`,
- finally it returns the message back to :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`.
-
- A :class:`BaseExecutableDriver` needs to be :attr:`attached` to a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` and :class:`jina.executors.BaseExecutor` before using.
- This is done by :func:`attach`. Note that a deserialized :class:`BaseDriver` from file is always unattached.
- """
-
- def __init__(
- self,
- executor: Optional[str] = None,
- method: Optional[str] = None,
- strict_method_args: bool = True,
- *args,
- **kwargs,
- ):
- """Initialize a :class:`BaseExecutableDriver`
-
- :param executor: the name of the sub-executor, only necessary when :class:`jina.executors.compound.CompoundExecutor` is used
- :param method: the function name of the executor that the driver feeds to
- :param strict_method_args: if set, then the input args of ``executor.method`` must be valid :class:`Document` attributes
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
- super().__init__(*args, **kwargs)
- self._executor_name = executor
- self._method_name = method
- self._strict_method_args = strict_method_args
- self._exec = None
- self._exec_fn = None
-
- @property
- def exec(self) -> 'AnyExecutor':
- """the executor that to which the instance is attached
-
-
- .. # noqa: DAR201
- """
- return self._exec
-
- @property
- def exec_fn(self) -> Callable:
- """the function of :func:`jina.executors.BaseExecutor` to call
-
- :return: the Callable to execute in the driver
- """
- if not self.runtime:
- return self._exec_fn
- elif (
- not self.msg.is_error
- or self.runtime.args.on_error_strategy < OnErrorStrategy.SKIP_EXECUTOR
- ):
- return self._exec_fn
- else:
- return lambda *args, **kwargs: None
-
- def attach(
- self, executor: 'AnyExecutor', req_type: Optional[str] = None, *args, **kwargs
- ) -> None:
- """Attach the driver to a :class:`jina.executors.BaseExecutor`
-
- :param executor: the executor to which we attach
- :param req_type: the request type to attach to
- :param args: additional positional arguments for the call of super().attach()
- :param kwargs: additional key value arguments for the call of super().attach()
- """
- super().attach(*args, **kwargs)
- if self._executor_name and isinstance(executor, CompoundExecutor):
- if self._executor_name in executor:
- self._exec = executor[self._executor_name]
- else:
- for c in executor.components:
- if any(
- t.__name__ == self._executor_name for t in type.mro(c.__class__)
- ):
- self._exec = c
- break
- if self._exec is None:
- self.logger.critical(
- f'fail to attach the driver to {executor}, '
- f'no executor is named or typed as {self._executor_name}'
- )
- else:
- self._exec = executor
-
- if not self._method_name:
- decor_bindings = find_request_binding(self.exec.__class__)
- if req_type:
- canonic_name = _canonical_request_name(req_type)
- if canonic_name in decor_bindings:
- self._method_name = decor_bindings[canonic_name]
- elif 'default' in decor_bindings:
- self._method_name = decor_bindings['default']
- elif 'default' in decor_bindings:
- self._method_name = decor_bindings['default']
-
- if self._method_name:
- self._exec_fn = getattr(self.exec, self._method_name)
-
- def __getstate__(self) -> Dict[str, Any]:
- """Do not save the executor and executor function, as it would be cross-referencing and unserializable.
- In other words, a deserialized :class:`BaseExecutableDriver` from file is always unattached.
-
- :return: dictionary of state
- """
- d = super().__getstate__()
- if '_exec' in d:
- del d['_exec']
- if '_exec_fn' in d:
- del d['_exec_fn']
- return d
diff --git a/jina/drivers/cache.py b/jina/drivers/cache.py
deleted file mode 100644
index c257dada7396a..0000000000000
--- a/jina/drivers/cache.py
+++ /dev/null
@@ -1,94 +0,0 @@
-"""Module for the Drivers for the Cache."""
-import hashlib
-from typing import Any, Dict, List
-
-from .index import BaseIndexDriver
-
-# noinspection PyUnreachableCode
-if False:
- from .. import Document
- from ..types.arrays import DocumentArray
-
-
-class BaseCacheDriver(BaseIndexDriver):
- """A driver related to :class:`BaseCache`.
-
- :param with_serialization: feed serialized Document to the CacheIndexer
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, with_serialization: bool = False, *args, **kwargs):
- self.with_serialization = with_serialization
- super().__init__(*args, **kwargs)
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- if self._method_name == 'update':
- values = [BaseCacheDriver.hash_doc(d, self.exec.fields) for d in docs]
- self.exec_fn([d.id for d in docs], values)
- else:
- for d in docs:
- value = BaseCacheDriver.hash_doc(d, self.exec.fields)
- result = self.exec[value]
- if result:
- self.on_hit(d, result)
- else:
- self.on_miss(d, value)
-
- def on_miss(self, req_doc: 'Document', value: bytes) -> None:
- """Call when document is missing.
-
- The default behavior is to add to cache when miss.
-
- :param req_doc: the document in the request but missed in the cache
- :param value: the data besides the `req_doc.id` to be passed through to the executors
- """
- if self.with_serialization:
- self.exec_fn([req_doc.id], req_doc.SerializeToString(), [value])
- else:
- self.exec_fn([req_doc.id], [value])
-
- def on_hit(self, req_doc: 'Document', hit_result: Any) -> None:
- """Call when cache is hit for a document.
-
- :param req_doc: the document in the request and hit in the cache
- :param hit_result: the hit result returned by the cache
- """
- pass
-
- @staticmethod
- def hash_doc(doc: 'Document', fields: List[str]) -> bytes:
- """Calculate hash by which we cache.
-
- :param doc: the Document
- :param fields: the list of fields
- :return: the hash value of the fields
- """
- values = doc.get_attrs(*fields).values()
- data = ''
- for field, value in zip(fields, values):
- data += f'{field}:{value};'
- digest = hashlib.sha256(bytes(data.encode('utf8'))).digest()
- return digest
-
-
-class TaggingCacheDriver(BaseCacheDriver):
- """A driver for labelling the hit-cache docs with certain tags."""
-
- def __init__(self, tags: Dict, *args, **kwargs):
- """Create a new TaggingCacheDriver.
-
- :param tags: the tags to be updated on hit docs
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
- super().__init__(*args, **kwargs)
- self._tags = tags
-
- def on_hit(self, req_doc: 'Document', hit_result: Any) -> None:
- """Call when cache is hit for a document.
-
- :param req_doc: the document requested
- :param hit_result: the result of the hit
- """
- req_doc.tags.update(self._tags)
diff --git a/jina/drivers/control.py b/jina/drivers/control.py
deleted file mode 100644
index 93e552581f231..0000000000000
--- a/jina/drivers/control.py
+++ /dev/null
@@ -1,238 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import re
-import time
-
-from google.protobuf.json_format import MessageToJson
-
-from . import BaseDriver
-from ..excepts import UnknownControlCommand, RuntimeTerminated
-from ..proto import jina_pb2
-from ..types.querylang.queryset.dunderkey import dunder_get
-
-
-class BaseControlDriver(BaseDriver):
- """Control driver does not have access to the executor and it
- often works directly with protobuf layer instead Jina primitive types"""
-
- @property
- def envelope(self) -> 'jina_pb2.EnvelopeProto':
- """Get the current request, shortcut to ``self.runtime.message``
-
-
- .. # noqa: DAR201
- """
- return self.msg.envelope
-
-
-class LogInfoDriver(BaseControlDriver):
- """
- Log output the request info
-
- :param key: (str) that represents a first level or nested key in the dict
- :param json: (bool) indicating if the log output should be formatted as json
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, key: str = 'request', json: bool = True, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.key = key
- self.json = json
-
- def __call__(self, *args, **kwargs):
- """Log the information.
-
- :param args: unused
- :param kwargs: unused
- """
- data = dunder_get(self.msg.proto, self.key)
- if self.json:
- self.logger.info(MessageToJson(data))
- else:
- self.logger.info(data)
-
-
-class WaitDriver(BaseControlDriver):
- """Wait for some seconds, mainly for demo purpose"""
-
- def __call__(self, *args, **kwargs):
- """Wait for some seconds, mainly for demo purpose
-
-
- .. # noqa: DAR101
- """
- time.sleep(5)
-
-
-class ControlReqDriver(BaseControlDriver):
- """Handling the control request, by default it is installed for all :class:`jina.peapods.peas.BasePea`"""
-
- def __call__(self, *args, **kwargs):
- """Handle the request controlling.
-
- :param args: unused
- :param kwargs: unused
- """
- if self.req.command == 'TERMINATE':
- self.envelope.status.code = jina_pb2.StatusProto.SUCCESS
- raise RuntimeTerminated
- elif self.req.command == 'STATUS':
- self.envelope.status.code = jina_pb2.StatusProto.READY
- self.req.args = vars(self.runtime.args)
- elif self.req.command == 'IDLE':
- pass
- elif self.req.command == 'CANCEL':
- pass
- elif self.req.command == 'DUMP':
- self._dump()
- elif self.req.command == 'RELOAD':
- self._reload()
- elif self.req.command == 'ACTIVATE':
- # TODO (Joan): This is a hack, but I checked in devel-2.0 branch, this _handle_control_req will be moved into the `ZedRuntime` so this code
- # aligns very well with that view
- self.runtime._zmqlet._send_idle_to_router()
- elif self.req.command == 'DEACTIVATE':
- # TODO (Joan): This is a hack, but I checked in devel-2.0 branch, this _handle_control_req will be moved into the `ZedRuntime` so this code
- # aligns very well with that view
- self.runtime._zmqlet._send_cancel_to_router()
- else:
- raise UnknownControlCommand(f'don\'t know how to handle {self.req.command}')
-
- def _reload(self):
- # TODO should this be removed, since we now have proper rolling update?
- if self.req.targets and self.runtime.__class__.__name__ == 'ZEDRuntime':
- patterns = self.req.targets
- if isinstance(patterns, str):
- patterns = [patterns]
- for p in patterns:
- if re.match(p, self.runtime.name):
- self.logger.info(
- f'reloading the Executor `{self.runtime._executor.name}` in `{self.runtime.name}`'
- )
- self.runtime._load_executor()
- break
-
- def _dump(self):
- # TODO(Cristian): this is a smell, since we are accessing the private _executor
- # to be reconsidered after the Executor API refactoring
- if self.req.targets and self.runtime.__class__.__name__ == 'ZEDRuntime':
- patterns = self.req.targets
- if isinstance(patterns, str):
- patterns = [patterns]
- for p in patterns:
- if re.match(p, self.runtime.name):
- self.logger.info(
- f'Dumping from Executor `{self.runtime._executor.name}` in `{self.runtime.name}`'
- )
- req_dict = dict(self.req.args)
- self.runtime._executor.dump(
- req_dict.get('dump_path'), int(req_dict.get('shards'))
- )
- break
-
-
-class RouteDriver(ControlReqDriver):
- """Ensures that data requests are forwarded to the downstream `:class:`BasePea` ensuring
- that the load is balanced between parallel `:class:`BasePea` if the scheduling `:class:`SchedulerType` is LOAD_BALANCE.
-
- .. note::
- - The dealer never receives a control request from the router,
- every time it finishes a job and sends via out_sock, it returns the envelope with control
- request idle back to the router. The dealer also sends control request idle to the router
- when it first starts.
-
- - The router receives requests from both dealer and upstream pusher.
- if it is an upstream request, use LB to schedule the receiver,
- mark it in the envelope if it is a control request in
-
- :param raise_no_dealer: raise a RuntimeError when no available dealer
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, raise_no_dealer: bool = False, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.idle_dealer_ids = set()
- self.is_polling_paused = False
- self.raise_no_dealer = raise_no_dealer
-
- def __call__(self, *args, **kwargs):
- """Perform the routing.
-
- :param args: additional positional arguments which are just used for calling the parent
- :param kwargs: additional key value arguments which are just used for calling the parent
-
-
- .. # noqa: DAR401
- """
- if self.msg.is_data_request:
- self.logger.debug(self.idle_dealer_ids)
- if self.idle_dealer_ids:
- dealer_id = self.idle_dealer_ids.pop()
- self.envelope.receiver_id = dealer_id
- if not self.idle_dealer_ids:
- self.runtime._zmqlet.pause_pollin()
- self.is_polling_paused = True
- elif self.raise_no_dealer:
- raise RuntimeError(
- 'if this router connects more than one dealer, '
- 'then this error should never be raised. often when it '
- 'is raised, some Pods must fail to start, so please go '
- 'up and check the first error message in the log'
- )
- # else:
- # this FALLBACK to trivial message pass
- #
- # Explanation on the logic here:
- # there are two cases that when `idle_dealer_ids` is empty
- # (1) this driver is used in a PUSH-PULL fan-out setting,
- # where no dealer is registered in the first place, so `idle_dealer_ids` is empty
- # all the time
- # (2) this driver is used in a ROUTER-DEALER fan-out setting,
- # where some dealer is broken/fails to start, so `idle_dealer_ids` is empty
- # IDLE requests add the dealer id to the router. Therefore, it knows which dealer would be available for
- # new data requests.
- # CANCEL requests remove the dealer id from the router. Therefore, it can not send any more data requests
- # to the dealer.
- elif self.req.command == 'IDLE':
- self.idle_dealer_ids.add(self.envelope.receiver_id)
- self.logger.debug(
- f'{self.envelope.receiver_id} is idle, now I know these idle peas {self.idle_dealer_ids}'
- )
- if self.is_polling_paused:
- self.runtime._zmqlet.resume_pollin()
- self.is_polling_paused = False
- elif self.req.command == 'CANCEL':
- if self.envelope.receiver_id in self.idle_dealer_ids:
- self.idle_dealer_ids.remove(self.envelope.receiver_id)
- self.logger.debug(
- f'{self.envelope.receiver_id} is cancelled, now I know these idle peas {self.idle_dealer_ids}'
- )
- else:
- super().__call__(*args, **kwargs)
-
-
-class ForwardDriver(RouteDriver):
- """Alias to :class:`RouteDriver`"""
-
-
-class WhooshDriver(BaseControlDriver):
- """Play a whoosh! sound"""
-
- def __call__(self, *args, **kwargs):
- """Play a whoosh sound, used in 2021 April fools day
-
- .. # noqa: DAR101
- """
- import subprocess
- from pkg_resources import resource_filename
-
- whoosh_mp3 = resource_filename(
- 'jina', '/'.join(('resources', 'soundfx', 'whoosh.mp3'))
- )
-
- subprocess.Popen(
- f'ffplay -nodisp -autoexit {whoosh_mp3} >/dev/null 2>&1', shell=True
- )
diff --git a/jina/drivers/convert.py b/jina/drivers/convert.py
deleted file mode 100644
index 07d84f07d2a88..0000000000000
--- a/jina/drivers/convert.py
+++ /dev/null
@@ -1,96 +0,0 @@
-from ..drivers import FlatRecursiveMixin, BaseRecursiveDriver
-
-if False:
- from ..types.arrays import DocumentArray
-
-
-class ConvertDriver(FlatRecursiveMixin, BaseRecursiveDriver):
- """Drivers that make sure that specific conversions are applied to the documents.
-
- .. note::
- The list of functions that can be applied can be found in `:class:`Document`
- """
-
- def __init__(self, convert_fn: str, *args, **kwargs):
- """
- :param convert_fn: the method name from `:class:`Document` to be applied
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: the set of named arguments to be passed to `convert_fn`
- """
- super().__init__(*args, **kwargs)
- self._convert_fn = convert_fn
- self._convert_fn_kwargs = kwargs
-
- def _apply_all(
- self,
- docs: 'DocumentArray',
- *args,
- **kwargs,
- ) -> None:
- for d in docs:
- getattr(d, self._convert_fn)(**self._convert_fn_kwargs)
-
-
-class URI2Buffer(ConvertDriver):
- """Driver to convert URI to buffer"""
-
- def __init__(self, convert_fn: str = 'convert_uri_to_buffer', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class URI2DataURI(ConvertDriver):
- """Driver to convert URI to data URI"""
-
- def __init__(self, convert_fn: str = 'convert_uri_to_data_uri', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class Buffer2URI(ConvertDriver):
- """Driver to convert buffer to URI"""
-
- def __init__(self, convert_fn: str = 'convert_buffer_to_uri', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class BufferImage2Blob(ConvertDriver):
- """Driver to convert image buffer to blob"""
-
- def __init__(
- self, convert_fn: str = 'convert_buffer_image_to_blob', *args, **kwargs
- ):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class URI2Blob(ConvertDriver):
- """Driver to convert URI to blob"""
-
- def __init__(self, convert_fn: str = 'convert_uri_to_blob', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class DataURI2Blob(ConvertDriver):
- """Driver to convert Data URI to image blob"""
-
- def __init__(self, convert_fn: str = 'convert_data_uri_to_blob', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class Text2URI(ConvertDriver):
- """Driver to convert text to URI"""
-
- def __init__(self, convert_fn: str = 'convert_text_to_uri', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class URI2Text(ConvertDriver):
- """Driver to convert URI to text"""
-
- def __init__(self, convert_fn: str = 'convert_uri_to_text', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
-
-
-class Blob2PngURI(ConvertDriver):
- """Driver to convert blob to URI"""
-
- def __init__(self, convert_fn: str = 'convert_blob_to_uri', *args, **kwargs):
- super().__init__(convert_fn, *args, **kwargs)
diff --git a/jina/drivers/craft.py b/jina/drivers/craft.py
deleted file mode 100644
index 1892cc18e4d65..0000000000000
--- a/jina/drivers/craft.py
+++ /dev/null
@@ -1,19 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Optional
-
-from . import FlatRecursiveMixin, BaseExecutableDriver, DocsExtractUpdateMixin
-
-
-class CraftDriver(DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver):
- """Drivers inherited from this Driver will bind :meth:`craft` by default """
-
- def __init__(
- self, executor: Optional[str] = None, method: str = 'craft', *args, **kwargs
- ):
- super().__init__(executor, method, *args, **kwargs)
-
- @property
- def _stack_document_content(self):
- return False
diff --git a/jina/drivers/debug.py b/jina/drivers/debug.py
deleted file mode 100644
index 48595acbef209..0000000000000
--- a/jina/drivers/debug.py
+++ /dev/null
@@ -1,64 +0,0 @@
-import os
-
-import numpy as np
-
-from jina.drivers import FlatRecursiveMixin, BaseRecursiveDriver
-from jina.importer import ImportExtensions
-
-if False:
- # noinspection PyUnreachableCode
- from jina import DocumentArray
-
-
-class PngToDiskDriver(FlatRecursiveMixin, BaseRecursiveDriver):
- """A driver that can store an intermediate representation of a png in the workspace, under a given folder.
-
- Useful for debugging Crafters in the Flow
-
- :param workspace: the folder where we store the pngs
- :param prefix: the subfolder to add to workspace
- :param top: limit the pngs to first N
- """
-
- def __init__(self, workspace, prefix='', top=10, *args, **kwargs):
- self.prefix = prefix
- self.top = top
- self.done = 0
- self.workspace = workspace
- self.folder = os.path.join(self.workspace, self.prefix)
- if not os.path.exists(self.folder):
- os.makedirs(self.folder)
- super().__init__(*args, **kwargs)
-
- def _apply_all(
- self,
- docs: 'DocumentArray',
- *args,
- **kwargs,
- ) -> None:
- def _move_channel_axis(
- img: 'np.ndarray', channel_axis_to_move: int, target_channel_axis: int = -1
- ) -> 'np.ndarray':
- if channel_axis_to_move == target_channel_axis:
- return img
- return np.moveaxis(img, channel_axis_to_move, target_channel_axis)
-
- def _load_image(blob: 'np.ndarray', channel_axis: int):
- with ImportExtensions(
- required=True,
- pkg_name='Pillow',
- verbose=True,
- logger=self.logger,
- help_text='PIL is missing. Install it with `pip install Pillow`',
- ):
- from PIL import Image
-
- img = _move_channel_axis(blob, channel_axis)
- return Image.fromarray(img.astype('uint8'))
-
- for d in docs:
- if self.done < self.top:
- img = _load_image(d.blob, -1)
- path = os.path.join(self.folder, f'{self.done}.png')
- img.save(path)
- self.done += 1
diff --git a/jina/drivers/delete.py b/jina/drivers/delete.py
deleted file mode 100644
index 8ca350da7c545..0000000000000
--- a/jina/drivers/delete.py
+++ /dev/null
@@ -1,24 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Optional
-
-from . import BaseExecutableDriver
-
-
-class DeleteDriver(BaseExecutableDriver):
- """Drivers inherited from this Driver will bind :meth:`delete` by default """
-
- def __init__(
- self, executor: Optional[str] = None, method: str = 'delete', *args, **kwargs
- ):
- super().__init__(executor, method, *args, **kwargs)
-
- def __call__(self, *args, **kwargs):
- """
- Call base executable driver on document ids for deletion.
-
- :param args: unused
- :param kwargs: unused
- """
- self.exec_fn(self.req.ids)
diff --git a/jina/drivers/dump.py b/jina/drivers/dump.py
deleted file mode 100644
index b56707d33f32f..0000000000000
--- a/jina/drivers/dump.py
+++ /dev/null
@@ -1,28 +0,0 @@
-from typing import Optional
-
-from jina.drivers import BaseExecutableDriver
-
-
-class DumpDriver(BaseExecutableDriver):
- """A Driver that calls the dump method of the Executor
-
- :param executor: the executor to which we attach the driver
- :param args: passed to super().__init__
- :param kwargs: passed to super().__init__
- """
-
- def __init__(
- self,
- executor: Optional[str] = None,
- *args,
- **kwargs,
- ):
- super().__init__(executor, 'dump', *args, **kwargs)
-
- def __call__(self, *args, **kwargs):
- """Call the Dump method of the Indexer to which the Driver is attached
-
- :param args: passed to the exec_fn
- :param kwargs: passed to the exec_fn
- """
- self.exec_fn(self.req.path, self.req.shards, *args, **kwargs)
diff --git a/jina/drivers/encode.py b/jina/drivers/encode.py
deleted file mode 100644
index 25e259fe2bba5..0000000000000
--- a/jina/drivers/encode.py
+++ /dev/null
@@ -1,52 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Optional, Any, Union
-
-from . import BaseExecutableDriver, FlatRecursiveMixin, DocsExtractUpdateMixin
-
-# noinspection PyUnreachableCode
-if False:
- from .. import Document, DocumentArray, NdArray
- import numpy as np
- from ..proto import jina_pb2
-
-
-class BaseEncodeDriver(BaseExecutableDriver):
- """Drivers inherited from this Driver will bind :meth:`encode` by default """
-
- def __init__(
- self, executor: Optional[str] = None, method: str = 'encode', *args, **kwargs
- ):
- super().__init__(executor, method, *args, **kwargs)
-
-
-class EncodeDriver(DocsExtractUpdateMixin, FlatRecursiveMixin, BaseEncodeDriver):
- """Extract the content from documents and call executor and do encoding"""
-
- def update_single_doc(
- self,
- doc: 'Document',
- exec_result: Union['np.ndarray', 'jina_pb2.NdArrayProto', 'NdArray'],
- ) -> None:
- """Update the document embedding with returned ndarray result
-
- :param doc: the Document object
- :param exec_result: the single result from :meth:`exec_fn`
- """
- doc.embedding = exec_result
-
-
-class ScipySparseEncodeDriver(
- DocsExtractUpdateMixin, FlatRecursiveMixin, BaseEncodeDriver
-):
- """Extract the content from documents and call executor and do encoding"""
-
- def update_docs(self, docs_pts: 'DocumentArray', exec_results: Any) -> None:
- """Update the document embedding with returned sparse matrix
-
- :param: docs_pts: the set of document to be updated
- :param: exec_results: the results from :meth:`exec_fn`
- """
- for idx, doc in enumerate(docs_pts):
- doc.embedding = exec_results.getrow(idx)
diff --git a/jina/drivers/evaluate.py b/jina/drivers/evaluate.py
deleted file mode 100644
index c3ec7a32527c9..0000000000000
--- a/jina/drivers/evaluate.py
+++ /dev/null
@@ -1,229 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Any, Iterator, Optional, Tuple, Union
-
-import numpy as np
-
-from . import BaseExecutableDriver
-from ..types.querylang.queryset.dunderkey import dunder_get
-from .search import KVSearchDriver
-from ..types.document import Document
-from ..types.document.helper import DocGroundtruthPair
-from ..helper import deprecated_alias
-from ..types.arrays.doc_groundtruth import DocumentGroundtruthSequence
-
-
-class BaseEvaluateDriver(BaseExecutableDriver):
- """The Base Driver for evaluation operations.
-
- .. warning::
-
- When ``running_avg=True``, then the running mean is returned. So far at Jina 0.8.10,
- there is no way to reset the running statistics. If you have a query Flow running multiple queries,
- you may want to make sure the running statistics is meaningful across multiple runs.
-
- :param executor: the name of the sub-executor, only necessary when :class:`jina.executors.compound.CompoundExecutor` is used
- :param method: the function name of the executor that the driver feeds to
- :param running_avg: always return running average instead of value of the current run
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(
- self,
- executor: Optional[str] = None,
- method: str = 'evaluate',
- running_avg: bool = False,
- *args,
- **kwargs,
- ):
- super().__init__(executor, method, *args, **kwargs)
- self._running_avg = running_avg
-
- def __call__(self, *args, **kwargs):
- """Load the ground truth pairs.
-
- :param args: args for _traverse_apply
- :param kwargs: kwargs for _traverse_apply
- """
- docs_groundtruths = DocumentGroundtruthSequence(
- [
- DocGroundtruthPair(doc, groundtruth)
- for doc, groundtruth in zip(self.req.docs, self.req.groundtruths)
- ]
- )
- traversal_result = docs_groundtruths.traverse_flatten(self._traversal_paths)
- self._apply_all(traversal_result, *args, **kwargs)
-
- def _apply_all(self, docs: Iterator['DocGroundtruthPair'], *args, **kwargs) -> None:
- for doc_groundtruth in docs:
- doc = doc_groundtruth.doc
- groundtruth = doc_groundtruth.groundtruth
- evaluation = doc.evaluations.add()
- evaluation.value = self.exec_fn(
- self.extract(doc), self.extract(groundtruth)
- )
- if self._running_avg:
- evaluation.value = self.exec.mean
-
- if getattr(self.exec, 'eval_at', None):
- evaluation.op_name = (
- f'{self.exec.__class__.__name__}@{self.exec.eval_at}'
- )
- else:
- evaluation.op_name = self.exec.__class__.__name__
- evaluation.ref_id = groundtruth.id
-
- def extract(self, doc: 'Document') -> Any:
- """Extract the to-be-evaluated field from the document.
-
- Drivers inherit from :class:`BaseEvaluateDriver` must implement this method.
- This function will be invoked two times in :meth:`_apply_all`:
- once with actual doc, once with groundtruth doc.
-
- .. # noqa: DAR401
- :param doc: the Document
- """
- raise NotImplementedError
-
-
-class FieldEvaluateDriver(BaseEvaluateDriver):
- """
- Evaluate on the values from certain field, the extraction is implemented with :meth:`dunder_get`.
-
- :param field: the field name to be extracted from the Protobuf.
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, field: str, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.field = field
-
- def extract(self, doc: 'Document') -> Any:
- """Extract the field from the Document.
-
- :param doc: the Document
- :return: the data in the field
- """
- return dunder_get(doc, self.field)
-
-
-class RankEvaluateDriver(BaseEvaluateDriver):
- """Drivers used to pass `matches` from documents and groundtruths to an executor and add the evaluation value.
-
- - Example fields:
- ['tags__id', 'score__value]
-
- :param fields: the fields names to be extracted from the Protobuf.
- The differences with `:class:FieldEvaluateDriver` are:
- - More than one field is allowed. For instance, for NDCGComputation you may need to have both `ID` and `Relevance` information.
- - The fields are extracted from the `matches` of the `Documents` and the `Groundtruth` so it returns a sequence of values.
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- @deprecated_alias(field=('fields', 0))
- def __init__(
- self,
- fields: Union[str, Tuple[str]] = (
- 'id',
- ), # str maintained for backwards compatibility
- *args,
- **kwargs,
- ):
- super().__init__(*args, **kwargs)
- self.fields = fields
-
- @property
- def single_field(self):
- """
- Get single field.
-
- Property to guarantee compatibility when only one field is provided either as a string or as a unit length tuple.
-
- :return: a list of fields
- """
- if isinstance(self.fields, str):
- return self.fields
- elif len(self.fields) == 1:
- return self.fields[0]
-
- def extract(self, doc: 'Document'):
- """
- Extract values of the matches from documents with fields as keys.
-
- :param doc: Documents to be extracted.
- :return: a list of tuples consisting of the values from the fields.
- """
- single_field = self.single_field
- if single_field:
- r = [dunder_get(x, single_field) for x in doc.matches]
- # TODO: Clean this, optimization for `hello-world` because it passes a list of 6k elements in a single
- # match. See `pseudo_match` in helloworld/helper.py _get_groundtruths
- ret = list(np.array(r).flat)
- else:
- ret = [
- tuple(dunder_get(x, field) for field in self.fields)
- for x in doc.matches
- ]
-
- return ret
-
-
-class NDArrayEvaluateDriver(FieldEvaluateDriver):
- """Drivers used to pass `embedding` from documents and groundtruths to an executor and add the evaluation value.
-
- .. note::
- - Valid fields:
- ['blob', 'embedding']
-
- """
-
- def __init__(self, field: str = 'embedding', *args, **kwargs):
- super().__init__(field, *args, **kwargs)
-
-
-class TextEvaluateDriver(FieldEvaluateDriver):
- """Drivers used to pass a content field from documents and groundtruths to an executor and add the evaluation value.
-
- .. note::
- - Valid fields:
- ['id', 'level_name', 'parent_id', 'text', 'mime_type', 'uri', 'modality']
- """
-
- def __init__(self, field: str = 'text', *args, **kwargs):
- super().__init__(field, *args, **kwargs)
-
-
-class LoadGroundTruthDriver(KVSearchDriver):
- """Driver used to search for the `document key` in a KVIndex to find the corresponding groundtruth.
- (This driver does not use the `recursive structure` of jina Documents, and will not consider the `traversal_path` argument.
- It only retrieves `groundtruth` taking documents at root as key)
- This driver's job is to fill the `request` groundtruth with the corresponding groundtruth for each document if found in the corresponding KVIndexer.
-
- .. warning::
- The documents that are not found to have an indexed groundtruth are removed from the `request` so that the `Evaluator` only
- works with documents which have groundtruth.
- """
-
- def __call__(self, *args, **kwargs):
- """Load the ground truth.
-
- :param args: unused
- :param kwargs: unused
- """
- miss_idx = (
- []
- ) #: missed hit results, some documents may not have groundtruth and thus will be removed
- serialized_groundtruths = self.exec_fn([d.id for d in self.docs])
- for idx, serialized_groundtruth in enumerate(serialized_groundtruths):
- if serialized_groundtruth:
- self.req.groundtruths.append(Document(serialized_groundtruth))
- else:
- miss_idx.append(idx)
-
- # delete non-existed matches in reverse
- for j in reversed(miss_idx):
- del self.docs[j]
diff --git a/jina/drivers/generic.py b/jina/drivers/generic.py
deleted file mode 100644
index 56113d5b31805..0000000000000
--- a/jina/drivers/generic.py
+++ /dev/null
@@ -1,15 +0,0 @@
-from typing import Optional
-
-from . import DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver
-
-
-class GenericExecutorDriver(
- DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver
-):
- """Generic driver that uses extract-apply-update pattern. It automatically binds to the method
- decorated with `@request`."""
-
- def __init__(
- self, executor: Optional[str] = None, method: str = '', *args, **kwargs
- ):
- super().__init__(executor, method, *args, **kwargs)
diff --git a/jina/drivers/index.py b/jina/drivers/index.py
deleted file mode 100644
index 115d09eb7610c..0000000000000
--- a/jina/drivers/index.py
+++ /dev/null
@@ -1,99 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Iterable, Optional
-
-from . import BaseExecutableDriver, FlatRecursiveMixin
-from .. import Document
-from ..enums import EmbeddingClsType
-
-if False:
- from ..types.arrays import DocumentArray
-
-
-class BaseIndexDriver(FlatRecursiveMixin, BaseExecutableDriver):
- """Drivers inherited from this Driver will bind :meth:`add` by default """
-
- def __init__(
- self, executor: Optional[str] = None, method: str = 'add', *args, **kwargs
- ):
- super().__init__(executor, method, *args, **kwargs)
-
- def check_key_length(self, val: Iterable[str]):
- """
- Check if the max length of val(e.g. doc id) is larger than key_length.
-
- :param val: The values to be checked
- """
- m_val = max(len(v) for v in val)
- if m_val > self.exec.key_length:
- raise ValueError(
- f'{self.exec} allows only keys of length {self.exec.key_length}, '
- f'but yours is {m_val}.'
- )
-
-
-class VectorIndexDriver(BaseIndexDriver):
- """Extracts embeddings and ids from the documents and forwards them to the executor.
- In case `method` is 'delete', the embeddings are ignored.
- If `method` is not 'delete', documents without content are filtered out.
- """
-
- @property
- def exec_embedding_cls_type(self) -> EmbeddingClsType:
- """Get the sparse class type of the attached executor.
-
- :return: Embedding class type of the attached executor, default value is `dense`
- """
- return EmbeddingClsType.from_string(self.exec.embedding_cls_type)
-
- def _get_documents_embeddings(self, docs: 'DocumentArray'):
- embedding_cls_type = self.exec_embedding_cls_type
- if embedding_cls_type.is_dense:
- return docs.all_embeddings
- else:
- return docs.get_all_sparse_embeddings(embedding_cls_type=embedding_cls_type)
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- embed_vecs, docs_pts = self._get_documents_embeddings(docs)
- if docs_pts:
- keys = [doc.id for doc in docs_pts]
- self.check_key_length(keys)
- self.exec_fn(keys, embed_vecs)
-
-
-class KVIndexDriver(BaseIndexDriver):
- """Forwards pairs of serialized documents and ids to the executor."""
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- info = [(doc.id, doc.SerializeToString()) for doc in docs]
- if info:
- keys, values = zip(*info)
- self.check_key_length(keys)
- self.exec_fn(keys, values)
-
-
-class DBMSIndexDriver(BaseIndexDriver):
- """Forwards ids, vectors, serialized Document to a BaseDBMSIndexer"""
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- info = [
- (
- doc.id,
- doc.embedding,
- DBMSIndexDriver._doc_without_embedding(doc).SerializeToString(),
- )
- for doc in docs
- ]
- if info:
- ids, vecs, metas = zip(*info)
- self.check_key_length(ids)
- self.exec_fn(ids, vecs, metas)
-
- @staticmethod
- def _doc_without_embedding(d):
- from .. import Document
-
- new_doc = Document(d, copy=True)
- new_doc.ClearField('embedding')
- return new_doc
diff --git a/jina/drivers/multimodal.py b/jina/drivers/multimodal.py
deleted file mode 100644
index 6733f0da02594..0000000000000
--- a/jina/drivers/multimodal.py
+++ /dev/null
@@ -1,106 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from collections import defaultdict
-from typing import Tuple, Dict, List
-
-import numpy as np
-
-from . import FlatRecursiveMixin
-from .encode import BaseEncodeDriver
-from ..types.document.multimodal import MultimodalDocument
-
-if False:
- from ..types.arrays import DocumentArray
-
-
-class MultiModalDriver(FlatRecursiveMixin, BaseEncodeDriver):
- """Extract multimodal embeddings from different modalities.
-
- Input-Output ::
-
- Input:
- document:
- |- chunk: {modality: mode1}
- |
- |- chunk: {modality: mode2}
- Output:
- document: (embedding: multimodal encoding)
- |- chunk: {modality: mode1}
- |
- |- chunk: {modality: mode2}
-
- .. note::
-
- - It traverses on the ``documents`` for which we want to apply the ``multimodal`` embedding. This way
-
- we can use the `batching` capabilities for the `executor`.
-
- .. warning::
- - It assumes that every ``chunk`` of a ``document`` belongs to a different modality.
- """
-
- def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
-
- @property
- def positional_modality(self) -> List[str]:
- """Get position per modality.
- :return: the list of strings representing the name and order of the modality.
- """
- if not self._exec.positional_modality:
- raise RuntimeError(
- 'Could not know which position of the ndarray to load to each modality'
- )
- return self._exec.positional_modality
-
- def _get_executor_input_arguments(
- self, content_by_modality: Dict[str, 'np.ndarray']
- ) -> List['np.ndarray']:
- """From a dictionary ``content_by_modality`` it returns the arguments in the proper order so that they can be
- passed to the executor.
-
- :param content_by_modality: a dictionary of `Document content` by modality name
- :return: list of input arguments as np arrays
- """
- return [content_by_modality[modality] for modality in self.positional_modality]
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- """Apply the driver to each of the Documents in docs.
-
- :param docs: the docs for which a ``multimodal embedding`` will be computed, whose chunks are of different
- :param args: unused
- :param kwargs: unused
- """
- content_by_modality = defaultdict(
- list
- ) # array of num_rows equal to num_docs and num_columns equal to
-
- valid_docs = []
- for doc in docs:
- # convert to MultimodalDocument
- doc = MultimodalDocument(doc)
- if doc.modality_content_map:
- valid_docs.append(doc)
- for modality in self.positional_modality:
- content_by_modality[modality].append(doc[modality])
- else:
- self.logger.warning(
- f'Invalid doc {doc.id}. Only one chunk per modality is accepted'
- )
-
- if len(valid_docs) > 0:
- # Pass a variable length argument (one argument per array)
- for modality in self.positional_modality:
- content_by_modality[modality] = np.stack(content_by_modality[modality])
-
- # Guarantee that the arguments are provided to the executor in its desired order
- input_args = self._get_executor_input_arguments(content_by_modality)
- embeds = self.exec_fn(*input_args)
- if len(valid_docs) != embeds.shape[0]:
- self.logger.error(
- f'mismatched {len(valid_docs)} docs from level {valid_docs[0].granularity} '
- f'and a {embeds.shape} shape embedding, the first dimension must be the same'
- )
- for doc, embedding in zip(valid_docs, embeds):
- doc.embedding = embedding
diff --git a/jina/drivers/predict.py b/jina/drivers/predict.py
deleted file mode 100644
index 6e46caffdd735..0000000000000
--- a/jina/drivers/predict.py
+++ /dev/null
@@ -1,186 +0,0 @@
-from typing import List, Any, Union, Optional
-
-import numpy as np
-
-from . import BaseExecutableDriver, FlatRecursiveMixin, DocsExtractUpdateMixin
-from ..helper import typename
-
-if False:
- from .. import DocumentArray, Document, NdArray
- from ..proto import jina_pb2
-
-
-class BasePredictDriver(
- DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver
-):
- """Drivers inherited from :class:`BasePredictDriver` will bind :meth:`predict` by default
-
- :param fields: name of fields to be used to predict tags, default "embeddings"
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(
- self,
- executor: Optional[str] = None,
- method: str = 'predict',
- *args,
- **kwargs,
- ):
- super().__init__(executor, method, *args, **kwargs)
-
-
-class BaseLabelPredictDriver(BasePredictDriver):
- """Base class of a Driver for label prediction.
-
- :param output_tag: output label will be written to ``doc.tags``
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, output_tag: str = 'prediction', *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.output_tag = output_tag
-
- def update_docs(self, docs_pts: 'DocumentArray', exec_results: Any):
- """Update doc tags attribute with executor's return
-
- :param: docs_pts: the set of document to be updated
- :param: exec_results: the results from :meth:`exec_fn`
- """
- labels = self.prediction2label(
- exec_results
- ) # type: List[Union[str, List[str]]]
- for doc, label in zip(docs_pts, labels):
- doc.tags[self.output_tag] = label
-
- def prediction2label(self, prediction: 'np.ndarray') -> List[Any]:
- """Converting ndarray prediction into list of readable labels
-
- .. note::
- ``len(output)`` should be the same as ``prediction.shape[0]``
-
- :param prediction: the float/int numpy ndarray given by :class:`BaseClassifier`
- :return: the readable label to be stored.
-
-
-
- .. # noqa: DAR401
-
-
- .. # noqa: DAR202
- """
- raise NotImplementedError
-
-
-class BinaryPredictDriver(BaseLabelPredictDriver):
- """Converts binary prediction into string label. This is often used with binary classifier.
-
- :param one_label: label when prediction is one
- :param zero_label: label when prediction is zero
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, one_label: str = 'yes', zero_label: str = 'no', *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.one_label = one_label
- self.zero_label = zero_label
-
- def prediction2label(self, prediction: 'np.ndarray') -> List[str]:
- """
-
- :param prediction: a (B,) or (B, 1) zero one array
- :return: the labels as either ``self.one_label`` or ``self.zero_label``
-
-
- .. # noqa: DAR401
- """
- p = np.squeeze(prediction)
- if p.ndim > 1:
- raise ValueError(
- f'{typename(self)} expects prediction has ndim=1, but receiving ndim={p.ndim}'
- )
-
- return [self.one_label if v else self.zero_label for v in p.astype(bool)]
-
-
-class OneHotPredictDriver(BaseLabelPredictDriver):
- """Mapping prediction to one of the given labels
-
- Expect prediction to be 2dim array, zero-one valued. Each row corresponds to
- a sample, each column corresponds to a label. Each row can have only one 1.
-
- This is often used with multi-class classifier.
- """
-
- def __init__(self, labels: List[str], *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.labels = labels
-
- def validate_labels(self, prediction: 'np.ndarray'):
- """Validate the labels.
-
- :param prediction: the predictions
-
-
- .. # noqa: DAR401
- """
- if prediction.ndim != 2:
- raise ValueError(
- f'{typename(self)} expects prediction to have ndim=2, but received {prediction.ndim}'
- )
- if prediction.shape[1] != len(self.labels):
- raise ValueError(
- f'{typename(self)} expects prediction.shape[1]==len(self.labels), but received {prediction.shape}'
- )
-
- def prediction2label(self, prediction: 'np.ndarray') -> List[str]:
- """
-
- :param prediction: a (B, C) array where C is the number of classes, only one element can be one
- :return: the list of labels
- """
- self.validate_labels(prediction)
- p = np.argmax(prediction, axis=1)
- return [self.labels[v] for v in p]
-
-
-class MultiLabelPredictDriver(OneHotPredictDriver):
- """Mapping prediction to a list of labels
-
- Expect prediction to be 2dim array, zero-one valued. Each row corresponds to
- a sample, each column corresponds to a label. Each row can have only multiple 1s.
-
- This is often used with multi-label classifier, where each instance can have multiple labels
- """
-
- def prediction2label(self, prediction: 'np.ndarray') -> List[List[str]]:
- """Transform the prediction into labels.
-
- :param prediction: the array of predictions
- :return: nested list of labels
- """
- self.validate_labels(prediction)
- return [[self.labels[int(pp)] for pp in p.nonzero()[0]] for p in prediction]
-
-
-class Prediction2DocBlobDriver(BasePredictDriver):
- """Write the prediction result directly into ``document.blob``.
-
- .. warning::
-
- This will erase the content in ``document.text`` and ``document.buffer``.
- """
-
- def update_single_doc(
- self,
- doc: 'Document',
- exec_result: Union['np.ndarray', 'jina_pb2.NdArrayProto', 'NdArray'],
- ) -> None:
- """Update doc blob with executor's return.
-
- :param doc: the Document object
- :param exec_result: the single result from :meth:`exec_fn`
- """
- doc.blob = exec_result
diff --git a/jina/drivers/querylang/filter.py b/jina/drivers/querylang/filter.py
deleted file mode 100644
index acdb47515ff9e..0000000000000
--- a/jina/drivers/querylang/filter.py
+++ /dev/null
@@ -1,53 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Dict, Any, Iterable
-
-from ...types.querylang.queryset.lookup import Q
-from .. import QuerySetReader, BaseRecursiveDriver, ContextAwareRecursiveMixin
-
-if False:
- from ...types.arrays import DocumentArray
-
-
-class FilterQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver):
- """Filters incoming `docs` by evaluating a series of `lookup rules`.
-
- This is often useful when the proceeding Pods require only a signal, not the full message.
-
- Example ::
- - !FilterQL
- with:
- lookups: {modality: mode2}
- - !EncodeDriver
- with:
- method: encode
-
- ensures that the EncodeDriver will only get documents which modality field value is `mode2` by filtering
- those documents at the specific levels that do not comply with this condition
-
- :param lookups: (dict) a dictionary where keys are interpreted by ``:class:`LookupLeaf`` to form a
- an evaluation function. For instance, a dictionary ``{ modality__in: [mode1, mode2] }``, would create
- an evaluation function that will check if the field `modality` is found in `[mode1, mode2]`
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, lookups: Dict[str, Any], *args, **kwargs):
- super().__init__(*args, **kwargs)
- self._lookups = lookups
-
- def _apply_all(
- self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs
- ) -> None:
- for docs in doc_sequences:
- if self.lookups:
- _lookups = Q(**self.lookups)
- miss_idx = []
- for idx, doc in enumerate(docs):
- if not _lookups.evaluate(doc):
- miss_idx.append(idx)
-
- # delete non-exit matches in reverse
- for j in reversed(miss_idx):
- del docs[j]
diff --git a/jina/drivers/querylang/reverse.py b/jina/drivers/querylang/reverse.py
deleted file mode 100644
index 671b4f409cd05..0000000000000
--- a/jina/drivers/querylang/reverse.py
+++ /dev/null
@@ -1,31 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Iterable, Tuple
-
-from .. import QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver
-
-if False:
- from ...types.arrays import DocumentArray
-
-
-class ReverseQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver):
- """Reverses the order of the provided ``docs``.
-
- This is often useful when the proceeding Pods require only a signal, not the full message.
-
- Example ::
- - !Chunk2DocRankerDriver {}
- - !ReverseQL {}
-
- will reverse the order of the documents returned by the `Chunk2DocRankerDriver` before sending them to the next `Pod`
- """
-
- def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
-
- def _apply_all(
- self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs
- ) -> None:
- for docs in doc_sequences:
- docs.reverse()
diff --git a/jina/drivers/querylang/select.py b/jina/drivers/querylang/select.py
deleted file mode 100644
index 9e0f5446e6e18..0000000000000
--- a/jina/drivers/querylang/select.py
+++ /dev/null
@@ -1,107 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Union, Tuple
-
-from .. import QuerySetReader, FlatRecursiveMixin, BaseRecursiveDriver
-
-# noinspection PyUnreachableCode
-if False:
- from ...types.arrays import DocumentArray
-
-
-class ExcludeQL(QuerySetReader, FlatRecursiveMixin, BaseRecursiveDriver):
- """Clean some fields from the document-level protobuf to reduce the total size of the request
- Example::
- - !ExcludeQL
- with:
- fields:
- - chunks
- - buffer
-
- ExcludeQL will avoid `buffer` and `chunks` fields to be sent to the next `Pod`
-
- :param fields: the pruned field names in tuple
- :param traversal_paths: the traversal paths
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(
- self,
- fields: Union[Tuple, str],
- traversal_paths: Tuple[str] = ('r',),
- *args,
- **kwargs,
- ):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
- if isinstance(fields, str):
- self._fields = [fields]
- else:
- self._fields = [field for field in fields]
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs):
- for doc in docs:
- for k in self.fields:
- doc.ClearField(k)
-
-
-class SelectQL(ExcludeQL):
- """Selects some fields from the chunk-level protobuf to reduce the total size of the request, it works with the opposite
- logic as `:class:`ExcludeQL`
-
- Example::
- - !SelectQL
- with:
- fields:
- - matches
-
- SelectQL will ensure that the `outgoing` documents only contain the field `matches`
- """
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs):
- for doc in docs:
- for k in doc.DESCRIPTOR.fields_by_name.keys():
- if k not in self.fields:
- doc.ClearField(k)
-
-
-class ExcludeReqQL(ExcludeQL):
- """Clean up request from the request-level protobuf message to reduce the total size of the message
-
- This is often useful when the proceeding Pods require only a signal, not the full message.
- """
-
- def __call__(self, *args, **kwargs):
- """
-
-
- .. # noqa: DAR102
-
-
- .. # noqa: DAR101
- """
- for k in self.fields:
- self.req.ClearField(k)
-
-
-class SelectReqQL(ExcludeReqQL):
- """Clean up request from the request-level protobuf message to reduce the total size of the message, it works with the opposite
- logic as `:class:`ExcludeReqQL`
-
-
- .. # noqa: DAR101
- """
-
- def __call__(self, *args, **kwargs):
- """
-
-
- .. # noqa: DAR102
-
-
- .. # noqa: DAR101
- """
- for k in self.req.DESCRIPTOR.fields_by_name.keys():
- if k not in self.fields:
- self.req.ClearField(k)
diff --git a/jina/drivers/querylang/slice.py b/jina/drivers/querylang/slice.py
deleted file mode 100644
index 180df1a23643e..0000000000000
--- a/jina/drivers/querylang/slice.py
+++ /dev/null
@@ -1,58 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import sys
-
-from typing import Iterable, Optional
-
-from .. import QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver
-
-if False:
- from ...types.arrays.document import DocumentArray
-
-
-class SliceQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver):
- """Restrict the size of the ``docs`` to ``k`` (given by the request)
-
- Example::
- - !ReduceAllDriver
- with:
- traversal_paths: ['m']
- - !SortQL
- with:
- reverse: true
- field: 'score__value'
- traversal_paths: ['m']
- - !SliceQL
- with:
- start: 0
- end: 50
- traversal_paths: ['m']
-
- `SliceQL` will ensure that only the first 50 documents are returned from this `Pod`
-
- :param start: Zero-based index at which to start extraction.
- :param end: Zero-based index before which to end extraction.
- slice extracts up to but not including end. For example, take(1,4) extracts
- the second element through the fourth element (elements indexed 1, 2, and 3).
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, start: int, end: Optional[int] = None, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self._start = int(start)
- if end is None:
- self._end = sys.maxsize
- else:
- self._end = int(end)
-
- def _apply_all(
- self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs
- ) -> None:
- for docs in doc_sequences:
- if self.start <= 0 and (self.end is None or self.end >= len(docs)):
- pass
- else:
- del docs[int(self.end) :]
- del docs[: int(self.start)]
diff --git a/jina/drivers/querylang/sort.py b/jina/drivers/querylang/sort.py
deleted file mode 100644
index 17a30294feca3..0000000000000
--- a/jina/drivers/querylang/sort.py
+++ /dev/null
@@ -1,56 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Iterable, Tuple
-
-from ...types.querylang.queryset.dunderkey import dunder_get
-from .. import QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver
-
-if False:
- from ...types.arrays import DocumentArray
-
-
-class SortQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver):
- """Sorts the incoming of the documents by the value of a given field.
- It can also work in reverse mode
-
- Example::
- - !ReduceAllDriver
- with:
- traversal_paths: ['m']
- - !SortQL
- with:
- reverse: true
- field: 'score__value'
- traversal_paths: ['m']
- - !SliceQL
- with:
- start: 0
- end: 50
- traversal_paths: ['m']
-
- `SortQL` will ensure that only the documents are sorted by the score value before slicing the first top 50 documents
- :param field: the value of the field drives the sort of the iterable docs
- :param reverse: sort the value from big to small
- :param traversal_paths: the traversal paths
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(
- self,
- field: str,
- reverse: bool = False,
- traversal_paths: Tuple[str] = ('r',),
- *args,
- **kwargs,
- ):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
- self._reverse = reverse
- self._field = field
-
- def _apply_all(
- self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs
- ) -> None:
- for docs in doc_sequences:
- docs.sort(key=lambda x: dunder_get(x, self.field), reverse=self.reverse)
diff --git a/jina/drivers/rank/__init__.py b/jina/drivers/rank/__init__.py
deleted file mode 100644
index 27b52f95a06da..0000000000000
--- a/jina/drivers/rank/__init__.py
+++ /dev/null
@@ -1,121 +0,0 @@
-from typing import Tuple, Optional, Iterable
-
-from .. import BaseExecutableDriver, FlatRecursiveMixin
-from ...types.arrays import MatchArray
-from ...types.score import NamedScore
-
-if False:
- from ...types.arrays import DocumentArray
-
-
-class BaseRankDriver(FlatRecursiveMixin, BaseExecutableDriver):
- """Drivers inherited from this Driver will bind :meth:`rank` by default """
-
- def __init__(
- self, executor: Optional[str] = None, method: str = 'score', *args, **kwargs
- ):
- super().__init__(executor, method, *args, **kwargs)
-
- @property
- def _exec_match_keys(self):
- """Property to provide backward compatibility to executors relying in `required_keys`
- :return: keys for attribute lookup in matches
- """
- return getattr(
- self.exec, 'match_required_keys', getattr(self.exec, 'required_keys', None)
- )
-
- @property
- def _exec_query_keys(self):
- """Property to provide backward compatibility to executors relying in `required_keys`
-
- :return: keys for attribute lookup in matches
- """
- return getattr(
- self.exec, 'query_required_keys', getattr(self.exec, 'required_keys', None)
- )
-
-
-class Matches2DocRankDriver(BaseRankDriver):
- """This driver is intended to only resort the given matches on the 0 level granularity for a document.
- It gets the scores from a Ranking Executor, which does only change the scores of matches.
- Afterwards, the Matches2DocRankDriver resorts all matches for a document.
- Input-Output ::
- Input:
- document: {granularity: 0, adjacency: k}
- |- matches: {granularity: 0, adjacency: k+1}
- Output:
- document: {granularity: 0, adjacency: k}
- |- matches: {granularity: 0, adjacency: k+1} (Sorted according to scores from Ranker Executor)
- """
-
- def __init__(
- self,
- reverse: bool = True,
- traversal_paths: Tuple[str] = ('r',),
- *args,
- **kwargs,
- ):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
- self.reverse = reverse
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- """
-
- :param docs: the matches of the ``context_doc``, they are at granularity ``k``
- :param args: not used (kept to maintain interface)
- :param kwargs: not used (kept to maintain interface)
-
- .. note::
- - This driver will change in place the ordering of ``matches`` of the ``context_doc`.
- - Set the ``traversal_paths`` of this driver such that it traverses along the ``matches`` of the ``chunks`` at the level desired.
- """
- old_scores = []
- queries_metas = []
- matches_metas = []
- for doc in docs:
- query_meta = (
- doc.get_attrs(*self._exec_query_keys) if self._exec_query_keys else None
- )
-
- matches = doc.matches
- old_match_scores = []
- needs_match_meta = self._exec_match_keys is not None
- match_meta = [] if needs_match_meta else None
- for match in matches:
- old_match_scores.append(match.score.value)
- if needs_match_meta:
- match_meta.append(match.get_attrs(*self._exec_match_keys))
-
- # if there are no matches, no need to sort them
- old_scores.append(old_match_scores)
- queries_metas.append(query_meta)
- matches_metas.append(match_meta)
-
- new_scores = self.exec_fn(old_scores, queries_metas, matches_metas)
- if len(new_scores) != len(docs):
- msg = f'The number of scores {len(new_scores)} does not match the number of queries {len(docs)}'
- self.logger.error(msg)
- raise ValueError(msg)
-
- for doc, scores in zip(docs, new_scores):
- matches = doc.matches
- if len(doc.matches) != len(scores):
- msg = (
- f'The number of matches to be scored {len(doc.matches)} do not match the number of scores returned '
- f'by the ranker {self.exec.__name__} for doc: {doc.id} '
- )
- self.logger.error(msg)
- raise ValueError(msg)
- self._sort_matches_in_place(matches, scores)
-
- def _sort_matches_in_place(
- self, matches: 'MatchArray', match_scores: Iterable[float]
- ) -> None:
- op_name = self.exec.__class__.__name__
- ref_doc_id = matches._ref_doc.id
-
- for match, score in zip(matches, match_scores):
- match.score = NamedScore(value=score, op_name=op_name, ref_id=ref_doc_id)
-
- matches.sort(key=lambda x: x.score.value, reverse=self.reverse)
diff --git a/jina/drivers/rank/aggregate/__init__.py b/jina/drivers/rank/aggregate/__init__.py
deleted file mode 100644
index b4e406fa6d958..0000000000000
--- a/jina/drivers/rank/aggregate/__init__.py
+++ /dev/null
@@ -1,307 +0,0 @@
-from typing import Dict, List, Tuple
-from collections import defaultdict, namedtuple
-
-import numpy as np
-
-from ....executors.rankers import Chunk2DocRanker
-from ....types.document import Document
-from ....types.score import NamedScore
-
-from .. import BaseRankDriver
-
-if False:
- from ....types.arrays import DocumentArray
-
-COL_STR_TYPE = 'U64' #: the ID column data type for score matrix
-
-
-class BaseAggregateMatchesRankerDriver(BaseRankDriver):
- """Drivers inherited from this Driver focus on aggregating scores from `chunks` to its `parents`.
-
- :param keep_source_matches_as_chunks: A flag to indicate if the driver must return the old matches of the query or its chunks
- (at a greater granularity level (k + 1)) as the chunks of the new computed `matches` (at granularity level k)
- Set it to `True` when keeping track of the chunks that lead to a retrieved result.
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
-
- .. note::
- When set `keep_source_matches_as_chunks=True`, the chunks of the match contains **ONLY** the chunks leading
- to the match rather than **ALL** the chunks of the match."""
-
- def __init__(self, keep_source_matches_as_chunks: bool = False, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.keep_source_matches_as_chunks = keep_source_matches_as_chunks
-
- QueryMatchInfo = namedtuple(
- 'QueryMatchInfo', 'match_parent_id match_id query_id score'
- )
-
- def _extract_query_match_info(self, match: Document, query: Document):
- return self.QueryMatchInfo(
- match_parent_id=match.parent_id,
- match_id=match.id,
- query_id=query.id,
- score=match.score.value,
- )
-
- def _insert_query_matches(
- self,
- query: Document,
- parent_id_chunk_id_map: dict,
- chunk_matches_by_id: dict,
- docs_scores: 'np.ndarray',
- ):
- """
- :param query: the query Document where the resulting matches will be inserted
- :param parent_id_chunk_id_map: a map with parent_id as key and list of previous matches ids as values
- :param chunk_matches_by_id: the previous matches of the query (at a higher granularity) grouped by the new map (by its parent)
- :param docs_scores: An `np.ndarray` resulting from the ranker executor with the `scores` of the new matches
- """
-
- op_name = self.exec.__class__.__name__
- for doc_id, score in docs_scores:
- m = Document(id=doc_id)
- m.score = NamedScore(op_name=op_name, value=score)
- if self.keep_source_matches_as_chunks:
- for match_chunk_id in parent_id_chunk_id_map[doc_id]:
- m.chunks.append(chunk_matches_by_id[match_chunk_id])
- query.matches.append(m)
-
- @staticmethod
- def _group_by(match_idx, col_name):
- """
- Create an list of numpy arrays with the same ``col_name`` in each position of the list
-
- :param match_idx: Numpy array of Tuples with document id and score
- :param col_name: Column name in the structured numpy array of Tuples
-
- :return: List of numpy arrays with the same ``doc_id`` in each position of the list
- :rtype: np.ndarray.
- """
- _sorted_m = np.sort(match_idx, order=col_name)
- list_numpy_arrays = []
- prev_val = _sorted_m[col_name][0]
- prev_index = 0
- for i, current_val in enumerate(_sorted_m[col_name]):
- if current_val != prev_val:
- list_numpy_arrays.append(_sorted_m[prev_index:i])
- prev_index = i
- prev_val = current_val
- list_numpy_arrays.append(_sorted_m[prev_index:])
- return list_numpy_arrays
-
- @staticmethod
- def _sort_doc_by_score(r):
- """
- Sort a numpy array of dtype (``doc_id``, ``score``) by the ``score``.
-
- :param r: Numpy array of Tuples with document id and score
- :type r: np.ndarray[Tuple[np.str_, np.float64]]
- """
- r[::-1].sort(order=Chunk2DocRanker.COL_SCORE)
-
- def _score(
- self, match_idx: 'np.ndarray', query_chunk_meta: Dict, match_chunk_meta: Dict
- ) -> 'np.ndarray':
- """
- Translate the chunk-level top-k results into doc-level top-k results. Some score functions may leverage the
- meta information of the query, hence the meta info of the query chunks and matched chunks are given
- as arguments.
-
- :param match_idx: A [N x 4] numpy ``ndarray``, column-wise:
- - ``match_idx[:, 0]``: ``doc_id`` of the matched chunks, integer
- - ``match_idx[:, 1]``: ``chunk_id`` of the matched chunks, integer
- - ``match_idx[:, 2]``: ``chunk_id`` of the query chunks, integer
- - ``match_idx[:, 3]``: distance/metric/score between the query and matched chunks, float
- :type match_idx: np.ndarray.
- :param query_chunk_meta: The meta information of the query chunks, where the key is query chunks' ``chunk_id``,
- the value is extracted by the ``query_required_keys``.
- :param match_chunk_meta: The meta information of the matched chunks, where the key is matched chunks'
- ``chunk_id``, the value is extracted by the ``match_required_keys``.
- :return: A [N x 2] numpy ``ndarray``, where the first column is the matched documents' ``doc_id`` (integer)
- the second column is the score/distance/metric between the matched doc and the query doc (float).
- :rtype: np.ndarray.
- """
- _groups = self._group_by(match_idx, Chunk2DocRanker.COL_PARENT_ID)
- n_groups = len(_groups)
- res = np.empty(
- (n_groups,),
- dtype=[
- (Chunk2DocRanker.COL_PARENT_ID, COL_STR_TYPE),
- (Chunk2DocRanker.COL_SCORE, np.float64),
- ],
- )
-
- for i, _g in enumerate(_groups):
- res[i] = (
- _g[Chunk2DocRanker.COL_PARENT_ID][0],
- self.exec_fn(_g, query_chunk_meta, match_chunk_meta),
- )
-
- self._sort_doc_by_score(res)
- return res
-
-
-class Chunk2DocRankDriver(BaseAggregateMatchesRankerDriver):
- """Extract matches score from chunks and use the executor to compute the rank and assign the resulting matches to the
- level above.
-
- Input-Output ::
- Input:
- document: {granularity: k-1}
- |- chunks: {granularity: k}
- | |- matches: {granularity: k}
- |
- |- chunks: {granularity: k}
- |- matches: {granularity: k}
- Output:
- document: {granularity: k-1}
- |- chunks: {granularity: k}
- | |- matches: {granularity: k}
- |
- |- chunks: {granularity: k}
- | |- matches: {granularity: k}
- |
- |-matches: {granularity: k-1} (Ranked according to Ranker Executor)
- """
-
- def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- """
- :param docs: the doc which gets bubbled up matches
- :param args: not used (kept to maintain interface)
- :param kwargs: not used (kept to maintain interface)
- """
- for doc in docs:
- chunks = doc.chunks
- match_idx = [] # type: List[Tuple[str, str, str, float]]
- query_meta = {} # type: Dict[str, Dict]
- match_meta = {} # type: Dict[str, Dict]
- parent_id_chunk_id_map = defaultdict(list)
- matches_by_id = defaultdict(Document)
- for chunk in chunks:
- query_meta[chunk.id] = (
- chunk.get_attrs(*self._exec_query_keys)
- if self._exec_query_keys
- else None
- )
- for match in chunk.matches:
- match_info = self._extract_query_match_info(
- match=match, query=chunk
- )
- match_idx.append(match_info)
- match_meta[match.id] = (
- match.get_attrs(*self._exec_match_keys)
- if self._exec_match_keys
- else None
- )
- parent_id_chunk_id_map[match.parent_id].append(match.id)
- matches_by_id[match.id] = match
-
- if match_idx:
- match_idx = np.array(
- match_idx,
- dtype=[
- (Chunk2DocRanker.COL_PARENT_ID, COL_STR_TYPE),
- (Chunk2DocRanker.COL_DOC_CHUNK_ID, COL_STR_TYPE),
- (Chunk2DocRanker.COL_QUERY_CHUNK_ID, COL_STR_TYPE),
- (Chunk2DocRanker.COL_SCORE, np.float64),
- ],
- )
-
- docs_scores = self._score(match_idx, query_meta, match_meta)
-
- self._insert_query_matches(
- query=doc,
- parent_id_chunk_id_map=parent_id_chunk_id_map,
- chunk_matches_by_id=matches_by_id,
- docs_scores=docs_scores,
- )
-
-
-class AggregateMatches2DocRankDriver(BaseAggregateMatchesRankerDriver):
- """This Driver is intended to take a `document` with matches at a `given granularity > 0`, clear those matches and substitute
- these matches by the documents at a lower granularity level.
- Input-Output ::
- Input:
- document: {granularity: k}
- |- matches: {granularity: k}
-
- Output:
- document: {granularity: k}
- |- matches: {granularity: k-1} (Sorted according to Ranker Executor)
-
- Imagine a case where we are querying a system with text documents chunked by sentences. When we query the system,
- we use sentences (chunks) to query it. So at some point we will have:
- `query sentence (documents of granularity 1):
- matches: indexed sentences (documents of level depth 1)`
- `
- But in the output we want to have the full document that better matches the `sentence`.
- `query sentence (documents of granularity 1):
- matches: indexed full documents (documents of granularity 0).
- `
- Using this Driver before querying a Binary Index with full binary document data can be very useful to implement a search system.
- """
-
- def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- """
-
- :param docs: the document at granularity ``k``
- :param args: not used (kept to maintain interface)
- :param kwargs: not used (kept to maintain interface)
-
- .. note::
- - This driver will substitute the ``matches`` of `docs` to the corresponding ``parent documents`` of its current ``matches`` according
- to the executor.
- - Set the ``traversal_paths`` of this driver to identify the documents, which needs to get bubbled up matches.
- """
-
- for doc in docs:
- matches = doc.matches
-
- match_idx = []
- query_meta = {}
- match_meta = {}
- parent_id_chunk_id_map = defaultdict(list)
- matches_by_id = defaultdict(Document)
-
- query_meta[doc.id] = (
- doc.get_attrs(*self._exec_query_keys) if self._exec_query_keys else None
- )
-
- for match in matches:
- match_info = self._extract_query_match_info(match=match, query=doc)
- match_idx.append(match_info)
- match_meta[match.id] = (
- match.get_attrs(*self._exec_match_keys)
- if self._exec_match_keys
- else None
- )
- parent_id_chunk_id_map[match.parent_id].append(match.id)
- matches_by_id[match.id] = match
-
- if match_idx:
- match_idx = np.array(
- match_idx,
- dtype=[
- (Chunk2DocRanker.COL_PARENT_ID, COL_STR_TYPE),
- (Chunk2DocRanker.COL_DOC_CHUNK_ID, COL_STR_TYPE),
- (Chunk2DocRanker.COL_QUERY_CHUNK_ID, COL_STR_TYPE),
- (Chunk2DocRanker.COL_SCORE, np.float64),
- ],
- )
-
- docs_scores = self._score(match_idx, query_meta, match_meta)
- # This ranker will change the current matches
- doc.ClearField('matches')
- self._insert_query_matches(
- query=doc,
- parent_id_chunk_id_map=parent_id_chunk_id_map,
- chunk_matches_by_id=matches_by_id,
- docs_scores=docs_scores,
- )
diff --git a/jina/drivers/reduce.py b/jina/drivers/reduce.py
deleted file mode 100644
index 47a5315edd401..0000000000000
--- a/jina/drivers/reduce.py
+++ /dev/null
@@ -1,89 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Tuple, Iterable
-
-from collections import defaultdict
-
-import numpy as np
-
-from . import ContextAwareRecursiveMixin, BaseRecursiveDriver, FlatRecursiveMixin
-from ..types.arrays import ChunkArray, MatchArray, DocumentArray
-
-
-class ReduceAllDriver(ContextAwareRecursiveMixin, BaseRecursiveDriver):
- """:class:`ReduceAllDriver` merges chunks/matches from all requests, recursively.
-
- .. note::
-
- It uses the last request as a reference.
- """
-
- def __init__(self, traversal_paths: Tuple[str] = ('c',), *args, **kwargs):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
-
- def _apply_root(self, docs):
- request = self.msg.request
- request.body.ClearField('docs')
- request.docs.extend(docs)
-
- def _apply_all(
- self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs
- ) -> None:
- doc_pointers = {}
- for docs in doc_sequences:
- if isinstance(docs, (ChunkArray, MatchArray)):
- context_id = docs.reference_doc.id
- if context_id not in doc_pointers:
- doc_pointers[context_id] = docs.reference_doc
- else:
- if isinstance(docs, ChunkArray):
- doc_pointers[context_id].chunks.extend(docs)
- else:
- doc_pointers[context_id].matches.extend(docs)
- else:
- self._apply_root(docs)
-
-
-class CollectEvaluationDriver(FlatRecursiveMixin, BaseRecursiveDriver):
- """Merge all evaluations into one, grouped by ``doc.id`` """
-
- def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- doc_pointers = {}
- for doc in docs:
- if doc.id not in doc_pointers:
- doc_pointers[doc.id] = doc.evaluations
- else:
- doc_pointers[doc.id].extend(doc.evaluations)
-
-
-class ConcatEmbedDriver(BaseRecursiveDriver):
- """Concat all embeddings into one, grouped by ``doc.id`` """
-
- def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
-
- def __call__(self, *args, **kwargs):
- """Performs the concatenation of all embeddings in `self.docs`.
-
- :param args: args not used. Only for complying with parent class interface.
- :param kwargs: kwargs not used. Only for complying with parent class interface.
- """
- all_documents = self.docs.traverse_flatten(self._traversal_paths)
- doc_pointers = self._collect_embeddings(all_documents)
-
- last_request_documents = self.req.docs.traverse_flatten(self._traversal_paths)
- self._concat_apply(last_request_documents, doc_pointers)
-
- def _collect_embeddings(self, docs: 'DocumentArray'):
- doc_pointers = defaultdict(list)
- for doc in docs:
- doc_pointers[doc.id].append(doc.embedding)
- return doc_pointers
-
- def _concat_apply(self, docs, doc_pointers):
- for doc in docs:
- doc.embedding = np.concatenate(doc_pointers[doc.id], axis=0)
diff --git a/jina/drivers/search.py b/jina/drivers/search.py
deleted file mode 100644
index ac10a39245837..0000000000000
--- a/jina/drivers/search.py
+++ /dev/null
@@ -1,185 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Iterable, Tuple, Optional
-
-from . import (
- BaseExecutableDriver,
- QuerySetReader,
- FlatRecursiveMixin,
- ContextAwareRecursiveMixin,
-)
-from ..enums import EmbeddingClsType
-from ..types.document import Document
-from ..types.score import NamedScore
-
-if False:
- from ..types.arrays import DocumentArray
-
-
-class BaseSearchDriver(BaseExecutableDriver):
- """Drivers inherited from this Driver will bind :meth:`query` by default """
-
- def __init__(
- self,
- executor: Optional[str] = None,
- method: str = 'query',
- traversal_paths: Tuple[str] = ('r', 'c'),
- *args,
- **kwargs,
- ):
- super().__init__(
- executor, method, traversal_paths=traversal_paths, *args, **kwargs
- )
-
-
-class KVSearchDriver(ContextAwareRecursiveMixin, BaseSearchDriver):
- """Fill in the results using the :class:`jina.executors.indexers.meta.BinaryPbIndexer`
-
- .. warning::
- This driver runs a query for each document.
- This may not be very efficient, as the total number of queries grows cubic with the number of documents, chunks
- per document and top-k.
-
- - traversal_paths = ['m'] => D x K
- - traversal_paths = ['r'] => D
- - traversal_paths = ['cm'] => D x C x K
- - traversal_paths = ['m', 'cm'] => D x K + D x C x K
-
- where:
- - D is the number of queries
- - C is the number of chunks per document
- - K is the top-k
-
- :param is_update: when set to true the retrieved docs are merged into current message;
- otherwise, the retrieved Document overrides the existing Document
- :param traversal_paths: traversal paths for the driver
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(
- self,
- is_update: bool = True,
- traversal_paths: Tuple[str] = ('m',),
- *args,
- **kwargs,
- ):
- super().__init__(traversal_paths=traversal_paths, *args, **kwargs)
- self._is_update = is_update
-
- def _apply_all(
- self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs
- ) -> None:
-
- for docs in doc_sequences:
- miss_idx = (
- []
- ) #: missed hit results, some search may not end with results. especially in shards
- serialized_docs = self.exec_fn([d.id for d in docs])
-
- for idx, (retrieved_doc, serialized_doc) in enumerate(
- zip(docs, serialized_docs)
- ):
- if serialized_doc:
- r = Document(serialized_doc)
- if self._is_update:
- retrieved_doc.update(r)
- else:
- retrieved_doc.CopyFrom(r)
- else:
- miss_idx.append(idx)
-
- # delete non-existed matches in reverse
- for j in reversed(miss_idx):
- del docs[j]
-
-
-class VectorFillDriver(FlatRecursiveMixin, QuerySetReader, BaseSearchDriver):
- """Fill in the embedding by their document id."""
-
- def __init__(
- self,
- executor: Optional[str] = None,
- method: str = 'query_by_key',
- *args,
- **kwargs,
- ):
- super().__init__(executor, method, *args, **kwargs)
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- embeds = self.exec_fn([d.id for d in docs])
- for doc, embedding in zip(docs, embeds):
- doc.embedding = embedding
-
-
-class VectorSearchDriver(FlatRecursiveMixin, QuerySetReader, BaseSearchDriver):
- """Extract dense embeddings from the request for the executor to query.
-
- :param top_k: top-k document ids to retrieve
- :param fill_embedding: fill in the embedding of the corresponding doc,
- this requires the executor to implement :meth:`query_by_key`
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization"""
-
- def __init__(self, top_k: int = 50, fill_embedding: bool = False, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self._top_k = top_k
- self._fill_embedding = fill_embedding
-
- @property
- def exec_embedding_cls_type(self) -> EmbeddingClsType:
- """Get the sparse class type of the attached executor.
-
- :return: Embedding class type of the attached executor, default value is `dense`
- """
- return EmbeddingClsType.from_string(self.exec.embedding_cls_type)
-
- def _get_documents_embeddings(self, docs: 'DocumentArray'):
- embedding_cls_type = self.exec_embedding_cls_type
- if embedding_cls_type.is_dense:
- return docs.all_embeddings
- else:
- return docs.get_all_sparse_embeddings(embedding_cls_type=embedding_cls_type)
-
- def _fill_matches(self, doc, op_name, topks, scores, topk_embed):
- embedding_cls_type = self.exec_embedding_cls_type
- if embedding_cls_type.is_dense:
- for numpy_match_id, score, vector in zip(topks, scores, topk_embed):
- m = Document(id=numpy_match_id)
- m.score = NamedScore(op_name=op_name, value=score)
- r = doc.matches.append(m)
- if vector is not None:
- r.embedding = vector
- else:
- for idx, (numpy_match_id, score) in enumerate(zip(topks, scores)):
- vector = None
- if topk_embed[idx] is not None:
- vector = topk_embed.getrow(idx)
- m = Document(id=numpy_match_id)
- m.score = NamedScore(op_name=op_name, value=score)
- match = doc.matches.append(m)
- if vector is not None:
- match.embedding = vector
-
- def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None:
- embed_vecs, doc_pts = self._get_documents_embeddings(docs)
-
- if not doc_pts:
- return
-
- fill_fn = getattr(self.exec, 'query_by_key', None)
- if self._fill_embedding and not fill_fn:
- self.logger.warning(
- f'"fill_embedding=True" but {self.exec} does not have "query_by_key" method'
- )
-
- idx, dist = self.exec_fn(embed_vecs, top_k=int(self.top_k))
- op_name = self.exec.__class__.__name__
- for doc, topks, scores in zip(doc_pts, idx, dist):
- topk_embed = (
- fill_fn(topks)
- if (self._fill_embedding and fill_fn)
- else [None] * len(topks)
- )
- self._fill_matches(doc, op_name, topks, scores, topk_embed)
diff --git a/jina/drivers/segment.py b/jina/drivers/segment.py
deleted file mode 100644
index 48fb5a0d9158b..0000000000000
--- a/jina/drivers/segment.py
+++ /dev/null
@@ -1,41 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Optional, Tuple, Dict, List
-
-from . import BaseExecutableDriver, FlatRecursiveMixin, DocsExtractUpdateMixin
-from ..types.document import Document
-
-
-class SegmentDriver(DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver):
- """Drivers inherited from this Driver will bind :meth:`segment` by default """
-
- def __init__(
- self,
- executor: Optional[str] = None,
- method: str = 'segment',
- traversal_paths: Tuple[str] = ('r',),
- *args,
- **kwargs,
- ):
- super().__init__(
- executor, method, traversal_paths=traversal_paths, *args, **kwargs
- )
-
- @property
- def _stack_document_content(self):
- return False
-
- def update_single_doc(self, doc: 'Document', exec_result: List[Dict]) -> None:
- """Update the document's chunks field with executor's returns.
-
- :param doc: the Document object
- :param exec_result: the single result from :meth:`exec_fn`
- """
- new_chunks = []
- for chunk in exec_result:
- with Document(**chunk) as c:
- if not c.mime_type:
- c.mime_type = doc.mime_type
- new_chunks.append(c)
- doc.chunks.extend(new_chunks)
diff --git a/jina/drivers/train/rank/__init__.py b/jina/drivers/train/rank/__init__.py
deleted file mode 100644
index 337f3e7cb20b7..0000000000000
--- a/jina/drivers/train/rank/__init__.py
+++ /dev/null
@@ -1,40 +0,0 @@
-from ...rank import Matches2DocRankDriver
-from ....types.sets import DocumentSet
-
-
-class RankerTrainerDriver(Matches2DocRankDriver):
- """Ranker trainer driver."""
-
- def __init__(self, method: str = 'train', *args, **kwargs):
- super().__init__(method=method, *args, **kwargs)
-
- def _apply_all(self, docs: 'DocumentSet', *args, **kwargs) -> None:
- """
-
- :param docs: the matches of the ``context_doc``, they are at granularity ``k``
- :param args: not used (kept to maintain interface)
- :param kwargs: not used (kept to maintain interface)
-
- .. note::
- - This driver will change in place the ordering of ``matches`` of the ``context_doc`.
- - Set the ``traversal_paths`` of this driver such that it traverses along the ``matches`` of the ``chunks`` at the level desired.
- """
- queries_metas = []
- matches_metas = []
- for doc in docs:
- query_meta = (
- doc.get_attrs(*self._exec_query_keys) if self._exec_query_keys else None
- )
-
- matches = doc.matches
- needs_match_meta = self._exec_match_keys is not None
- match_meta = [] if needs_match_meta else None
- for match in matches:
- if needs_match_meta:
- match_meta.append(match.get_attrs(*self._exec_match_keys))
-
- # if there are no matches, no need to sort them
- queries_metas.append(query_meta)
- matches_metas.append(match_meta)
-
- self.exec_fn(queries_metas, matches_metas)
diff --git a/jina/enums.py b/jina/enums.py
index b449af810d95f..23029a6bbbeec 100644
--- a/jina/enums.py
+++ b/jina/enums.py
@@ -16,9 +16,6 @@
parallel_type: any
"""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
from enum import IntEnum, EnumMeta
@@ -211,15 +208,6 @@ def paired(self) -> 'SocketType':
}[self]
-class FlowOutputType(BetterEnum):
- """The enum for representing flow output config."""
-
- SHELL_PROC = 0 #: a shell-script, run each microservice as a process
- SHELL_DOCKER = 1 #: a shell-script, run each microservice as a container
- DOCKER_SWARM = 2 #: a docker-swarm YAML config
- K8S = 3 #: a Kubernetes YAML config
-
-
class FlowBuildLevel(BetterEnum):
"""
The enum for representing a flow's build level.
@@ -263,12 +251,8 @@ def is_inspect(self) -> bool:
class RequestType(BetterEnum):
"""The enum of Client mode."""
- INDEX = 0
- SEARCH = 1
- DELETE = 2
- UPDATE = 3
- CONTROL = 4
- TRAIN = 5
+ DATA = 0
+ CONTROL = 1
class CompressAlgo(BetterEnum):
@@ -305,9 +289,8 @@ class OnErrorStrategy(BetterEnum):
IGNORE = (
0 #: Ignore it, keep running all Drivers & Executors logics in the sequel flow
)
- SKIP_EXECUTOR = 1 #: Skip all Executors in the sequel, but drivers are still called
- SKIP_HANDLE = 2 #: Skip all Drivers & Executors in the sequel, only `pre_hook` and `post_hook` are called
- THROW_EARLY = 3 #: Immediately throw the exception, the sequel flow will not be running at all
+ SKIP_HANDLE = 1 #: Skip all Executors in the sequel, only `pre_hook` and `post_hook` are called
+ THROW_EARLY = 2 #: Immediately throw the exception, the sequel flow will not be running at all
class FlowInspectType(BetterEnum):
diff --git a/jina/excepts.py b/jina/excepts.py
index b600e66de7b1a..cff8dfd541fa6 100644
--- a/jina/excepts.py
+++ b/jina/excepts.py
@@ -1,8 +1,5 @@
"""This modules defines all kinds of exceptions raised in Jina."""
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
class NoExplicitMessage(Exception):
"""Waiting until all partial messages are received."""
@@ -32,10 +29,6 @@ class NoAvailablePortError(Exception):
"""When no available random port could be found"""
-class DriverError(Exception):
- """Driver related exceptions."""
-
-
class RuntimeTerminated(KeyboardInterrupt):
"""The event loop of BasePea ends."""
@@ -44,18 +37,6 @@ class PodRunTimeError(Exception):
"""The error propagated by Pods when Executor throws an exception."""
-class DriverNotInstalled(DriverError):
- """Driver is not installed in the BasePea."""
-
-
-class NoDriverForRequest(DriverError):
- """No matched driver for this request."""
-
-
-class UnattachedDriver(DriverError):
- """Driver is not attached to any BasePea or executor."""
-
-
class UnknownControlCommand(RuntimeError):
"""The control command received can not be recognized."""
@@ -163,10 +144,6 @@ class BadDocType(TypeError):
"""Exception when can not construct a document from the given data."""
-class BadQueryLangType(TypeError):
- """Exception when can not construct a query language from the given data."""
-
-
class BadRequestType(TypeError):
"""Exception when can not construct a request object from given data."""
diff --git a/jina/executors/__init__.py b/jina/executors/__init__.py
index 311d5bea3f632..1d254aed1b527 100644
--- a/jina/executors/__init__.py
+++ b/jina/executors/__init__.py
@@ -1,87 +1,31 @@
-__copyright__ = 'Copyright (c) 2020 Jina AI Limited. All rights reserved.'
-__license__ = 'Apache-2.0'
-
import os
-import pickle
-import tempfile
-from datetime import datetime
-from pathlib import Path
from types import SimpleNamespace
-from typing import Dict, TypeVar, Type, List, Optional
-
-from .decorators import (
- as_update_method,
- store_init_kwargs,
- as_aggregate_method,
- wrap_func,
-)
-from .metas import get_default_metas, fill_metas_with_defaults
-from ..excepts import BadPersistantFile, NoDriverForRequest, UnattachedDriver
-from ..helper import typename, random_identity
-from ..jaml import JAMLCompatible, JAML, subvar_regex, internal_var_regex
-from ..logging import JinaLogger
+from typing import Dict, TypeVar, Optional, Callable
-# noinspection PyUnreachableCode
-if False:
- from ..peapods.runtimes.zmq.zed import ZEDRuntime
- from ..drivers import BaseDriver
+from .decorators import store_init_kwargs, wrap_func
+from .metas import get_default_metas
+from .. import __default_endpoint__
+from ..helper import typename
+from ..jaml import JAMLCompatible, JAML, subvar_regex, internal_var_regex
-__all__ = ['BaseExecutor', 'AnyExecutor', 'ExecutorType', 'GenericExecutor']
+__all__ = ['BaseExecutor', 'AnyExecutor', 'ExecutorType']
AnyExecutor = TypeVar('AnyExecutor', bound='BaseExecutor')
-# some variables may be self-referred and they must be resolved at here
-_ref_desolve_map = SimpleNamespace()
-_ref_desolve_map.__dict__['metas'] = SimpleNamespace()
-_ref_desolve_map.__dict__['metas'].__dict__['pea_id'] = 0
-_ref_desolve_map.__dict__['metas'].__dict__['replica_id'] = -1
-
class ExecutorType(type(JAMLCompatible), type):
"""The class of Executor type, which is the metaclass of :class:`BaseExecutor`."""
def __new__(cls, *args, **kwargs):
"""
-
-
- # noqa: DAR201
-
-
# noqa: DAR101
-
-
# noqa: DAR102
+
+ :return: Executor class
"""
_cls = super().__new__(cls, *args, **kwargs)
return cls.register_class(_cls)
- def __call__(cls, *args, **kwargs):
- """
-
-
- # noqa: DAR201
-
-
- # noqa: DAR101
-
-
- # noqa: DAR102
- """
- # do _preload_package
- getattr(cls, 'pre_init', lambda *x: None)()
-
- m = kwargs.pop('metas') if 'metas' in kwargs else {}
- r = kwargs.pop('requests') if 'requests' in kwargs else {}
-
- obj = type.__call__(cls, *args, **kwargs)
-
- # set attribute with priority
- # metas in YAML > class attribute > default_jina_config
- # jina_config = expand_dict(jina_config)
-
- getattr(obj, '_post_init_wrapper', lambda *x: None)(m, r)
- return obj
-
@staticmethod
def register_class(cls):
"""
@@ -90,16 +34,12 @@ def register_class(cls):
:param cls: The class.
:return: The class, after being registered.
"""
- update_funcs = ['add', 'delete', 'update']
- aggregate_funcs = ['evaluate']
reg_cls_set = getattr(cls, '_registered_class', set())
cls_id = f'{cls.__module__}.{cls.__name__}'
if cls_id not in reg_cls_set or getattr(cls, 'force_register', False):
wrap_func(cls, ['__init__'], store_init_kwargs)
- wrap_func(cls, update_funcs, as_update_method)
- wrap_func(cls, aggregate_funcs, as_aggregate_method)
reg_cls_set.add(cls_id)
setattr(cls, '_registered_class', reg_cls_set)
@@ -126,473 +66,153 @@ def __init__(awesomeness = 5):
.. highlight:: yaml
.. code-block:: yaml
- !MyAwesomeExecutor
+ jtype: MyAwesomeExecutor
with:
awesomeness: 5
- To use an executor in a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`,
- a proper :class:`jina.drivers.Driver` is required. This is because the
- executor is *NOT* protobuf-aware and has no access to the key-values in the protobuf message.
-
- Different executor may require different :class:`Driver` with
- proper :mod:`jina.drivers.handlers`, :mod:`jina.drivers.hooks` installed.
-
- .. seealso::
- Methods of the :class:`BaseExecutor` can be decorated via :mod:`jina.executors.decorators`.
-
- .. seealso::
- Meta fields :mod:`jina.executors.metas.defaults`.
-
"""
- store_args_kwargs = False #: set this to ``True`` to save ``args`` (in a list) and ``kwargs`` (in a map) in YAML config
-
- def __init__(self, *args, **kwargs):
- if isinstance(args, tuple) and len(args) > 0:
- self.args = args[0]
- else:
- self.args = args
- self.logger = JinaLogger(self.__class__.__name__)
- self._snapshot_files = []
- self._post_init_vars = set()
- self._last_snapshot_ts = datetime.now()
-
- def _post_init_wrapper(
+ def __init__(
self,
- _metas: Optional[Dict] = None,
- _requests: Optional[Dict] = None,
- fill_in_metas: bool = True,
- ) -> None:
- if fill_in_metas:
- if not _metas:
- _metas = get_default_metas()
-
- self._fill_metas(_metas)
- self.fill_in_drivers(_requests)
-
- _before = set(list(vars(self).keys()))
- self.post_init()
- self._post_init_vars = {k for k in vars(self) if k not in _before}
-
- def fill_in_drivers(self, _requests: Optional[Dict]):
- """
- Fill in drivers in a BaseExecutor.
-
- :param _requests: Dict containing driver information.
- """
- from ..executors.requests import get_default_reqs
-
- default_requests = get_default_reqs(type.mro(self.__class__))
+ metas: Optional[Dict] = None,
+ requests: Optional[Dict] = None,
+ runtime_args: Optional[Dict] = None,
+ ):
+ """`metas` and `requests` are always auto-filled with values from YAML config.
+
+ :param metas: a dict of metas fields
+ :param requests: a dict of endpoint-function mapping
+ :param runtime_args: a dict of arguments injected from :class:`Runtime` during runtime
+ """
+ self._add_metas(metas)
+ self._add_requests(requests)
+ self._add_runtime_args(runtime_args)
+
+ def _add_runtime_args(self, _runtime_args: Optional[Dict]):
+ if _runtime_args:
+ self.runtime_args = SimpleNamespace(**_runtime_args)
+ else:
+ self.runtime_args = SimpleNamespace()
+
+ def _add_requests(self, _requests: Optional[Dict]):
+ request_mapping = {} # type: Dict[str, Callable]
+
+ if _requests:
+ for endpoint, func in _requests.items():
+ # the following line must be `getattr(self.__class__, func)` NOT `getattr(self, func)`
+ # this to ensure we always have `_func` as unbound method
+ _func = getattr(self.__class__, func)
+ if callable(_func):
+ # the target function is not decorated with `@requests` yet
+ request_mapping[endpoint] = _func
+ elif typename(_func) == 'jina.executors.decorators.FunctionMapper':
+ # the target function is already decorated with `@requests`, need unwrap with `.fn`
+ request_mapping[endpoint] = _func.fn
+ else:
+ raise TypeError(
+ f'expect {typename(self)}.{func} to be a function, but receiving {typename(_func)}'
+ )
- if not _requests:
- self._drivers = self._get_drivers_from_requests(default_requests)
+ if hasattr(self, 'requests'):
+ self.requests.update(request_mapping)
else:
- parsed_drivers = self._get_drivers_from_requests(_requests)
+ self.requests = request_mapping
- if _requests.get('use_default', False):
- default_drivers = self._get_drivers_from_requests(default_requests)
+ def _add_metas(self, _metas: Optional[Dict]):
- for k, v in default_drivers.items():
- if k not in parsed_drivers:
- parsed_drivers[k] = v
+ tmp = get_default_metas()
- self._drivers = parsed_drivers
+ if _metas:
+ tmp.update(_metas)
- @staticmethod
- def _get_drivers_from_requests(_requests):
- _drivers = {} # type: Dict[str, List['BaseDriver']]
-
- if _requests and 'on' in _requests and isinstance(_requests['on'], dict):
- # if control request is forget in YAML, then fill it
- if 'ControlRequest' not in _requests['on']:
- from ..drivers.control import ControlReqDriver
-
- _requests['on']['ControlRequest'] = [ControlReqDriver()]
-
- for req_type, drivers_spec in _requests['on'].items():
- if isinstance(req_type, str):
- req_type = [req_type]
- if isinstance(drivers_spec, list):
- # old syntax
- drivers = drivers_spec
- common_kwargs = {}
- elif isinstance(drivers_spec, dict):
- drivers = drivers_spec.get('drivers', [])
- common_kwargs = drivers_spec.get('with', {})
- else:
- raise TypeError(f'unsupported type of driver spec: {drivers_spec}')
-
- for r in req_type:
- if r not in _drivers:
- _drivers[r] = list()
- if _drivers[r] != drivers:
- _drivers[r].extend(drivers)
-
- # inject common kwargs to drivers
- if common_kwargs:
- new_drivers = []
- for d in _drivers[r]:
- new_init_kwargs_dict = {
- k: v for k, v in d._init_kwargs_dict.items()
- }
- new_init_kwargs_dict.update(common_kwargs)
- new_drivers.append(d.__class__(**new_init_kwargs_dict))
- _drivers[r].clear()
- _drivers[r] = new_drivers
-
- if not _drivers[r]:
- _drivers.pop(r)
- return _drivers
-
- def _fill_metas(self, _metas):
unresolved_attr = False
+ target = SimpleNamespace()
# set self values filtered by those non-exist, and non-expandable
- for k, v in _metas.items():
- if not hasattr(self, k):
+ for k, v in tmp.items():
+ if not hasattr(target, k):
if isinstance(v, str):
if not subvar_regex.findall(v):
- setattr(self, k, v)
+ setattr(target, k, v)
else:
unresolved_attr = True
else:
- setattr(self, k, v)
- elif type(getattr(self, k)) == type(v):
- setattr(self, k, v)
- if not getattr(self, 'name', None):
- _id = random_identity().split('-')[0]
- _name = f'{typename(self)}-{_id}'
- if getattr(self, 'warn_unnamed', False):
- self.logger.warning(
- f'this executor is not named, i will call it "{_name}". '
- 'naming is important as it provides an unique identifier when '
- 'persisting this executor on disk.'
- )
- setattr(self, 'name', _name)
+ setattr(target, k, v)
+ elif type(getattr(target, k)) == type(v):
+ setattr(target, k, v)
+
if unresolved_attr:
_tmp = vars(self)
- _tmp['metas'] = _metas
- new_metas = JAML.expand_dict(_tmp, context=_ref_desolve_map)['metas']
+ _tmp['metas'] = tmp
+ new_metas = JAML.expand_dict(_tmp)['metas']
- # set self values filtered by those non-exist, and non-expandable
for k, v in new_metas.items():
- if not hasattr(self, k):
+ if not hasattr(target, k):
if isinstance(v, str):
if not (
subvar_regex.findall(v) or internal_var_regex.findall(v)
):
- setattr(self, k, v)
+ setattr(target, k, v)
else:
raise ValueError(
f'{k}={v} is not substitutable or badly referred'
)
else:
- setattr(self, k, v)
-
- def post_init(self):
- """
- Initialize class attributes/members that can/should not be (de)serialized in standard way.
-
- Examples:
+ setattr(target, k, v)
+ # `name` is important as it serves as an identifier of the executor
+ # if not given, then set a name by the rule
+ if not getattr(target, 'name', None):
+ setattr(target, 'name', typename(self))
- - deep learning models
- - index files
- - numpy arrays
+ self.metas = target
- .. warning::
- All class members created here will NOT be serialized when calling :func:`save`. Therefore if you
- want to store them, please override the :func:`__getstate__`.
+ def close(self) -> None:
"""
- pass
+ Always invoked as executor is destroyed.
- @classmethod
- def pre_init(cls):
- """This function is called before the object initiating (i.e. :func:`__call__`)
-
- Packages and environment variables can be set and load here.
+ You can write destructor & saving logic here.
"""
pass
- @property
- def save_abspath(self) -> str:
- """Get the file path of the binary serialized object
-
- The file name ends with `.bin`.
-
- :return: the name of the file with `.bin`
+ def __call__(self, req_endpoint: str, **kwargs):
"""
- return self.get_file_from_workspace(f'{self.name}.bin')
-
- @property
- def config_abspath(self) -> str:
- """Get the file path of the YAML config
-
- :return: The file name ends with `.yml`.
+ # noqa: DAR101
+ # noqa: DAR102
+ # noqa: DAR201
"""
- return self.get_file_from_workspace(f'{self.name}.yml')
+ if req_endpoint in self.requests:
+ return self.requests[req_endpoint](
+ self, **kwargs
+ ) # unbound method, self is required
+ elif __default_endpoint__ in self.requests:
+ return self.requests[__default_endpoint__](
+ self, **kwargs
+ ) # unbound method, self is required
- @staticmethod
- def get_shard_workspace(
- workspace_folder: str,
- workspace_name: str,
- pea_id: int,
- replica_id: int = -1,
- ) -> str:
+ @property
+ def workspace(self) -> str:
"""
Get the path of the current shard.
- :param workspace_folder: folder of the workspace.
- :param workspace_name: name of the workspace.
- :param pea_id: id of the pea
- :param replica_id: id of the replica
-
:return: returns the workspace of the shard of this Executor.
"""
- if replica_id == -1:
- return os.path.join(workspace_folder, f'{workspace_name}-{pea_id}')
- else:
- return os.path.join(
- workspace_folder, f'{workspace_name}-{replica_id}-{pea_id}'
- )
-
- @property
- def workspace_name(self):
- """Get the name of the workspace.
-
- :return: returns the name of the executor
- """
- return self.name
-
- @property
- def _workspace(self):
- """Property to access `workspace` if existing or default to `./`. Useful to provide good interface when
- using executors directly in python.
-
- .. highlight:: python
- .. code-block:: python
-
- with NumpyIndexer() as indexer:
- indexer.touch()
-
- :return: returns the workspace property of the executor or default to './'
- """
- return self.workspace or './'
-
- @property
- def shard_workspace(self) -> str:
- """Get the path of the current shard.
-
- :return: returns the workspace of the shard of this Executor
- """
- return BaseExecutor.get_shard_workspace(
- self._workspace, self.workspace_name, self.pea_id, self.replica_id
- )
-
- def get_file_from_workspace(self, name: str) -> str:
- """Get a usable file path under the current workspace
-
- :param name: the name of the file
-
- :return: file path
- """
- Path(self.shard_workspace).mkdir(parents=True, exist_ok=True)
- return os.path.join(self.shard_workspace, name)
-
- @property
- def physical_size(self) -> int:
- """Return the size of the current workspace in bytes
-
- :return: byte size of the current workspace
- """
- root_directory = Path(self.shard_workspace)
- return sum(f.stat().st_size for f in root_directory.glob('**/*') if f.is_file())
-
- def __getstate__(self):
- d = dict(self.__dict__)
- del d['logger']
- for k in self._post_init_vars:
- del d[k]
- cached = [k for k in d.keys() if k.startswith('CACHED_')]
- for k in cached:
- del d[k]
-
- d.pop('_drivers', None)
- return d
-
- def __setstate__(self, d):
- self.__dict__.update(d)
- self.logger = JinaLogger(self.__class__.__name__)
- try:
- self._post_init_wrapper(fill_in_metas=False)
- except ModuleNotFoundError as ex:
- self.logger.warning(
- f'{typename(ex)} is often caused by a missing component, '
- f'which often can be solved by "pip install" relevant package: {ex!r}',
- exc_info=True,
- )
-
- def touch(self) -> None:
- """Touch the executor and change ``is_updated`` to ``True`` so that one can call :func:`save`. """
- self.is_updated = True
-
- def save(self, filename: str = None):
- """
- Persist data of this executor to the :attr:`shard_workspace`. The data could be
- a file or collection of files produced/used during an executor run.
-
- These are some of the common data that you might want to persist:
-
- - binary dump/pickle of the executor
- - the indexed files
- - (pre)trained models
-
- .. warning::
-
- Class members created in `post_init` will NOT be serialized when calling :func:`save`. Therefore if you
- want to store them, please override the :func:`__getstate__`.
-
- It uses ``pickle`` for dumping. For members/attributes that are invalid or inefficient for ``pickle``, you
- need to implement their own persistence strategy in the :func:`__getstate__`.
-
- :param filename: file path of the serialized file, if not given then :attr:`save_abspath` is used
- """
- if not self.read_only and self.is_updated:
- f = filename or self.save_abspath
- if not f:
- f = tempfile.NamedTemporaryFile(
- 'w', delete=False, dir=os.environ.get('JINA_EXECUTOR_WORKDIR', None)
- ).name
-
- if self.max_snapshot > 0 and os.path.exists(f):
- bak_f = (
- f
- + f'.snapshot-{self._last_snapshot_ts.strftime("%Y%m%d%H%M%S") or "NA"}'
- )
- os.rename(f, bak_f)
- self._snapshot_files.append(bak_f)
- if len(self._snapshot_files) > self.max_snapshot:
- d_f = self._snapshot_files.pop(0)
- if os.path.exists(d_f):
- os.remove(d_f)
- with open(f, 'wb') as fp:
- pickle.dump(self, fp)
- self._last_snapshot_ts = datetime.now()
- self.is_updated = False
- self.logger.success(
- f'artifacts of this executor ({self.name}) is persisted to {f}'
+ if getattr(self.runtime_args, 'workspace', None):
+ complete_workspace = os.path.join(
+ self.runtime_args.workspace, self.metas.name
)
+ replica_id = getattr(self.runtime_args, 'replica_id', None)
+ pea_id = getattr(self.runtime_args, 'pea_id', None)
+ if replica_id is not None and replica_id != -1:
+ complete_workspace = os.path.join(complete_workspace, str(replica_id))
+ if pea_id is not None and pea_id != -1:
+ complete_workspace = os.path.join(complete_workspace, str(pea_id))
+ return os.path.abspath(complete_workspace)
+ elif self.metas.workspace is not None:
+ return os.path.abspath(self.metas.workspace)
else:
- if not self.is_updated:
- self.logger.info(
- f'no update since {self._last_snapshot_ts:%Y-%m-%d %H:%M:%S%z}, will not save. '
- 'If you really want to save it, call "touch()" before "save()" to force saving'
- )
-
- @classmethod
- def inject_config(
- cls: Type[AnyExecutor],
- raw_config: Dict,
- pea_id: int = 0,
- replica_id: int = -1,
- read_only: bool = False,
- *args,
- **kwargs,
- ) -> Dict:
- """Inject config into the raw_config before loading into an object.
-
- :param raw_config: raw config to work on
- :param pea_id: the id of the storage of this parallel pea
- :param replica_id: the id of the replica the pea is contained in
- :param read_only: if the executor should be readonly
- :param args: Additional arguments.
- :param kwargs: Additional key word arguments.
-
- :return: an executor object
- """
- if 'metas' not in raw_config:
- raw_config['metas'] = {}
- tmp = fill_metas_with_defaults(raw_config)
- tmp['metas']['pea_id'] = pea_id
- tmp['metas']['replica_id'] = replica_id
- tmp['metas']['read_only'] = read_only
- if kwargs.get('metas'):
- tmp['metas'].update(kwargs['metas'])
- del kwargs['metas']
- tmp.update(kwargs)
- return tmp
-
- @staticmethod
- def load(filename: str = None) -> AnyExecutor:
- """Build an executor from a binary file
-
- :param filename: the file path of the binary serialized file
- :return: an executor object
-
- It uses ``pickle`` for loading.
- """
- if not filename:
- raise FileNotFoundError
- try:
- with open(filename, 'rb') as fp:
- return pickle.load(fp)
- except EOFError:
- raise BadPersistantFile(f'broken file {filename} can not be loaded')
-
- def close(self) -> None:
- """
- Release the resources as executor is destroyed, need to be overridden
- """
- self.save()
- self.logger.close()
+ raise Exception('can not find metas.workspace or runtime_args.workspace')
def __enter__(self):
return self
def __exit__(self, exc_type, exc_val, exc_tb):
self.close()
-
- def attach(self, runtime: 'ZEDRuntime', *args, **kwargs):
- """Attach this executor to a Basepea
-
- This is called inside the initializing of a :class:`jina.peapods.runtime.BasePea`.
-
- :param runtime: Runtime procedure leveraging ZMQ.
- :param args: Additional arguments.
- :param kwargs: Additional key word arguments.
- """
- for req_type, drivers in self._drivers.items():
- for driver in drivers:
- driver.attach(
- executor=self, runtime=runtime, req_type=req_type, *args, **kwargs
- )
-
- # replacing the logger to runtime's logger
- if runtime and isinstance(getattr(runtime, 'logger', None), JinaLogger):
- self.logger = runtime.logger
-
- def __call__(self, req_type, *args, **kwargs):
- """
-
-
- # noqa: DAR201
-
-
- # noqa: DAR101
-
-
- # noqa: DAR102
- """
- if req_type in self._drivers:
- for d in self._drivers[req_type]:
- if d.attached:
- d()
- else:
- raise UnattachedDriver(d)
- else:
- raise NoDriverForRequest(f'{req_type} for {self}')
-
- def __str__(self):
- return self.__class__.__name__
-
-
-class GenericExecutor(BaseExecutor):
- """Alias to BaseExecutor, but bind with GenericDriver by default. """
diff --git a/jina/executors/classifiers/__init__.py b/jina/executors/classifiers/__init__.py
deleted file mode 100644
index d263819fac568..0000000000000
--- a/jina/executors/classifiers/__init__.py
+++ /dev/null
@@ -1,34 +0,0 @@
-from .. import BaseExecutor
-
-if False:
- import numpy as np
-
-
-class BaseClassifier(BaseExecutor):
- """
- The base class of Classifier Executor. Classifier Executor allows one to
- perform classification and regression on given input and output the predicted
- hard/soft label.
-
- This class should not be used directly. Subclasses should be used.
- """
-
- def predict(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray':
- """
- Perform hard/soft classification on ``data``, the predicted value for each sample in X is returned.
-
- The output value can be zero/one, for one-hot label; or float for soft-label or regression label.
- Use the corresponding driver to interpret these labels
-
- The size and type of output can be one of the follows, ``B`` is ``data.shape[0]``:
- - (B,) or (B, 1); zero/one or float
- - (B, L): zero/one one-hot or soft label for L-class multi-class classification
-
- :param content: the input data to be classified, can be a ndim array.
- where axis=0 represents the batch size, i.e. data[0] is the first sample, data[1] is the second sample, data[n] is the n sample
- :type content: np.ndarray
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- :rtype: np.ndarray
- """
- raise NotImplementedError
diff --git a/jina/executors/compound.py b/jina/executors/compound.py
deleted file mode 100644
index 336d6fc9509eb..0000000000000
--- a/jina/executors/compound.py
+++ /dev/null
@@ -1,378 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from collections import defaultdict
-from typing import Dict, List, Callable, Union, Optional
-
-from . import BaseExecutor, AnyExecutor
-
-
-class CompoundExecutor(BaseExecutor):
- """A :class:`CompoundExecutor` is a set of multiple executors.
- The most common usage is chaining a pipeline of executors, where the
- input of the current is the output of the former.
-
- A common use case of :class:`CompoundExecutor` is to glue multiple :class:`BaseExecutor` together, instead of breaking them into different Pods.
-
- :param routes: a map of function routes. The key is the function name, the value is a tuple of two pieces,
- where the first element is the name of the referred component (``metas.name``) and the second element
- is the name of the referred function.
-
- .. seealso::
-
- :func:`add_route`
- :param resolve_all: universally add ``*_all()`` to all functions that have the identical name
-
- **Example 1: a compound Chunk Indexer that does vector indexing and key-value index**
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- !CompoundExecutor
- components:
- - !NumpyIndexer
- with:
- index_filename: vec.gz
- metas:
- name: vecidx_exec # a customized name
- workspace: ${{TEST_WORKDIR}}
- - !BinaryPbIndexer
- with:
- index_filename: chunk.gz
- metas:
- name: chunkidx_exec
- workspace: ${{TEST_WORKDIR}}
- metas:
- name: chunk_compound_indexer
- workspace: ${{TEST_WORKDIR}}
- requests:
- on:
- SearchRequest:
- - !VectorSearchDriver
- with:
- executor: vecidx_exec
- IndexRequest:
- - !VectorIndexDriver
- with:
- executor: vecidx_exec
- ControlRequest:
- - !ControlReqDriver {}
-
- **Example 2: a compound crafter that first craft the doc and then segment **
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- !CompoundExecutor
- components:
- - !GifNameRawSplit
- metas:
- name: name_split # a customized name
- workspace: ${{TEST_WORKDIR}}
- - !GifPreprocessor
- with:
- every_k_frame: 2
- from_buffer: true
- metas:
- name: gif2chunk_preprocessor # a customized name
- metas:
- name: compound_crafter
- workspace: ${{TEST_WORKDIR}}
- py_modules: gif2chunk.py
- requests:
- on:
- IndexRequest:
- - !DocCraftDriver
- with:
- executor: name_split
- - !SegmentDriver
- with:
- executor: gif2chunk_preprocessor
- ControlRequest:
- - !ControlReqDriver {}
-
- Create a new :class:`CompoundExecutor` object
-
-
- **Example 3: **
-
- We have two dummy executors as follows:
-
- .. highlight:: python
- .. code-block:: python
-
- class dummyA(BaseExecutor):
- def say(self):
- return 'a'
-
- def sayA(self):
- print('A: im A')
-
-
- class dummyB(BaseExecutor):
- def say(self):
- return 'b'
-
- def sayB(self):
- print('B: im B')
-
- and we create a :class:`CompoundExecutor` consisting of these two via
-
- .. highlight:: python
- .. code-block:: python
-
- da, db = dummyA(), dummyB()
- ce = CompoundExecutor()
- ce.components = lambda: [da, db]
-
- Now the new executor ``ce`` have two new methods, i.e :func:`ce.sayA` and :func:`ce.sayB`. They point to the original
- :func:`dummyA.sayA` and :func:`dummyB.sayB` respectively. One can say ``ce`` has inherited these two methods.
-
- The interesting part is :func:`say`, as this function name is shared between :class:`dummyA` and :class:`dummyB`.
- It requires some resolution. When `resolve_all=True`, then a new function :func:`say_all` is add to ``ce``.
- ``ce.say_all`` works as if you call :func:`dummyA.sayA` and :func:`dummyB.sayB` in a row. This
- makes sense in some cases such as training, saving. In other cases, it may require a more sophisticated resolution,
- where one can use :func:`add_route` to achieve that. For example,
-
- .. highlight:: python
- .. code-block:: python
-
- ce.add_route('say', db.name, 'say')
- assert b.say() == 'b'
-
- Such resolution is what we call **routes** here, and it can be specified in advance with the
- arguments ``routes`` in :func:`__init__`, or using YAML.
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- !CompoundExecutor
- components: ...
- with:
- resolve_all: true
- routes:
- say:
- - dummyB-e3acc910
- - say
-
- .. warning::
-
- When setting inner `executors` in `components` the `workspace` configuration will not be used and will be overridden
- by a workspace extracted considering the name of the `CompoundExecutor`, the name of each internal `Component` and the `pea_id`
-
-
- One can access the component of a :class:`CompoundExecutor` via index, e.g.
-
- .. highlight:: python
- .. code-block:: python
-
- c = BaseExecutor.load_config('compound-example.yaml')
- assertTrue(c[0] == c['dummyA-1ef90ea8'])
- c[0].add(obj)
-
- .. note::
- Component ``workspace`` and ``pea_id`` are overridden by their :class:`CompoundExecutor` counterparts.
-
- .. warning::
-
- When sub-component is external, ``py_modules`` must be given at root level ``metas`` not at the sub-level.
-
- """
-
- class _FnWrapper:
- def __init__(self, fns):
- self.fns = fns
-
- def __call__(self, *args, **kwargs):
- r = []
- for f in self.fns:
- r.append(f())
- return r
-
- class _FnAllWrapper(_FnWrapper):
- def __call__(self, *args, **kwargs):
- return all(super().__call__(*args, **kwargs))
-
- class _FnOrWrapper(_FnWrapper):
- def __call__(self, *args, **kwargs):
- return any(super().__call__(*args, **kwargs))
-
- def __init__(
- self, routes: Dict[str, Dict] = None, resolve_all: bool = True, *args, **kwargs
- ):
- super().__init__(*args, **kwargs)
- self._components = None # type: Optional[List[AnyExecutor]]
- self._routes = routes
- self._is_updated = False #: the internal update state of this compound executor
- self.resolve_all = resolve_all
-
- @property
- def is_updated(self) -> bool:
- """
- Return ``True`` if any components is updated.
-
- :return: only true if all components are updated or if the compound is updated
- """
- return (
- self.components and any(c.is_updated for c in self.components)
- ) or self._is_updated
-
- @is_updated.setter
- def is_updated(self, val: bool) -> None:
- """
- Set :attr:`is_updated` for this :class:`CompoundExecutor`. Note, not to all its components
-
- :param val: new value of :attr:`is_updated`
- """
- self._is_updated = val
-
- def save(self, filename: Optional[str] = None):
- """
- Serialize this compound executor along with all components in it to binary files.
- It uses ``pickle`` for dumping.
-
- :param filename: file path of the serialized file, if not given then :attr:`save_abspath` is used
- """
- for c in self.components:
- c.save()
- super().save(
- filename=filename
- ) # do i really need to save the compound executor itself
-
- @property
- def components(self) -> List[AnyExecutor]:
- """
- Return all component executors as a list. The list follows the order as defined in the YAML config or the
- pre-given order when calling the setter.
-
- :return: components
- """
- return self._components
-
- @components.setter
- def components(self, comps: Callable[[], List]) -> None:
- """Set the components of this executors
-
- :param comps: a function returns a list of executors
- """
- if not callable(comps):
- raise TypeError(
- 'components must be a callable function that returns '
- 'a List[BaseExecutor]'
- )
-
- # Important to handle when loading a CompoundExecutor when `inner` executors have not been loaded from yaml
- if not getattr(self, '_init_from_yaml', False):
- self._components = comps()
- if not isinstance(self._components, list):
- raise TypeError(
- f'components expect a list of executors, receiving {type(self._components)!r}'
- )
- self._set_comp_workspace()
- self._resolve_routes()
- self._post_components()
- else:
- self.logger.debug(
- 'components is omitted from construction, as it is initialized from yaml config'
- )
-
- @staticmethod
- def get_component_workspace_from_compound_workspace(
- compound_workspace: str, compound_name: str, pea_id: int
- ) -> str:
- """
- Get the name of workspace.
-
- :param compound_workspace: Workspace of the compound executor.
- :param compound_name: Name of the compound executor.
- :param pea_id: Id of the pea.
- :return: The name of workspace.
- """
- import os
-
- return (
- BaseExecutor.get_shard_workspace(compound_workspace, compound_name, pea_id)
- if (isinstance(pea_id, int) and pea_id > 0)
- else os.path.join(compound_workspace, compound_name)
- )
-
- def _set_comp_workspace(self) -> None:
- # overrides the workspace setting for all components
- for c in self.components:
- if not c.workspace and self.workspace:
- c_workspace = (
- CompoundExecutor.get_component_workspace_from_compound_workspace(
- self.workspace, self.name, self.pea_id
- )
- )
- self.logger.warning(f'Setting workspace of {c.name} to {c_workspace}')
- c.workspace = c_workspace
-
- def _resolve_routes(self) -> None:
- if self._routes:
- for f, v in self._routes.items():
- for kk, vv in v.items():
- self.add_route(f, kk, vv)
-
- def add_route(
- self, fn_name: str, comp_name: str, comp_fn_name: str, is_stored: bool = False
- ) -> None:
- """Create a new function for this executor which refers to the component's function
-
- This will create a new function :func:`fn_name` which actually refers to ``components[comp_name].comp_fn_name``.
- It is useful when two components have a function with duplicated name and one wants to resolve this duplication.
-
- :param fn_name: the name of the new function
- :param comp_name: the name of the referred component, defined in ``metas.name``
- :param comp_fn_name: the name of the referred function of ``comp_name``
- :param is_stored: if ``True`` then this change will be stored in the config and affects future :func:`save` and
- :func:`save_config`
-
- """
- for c in self.components:
- if (
- c.name == comp_name
- and hasattr(c, comp_fn_name)
- and callable(getattr(c, comp_fn_name))
- ):
- setattr(self, fn_name, getattr(c, comp_fn_name))
- if is_stored:
- if not self._routes:
- self._routes = {}
- self._routes[fn_name] = {comp_name: comp_fn_name}
- self.is_updated = True
- return
- else:
- raise AttributeError(f'bad names: {comp_name} and {comp_fn_name}')
-
- def close(self) -> None:
- """Close all components and release the resources"""
- if self.components:
- for c in self.components:
- c.close()
- super().close()
-
- def __contains__(self, item: str):
- if isinstance(item, str):
- for c in self.components:
- if c.name == item:
- return True
- return False
- else:
- raise TypeError('CompoundExecutor only support string type "in"')
-
- def __getitem__(self, item: Union[int, str]):
- if isinstance(item, int):
- return self.components[item]
- elif isinstance(item, str):
- for c in self.components:
- if c.name == item:
- return c
- else:
- raise TypeError('CompoundExecutor only supports int or string index')
-
- def __iter__(self):
- return self.components.__iter__()
-
- def _post_components(self):
- pass
diff --git a/jina/executors/crafters/__init__.py b/jina/executors/crafters/__init__.py
deleted file mode 100644
index a43a36fcc147b..0000000000000
--- a/jina/executors/crafters/__init__.py
+++ /dev/null
@@ -1,31 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Dict, Union, List
-
-from .. import BaseExecutor
-
-
-class BaseCrafter(BaseExecutor):
- """
- A :class:`BaseCrafter` transforms the content of `Document`.
- It can be used for preprocessing, segmenting etc.
- It is an interface for Crafters which is a family of executors intended to apply
- transformations to single documents.
- The apply function is :func:`craft`, where the name of the arguments will be used as keys of the content.
-
- :param args: Additional positional arguments which are just used for the parent initialization
- :param kwargs: Additional keyword arguments which are just used for the parent initialization
- """
-
- def craft(self, *args, **kwargs) -> Union[List[Dict], Dict]:
- """
- Apply function of this executor.
- The name of the arguments are used as keys, which are then used to tell :class:`Driver` what information to extract
- from the protobuf request accordingly.
- The name of the arguments should be always valid keys defined in the protobuf.
-
- :param args: Extra variable length arguments
- :param kwargs: Extra variable keyword arguments
- """
- raise NotImplementedError
diff --git a/jina/executors/decorators.py b/jina/executors/decorators.py
index 890caa36c10bb..27b5f8210fb7f 100644
--- a/jina/executors/decorators.py
+++ b/jina/executors/decorators.py
@@ -1,52 +1,20 @@
"""Decorators and wrappers designed for wrapping :class:`BaseExecutor` functions. """
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import copy
+import functools
import inspect
from functools import wraps
-from itertools import islice, chain
-from typing import Callable, Any, Union, Iterator, List, Optional, Dict, Iterable
-
-import numpy as np
+from typing import (
+ Callable,
+ Union,
+ List,
+ Optional,
+ Dict,
+ Sequence,
+)
from .metas import get_default_metas
-from ..helper import batch_iterator, convert_tuple_to_list
-from ..logging import default_logger
-
-
-def as_aggregate_method(func: Callable) -> Callable:
- """Mark a function so that it keeps track of the number of documents evaluated and a running sum
- to have always access to average value
- :param func: the function to decorate
- :return: the wrapped function
- """
-
- @wraps(func)
- def arg_wrapper(self, *args, **kwargs):
- f = func(self, *args, **kwargs)
- self._running_stats += f
- return f
-
- return arg_wrapper
-
-
-def as_update_method(func: Callable) -> Callable:
- """Mark the function as the updating function of this executor,
- calling this function will change the executor so later you can save the change via :func:`save`
- Will set the is_updated property after function is called.
- :param func: the function to decorate
- :return: the wrapped function
- """
-
- @wraps(func)
- def arg_wrapper(self, *args, **kwargs):
- f = func(self, *args, **kwargs)
- self.is_updated = True
- return f
-
- return arg_wrapper
+from .. import DocumentArray
+from ..helper import convert_tuple_to_list
def wrap_func(cls, func_lst, wrapper):
@@ -63,27 +31,6 @@ def wrap_func(cls, func_lst, wrapper):
setattr(cls, f_name, wrapper(getattr(cls, f_name)))
-def as_ndarray(func: Callable, dtype=np.float32) -> Callable:
- """Convert an :class:`BaseExecutor` function returns to a ``numpy.ndarray``,
- the following type are supported: `EagerTensor`, `Tensor`, `list`
-
- :param func: the function to decorate
- :param dtype: the converted dtype of the ``numpy.ndarray``
- :return: the wrapped function
- """
-
- @wraps(func)
- def arg_wrapper(self, *args, **kwargs):
- r = func(self, *args, **kwargs)
- r_type = type(r).__name__
- if r_type in {'ndarray', 'EagerTensor', 'Tensor', 'list'}:
- return np.array(r, dtype)
- else:
- raise TypeError(f'unrecognized type {r_type}: {type(r)}')
-
- return arg_wrapper
-
-
def store_init_kwargs(func: Callable) -> Callable:
"""Mark the args and kwargs of :func:`__init__` later to be stored via :func:`save_config` in YAML
:param func: the function to decorate
@@ -96,7 +43,7 @@ def arg_wrapper(self, *args, **kwargs):
raise TypeError(
'this decorator should only be used on __init__ method of an executor'
)
- taboo = {'self', 'args', 'kwargs'}
+ taboo = {'self', 'args', 'kwargs', 'metas', 'requests', 'runtime_args'}
_defaults = get_default_metas()
taboo.update(_defaults.keys())
all_pars = inspect.signature(func).parameters
@@ -127,302 +74,56 @@ def arg_wrapper(self, *args, **kwargs):
return arg_wrapper
-def _get_slice(
- data: Union[Iterator[Any], List[Any], np.ndarray], total_size: int
-) -> Union[Iterator[Any], List[Any], np.ndarray]:
- if isinstance(data, Dict):
- data = islice(data.items(), total_size)
- else:
- data = data[:total_size]
- return data
-
-
-def _get_size(data: Union[Iterator[Any], List[Any], np.ndarray], axis: int = 0) -> int:
- if isinstance(data, np.ndarray):
- total_size = data.shape[axis]
- elif hasattr(data, '__len__'):
- total_size = len(data)
- else:
- total_size = None
- return total_size
-
-
-def _get_total_size(full_data_size, batch_size, num_batch):
- batched_data_size = batch_size * num_batch if num_batch else None
-
- if full_data_size is not None and batched_data_size is not None:
- total_size = min(full_data_size, batched_data_size)
- else:
- total_size = full_data_size or batched_data_size
- return total_size
-
-
-def _merge_results_after_batching(
- final_result, merge_over_axis: int = 0, flatten: bool = True
+def requests(
+ func: Callable[
+ [DocumentArray, DocumentArray, Dict, List[DocumentArray], List[DocumentArray]],
+ Optional[DocumentArray],
+ ] = None,
+ *,
+ on: Optional[Union[str, Sequence[str]]] = None,
):
- if not final_result:
- return
-
- if isinstance(final_result[0], np.ndarray):
- if len(final_result[0].shape) > 1:
- final_result = np.concatenate(final_result, merge_over_axis)
- elif isinstance(final_result[0], list) and flatten:
- final_result = list(chain.from_iterable(final_result))
-
- return final_result
-
-
-def batching(
- func: Optional[Callable[[Any], np.ndarray]] = None,
- batch_size: Optional[Union[int, Callable]] = None,
- num_batch: Optional[int] = None,
- split_over_axis: int = 0,
- merge_over_axis: int = 0,
- slice_on: int = 1,
- slice_nargs: int = 1,
- label_on: Optional[int] = None,
- ordinal_idx_arg: Optional[int] = None,
- flatten_output: bool = True,
-) -> Any:
- """Split the input of a function into small batches and call :func:`func` on each batch
- , collect the merged result and return. This is useful when the input is too big to fit into memory
-
- :param func: function to decorate
- :param batch_size: size of each batch
- :param num_batch: number of batches to take, the rest will be ignored
- :param split_over_axis: split over which axis into batches
- :param merge_over_axis: merge over which axis into a single result
- :param slice_on: the location of the data. When using inside a class,
- ``slice_on`` should take ``self`` into consideration.
- :param slice_nargs: the number of arguments
- :param label_on: the location of the labels. Useful for data with any kind of accompanying labels
- :param ordinal_idx_arg: the location of the ordinal indexes argument. Needed for classes
- where function decorated needs to know the ordinal indexes of the data in the batch
- (Not used when label_on is used)
- :param flatten_output: If this is set to True, the results from different batches will be chained and the returning value is a list of the results. Otherwise, the returning value is a list of lists, in which each element is a list containing the result from one single batch. Note if there is only one batch returned, the returned result is always flatten.
- :return: the merged result as if run :func:`func` once on the input.
-
- Example:
- .. highlight:: python
- .. code-block:: python
-
- class MemoryHungryExecutor:
-
- @batching
- def train(self, batch: 'numpy.ndarray', *args, **kwargs):
- gpu_train(batch) #: this will respect the ``batch_size`` defined as object attribute
-
- @batching(batch_size = 64)
- def train(self, batch: 'numpy.ndarray', *args, **kwargs):
- gpu_train(batch)
"""
+ `@requests` defines when a function will be invoked. It has a keyword `on=` to define the endpoint.
- def _batching(func):
- @wraps(func)
- def arg_wrapper(*args, **kwargs):
- # priority: decorator > class_attribute
- # by default data is in args[1] (self needs to be taken into account)
- data = args[slice_on : slice_on + slice_nargs]
- b_size = (
- batch_size(data) if callable(batch_size) else batch_size
- ) or getattr(args[0], 'batch_size', None)
-
- # no batching if b_size is None
- if b_size is None or data is None:
- return func(*args, **kwargs)
-
- default_logger.debug(
- f'batching enabled for {func.__qualname__} batch_size={b_size} '
- f'num_batch={num_batch} axis={split_over_axis}'
- )
-
- results = []
- data = (data, args[label_on]) if label_on else data
-
- yield_slice = [
- isinstance(args[slice_on + i], np.memmap) for i in range(0, slice_nargs)
- ]
-
- slice_idx = None
-
- # split the data into batches
- data_iterators = [
- batch_iterator(
- data[i],
- b_size,
- split_over_axis,
- yield_slice=yield_slice[i],
- )
- for i in range(0, slice_nargs)
- ]
-
- batch_args = list(copy.copy(args))
-
- # load the batches of data and feed into the function
- for _data_args in zip(*data_iterators):
- _data_args = list(_data_args)
- for i, (_yield_slice, _arg) in enumerate(zip(yield_slice, _data_args)):
- if _yield_slice:
- original_arg = args[slice_on + i]
- _memmap = np.memmap(
- original_arg.filename,
- dtype=original_arg.dtype,
- mode='r',
- shape=original_arg.shape,
- )
- _data_args[i] = _memmap[_arg]
- slice_idx = _arg[split_over_axis]
- if slice_idx.start is None or slice_idx.stop is None:
- slice_idx = None
- del _memmap
-
- # TODO: figure out what is ordinal_idx_arg
- if not isinstance(_data_args[i], tuple):
- if ordinal_idx_arg and slice_idx is not None:
- batch_args[ordinal_idx_arg] = slice_idx
-
- batch_args[slice_on : slice_on + slice_nargs] = _data_args
-
- r = func(*batch_args, **kwargs)
-
- if r is not None:
- results.append(r)
-
- return _merge_results_after_batching(
- results, merge_over_axis, flatten_output
- )
-
- return arg_wrapper
-
- if func:
- return _batching(func)
- else:
- return _batching
-
-
-def single(
- func: Optional[Callable[[Any], np.ndarray]] = None,
- merge_over_axis: int = 0,
- slice_on: int = 1,
- slice_nargs: int = 1,
- flatten_output: bool = False,
-) -> Any:
- """Guarantee that the inputs of a function with more than one argument is provided as single instances and not in batches
-
- :param func: function to decorate
- :param merge_over_axis: merge over which axis into a single result
- :param slice_on: the location of the data. When using inside a class,
- ``slice_on`` should take ``self`` into consideration.
- :param slice_nargs: the number of positional arguments considered as data
- :param flatten_output: If this is set to True, the results from different batches will be chained and the returning value is a list of the results. Otherwise, the returning value is a list of lists, in which each element is a list containing the result from one single batch. Note if there is only one batch returned, the returned result is always flatten.
- :return: the merged result as if run :func:`func` once on the input.
-
- ..warning:
- data arguments will be taken starting from ``slice_on` to ``slice_on + num_data``
-
- Example:
- .. highlight:: python
- .. code-block:: python
-
- class OneByOneCrafter:
-
- @single
- def craft(self, text: str, id: str) -> Dict:
- ...
-
- .. note:
- Single multi input decorator will let the user interact with the executor in 3 different ways:
- - Providing batches: (This decorator will make sure that the actual method receives just a single instance)
- - Providing a single instance
- - Providing a single instance through kwargs.
-
- .. highlight:: python
- .. code-block:: python
-
- class OneByOneCrafter:
- @single
- def craft(self, text: str, id: str) -> Dict:
- return {'text': f'{text}-crafted', 'id': f'{id}-crafted'}
-
- crafter = OneByOneCrafter()
-
- results = crafted.craft(['text1', 'text2'], ['id1', 'id2'])
- assert len(results) == 2
- assert results[0] == {'text': 'text1-crafted', 'id': 'id1-crafted'}
- assert results[1] == {'text': 'text2-crafted', 'id': 'id2-crafted'}
+ A class method decorated with plan `@requests` (without `on=`) is the default handler for all endpoints.
+ That means, it is the fallback handler for endpoints that are not found.
- result = crafter.craft('text', 'id')
- assert result['text'] == 'text-crafted'
- assert result['id'] == 'id-crafted'
-
- results = crafted.craft(text='text', id='id')
- assert result['text'] == 'text-crafted'
- assert result['id'] == 'id-crafted'
+ :param func: the method to decorate
+ :param on: the endpoint string, by convention starts with `/`
+ :return: decorated function
"""
+ from .. import __default_endpoint__, __num_args_executor_func__
- def _single_multi_input(func):
- @wraps(func)
- def arg_wrapper(*args, **kwargs):
- # by default data is in args[1:] (self needs to be taken into account)
- args = list(args)
- default_logger.debug(f'batching disabled for {func.__qualname__}')
-
- data_iterators = args[slice_on : slice_on + slice_nargs]
+ class FunctionMapper:
+ def __init__(self, fn):
- if len(args) <= slice_on:
- # like this one can use the function with single kwargs
- return func(*args, **kwargs)
- elif len(args) < slice_on + slice_nargs:
- raise IndexError(
- f'can not select positional args at {slice_on}: {slice_nargs}, '
- f'your `args` has {len(args)} arguments.'
+ arg_spec = inspect.getfullargspec(fn)
+ if not arg_spec.varkw and len(arg_spec.args) < __num_args_executor_func__:
+ raise TypeError(
+ f'{fn} accepts only {arg_spec.args} which is fewer than expected, '
+ f'please add `**kwargs` to the function signature.'
)
- elif (
- len(args) <= slice_on
- or isinstance(data_iterators[0], str)
- or isinstance(data_iterators[0], bytes)
- or not isinstance(data_iterators[0], Iterable)
- ):
- # like this one can use the function with single kwargs
- return func(*args, **kwargs)
-
- final_result = []
- for new_args in zip(*data_iterators):
- args[slice_on : slice_on + slice_nargs] = new_args
- r = func(*args, **kwargs)
-
- if r is not None:
- final_result.append(r)
-
- return _merge_results_after_batching(
- final_result, merge_over_axis, flatten=flatten_output
- )
- return arg_wrapper
-
- if func:
- return _single_multi_input(func)
- else:
- return _single_multi_input
+ @functools.wraps(fn)
+ def arg_wrapper(*args, **kwargs):
+ return fn(*args, **kwargs)
+ self.fn = arg_wrapper
-def requests(func: Callable = None, on: str = 'default') -> Callable:
- """Decorator for binding an Executor function to requests
-
- :param func: the Executor function to decorate
- :param on: the request type to bind, e.g. IndexRequest, SearchRequest, UpdateRequest, DeleteRequest, etc.
- you may also use `index`, `search`, `update`, `delete` as shortcut.
- :return: the wrapped function
- """
+ def __set_name__(self, owner, name):
+ self.fn.class_name = owner.__name__
+ if not hasattr(owner, 'requests'):
+ owner.requests = {}
- def _requests(func):
- @wraps(func)
- def arg_wrapper(*args, **kwargs):
- return func(*args, **kwargs)
+ if isinstance(on, (list, tuple)):
+ for o in on:
+ owner.requests[o] = self.fn
+ else:
+ owner.requests[on or __default_endpoint__] = self.fn
- return arg_wrapper
+ setattr(owner, name, self.fn)
if func:
- return _requests(func)
+ return FunctionMapper(func)
else:
- return _requests
+ return FunctionMapper
diff --git a/jina/executors/devices.py b/jina/executors/devices.py
deleted file mode 100644
index 78b5eccf68fc6..0000000000000
--- a/jina/executors/devices.py
+++ /dev/null
@@ -1,270 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from abc import abstractmethod
-
-from ..helper import cached_property
-
-
-class BaseDevice:
- """:class:`BaseFrameworkExecutor` is the base class for the executors using other frameworks internally, including `tensorflow`, `pytorch`, `onnx`, `faiss` and `paddlepaddle`."""
-
- @cached_property
- @abstractmethod
- def device(self):
- """
- Set the device on which the executor will be running.
-
- ..notes:
- In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use,
- please use the environment variable `CUDA_VISIBLE_DEVICES`.
- """
-
- @abstractmethod
- def to_device(self, *args, **kwargs):
- """Move the computation from GPU to CPU or vice versa."""
-
-
-class TorchDevice(BaseDevice):
- """
- :class:`BaseTorchDeviceHandler` implements the base class for the executors using :mod:`torch` library. The common setups go into this class.
-
- To implement your own executor with the :mod:`torch` library,
-
- .. highlight:: python
- .. code-block:: python
-
- class MyAwesomeTorchEncoder(BaseEncoder, BaseTorchDeviceHandler):
- def post_init(self):
- # load your awesome model
- import torchvision.models as models
- self.model = models.mobilenet_v2().features.eval()
- self.to_device(self.model)
-
- def encode(self, data, *args, **kwargs):
- # use your awesome model to encode/craft/score
- import torch
- torch.set_grad_enabled(False)
-
- _input = torch.as_tensor(data, device=self.device)
- _output = self.model(_input).cpu()
-
- return _output.numpy()
-
- """
-
- @cached_property
- def device(self):
- """
- Set the device on which the executors using :mod:`torch` library will be running.
-
- ..notes:
- In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use,
- please use the environment variable `CUDA_VISIBLE_DEVICES`.
- """
- import torch
-
- return torch.device('cuda:0') if self.on_gpu else torch.device('cpu')
-
- def to_device(self, model, *args, **kwargs):
- """Load the model to device."""
- model.to(self.device)
-
-
-class PaddleDevice(BaseDevice):
- """
- :class:`BasePaddleExecutor` implements the base class for the executors using :mod:`paddlepaddle` library. The common setups go into this class.
-
- To implement your own executor with the :mod:`paddlepaddle` library,
-
- .. highlight:: python
- .. code-block:: python
-
- class MyAwesomePaddleEncoder(BasePaddleExecutor):
- def post_init(self):
- # load your awesome model
- import paddlehub as hub
- module = hub.Module(name='mobilenet_v2_imagenet')
- inputs, outputs, self.model = module.context(trainable=False)
- self.inputs_name = input_dict['image'].name
- self.outputs_name = output_dict['feature_map'].name
- self.exe = self.to_device()
-
- def encode(self, data, *args, **kwargs):
- # use your awesome model to encode/craft/score
- _output, *_ = self.exe.run(
- program=self.model,
- fetch_list=[self.outputs_name],
- feed={self.inputs_name: data},
- return_numpy=True
- )
- return feature_map
- """
-
- @cached_property
- def device(self):
- """
- Set the device on which the executors using :mod:`paddlepaddle` library will be running.
-
- ..notes:
- In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use,
- please use the environment variable `CUDA_VISIBLE_DEVICES`.
- """
- import paddle.fluid as fluid
-
- return fluid.CUDAPlace(0) if self.on_gpu else fluid.CPUPlace()
-
- def to_device(self):
- """Load the model to device."""
- import paddle.fluid as fluid
-
- return fluid.Executor(self.device)
-
-
-class TFDevice(BaseDevice):
- """
- :class:`BaseTFDeviceHandler` implements the base class for the executors using :mod:`tensorflow` library. The common setups go into this class.
-
- To implement your own executor with the :mod:`tensorflow` library,
-
- .. highlight:: python
- .. code-block:: python
-
- class MyAwesomeTFEncoder(BaseTFDeviceHandler):
- def post_init(self):
- # load your awesome model
- self.to_device()
- import tensorflow as tf
- model = tf.keras.applications.MobileNetV2(
- input_shape=(self.img_shape, self.img_shape, 3),
- include_top=False,
- pooling=self.pool_strategy,
- weights='imagenet')
- model.trainable = False
- self.model = model
-
- def encode(self, data, *args, **kwargs):
- # use your awesome model to encode/craft/score
- return self.model(data)
- """
-
- @cached_property
- def device(self):
- """
- Set the device on which the executors using :mod:`tensorflow` library will be running.
-
- ..notes:
- In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use,
- please use the environment variable `CUDA_VISIBLE_DEVICES`.
- """
- import tensorflow as tf
-
- cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
- gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
- if self.on_gpu and len(gpus) > 0:
- cpus.append(gpus[0])
- return cpus
-
- def to_device(self):
- """Load the model to device."""
- import tensorflow as tf
-
- tf.config.experimental.set_visible_devices(devices=self.device)
-
-
-class OnnxDevice(BaseDevice):
- """
- :class:`OnnxDevice` implements the base class for the executors using :mod:`onnxruntime` library. The common setups go into this class.
-
- To implement your own executor with the :mod:`onnxruntime` library,
-
- .. highlight:: python
- .. code-block:: python
-
- class MyAwesomeOnnxEncoder(BaseOnnxDeviceHandler):
- def __init__(self, output_feature, model_path, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.outputs_name = output_feature
- self.model_path = model_path
-
- def post_init(self):
- import onnxruntime
- self.model = onnxruntime.InferenceSession(self.model_path, None)
- self.inputs_name = self.model.get_inputs()[0].name
- self.to_device(self.model)
-
- def encode(self, data, *args, **kwargs):
- # use your awesome model to encode/craft/score
- results = []
- for idx in data:
- data_encoded, *_ = self.model.run(
- [self.outputs_name, ], {self.inputs_name: data})
- results.append(data_encoded)
- return np.concatenate(results, axis=0)
-
- """
-
- @cached_property
- def device(self):
- """
- Set the device on which the executors using :mod:`onnxruntime` library will be running.
-
- ..notes:
- In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use,
- please use the environment variable `CUDA_VISIBLE_DEVICES`.
- """
- return ['CUDAExecutionProvider'] if self.on_gpu else ['CPUExecutionProvider']
-
- def to_device(self, model, *args, **kwargs):
- """Load the model to device."""
- model.set_providers(self.device)
-
-
-class FaissDevice(BaseDevice):
- """:class:`FaissDevice` implements the base class for the executors using :mod:`faiss` library. The common setups go into this class."""
-
- @cached_property
- def device(self):
- """
- Set the device on which the executors using :mod:`faiss` library will be running.
-
- ..notes:
- In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use,
- please use the environment variable `CUDA_VISIBLE_DEVICES`.
- """
- import faiss
-
- # For now, consider only one GPU, do not distribute the index
- return faiss.StandardGpuResources() if self.on_gpu else None
-
- def to_device(self, index, *args, **kwargs):
- """Load the model to device."""
- import faiss
-
- device = self.device
- return (
- faiss.index_cpu_to_gpu(device, 0, index, None)
- if device is not None
- else index
- )
-
-
-class MindsporeDevice(BaseDevice):
- """:class:`MindsporeDevice` implements the base classes for the executors using :mod:`mindspore` library. The common setups go into this class."""
-
- @cached_property
- def device(self):
- """
- Set the device on which the executors using :mod:`mindspore` library will be running.
-
- ..notes:
- In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use,
- please use the environment variable `CUDA_VISIBLE_DEVICES`.
- """
- return 'GPU' if self.on_gpu else 'CPU'
-
- def to_device(self):
- """Load the model to device."""
- import mindspore.context as context
-
- context.set_context(mode=context.GRAPH_MODE, device_target=self.device)
diff --git a/jina/executors/encoders/__init__.py b/jina/executors/encoders/__init__.py
deleted file mode 100644
index bbc6a07074bbd..0000000000000
--- a/jina/executors/encoders/__init__.py
+++ /dev/null
@@ -1,88 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-
-from .. import BaseExecutor
-
-if False:
- # fix type-hint complain for sphinx and flake
- from typing import TypeVar
- import numpy as np
- import scipy
- import tensorflow as tf
- import torch
-
- EncodingType = TypeVar(
- 'EncodingType',
- np.ndarray,
- scipy.sparse.csr_matrix,
- scipy.sparse.coo_matrix,
- scipy.sparse.bsr_matrix,
- scipy.sparse.csc_matrix,
- torch.sparse_coo_tensor,
- tf.SparseTensor,
- )
-
-
-class BaseEncoder(BaseExecutor):
- """``BaseEncoder`` encodes chunk into vector representation.
-
- The key function is :func:`encode`.
-
- .. seealso::
- :mod:`jina.drivers.encode`
- """
-
- def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'EncodingType':
- """Encode the data, needs to be implemented in subclass.
- :param content: the data to be encoded
- :param args: additional positional arguments
- :param kwargs: additional key-value arguments
- """
-
- raise NotImplementedError
-
-
-class BaseNumericEncoder(BaseEncoder):
- """BaseNumericEncoder encodes data from a ndarray, potentially B x ([T] x D) into a ndarray of B x D"""
-
- def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'EncodingType':
- """
- :param content: a `B x ([T] x D)` numpy ``ndarray``, `B` is the size of the batch
- :param args: additional positional arguments
- :param kwargs: additional key-value arguments
- """
- raise NotImplementedError
-
-
-class BaseImageEncoder(BaseNumericEncoder):
- """BaseImageEncoder encodes data from a ndarray, potentially B x (Height x Width) into a ndarray of B x D"""
-
- pass
-
-
-class BaseVideoEncoder(BaseNumericEncoder):
- """BaseVideoEncoder encodes data from a ndarray, potentially B x (Time x Height x Width) into a ndarray of B x D"""
-
- pass
-
-
-class BaseAudioEncoder(BaseNumericEncoder):
- """BaseAudioEncoder encodes data from a ndarray, potentially B x (Time x D) into a ndarray of B x D"""
-
- pass
-
-
-class BaseTextEncoder(BaseEncoder):
- """
- BaseTextEncoder encodes data from an array of string type (data.dtype.kind == 'U') of size B into a ndarray of B x D.
- """
-
- def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'EncodingType':
- """
-
- :param content: an 1d array of string type (data.dtype.kind == 'U') in size B
- :param args: additional positional arguments
- :param kwargs: additional key-value arguments
- """
- raise NotImplementedError
diff --git a/jina/executors/encoders/frameworks.py b/jina/executors/encoders/frameworks.py
deleted file mode 100644
index d8e82042bd510..0000000000000
--- a/jina/executors/encoders/frameworks.py
+++ /dev/null
@@ -1,164 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import os
-from typing import Optional
-
-from . import BaseEncoder
-from ..devices import OnnxDevice, PaddleDevice, TorchDevice, TFDevice, MindsporeDevice
-from ...excepts import ModelCheckpointNotExist
-from ...helper import is_url, cached_property
-
-
-# mixin classes go first, base classes are read from right to left.
-class BaseOnnxEncoder(OnnxDevice, BaseEncoder):
- """
- :class:`BasePaddleEncoder` is the base class for implementing Encoders with models from :mod:`onnxruntime` library.
-
- :param output_feature: the name of the layer for feature extraction.
- :param model_path: the path of the model in the format of `.onnx`. Check a list of available pretrained
- models at https://github.com/onnx/models#image_classification and download the git LFS to your local path.
- The ``model_path`` is the local path of the ``.onnx`` file, e.g. ``/tmp/onnx/mobilenetv2-1.0.onnx``.
- """
-
- def __init__(
- self,
- output_feature: Optional[str] = None,
- model_path: Optional[str] = None,
- *args,
- **kwargs,
- ):
- super().__init__(*args, **kwargs)
- self.outputs_name = output_feature
- self.raw_model_path = model_path
-
- def post_init(self):
- """
- Load the model from the `.onnx` file and add outputs for the selected layer, i.e. ``outputs_name``. The modified
- models is saved at `tmp_model_path`.
- """
- super().post_init()
- model_name = self.raw_model_path.split('/')[-1] if self.raw_model_path else None
- tmp_model_path = (
- self.get_file_from_workspace(f'{model_name}.tmp') if model_name else None
- )
- raw_model_path = self.raw_model_path
- if self.raw_model_path and is_url(self.raw_model_path):
- import urllib.request
-
- download_path, *_ = urllib.request.urlretrieve(self.raw_model_path)
- raw_model_path = download_path
- self.logger.info(f'download the model at {self.raw_model_path}')
- if tmp_model_path and not os.path.exists(tmp_model_path) and self.outputs_name:
- self._append_outputs(raw_model_path, self.outputs_name, tmp_model_path)
- self.logger.info(
- f'save the model with outputs [{self.outputs_name}] at {tmp_model_path}'
- )
-
- if tmp_model_path and os.path.exists(tmp_model_path):
- import onnxruntime
-
- self.model = onnxruntime.InferenceSession(tmp_model_path, None)
- self.inputs_name = self.model.get_inputs()[0].name
- self._device = None
- self.to_device(self.model)
- else:
- raise ModelCheckpointNotExist(f'model at {tmp_model_path} does not exist')
-
- @staticmethod
- def _append_outputs(inputs, outputs_name_to_append, output_fn):
- import onnx
-
- model = onnx.load(inputs)
- feature_map = onnx.helper.ValueInfoProto()
- feature_map.name = outputs_name_to_append
- model.graph.output.append(feature_map)
- onnx.save(model, output_fn)
-
-
-class BaseTFEncoder(TFDevice, BaseEncoder):
- """:class:`BasePaddleEncoder` is the base class for implementing Encoders with models from :mod:`tensorflow` library."""
-
- pass
-
-
-class BaseTorchEncoder(TorchDevice, BaseEncoder):
- """Base encoder class for :mod:`pytorch` library."""
-
- pass
-
-
-class BasePaddleEncoder(PaddleDevice, BaseEncoder):
- """:class:`BasePaddleEncoder` is the base class for implementing Encoders with models from :mod:`paddlepaddle` library."""
-
- pass
-
-
-class BaseMindsporeEncoder(MindsporeDevice, BaseEncoder):
- """
- :class:`BaseMindsporeEncoder` is the base class for implementing Encoders with models from `mindspore`.
-
- To implement your own executor with the :mod:`mindspore` lilbrary,
-
- .. highlight:: python
- .. code-block:: python
- import mindspore.nn as nn
-
- class YourAwesomeModel(nn.Cell):
- def __init__(self):
- ...
-
- def construct(self, x):
- ...
-
- class YourAwesomeEncoder(BaseMindsporeEncoder):
- def encode(self, data, *args, **kwargs):
- from mindspore import Tensor
- return self.model(Tensor(data)).asnumpy()
-
- def get_cell(self):
- return YourAwesomeModel()
-
- :param model_path: the path of the model's checkpoint.
- :param args: additional arguments
- :param kwargs: additional key value arguments
- """
-
- def __init__(self, model_path: Optional[str] = None, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.model_path = model_path
-
- def post_init(self):
- """
- Load the model from the `.ckpt` checkpoint.
- """
- super().post_init()
- if self.model_path and os.path.exists(self.model_path):
- self.to_device()
- from mindspore.train.serialization import (
- load_checkpoint,
- load_param_into_net,
- )
-
- _param_dict = load_checkpoint(ckpt_file_name=self.model_path)
- load_param_into_net(self.model, _param_dict)
- else:
- raise ModelCheckpointNotExist(f'model {self.model_path} does not exist')
-
- @cached_property
- def model(self):
- """
- Get the Mindspore Neural Networks Cells.
- :return: model property
- """
- return self.get_cell()
-
- def get_cell(self):
- """
- Return Mindspore Neural Networks Cells.
-
- Pre-defined building blocks or computing units to construct Neural Networks.
- A ``Cell`` could be a single neural network cell, such as conv2d, relu, batch_norm, etc.
- or a composition of cells to constructing a network.
- """
- raise NotImplementedError
diff --git a/jina/executors/encoders/multimodal/__init__.py b/jina/executors/encoders/multimodal/__init__.py
deleted file mode 100644
index 914bb9c1b0c56..0000000000000
--- a/jina/executors/encoders/multimodal/__init__.py
+++ /dev/null
@@ -1,31 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Sequence
-
-import numpy as np
-
-from ... import BaseExecutor
-
-
-class BaseMultiModalEncoder(BaseExecutor):
- """
- :class:`BaseMultiModalEncoder` encodes data from multiple inputs (``text``, ``buffer``, ``blob`` or other ``embeddings``)
- into a single ``embedding``
- """
-
- def __init__(self, positional_modality: Sequence[str], *args, **kwargs):
- """
- :param positional_modality: the list of arguments indicating in which order the modalities they need to come
- for the encoding method
- :return:
- """
- super().__init__(*args, **kwargs)
- self.positional_modality = positional_modality
-
- def encode(self, *data: 'np.ndarray', **kwargs) -> 'np.ndarray':
- """
- :param: data: M arguments of shape `B x (D)` numpy ``ndarray``, `B` is the size of the batch, `M` is the number of modalities
- :return: a `B x D` numpy ``ndarray``
- """
- raise NotImplementedError
diff --git a/jina/executors/encoders/numeric/__init__.py b/jina/executors/encoders/numeric/__init__.py
deleted file mode 100644
index c953b94590610..0000000000000
--- a/jina/executors/encoders/numeric/__init__.py
+++ /dev/null
@@ -1,42 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Optional
-
-import numpy as np
-
-from .. import BaseNumericEncoder
-from ...decorators import batching
-
-
-class TransformEncoder(BaseNumericEncoder):
- """
- :class:`TransformEncoder` encodes data from an ndarray in size `B x T` into an ndarray in size `B x D`
-
- :param model_path: path from where to pickle the sklearn model.
- :param args: Extra positional arguments to be set
- :param kwargs: Extra keyword arguments to be set
- """
-
- def __init__(self, model_path: Optional[str] = None, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.model_path = model_path
-
- def post_init(self) -> None:
- """Load the model from path if :param:`model_path` is set."""
- import pickle
-
- self.model = None
- if self.model_path:
- with open(self.model_path, 'rb') as model_file:
- self.model = pickle.load(model_file)
-
- @batching
- def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray':
- """
- :param content: a `B x T` numpy ``ndarray``, `B` is the size of the batch
- :return: a `B x D` numpy ``ndarray``
- :param args: Extra positional arguments to be set
- :param kwargs: Extra keyword arguments to be set
- """
- return self.model.transform(content)
diff --git a/jina/executors/evaluators/__init__.py b/jina/executors/evaluators/__init__.py
deleted file mode 100644
index 93ba4ac9db4d7..0000000000000
--- a/jina/executors/evaluators/__init__.py
+++ /dev/null
@@ -1,85 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Any
-
-from .running_stats import RunningStats
-from .. import BaseExecutor
-from ..compound import CompoundExecutor
-
-
-class BaseEvaluator(BaseExecutor):
- """A :class:`BaseEvaluator` is used to evaluate different messages coming from any kind of executor"""
-
- metric = '' #: Get the name of the evaluation metric
-
- def post_init(self):
- """Initialize running stats."""
- super().post_init()
- self._running_stats = RunningStats()
-
- def evaluate(self, actual: Any, desired: Any, *args, **kwargs) -> float:
- """Evaluates difference between param:`actual` and `param:desired`, needs to be implemented in subclass."""
- raise NotImplementedError
-
- @property
- def mean(self) -> float:
- """Get the running mean."""
- return self._running_stats.mean
-
- @property
- def std(self) -> float:
- """Get the running standard variance."""
- return self._running_stats.std
-
- @property
- def variance(self) -> float:
- """Get the running variance."""
- return self._running_stats.variance
-
-
-class FileBasedEvaluator(CompoundExecutor):
-
- """A Frequently used pattern for combining A :class:`BinaryPbIndexer` and :class:`BaseEvaluator`.
- It will be equipped with predefined ``requests.on`` behaviors:
-
- - At evaluation time(query or index)
- - 1. Checks for the incoming document, gets its value from the `BinaryPbIndexer` and fills the `groundtruth of the request
- - 2. Filter the documents that do not have a corresponding groundtruth
- - 3. The BaseEvaluator works as if the `groundtruth` had been provided by the client as it comes in the request.
-
- .. warning::
- The documents that are not found to have an indexed groundtruth are removed from the `request` so that the `Evaluator` only
- works with documents which have groundtruth.
-
- One can use the :class:`FileBasedEvaluator` via
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- !FileBasedEvaluator
- components:
- - !BinaryPbIndexer
- with:
- index_filename: ground_truth.gz
- metas:
- name: groundtruth_index # a customized name
- workspace: ${{TEST_WORKDIR}}
- - !BaseEvaluator
-
- Without defining any ``requests.on`` logic. When load from this YAML, it will be auto equipped with
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- on:
- [SearchRequest, IndexRequest]:
- - !LoadGroundTruthDriver
- with:
- executor: BaseKVIndexer
- - !BaseEvaluateDriver
- with:
- executor: BaseEvaluator
- ControlRequest:
- - !ControlReqDriver {}
- """
diff --git a/jina/executors/evaluators/embedding/__init__.py b/jina/executors/evaluators/embedding/__init__.py
deleted file mode 100644
index 97bdd8176da0b..0000000000000
--- a/jina/executors/evaluators/embedding/__init__.py
+++ /dev/null
@@ -1,34 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import numpy as np
-
-from .. import BaseEvaluator
-
-
-class BaseEmbeddingEvaluator(BaseEvaluator):
- """A :class:`BaseEmbeddingEvaluator` evaluates the difference between actual and desired embeddings"""
-
- def evaluate(
- self, actual: 'np.array', desired: 'np.array', *args, **kwargs
- ) -> float:
- """ "
- :param actual: the embedding of the document (resulting from an Encoder)
- :param desired: the expected embedding of the document
- :return the evaluation metric value for the request document
- """
- raise NotImplementedError
-
-
-def expand_vector(vec):
- """
- Expand 1d vector with one dimension axis == 0.
-
- :param vec: Vector to be expanded.
- :return: Expanded vector,
- """
- if not isinstance(vec, np.ndarray):
- vec = np.array(vec)
- if len(vec.shape) == 1:
- vec = np.expand_dims(vec, 0)
- return vec
diff --git a/jina/executors/evaluators/embedding/cosine.py b/jina/executors/evaluators/embedding/cosine.py
deleted file mode 100644
index 4e6436954b5db..0000000000000
--- a/jina/executors/evaluators/embedding/cosine.py
+++ /dev/null
@@ -1,63 +0,0 @@
-import numpy as np
-
-from ..embedding import BaseEmbeddingEvaluator, expand_vector
-
-
-class CosineEvaluator(BaseEmbeddingEvaluator):
- """A :class:`CosineEvaluator` evaluates the distance between actual and desired embeddings computing
- the cosine distance between them. (The smaller value the closest distance, it is not cosine similarity measure)
-
- .. math::
-
- 1 - \\frac{u \\cdot v}
- {||u||_2 ||v||_2}.
- """
-
- metric = 'CosineDistance'
-
- def evaluate(
- self, actual: 'np.array', desired: 'np.array', *args, **kwargs
- ) -> float:
- """ "
- :param actual: the embedding of the document (resulting from an Encoder)
- :param desired: the expected embedding of the document
- :return the evaluation metric value for the request document
- """
- actual = expand_vector(actual)
- desired = expand_vector(desired)
- return _cosine(_ext_A(_norm(actual)), _ext_B(_norm(desired)))
-
-
-# duplicate on purpose, to be migrated to the Hub
-def _get_ones(x, y):
- return np.ones((x, y))
-
-
-def _ext_A(A):
- nA, dim = A.shape
- A_ext = _get_ones(nA, dim * 3)
- A_ext[:, dim : 2 * dim] = A
- A_ext[:, 2 * dim :] = A ** 2
- return A_ext
-
-
-def _ext_B(B):
- nB, dim = B.shape
- B_ext = _get_ones(dim * 3, nB)
- B_ext[:dim] = (B ** 2).T
- B_ext[dim : 2 * dim] = -2.0 * B.T
- del B
- return B_ext
-
-
-def _euclidean(A_ext, B_ext):
- sqdist = A_ext.dot(B_ext).clip(min=0)
- return np.sqrt(sqdist)
-
-
-def _norm(A):
- return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True)
-
-
-def _cosine(A_norm_ext, B_norm_ext):
- return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2
diff --git a/jina/executors/evaluators/embedding/euclidean.py b/jina/executors/evaluators/embedding/euclidean.py
deleted file mode 100644
index 44cae756342cb..0000000000000
--- a/jina/executors/evaluators/embedding/euclidean.py
+++ /dev/null
@@ -1,58 +0,0 @@
-import numpy as np
-
-from ..embedding import BaseEmbeddingEvaluator, expand_vector
-
-
-class EuclideanEvaluator(BaseEmbeddingEvaluator):
- """A :class:`EuclideanEvaluator` evaluates the distance between actual and desired embeddings computing
- the euclidean distance between them
- """
-
- metric = 'EuclideanDistance'
-
- def evaluate(
- self, actual: 'np.array', desired: 'np.array', *args, **kwargs
- ) -> float:
- """ "
- :param actual: the embedding of the document (resulting from an Encoder)
- :param desired: the expected embedding of the document
- :return the evaluation metric value for the request document
- """
- actual = expand_vector(actual)
- desired = expand_vector(desired)
- return _euclidean(_ext_A(actual), _ext_B(desired))
-
-
-# duplicate on purpose, to be migrated to the Hub
-def _get_ones(x, y):
- return np.ones((x, y))
-
-
-def _ext_A(A):
- nA, dim = A.shape
- A_ext = _get_ones(nA, dim * 3)
- A_ext[:, dim : 2 * dim] = A
- A_ext[:, 2 * dim :] = A ** 2
- return A_ext
-
-
-def _ext_B(B):
- nB, dim = B.shape
- B_ext = _get_ones(dim * 3, nB)
- B_ext[:dim] = (B ** 2).T
- B_ext[dim : 2 * dim] = -2.0 * B.T
- del B
- return B_ext
-
-
-def _euclidean(A_ext, B_ext):
- sqdist = A_ext.dot(B_ext).clip(min=0)
- return np.sqrt(sqdist)
-
-
-def _norm(A):
- return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True)
-
-
-def _cosine(A_norm_ext, B_norm_ext):
- return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2
diff --git a/jina/executors/evaluators/rank/__init__.py b/jina/executors/evaluators/rank/__init__.py
deleted file mode 100644
index afe858bbebf01..0000000000000
--- a/jina/executors/evaluators/rank/__init__.py
+++ /dev/null
@@ -1,22 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Sequence, Any
-
-from .. import BaseEvaluator
-
-
-class BaseRankingEvaluator(BaseEvaluator):
- """A :class:`BaseRankingEvaluator` evaluates the content of matches against the expected GroundTruth.
- It is used to evaluate messages coming out from Indexers and Rankers and compares matches with groundtruths
- """
-
- def evaluate(
- self, actual: Sequence[Any], desired: Sequence[Any], *args, **kwargs
- ) -> float:
- """ "
- :param actual: the matched document identifiers from the request as matched by jina indexers and rankers
- :param desired: the expected documents matches ids sorted as they are expected
- :return the evaluation metric value for the request document
- """
- raise NotImplementedError
diff --git a/jina/executors/evaluators/rank/precision.py b/jina/executors/evaluators/rank/precision.py
deleted file mode 100644
index 85b7f8b7b2b8c..0000000000000
--- a/jina/executors/evaluators/rank/precision.py
+++ /dev/null
@@ -1,31 +0,0 @@
-from typing import Sequence, Any, Optional
-
-from ..rank import BaseRankingEvaluator
-
-
-class PrecisionEvaluator(BaseRankingEvaluator):
- """A :class:`PrecisionEvaluator` evaluates the Precision of the search.
- It computes how many of the first given `eval_at` matches are found in the groundtruth
- """
-
- def __init__(self, eval_at: Optional[int] = None, *args, **kwargs):
- """ "
- :param eval_at: the point at which evaluation is computed, if None give, will consider all the input to evaluate
- """
- super().__init__(*args, **kwargs)
- self.eval_at = eval_at
-
- def evaluate(
- self, actual: Sequence[Any], desired: Sequence[Any], *args, **kwargs
- ) -> float:
- """ "
- :param actual: the matched document identifiers from the request as matched by jina indexers and rankers
- :param desired: the expected documents matches ids sorted as they are expected
- :return the evaluation metric value for the request document
- """
- if self.eval_at == 0:
- return 0.0
- actual_at_k = actual[: self.eval_at] if self.eval_at else actual
- ret = len(set(actual_at_k).intersection(set(desired)))
- sub = len(actual_at_k)
- return ret / sub if sub != 0 else 0.0
diff --git a/jina/executors/evaluators/rank/recall.py b/jina/executors/evaluators/rank/recall.py
deleted file mode 100644
index a363d54506677..0000000000000
--- a/jina/executors/evaluators/rank/recall.py
+++ /dev/null
@@ -1,30 +0,0 @@
-from typing import Sequence, Any, Optional
-
-from . import BaseRankingEvaluator
-
-
-class RecallEvaluator(BaseRankingEvaluator):
- """A :class:`RecallEvaluator` evaluates the Precision of the search.
- It computes how many of the first given `eval_at` groundtruth are found in the matches
- """
-
- def __init__(self, eval_at: Optional[int] = None, *args, **kwargs):
- """ "
- :param eval_at: the point at which evaluation is computed, if None give, will consider all the input to evaluate
- """
- super().__init__(*args, **kwargs)
- self.eval_at = eval_at
-
- def evaluate(
- self, actual: Sequence[Any], desired: Sequence[Any], *args, **kwargs
- ) -> float:
- """ "
- :param actual: the matched document identifiers from the request as matched by jina indexers and rankers
- :param desired: the expected documents matches ids sorted as they are expected
- :return the evaluation metric value for the request document
- """
- if self.eval_at == 0:
- return 0.0
- actual_at_k = actual[: self.eval_at] if self.eval_at else actual
- ret = len(set(actual_at_k).intersection(set(desired)))
- return ret / len(desired)
diff --git a/jina/executors/evaluators/running_stats.py b/jina/executors/evaluators/running_stats.py
deleted file mode 100644
index 72979d8a0213e..0000000000000
--- a/jina/executors/evaluators/running_stats.py
+++ /dev/null
@@ -1,46 +0,0 @@
-"""Decorators and wrappers designed for wrapping :class:`BaseExecutor` functions. """
-
-from math import sqrt
-
-
-class RunningStats:
- """Computes running mean and standard deviation"""
-
- def __init__(self):
- """Constructor."""
- self._n = 0
- self._m = None
- self._s = None
-
- def clear(self):
- """Reset the stats."""
- self._n = 0.0
-
- @property
- def mean(self):
- """Get the running mean."""
- return self._m if self._n else 0.0
-
- @property
- def variance(self):
- """Get the running variance."""
- return self._s / self._n if self._n else 0.0
-
- @property
- def std(self):
- """Get the standard variance."""
- return sqrt(self.variance)
-
- def __add__(self, x: float):
- self._n += 1
- if self._n == 1:
- self._m = x
- self._s = 0.0
- else:
- prev_m = self._m
- self._m += (x - self._m) / self._n
- self._s += (x - prev_m) * (x - self._m)
- return self
-
- def __str__(self):
- return f'mean={self.mean:2.4f}, std={self.std:2.4f}'
diff --git a/jina/executors/evaluators/text/__init__.py b/jina/executors/evaluators/text/__init__.py
deleted file mode 100644
index 9d17747869ffd..0000000000000
--- a/jina/executors/evaluators/text/__init__.py
+++ /dev/null
@@ -1,18 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Any
-
-from .. import BaseEvaluator
-
-
-class BaseTextEvaluator(BaseEvaluator):
- """A :class:`BaseTextEvaluator` evaluates the difference between actual and desired text"""
-
- def evaluate(self, actual: Any, desired: Any, *args, **kwargs) -> float:
- """ "
- :param actual: the content of the document
- :param desired: the expected content of the document
- :return the evaluation metric value for the request document
- """
- raise NotImplementedError
diff --git a/jina/executors/evaluators/text/length.py b/jina/executors/evaluators/text/length.py
deleted file mode 100644
index 1207f7c2b0097..0000000000000
--- a/jina/executors/evaluators/text/length.py
+++ /dev/null
@@ -1,18 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from . import BaseTextEvaluator
-
-
-class TextLengthEvaluator(BaseTextEvaluator):
- """A :class:`TextLengthEvaluator` evaluates the different lengths between actual and desired text"""
-
- metric = 'LengthDiff'
-
- def evaluate(self, actual: str, desired: str, *args, **kwargs) -> float:
- """ "
- :param actual: the text of the document
- :param desired: the expected text of the document
- :return the evaluation metric value for the request document
- """
- return abs(len(actual) - len(desired))
diff --git a/jina/executors/indexers/__init__.py b/jina/executors/indexers/__init__.py
deleted file mode 100644
index 99246d7c281f7..0000000000000
--- a/jina/executors/indexers/__init__.py
+++ /dev/null
@@ -1,456 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import os
-from typing import Tuple, Optional, Any, Iterable
-
-import numpy as np
-
-from .. import BaseExecutor
-from ..compound import CompoundExecutor
-from ...helper import call_obj_fn, cached_property, get_readable_size
-
-if False:
- from typing import TypeVar
- import scipy
- import tensorflow as tf
- import torch
-
- EncodingType = TypeVar(
- 'EncodingType',
- np.ndarray,
- scipy.sparse.csr_matrix,
- scipy.sparse.coo_matrix,
- scipy.sparse.bsr_matrix,
- scipy.sparse.csc_matrix,
- torch.sparse_coo_tensor,
- tf.SparseTensor,
- )
-
-
-class BaseIndexer(BaseExecutor):
- """Base class for storing and searching any kind of data structure.
-
- The key functions here are :func:`add` and :func:`query`.
- One can decorate them with :func:`jina.helper.batching` and :func:`jina.logging.profile.profiling`.
-
- One should always inherit from either :class:`BaseVectorIndexer` or :class:`BaseKVIndexer`.
-
- .. seealso::
- :mod:`jina.drivers.handlers.index`
-
- .. note::
- Calling :func:`save` to save a :class:`BaseIndexer` will create
- more than one files. One is the serialized version of the :class:`BaseIndexer` object, often ends with ``.bin``
-
- .. warning::
- When using :class:`BaseIndexer` out of the Pod, use it with context manager
-
- .. highlight:: python
- .. code-block:: python
-
- with BaseIndexer() as b:
- b.add()
-
- So that it can safely save the data. Or you have to manually call `b.close()` to close the indexer safely.
-
- :param index_filename: the name of the file for storing the index, when not given metas.name is used.
- :param args: Additional positional arguments which are just used for the parent initialization
- :param kwargs: Additional keyword arguments which are just used for the parent initialization
- """
-
- def __init__(
- self,
- index_filename: Optional[str] = None,
- key_length: int = 36,
- *args,
- **kwargs,
- ):
- super().__init__(*args, **kwargs)
- self.index_filename = (
- index_filename #: the file name of the stored index, no path is required
- )
- self.key_length = key_length #: the default minimum length of the key, will be expanded one time on the first batch
- self._size = 0
-
- def add(self, *args, **kwargs):
- """
- Add documents to the index.
-
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def update(self, *args, **kwargs):
- """
- Update documents on the index.
-
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def delete(self, *args, **kwargs):
- """
- Delete documents from the index.
-
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def post_init(self):
- """query handler and write handler can not be serialized, thus they must be put into :func:`post_init`. """
- self.index_filename = self.index_filename or self.name
- self.handler_mutex = True #: only one handler at a time by default
- self.is_handler_loaded = False
-
- def query(self, *args, **kwargs):
- """
- Query documents from the index.
-
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- @property
- def index_abspath(self) -> str:
- """
- Get the file path of the index storage
-
- :return: absolute path
- """
- return self.get_file_from_workspace(self.index_filename)
-
- @cached_property
- def query_handler(self):
- """A readable and indexable object, could be dict, map, list, numpy array etc.
-
- :return: read handler
-
- .. note::
- :attr:`query_handler` and :attr:`write_handler` are by default mutex
- """
- r = None
- if not self.handler_mutex or not self.is_handler_loaded:
- r = self.get_query_handler()
- if r is None:
- self.logger.warning(
- f'you can not query from {self} as its "query_handler" is not set. '
- 'If you are indexing data from scratch then it is fine. '
- 'If you are querying data then the index file must be empty or broken.'
- )
- else:
- self.logger.info(f'indexer size: {self.size}')
- self.is_handler_loaded = True
- if r is None:
- r = self.null_query_handler
- return r
-
- @cached_property
- def null_query_handler(self) -> Optional[Any]:
- """The empty query handler when :meth:`get_query_handler` fails
-
- :return: nothing
- """
- return
-
- @property
- def is_exist(self) -> bool:
- """
- Check if the database is exist or not
-
- :return: true if the absolute index path exists, else false
- """
- return os.path.exists(self.index_abspath)
-
- @cached_property
- def write_handler(self):
- """A writable and indexable object, could be dict, map, list, numpy array etc.
-
- :return: write handler
-
- .. note::
- :attr:`query_handler` and :attr:`write_handler` are by default mutex
- """
-
- # ! a || ( a && b )
- # =
- # ! a || b
- if not self.handler_mutex or not self.is_handler_loaded:
- r = self.get_add_handler() if self.is_exist else self.get_create_handler()
-
- if r is None:
- self.logger.warning(
- '"write_handler" is None, you may not add data to this index, '
- 'unless "write_handler" is later assigned with a meaningful value'
- )
- else:
- self.is_handler_loaded = True
- return r
-
- def get_query_handler(self):
- """Get a *readable* index handler when the ``index_abspath`` already exist, need to be overridden"""
- raise NotImplementedError
-
- def get_add_handler(self):
- """Get a *writable* index handler when the ``index_abspath`` already exist, need to be overridden"""
- raise NotImplementedError
-
- def get_create_handler(self):
- """Get a *writable* index handler when the ``index_abspath`` does not exist, need to be overridden"""
- raise NotImplementedError
-
- @property
- def size(self) -> int:
- """
- The number of vectors or documents indexed.
-
- :return: size
- """
- return self._size
-
- def __getstate__(self):
- d = super().__getstate__()
- self.flush()
- return d
-
- def close(self):
- """Close all file-handlers and release all resources. """
- self.logger.info(
- f'indexer size: {self.size} physical size: {get_readable_size(self.physical_size)}'
- )
- self.flush()
- call_obj_fn(self.write_handler, 'close')
- call_obj_fn(self.query_handler, 'close')
- super().close()
-
- def flush(self):
- """Flush all buffered data to ``index_abspath`` """
- try:
- # It may have already been closed by the Pea using context manager
- call_obj_fn(self.write_handler, 'flush')
- except:
- pass
-
- def _filter_nonexistent_keys_values(
- self, keys: Iterable, values: Iterable, existent_keys: Iterable
- ) -> Tuple[Iterable, Iterable]:
- f = [(key, value) for key, value in zip(keys, values) if key in existent_keys]
- if f:
- return zip(*f)
- else:
- return None, None
-
- def _filter_nonexistent_keys(
- self, keys: Iterable, existent_keys: Iterable
- ) -> Iterable:
- return [key for key in keys if key in set(existent_keys)]
-
- def sample(self):
- """Return a sample from this indexer, useful in sanity check """
- raise NotImplementedError
-
- def __iter__(self):
- """Iterate over all entries in this indexer. """
- raise NotImplementedError
-
-
-class BaseVectorIndexer(BaseIndexer):
- """An abstract class for vector indexer. It is equipped with drivers in ``requests.on``
-
- All vector indexers should inherit from it.
-
- It can be used to tell whether an indexer is vector indexer, via ``isinstance(a, BaseVectorIndexer)``
- """
-
- embedding_cls_type = 'dense'
-
- def query_by_key(self, keys: Iterable[str], *args, **kwargs) -> 'np.ndarray':
- """Get the vectors by id, return a subset of indexed vectors
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def add(
- self, keys: Iterable[str], vectors: 'EncodingType', *args, **kwargs
- ) -> None:
- """Add new chunks and their vector representations
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param vectors: vector representations in B x D
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def query(
- self, vectors: 'EncodingType', top_k: int, *args, **kwargs
- ) -> Tuple['np.ndarray', 'np.ndarray']:
- """Find k-NN using query vectors, return chunk ids and chunk scores
-
- :param vectors: query vectors in ndarray, shape B x D
- :param top_k: int, the number of nearest neighbour to return
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def update(
- self, keys: Iterable[str], vectors: 'EncodingType', *args, **kwargs
- ) -> None:
- """Update vectors on the index.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param vectors: vector representations in B x D
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def delete(self, keys: Iterable[str], *args, **kwargs) -> None:
- """Delete vectors from the index.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
-
-class BaseKVIndexer(BaseIndexer):
- """An abstract class for key-value indexer.
-
- All key-value indexers should inherit from it.
-
- It can be used to tell whether an indexer is key-value indexer, via ``isinstance(a, BaseKVIndexer)``
- """
-
- def add(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Add the serialized documents to the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param values: serialized documents
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def query(self, key: str, *args, **kwargs) -> Optional[bytes]:
- """Find the serialized document to the index via document id.
-
- :param key: document id
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def update(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Update the serialized documents on the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param values: serialized documents
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def delete(self, keys: Iterable[str], *args, **kwargs) -> None:
- """Delete the serialized documents from the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: Additional positional arguments
- :param kwargs: Additional keyword arguments
- """
- raise NotImplementedError
-
- def __getitem__(self, key: Any) -> Optional[bytes]:
- return self.query(key)
-
-
-class UniqueVectorIndexer(CompoundExecutor):
- """A frequently used pattern for combining a :class:`BaseVectorIndexer` and a :class:`DocCache` """
-
-
-class CompoundIndexer(CompoundExecutor):
- """A Frequently used pattern for combining A :class:`BaseVectorIndexer` and :class:`BaseKVIndexer`.
- It will be equipped with predefined ``requests.on`` behaviors:
-
- - In the index time
- - 1. stores the vector via :class:`BaseVectorIndexer`
- - 2. remove all vector information (embedding, buffer, blob, text)
- - 3. store the remained meta information via :class:`BaseKVIndexer`
- - In the query time
- - 1. Find the knn using the vector via :class:`BaseVectorIndexer`
- - 2. remove all vector information (embedding, buffer, blob, text)
- - 3. Fill in the meta information of the document via :class:`BaseKVIndexer`
-
- One can use the :class:`ChunkIndexer` via
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- !ChunkIndexer
- components:
- - !NumpyIndexer
- with:
- index_filename: vec.gz
- metas:
- name: vecidx # a customized name
- workspace: ${{TEST_WORKDIR}}
- - !BinaryPbIndexer
- with:
- index_filename: chunk.gz
- metas:
- name: chunkidx # a customized name
- workspace: ${{TEST_WORKDIR}}
- metas:
- name: chunk_compound_indexer
- workspace: ${{TEST_WORKDIR}}
-
- Without defining any ``requests.on`` logic. When load from this YAML, it will be auto equipped with
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- on:
- SearchRequest:
- - !VectorSearchDriver
- with:
- executor: BaseVectorIndexer
- - !PruneDriver
- with:
- pruned:
- - embedding
- - buffer
- - blob
- - text
- - !KVSearchDriver
- with:
- executor: BaseKVIndexer
- IndexRequest:
- - !VectorIndexDriver
- with:
- executor: BaseVectorIndexer
- - !PruneDriver
- with:
- pruned:
- - embedding
- - buffer
- - blob
- - text
- - !KVIndexDriver
- with:
- executor: BaseKVIndexer
- ControlRequest:
- - !ControlReqDriver {}
- """
diff --git a/jina/executors/indexers/cache.py b/jina/executors/indexers/cache.py
deleted file mode 100644
index b6d28c78f3593..0000000000000
--- a/jina/executors/indexers/cache.py
+++ /dev/null
@@ -1,175 +0,0 @@
-"""Indexer for caching."""
-
-import pickle
-import tempfile
-from typing import Optional, Iterable, List, Tuple, Union
-
-from jina.executors.indexers import BaseKVIndexer
-from jina.helper import deprecated_alias
-
-DATA_FIELD = 'data'
-ID_KEY = 'id'
-CONTENT_HASH_KEY = 'content_hash'
-
-
-class BaseCache(BaseKVIndexer):
- """Base class of the cache inherited :class:`BaseKVIndexer`.
-
- The difference between a cache and a :class:`BaseKVIndexer` is the ``handler_mutex`` is released in cache,
- this allows one to query-while-indexing.
-
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- def __init__(self, *args, **kwargs):
- super().__init__(*args, **kwargs)
-
- def post_init(self):
- """For Cache we need to release the handler mutex to allow RW at the same time."""
- self.handler_mutex = False
-
-
-class DocCache(BaseCache):
- """A key-value indexer that specializes in caching.
-
- Serializes the cache to two files, one for ids, one for the actually cached field.
- If fields=["id"], then the second file is redundant. The class optimizes the process
- so that there are no duplicates.
-
- Order of fields does NOT affect the caching.
-
- :param index_filename: file name for storing the cache data
- :param fields: fields to cache on (of Document)
- :param args: additional positional arguments which are just used for the parent initialization
- :param kwargs: additional key value arguments which are just used for the parent initialization
- """
-
- class CacheHandler:
- """A handler for loading and serializing the in-memory cache of the DocCache.
-
- :param path: Path to the file from which to build the actual paths.
- :param logger: Instance of logger.
- """
-
- def __init__(self, path, logger):
- self.path = path
- try:
- self.id_to_cache_val = pickle.load(open(path + '.ids', 'rb'))
- self.cache_val_to_id = pickle.load(open(path + '.cache', 'rb'))
- except FileNotFoundError as e:
- logger.warning(
- f'File path did not exist : {path}.ids or {path}.cache: {e!r}. Creating new CacheHandler...'
- )
- self.id_to_cache_val = dict()
- self.cache_val_to_id = dict()
-
- def close(self):
- """Flushes the in-memory cache to pickle files."""
- pickle.dump(self.id_to_cache_val, open(self.path + '.ids', 'wb'))
- pickle.dump(self.cache_val_to_id, open(self.path + '.cache', 'wb'))
-
- default_fields = (ID_KEY,)
-
- @deprecated_alias(field=('fields', 0))
- def __init__(
- self,
- index_filename: Optional[str] = None,
- fields: Optional[
- Union[str, Tuple[str]]
- ] = None, # str for backwards compatibility
- *args,
- **kwargs,
- ):
- if not index_filename:
- # create a new temp file if not exist
- index_filename = tempfile.NamedTemporaryFile(delete=False).name
- super().__init__(index_filename, *args, **kwargs)
- if isinstance(fields, str):
- fields = (fields,)
- # order shouldn't matter
- self.fields = sorted(fields or self.default_fields)
-
- def add(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Add a document to the cache depending.
-
- :param keys: document ids to be added
- :param values: document cache values to be added
- :param args: not used
- :param kwargs: not used
- """
- for key, value in zip(keys, values):
- self.query_handler.id_to_cache_val[key] = value
- self.query_handler.cache_val_to_id[value] = key
- self._size += 1
-
- def query(self, key: str, *args, **kwargs) -> bool:
- """Check whether the data exists in the cache.
-
- :param key: the value that we cached by (combination of the Document fields)
- :param args: not used
- :param kwargs: not used
- :return: status
- """
- return key in self.query_handler.cache_val_to_id
-
- def update(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Update cached documents.
-
- :param keys: list of Document.id
- :param values: list of values (combination of the Document fields)
- :param args: not used
- :param kwargs: not used
- """
-
- if len(self.fields) == 1 and self.fields[0] == ID_KEY:
- # if we don't cache anything else, no need
- return
-
- for key, value in zip(keys, values):
- if key not in self.query_handler.id_to_cache_val:
- continue
- old_value = self.query_handler.id_to_cache_val[key]
- self.query_handler.id_to_cache_val[key] = value
- del self.query_handler.cache_val_to_id[old_value]
- self.query_handler.cache_val_to_id[value] = key
-
- def delete(self, keys: Iterable[str], *args, **kwargs) -> None:
- """Delete documents from the cache.
-
- :param keys: list of Document.id
- :param args: not used
- :param kwargs: not used
- """
- for key in keys:
- if key not in self.query_handler.id_to_cache_val:
- continue
- value = self.query_handler.id_to_cache_val[key]
- del self.query_handler.id_to_cache_val[key]
- del self.query_handler.cache_val_to_id[value]
- self._size -= 1
-
- def get_add_handler(self):
- """Get the CacheHandler.
-
-
- .. # noqa: DAR201"""
- return self.get_query_handler()
-
- def get_query_handler(self) -> CacheHandler:
- """Get the CacheHandler.
-
-
- .. # noqa: DAR201"""
- return self.CacheHandler(self.save_abspath, self.logger)
-
- def get_create_handler(self):
- """Get the CacheHandler.
-
-
- .. # noqa: DAR201"""
- return self.get_query_handler()
diff --git a/jina/executors/indexers/dbms/__init__.py b/jina/executors/indexers/dbms/__init__.py
deleted file mode 100644
index f5599ad452837..0000000000000
--- a/jina/executors/indexers/dbms/__init__.py
+++ /dev/null
@@ -1,60 +0,0 @@
-from typing import Optional, List
-
-import numpy as np
-from jina.executors.indexers import BaseIndexer
-
-
-class BaseDBMSIndexer(BaseIndexer):
- """A class only meant for storing (indexing, update, delete) of data"""
-
- def add(
- self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs
- ):
- """Add to the DBMS Indexer, both vectors and metadata
-
- :param ids: the ids of the documents
- :param vecs: the vectors
- :param metas: the metadata, in binary format
- :param args: not used
- :param kwargs: not used
- """
- raise NotImplementedError
-
- def update(
- self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs
- ):
- """Update the DBMS Indexer, both vectors and metadata
-
- :param ids: the ids of the documents
- :param vecs: the vectors
- :param metas: the metadata, in binary format
- :param args: not used
- :param kwargs: not used
- """
- raise NotImplementedError
-
- def delete(self, ids: List[str], *args, **kwargs):
- """Delete from the indexer by ids
-
- :param ids: the ids of the Documents to delete
- :param args: not used
- :param kwargs: not used
- """
- raise NotImplementedError
-
- def dump(self, path: str, shards: int):
- """Dump the index
-
- :param path: the path to which to dump
- :param shards: the nr of shards to which to dump
- """
- raise NotImplementedError
-
- def query(self, key: str, *args, **kwargs) -> Optional[bytes]:
- """DBMSIndexers do NOT support querying
-
- :param key: the key by which to query
- :param args: not used
- :param kwargs: not used
- """
- raise NotImplementedError('DBMSIndexers do not support querying')
diff --git a/jina/executors/indexers/dbms/keyvalue.py b/jina/executors/indexers/dbms/keyvalue.py
deleted file mode 100644
index bf0df53d68a79..0000000000000
--- a/jina/executors/indexers/dbms/keyvalue.py
+++ /dev/null
@@ -1,94 +0,0 @@
-import pickle
-from typing import List, Tuple, Generator
-import numpy as np
-
-from jina import Document
-from jina.executors.indexers.dump import export_dump_streaming
-from jina.executors.indexers.dbms import BaseDBMSIndexer
-from jina.executors.indexers.keyvalue import BinaryPbWriterMixin
-
-
-class BinaryPbDBMSIndexer(BinaryPbWriterMixin, BaseDBMSIndexer):
- """A DBMS Indexer (no query method)"""
-
- def _get_generator(
- self, ids: List[str]
- ) -> Generator[Tuple[str, np.array, bytes], None, None]:
- for id_ in ids:
- vecs_metas_list_bytes = super()._query([id_])
- vec, meta = pickle.loads(vecs_metas_list_bytes[0])
- yield id_, vec, meta
-
- def dump(self, path: str, shards: int) -> None:
- """Dump the index
-
- :param path: the path to which to dump
- :param shards: the nr of shards to which to dump
- """
- self.write_handler.close()
- # noinspection PyPropertyAccess
- del self.write_handler
- self.handler_mutex = False
- ids = self.query_handler.header.keys()
- export_dump_streaming(
- path,
- shards=shards,
- size=len(ids),
- data=self._get_generator(ids),
- )
- self.query_handler.close()
- self.handler_mutex = False
- # noinspection PyPropertyAccess
- del self.query_handler
-
- def add(
- self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs
- ):
- """Add to the DBMS Indexer, both vectors and metadata
-
- :param ids: the ids of the documents
- :param vecs: the vectors
- :param metas: the metadata, in binary format
- :param args: not used
- :param kwargs: not used
- """
- if not any(ids):
- return
-
- vecs_metas = [pickle.dumps([vec, meta]) for vec, meta in zip(vecs, metas)]
- with self.write_handler as write_handler:
- self._add(ids, vecs_metas, write_handler)
-
- def update(
- self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs
- ):
- """Update the DBMS Indexer, both vectors and metadata
-
- :param ids: the ids of the documents
- :param vecs: the vectors
- :param metas: the metadata, in binary format
- :param args: not used
- :param kwargs: not used
- """
- vecs_metas = [pickle.dumps((vec, meta)) for vec, meta in zip(vecs, metas)]
- keys, vecs_metas = self._filter_nonexistent_keys_values(
- ids, vecs_metas, self.query_handler.header.keys()
- )
- del self.query_handler
- self.handler_mutex = False
- if keys:
- self._delete(keys)
- with self.write_handler as write_handler:
- self._add(keys, vecs_metas, write_handler)
-
- def delete(self, ids: List[str], *args, **kwargs):
- """Delete the serialized documents from the index via document ids.
-
- :param ids: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: not used
- :param kwargs: not used"""
- super(BinaryPbDBMSIndexer, self).delete(ids)
-
-
-class KeyValueDBMSIndexer(BinaryPbDBMSIndexer):
- """An alias"""
diff --git a/jina/executors/indexers/dump.py b/jina/executors/indexers/dump.py
deleted file mode 100644
index a7f20f600ab92..0000000000000
--- a/jina/executors/indexers/dump.py
+++ /dev/null
@@ -1,156 +0,0 @@
-import os
-import sys
-from typing import Tuple, Generator, BinaryIO, TextIO
-
-import numpy as np
-
-from jina.logging import JinaLogger
-
-BYTE_PADDING = 4
-DUMP_DTYPE = np.float64
-
-logger = JinaLogger(__name__)
-
-
-def export_dump_streaming(
- path: str,
- shards: int,
- size: int,
- data: Generator[Tuple[str, np.array, bytes], None, None],
-):
- """Export the data to a path, based on sharding,
-
- :param path: path to dump
- :param shards: the nr of shards this pea is part of
- :param size: total amount of entries
- :param data: the generator of the data (ids, vectors, metadata)
- """
- logger.info(f'Dumping {size} docs to {path} for {shards} shards')
- _handle_dump(data, path, shards, size)
-
-
-def _handle_dump(
- data: Generator[Tuple[str, np.array, bytes], None, None],
- path: str,
- shards: int,
- size: int,
-):
- if not os.path.exists(path):
- os.makedirs(path)
-
- # directory must be empty to be safe
- if not os.listdir(path):
- size_per_shard = size // shards
- extra = size % shards
- shard_range = list(range(shards))
- for shard_id in shard_range:
- if shard_id == shard_range[-1]:
- size_this_shard = size_per_shard + extra
- else:
- size_this_shard = size_per_shard
- _write_shard_data(data, path, shard_id, size_this_shard)
- else:
- raise Exception(
- f'path for dump {path} contains data. Please empty. Not dumping...'
- )
-
-
-def _write_shard_data(
- data: Generator[Tuple[str, np.array, bytes], None, None],
- path: str,
- shard_id: int,
- size_this_shard: int,
-):
- shard_path = os.path.join(path, str(shard_id))
- shard_docs_written = 0
- os.makedirs(shard_path)
- vectors_fp, metas_fp, ids_fp = _get_file_paths(shard_path)
- with open(vectors_fp, 'wb') as vectors_fh, open(metas_fp, 'wb') as metas_fh, open(
- ids_fp, 'w'
- ) as ids_fh:
- while shard_docs_written < size_this_shard:
- _write_shard_files(data, ids_fh, metas_fh, vectors_fh)
- shard_docs_written += 1
-
-
-def _write_shard_files(
- data: Generator[Tuple[str, np.array, bytes], None, None],
- ids_fh: TextIO,
- metas_fh: BinaryIO,
- vectors_fh: BinaryIO,
-):
- id_, vec, meta = next(data)
- # need to ensure compatibility to read time
- vec = vec.astype(DUMP_DTYPE)
- vec_bytes = vec.tobytes()
- vectors_fh.write(len(vec_bytes).to_bytes(BYTE_PADDING, sys.byteorder) + vec_bytes)
- metas_fh.write(len(meta).to_bytes(BYTE_PADDING, sys.byteorder) + meta)
- ids_fh.write(id_ + '\n')
-
-
-def import_vectors(path: str, pea_id: str):
- """Import id and vectors
-
- :param path: the path to the dump
- :param pea_id: the id of the pea (as part of the shards)
- :return: the generators for the ids and for the vectors
- """
- logger.info(f'Importing ids and vectors from {path} for pea_id {pea_id}')
- path = os.path.join(path, pea_id)
- ids_gen = _ids_gen(path)
- vecs_gen = _vecs_gen(path)
- return ids_gen, vecs_gen
-
-
-def import_metas(path: str, pea_id: str):
- """Import id and metadata
-
- :param path: the path of the dump
- :param pea_id: the id of the pea (as part of the shards)
- :return: the generators for the ids and for the metadata
- """
- logger.info(f'Importing ids and metadata from {path} for pea_id {pea_id}')
- path = os.path.join(path, pea_id)
- ids_gen = _ids_gen(path)
- metas_gen = _metas_gen(path)
- return ids_gen, metas_gen
-
-
-def _ids_gen(path: str):
- with open(os.path.join(path, 'ids'), 'r') as ids_fh:
- for l in ids_fh:
- yield l.strip()
-
-
-def _vecs_gen(path: str):
- with open(os.path.join(path, 'vectors'), 'rb') as vectors_fh:
- while True:
- next_size = vectors_fh.read(BYTE_PADDING)
- next_size = int.from_bytes(next_size, byteorder=sys.byteorder)
- if next_size:
- vec = np.frombuffer(
- vectors_fh.read(next_size),
- dtype=DUMP_DTYPE,
- )
- yield vec
- else:
- break
-
-
-def _metas_gen(path: str):
- with open(os.path.join(path, 'metas'), 'rb') as metas_fh:
- while True:
- next_size = metas_fh.read(BYTE_PADDING)
- next_size = int.from_bytes(next_size, byteorder=sys.byteorder)
- if next_size:
- meta = metas_fh.read(next_size)
- yield meta
- else:
- break
-
-
-def _get_file_paths(shard_path: str):
- vectors_fp = os.path.join(shard_path, 'vectors')
- metas_fp = os.path.join(shard_path, 'metas')
- ids_fp = os.path.join(shard_path, 'ids')
- return vectors_fp, metas_fp, ids_fp
diff --git a/jina/executors/indexers/keyvalue.py b/jina/executors/indexers/keyvalue.py
deleted file mode 100644
index 1941a7fe2a686..0000000000000
--- a/jina/executors/indexers/keyvalue.py
+++ /dev/null
@@ -1,352 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import mmap
-import os
-import random
-from typing import Iterable, Optional, Union, List
-
-import numpy as np
-
-from . import BaseKVIndexer
-from ..compound import CompoundExecutor
-
-HEADER_NONE_ENTRY = (-1, -1, -1)
-
-
-class _WriteHandler:
- """
- Write file handler.
-
- :param path: Path of the file.
- :param mode: Writing mode. (e.g. 'ab', 'wb')
- """
-
- def __init__(self, path, mode):
- self.path = path
- self.mode = mode
- self.body = open(self.path, self.mode)
- self.header = open(self.path + '.head', self.mode)
-
- def __enter__(self):
- if self.body.closed:
- self.body = open(self.path, self.mode)
- if self.header.closed:
- self.header = open(self.path + '.head', self.mode)
- return self
-
- def __exit__(self, exc_type, exc_val, exc_tb):
- self.flush()
-
- def close(self):
- """Close the file."""
- if not self.body.closed:
- self.body.close()
- if not self.header.closed:
- self.header.close()
-
- def flush(self):
- """Clear the body and header."""
- if not self.body.closed:
- self.body.flush()
- if not self.header.closed:
- self.header.flush()
-
-
-class _ReadHandler:
- """
- Read file handler.
-
- :param path: Path of the file.
- :param key_length: Length of key.
- """
-
- def __init__(self, path, key_length):
- self.path = path
- self.header = {}
- if os.path.exists(self.path + '.head'):
- with open(self.path + '.head', 'rb') as fp:
- tmp = np.frombuffer(
- fp.read(),
- dtype=[
- ('', (np.str_, key_length)),
- ('', np.int64),
- ('', np.int64),
- ('', np.int64),
- ],
- )
- self.header = {
- r[0]: None
- if np.array_equal((r[1], r[2], r[3]), HEADER_NONE_ENTRY)
- else (r[1], r[2], r[3])
- for r in tmp
- }
- if os.path.exists(self.path):
- self._body = open(self.path, 'r+b')
- self.body = self._body.fileno()
- else:
- raise FileNotFoundError(
- f'Path not found {self.path}. Querying will not work'
- )
- else:
- raise FileNotFoundError(
- f'Path not found {self.path + ".head"}. Querying will not work'
- )
-
- def close(self):
- """Close the file."""
- if hasattr(self, '_body'):
- if not self._body.closed:
- self._body.close()
-
-
-class _CloseHandler:
- def __init__(self, handler: Union['_WriteHandler', '_ReadHandler']):
- self.handler = handler
-
- def __enter__(self):
- return self
-
- def __exit__(self, exc_type, exc_val, exc_tb):
- if self.handler is not None:
- self.handler.close()
-
-
-class BinaryPbWriterMixin:
- """Mixing for providing the binarypb writing and reading methods"""
-
- def __init__(self, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self._start = 0
- self._page_size = mmap.ALLOCATIONGRANULARITY
-
- def get_add_handler(self) -> '_WriteHandler':
- """
- Get write file handler.
-
- :return: write handler
- """
- # keep _start position as in pickle serialization
- return _WriteHandler(self.index_abspath, 'ab')
-
- def get_create_handler(self) -> '_WriteHandler':
- """
- Get write file handler.
-
- :return: write handler.
- """
-
- self._start = 0 # override _start position
- return _WriteHandler(self.index_abspath, 'wb')
-
- def get_query_handler(self) -> '_ReadHandler':
- """
- Get read file handler.
-
- :return: read handler.
- """
- return _ReadHandler(self.index_abspath, self.key_length)
-
- def _add(
- self, keys: Iterable[str], values: Iterable[bytes], write_handler: _WriteHandler
- ):
- for key, value in zip(keys, values):
- l = len(value) #: the length
- p = (
- int(self._start / self._page_size) * self._page_size
- ) #: offset of the page
- r = (
- self._start % self._page_size
- ) #: the remainder, i.e. the start position given the offset
- # noinspection PyTypeChecker
- write_handler.header.write(
- np.array(
- (key, p, r, r + l),
- dtype=[
- ('', (np.str_, self.key_length)),
- ('', np.int64),
- ('', np.int64),
- ('', np.int64),
- ],
- ).tobytes()
- )
- self._start += l
- write_handler.body.write(value)
- self._size += 1
-
- def delete(self, keys: Iterable[str], *args, **kwargs) -> None:
- """Delete the serialized documents from the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: not used
- :param kwargs: not used
- """
- keys = self._filter_nonexistent_keys(keys, self.query_handler.header.keys())
- del self.query_handler
- self.handler_mutex = False
- if keys:
- self._delete(keys)
-
- def _delete(self, keys: Iterable[str]) -> None:
- with self.write_handler as write_handler:
- for key in keys:
- write_handler.header.write(
- np.array(
- tuple(np.concatenate([[key], HEADER_NONE_ENTRY])),
- dtype=[
- ('', (np.str_, self.key_length)),
- ('', np.int64),
- ('', np.int64),
- ('', np.int64),
- ],
- ).tobytes()
- )
- self._size -= 1
-
- def _query(self, keys: Iterable[str]) -> List[bytes]:
- query_results = []
- for key in keys:
- pos_info = self.query_handler.header.get(key, None)
- if pos_info is not None:
- p, r, l = pos_info
- with mmap.mmap(self.query_handler.body, offset=p, length=l) as m:
- query_results.append(m[r:])
- else:
- query_results.append(None)
-
- return query_results
-
-
-class BinaryPbIndexer(BinaryPbWriterMixin, BaseKVIndexer):
- """Simple Key-value indexer."""
-
- def __init__(self, delete_on_dump: bool = False, *args, **kwargs):
- super().__init__(*args, **kwargs)
- self.delete_on_dump = delete_on_dump
-
- def __getstate__(self):
- # called on pickle save
- if self.delete_on_dump:
- self._delete_invalid_indices()
- d = super().__getstate__()
- return d
-
- def _delete_invalid_indices(self):
- # make sure the file is closed before querying.
- with _CloseHandler(handler=self.write_handler):
- pass
-
- keys = []
- vals = []
- # we read the valid values and write them to the intermediary file
- with _CloseHandler(
- handler=_ReadHandler(self.index_abspath, self.key_length)
- ) as close_handler:
- for key in close_handler.handler.header.keys():
- pos_info = close_handler.handler.header.get(key, None)
- if pos_info:
- p, r, l = pos_info
- with mmap.mmap(close_handler.handler.body, offset=p, length=l) as m:
- keys.append(key)
- vals.append(m[r:])
- if len(keys) == 0:
- return
-
- # intermediary file
- tmp_file = self.index_abspath + '-tmp'
- self._start = 0
- with _CloseHandler(handler=_WriteHandler(tmp_file, 'ab')) as close_handler:
- # reset size
- self._size = 0
- self._add(keys, vals, write_handler=close_handler.handler)
-
- # replace orig. file
- # and .head file
- head_path = self.index_abspath + '.head'
- os.remove(self.index_abspath)
- os.remove(head_path)
- os.rename(tmp_file, self.index_abspath)
- os.rename(tmp_file + '.head', head_path)
-
- def add(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Add the serialized documents to the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param values: serialized documents
- :param args: extra arguments
- :param kwargs: keyword arguments
- """
- if not any(keys):
- return
-
- need_to_remove_handler = not self.is_exist
- with self.write_handler as writer_handler:
- self._add(keys, values, write_handler=writer_handler)
- if need_to_remove_handler:
- # very hacky way to ensure write_handler will use add_handler at next computation, this must be solved
- # by touching file at __init__ time
- del self.write_handler
- self.is_handler_loaded = False
-
- def sample(self) -> Optional[bytes]:
- """Return a random entry from the indexer for sanity check.
-
- :return: A random entry from the indexer.
- """
- k = random.sample(self.query_handler.header.keys(), k=1)[0]
- return self.query([k])[0]
-
- def __iter__(self):
- for k in self.query_handler.header.keys():
- yield self[k]
-
- def query(self, keys: Iterable[str], *args, **kwargs) -> Iterable[Optional[bytes]]:
- """Find the serialized document to the index via document id.
-
- :param keys: list of document ids
- :param args: extra arguments
- :param kwargs: keyword arguments
- :return: serialized documents
- """
- return self._query(keys)
-
- def update(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Update the serialized documents on the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param values: serialized documents
- :param args: extra arguments
- :param kwargs: keyword arguments
- """
- keys, values = self._filter_nonexistent_keys_values(
- keys, values, self.query_handler.header.keys()
- )
- del self.query_handler
- self.handler_mutex = False
- if keys:
- self._delete(keys)
- self.add(keys, values)
-
- def delete(self, keys: Iterable[str], *args, **kwargs) -> None:
- """Delete the serialized documents from the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: not used
- :param kwargs: not used"""
- super(BinaryPbIndexer, self).delete(keys)
-
-
-class KeyValueIndexer(BinaryPbIndexer):
- """Alias for :class:`BinaryPbIndexer` """
-
-
-class DataURIPbIndexer(BinaryPbIndexer):
- """Alias for BinaryPbIndexer"""
-
-
-class UniquePbIndexer(CompoundExecutor):
- """A frequently used pattern for combining a :class:`BaseKVIndexer` and a :class:`DocCache` """
diff --git a/jina/executors/indexers/query/__init__.py b/jina/executors/indexers/query/__init__.py
deleted file mode 100644
index d29aecbe91eac..0000000000000
--- a/jina/executors/indexers/query/__init__.py
+++ /dev/null
@@ -1,59 +0,0 @@
-from typing import Iterable, Optional, Dict
-
-from jina.executors.indexers import BaseIndexer
-
-
-class BaseQueryIndexer(BaseIndexer):
- """An indexer only for querying. It only reads once (at creation time, from a dump)"""
-
- def _post_init_wrapper(
- self,
- _metas: Optional[Dict] = None,
- _requests: Optional[Dict] = None,
- fill_in_metas: bool = True,
- ) -> None:
- super()._post_init_wrapper(_metas, _requests, fill_in_metas)
- self.dump_path = _metas.get('dump_path')
- # TODO this shouldn't be required
- # we don't do this for Compounds, as the _components
- # are not yet set at this stage.
- # for Compound we use a `_post_components`
- if self.dump_path and not hasattr(self, 'components'):
- self._load_dump(self.dump_path)
-
- def _load_dump(self, dump_path):
- """Load the dump at the dump_path
-
- :param dump_path: the path of the dump"""
- raise NotImplementedError
-
- def _log_warn(self):
- self.logger.error(f'Index {self.__class__} is write-once')
-
- def add(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Disabled. QueryIndexers are write-once (at instantiation time)
-
-
- .. # noqa: DAR101
- """
- self._log_warn()
-
- def update(
- self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs
- ) -> None:
- """Disabled. QueryIndexers are write-once (at instantiation time)
-
-
- .. # noqa: DAR101
- """
- self._log_warn()
-
- def delete(self, keys: Iterable[str], *args, **kwargs) -> None:
- """Disabled. QueryIndexers are write-once (at instantiation time)
-
-
- .. # noqa: DAR101
- """
- self._log_warn()
diff --git a/jina/executors/indexers/query/compound.py b/jina/executors/indexers/query/compound.py
deleted file mode 100644
index 81963ca1e2624..0000000000000
--- a/jina/executors/indexers/query/compound.py
+++ /dev/null
@@ -1,52 +0,0 @@
-from jina.executors.compound import CompoundExecutor
-from jina.executors.indexers.query import BaseQueryIndexer
-
-
-class CompoundQueryExecutor(CompoundExecutor, BaseQueryIndexer):
- """A Compound Executor that wraps several QueryIndexers
-
- :param dump_path: the path to initialize from
- """
-
- # TODO this shouldn't be required
- # we don't do this for Compounds, as the _components
- # are not yet set at this stage.
- # for Compound we use a `_post_components`
- def _post_components(self):
- if self.dump_path:
- self._load_dump(self.dump_path)
-
- def _load_dump(self, dump_path, *args, **kwargs):
- """Loads the data in the indexer
-
- :param dump_path: the path to the dump
- :param args: passed to the inner Indexer's load_dump
- :param kwargs: passed to the inner Indexer's load_dump
- """
- for c in self.components:
- c._load_dump(dump_path)
-
- def get_add_handler(self):
- """required to silence NotImplementedErrors
-
-
- .. #noqa: DAR201"""
- return None
-
- def get_create_handler(self):
- """required to silence NotImplementedErrors
-
-
- .. #noqa: DAR201"""
- return None
-
- def get_query_handler(self):
- """required to silence NotImplementedErrors
-
-
- .. #noqa: DAR201"""
- return None
-
-
-class CompoundQueryIndexer(CompoundQueryExecutor):
- """Alias"""
diff --git a/jina/executors/indexers/query/keyvalue.py b/jina/executors/indexers/query/keyvalue.py
deleted file mode 100644
index c1be358fb267a..0000000000000
--- a/jina/executors/indexers/query/keyvalue.py
+++ /dev/null
@@ -1,35 +0,0 @@
-from typing import Optional, List
-
-from jina import Document
-from jina.executors.indexers.dump import import_metas
-from jina.executors.indexers.keyvalue import BinaryPbWriterMixin
-from jina.executors.indexers.query import BaseQueryIndexer
-
-
-class BinaryPbQueryIndexer(BinaryPbWriterMixin, BaseQueryIndexer):
- """A write-once Key-value indexer."""
-
- def _load_dump(self, dump_path):
- """Load the dump at the path
-
- :param dump_path: the path of the dump"""
- ids, metas = import_metas(dump_path, str(self.pea_id))
- with self.get_create_handler() as write_handler:
- self._add(list(ids), list(metas), write_handler)
- # warming up
- self.query(['someid'])
-
- def query(self, keys: List[str], *args, **kwargs) -> List[Optional[bytes]]:
- """Get a document by its id
-
- :param keys: the ids
- :param args: not used
- :param kwargs: not used
- :return: List of the bytes of the Documents (or None, if not found)
- """
- res = self._query(keys)
- return res
-
-
-class KeyValueQueryIndexer(BinaryPbQueryIndexer):
- """An alias"""
diff --git a/jina/executors/indexers/query/vector.py b/jina/executors/indexers/query/vector.py
deleted file mode 100644
index 686d751b53a4c..0000000000000
--- a/jina/executors/indexers/query/vector.py
+++ /dev/null
@@ -1,69 +0,0 @@
-from typing import Generator
-
-import numpy as np
-
-from jina.executors.indexers.dump import import_vectors
-from jina.executors.indexers.query import BaseQueryIndexer
-from jina.executors.indexers.vector import NumpyIndexer
-
-
-class NumpyQueryIndexer(NumpyIndexer, BaseQueryIndexer):
- """An exhaustive vector indexers implemented with numpy and scipy.
-
- .. note::
- Metrics other than `cosine` and `euclidean` requires ``scipy`` installed.
-
- :param metric: The distance metric to use. `braycurtis`, `canberra`, `chebyshev`, `cityblock`, `correlation`,
- `cosine`, `dice`, `euclidean`, `hamming`, `jaccard`, `jensenshannon`, `kulsinski`,
- `mahalanobis`,
- `matching`, `minkowski`, `rogerstanimoto`, `russellrao`, `seuclidean`, `sokalmichener`,
- `sokalsneath`, `sqeuclidean`, `wminkowski`, `yule`.
- :param backend: `numpy` or `scipy`, `numpy` only supports `euclidean` and `cosine` distance
- :param compress_level: compression level to use
- """
-
- def _load_dump(self, dump_path):
- """Load the dump at the path
-
- :param dump_path: the path of the dump"""
- ids, vecs = import_vectors(dump_path, str(self.pea_id))
- self._add(ids, vecs)
- self.write_handler.flush()
- self.write_handler.close()
- self.handler_mutex = False
- self.is_handler_loaded = False
- test_vecs = np.array([np.random.random(self.num_dim)], dtype=self.dtype)
- assert self.query(test_vecs, 1) is not None
-
- def _add(self, keys: Generator, vectors: Generator, *args, **kwargs) -> None:
- """Add the embeddings and document ids to the index.
-
- NOTE::
-
- This replaces the parent class' `_add` since we
- need to adapt to use Generators from the dump loading
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param vectors: embeddings
- :param args: not used
- :param kwargs: not used
- """
- keys = np.array(list(keys), (np.str_, self.key_length))
- vectors_nr = 0
- for vector in vectors:
- if not getattr(self, 'num_dim', None):
- self.num_dim = vector.shape[0]
- self.dtype = vector.dtype.name
- self.write_handler.write(vector.tobytes())
- vectors_nr += 1
-
- if vectors_nr != keys.shape[0]:
- raise ValueError(
- f'Different number of vectors and keys. {vectors_nr} vectors and {len(keys)} keys. Validate your dump'
- )
-
- self.valid_indices = np.concatenate(
- (self.valid_indices, np.full(len(keys), True))
- )
- self.key_bytes += keys.tobytes()
- self._size += keys.shape[0]
diff --git a/jina/executors/indexers/vector.py b/jina/executors/indexers/vector.py
deleted file mode 100644
index 67bc3c5580fa7..0000000000000
--- a/jina/executors/indexers/vector.py
+++ /dev/null
@@ -1,542 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-import gzip
-import io
-import os
-import random
-from functools import lru_cache
-from os import path
-from typing import Optional, Iterable, Tuple, Dict, Union
-
-import numpy as np
-
-from . import BaseVectorIndexer
-from ..decorators import batching
-from ...helper import cached_property
-from ...importer import ImportExtensions
-
-
-class BaseNumpyIndexer(BaseVectorIndexer):
- """
- :class:`BaseNumpyIndexer` stores and loads vector in a compresses binary file
-
- .. note::
- :attr:`compress_level` balances between time and space. By default, :classL`NumpyIndexer` has
- :attr:`compress_level` = 0.
-
- Setting :attr:`compress_level`>0 gives a smaller file size on the disk in the index time. However, in the query
- time it loads all data into memory at once. Not ideal for large scale application.
-
- Setting :attr:`compress_level`=0 enables :func:`np.memmap`, which loads data in an on-demand way and
- gives smaller memory footprint in the query time. However, it often gives larger file size on the disk.
-
- :param compress_level: The compresslevel argument is an integer from 0 to 9 controlling the
- level of compression; 1 is fastest and produces the least compression,
- and 9 is slowest and produces the most compression. 0 is no compression
- at all. The default is 9.
- :param ref_indexer: Bootstrap the current indexer from a ``ref_indexer``. This enables user to switch
- the query algorithm at the query time.
- :param delete_on_dump: whether to delete the rows marked as delete (see ``valid_indices``)
- """
-
- def __init__(
- self,
- compress_level: int = 1,
- ref_indexer: Optional['BaseNumpyIndexer'] = None,
- delete_on_dump: bool = False,
- *args,
- **kwargs,
- ):
- super().__init__(*args, **kwargs)
- self.num_dim = None
- self.dtype = None
- self.delete_on_dump = delete_on_dump
- self.compress_level = compress_level
- self.key_bytes = b''
- self.valid_indices = np.array([], dtype=bool)
- self.ref_indexer_workspace_name = None
-
- if ref_indexer:
- # copy the header info of the binary file
- self.num_dim = ref_indexer.num_dim
- self.dtype = ref_indexer.dtype
- self.compress_level = ref_indexer.compress_level
- self.key_bytes = ref_indexer.key_bytes
- self.key_length = ref_indexer.key_length
- self._size = ref_indexer._size
- # point to the ref_indexer.index_filename
- # so that later in `post_init()` it will load from the referred index_filename
- self.valid_indices = ref_indexer.valid_indices
- self.index_filename = ref_indexer.index_filename
- self.logger.warning(
- f'\n'
- f'num_dim extracted from `ref_indexer` to {ref_indexer.num_dim} \n'
- f'_size extracted from `ref_indexer` to {ref_indexer._size} \n'
- f'dtype extracted from `ref_indexer` to {ref_indexer.dtype} \n'
- f'compress_level overridden from `ref_indexer` to {ref_indexer.compress_level} \n'
- f'index_filename overridden from `ref_indexer` to {ref_indexer.index_filename}'
- )
- self.ref_indexer_workspace_name = ref_indexer.workspace_name
- self.delete_on_dump = getattr(ref_indexer, 'delete_on_dump', delete_on_dump)
-
- def _delete_invalid_indices(self):
- valid = self.valid_indices[self.valid_indices == True] # noqa
- if len(valid) != len(self.valid_indices):
- self._clean_memmap()
- self._post_clean_memmap(valid)
-
- def _post_clean_memmap(self, valid):
- # here we need to make sure the fields
- # that depend on the valid_indices are cleaned up too
- valid_key_bytes = np.frombuffer(
- self.key_bytes, dtype=(np.str_, self.key_length)
- )[self.valid_indices].tobytes()
- self.key_bytes = valid_key_bytes
- self._size = len(valid)
- self.valid_indices = valid
- del self._int2ext_id
- del self._ext2int_id
-
- def _clean_memmap(self):
- # clean up the underlying matrix of entries marked for deletion
- # first we need to make sure we flush the writing handler
- if self.write_handler and not self.write_handler.closed:
- with self.write_handler as f:
- f.flush()
- self.handler_mutex = False
- # force the raw_ndarray (underlying matrix) to re-read from disk
- # (needed when there were writing ops to be flushed)
- del self._raw_ndarray
- filtered = self._raw_ndarray[self.valid_indices]
- # we need an intermediary file
- tmp_path = self.index_abspath + 'tmp'
-
- # write the bytes in the respective files
- if self.compress_level > 0:
- with gzip.open(
- tmp_path, 'wb', compresslevel=self.compress_level
- ) as new_gzip_fh:
- new_gzip_fh.write(filtered.tobytes())
- else:
- with open(tmp_path, 'wb') as filtered_data_fh:
- filtered_data_fh.write(filtered.tobytes())
-
- os.remove(self.index_abspath)
- os.rename(tmp_path, self.index_abspath)
- # force it to re-read again from the new file
- del self._raw_ndarray
-
- def __getstate__(self):
- # called on pickle save
- if self.delete_on_dump:
- self._delete_invalid_indices()
- d = super().__getstate__()
- return d
-
- @property
- def workspace_name(self):
- """Get the workspace name.
-
-
- .. # noqa: DAR201
- """
- return (
- self.name
- if self.ref_indexer_workspace_name is None
- else self.ref_indexer_workspace_name
- )
-
- @property
- def index_abspath(self) -> str:
- """Get the file path of the index storage
-
- Use index_abspath
-
-
- .. # noqa: DAR201
- """
- return self.get_file_from_workspace(self.index_filename)
-
- def get_add_handler(self) -> 'io.BufferedWriter':
- """Open a binary gzip file for appending new vectors
-
- :return: a gzip file stream
- """
- if self.compress_level > 0:
- return gzip.open(
- self.index_abspath, 'ab', compresslevel=self.compress_level
- )
- else:
- return open(self.index_abspath, 'ab')
-
- def get_create_handler(self) -> 'io.BufferedWriter':
- """Create a new gzip file for adding new vectors. The old vectors are replaced.
-
- :return: a gzip file stream
- """
- if self.compress_level > 0:
- return gzip.open(
- self.index_abspath, 'wb', compresslevel=self.compress_level
- )
- else:
- return open(self.index_abspath, 'wb')
-
- def _validate_key_vector_shapes(self, keys, vectors):
- if len(vectors.shape) != 2:
- raise ValueError(
- f'vectors shape {vectors.shape} is not valid, expecting "vectors" to have rank of 2'
- )
-
- if not getattr(self, 'num_dim', None):
- self.num_dim = vectors.shape[1]
- self.dtype = vectors.dtype.name
- elif self.num_dim != vectors.shape[1]:
- raise ValueError(
- f'vectors shape {vectors.shape} does not match with indexers\'s dim: {self.num_dim}'
- )
- elif self.dtype != vectors.dtype.name:
- raise TypeError(
- f'vectors\' dtype {vectors.dtype.name} does not match with indexers\'s dtype: {self.dtype}'
- )
-
- if keys.shape[0] != vectors.shape[0]:
- raise ValueError(
- f'number of key {keys.shape[0]} not equal to number of vectors {vectors.shape[0]}'
- )
-
- def add(self, keys: Iterable[str], vectors: 'np.ndarray', *args, **kwargs) -> None:
- """Add the embeddings and document ids to the index.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param vectors: embeddings
- :param args: not used
- :param kwargs: not used
- """
- np_keys = np.array(keys, (np.str_, self.key_length))
- self._add(np_keys, vectors)
-
- def _add(self, keys: 'np.ndarray', vectors: 'np.ndarray'):
- if keys.size and vectors.size:
- self._validate_key_vector_shapes(keys, vectors)
- self.write_handler.write(vectors.tobytes())
- self.valid_indices = np.concatenate(
- (self.valid_indices, np.full(len(keys), True))
- )
- self.key_bytes += keys.tobytes()
- self._size += keys.shape[0]
-
- def update(
- self, keys: Iterable[str], vectors: 'np.ndarray', *args, **kwargs
- ) -> None:
- """Update the embeddings on the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param vectors: embeddings
- :param args: not used
- :param kwargs: not used
- """
- # noinspection PyTypeChecker
- if self.size:
- keys, values = self._filter_nonexistent_keys_values(
- keys, vectors, self._ext2int_id.keys()
- )
- if keys:
- np_keys = np.array(keys, (np.str_, self.key_length))
- self._delete(np_keys)
- self._add(np_keys, np.array(values))
- else:
- self.logger.error(f'{self!r} is empty, update is aborted')
-
- def _delete(self, keys):
- if keys.size:
- for key in keys:
- # mark as `False` in mask
- self.valid_indices[self._ext2int_id[key]] = False
- self._size -= 1
-
- def delete(self, keys: Iterable[str], *args, **kwargs) -> None:
- """Delete the embeddings from the index via document ids.
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: not used
- :param kwargs: not used
- """
- if self.size:
- keys = self._filter_nonexistent_keys(keys, self._ext2int_id.keys())
- if keys:
- np_keys = np.array(keys, (np.str_, self.key_length))
- self._delete(np_keys)
- else:
- self.logger.error(f'{self!r} is empty, deletion is aborted')
-
- def get_query_handler(self) -> Optional['np.ndarray']:
- """Open a gzip file and load it as a numpy ndarray
-
- :return: a numpy ndarray of vectors
- """
- if np.all(self.valid_indices):
- vecs = self._raw_ndarray
- else:
- vecs = self._raw_ndarray[self.valid_indices]
-
- if vecs is not None:
- return self.build_advanced_index(vecs)
-
- def build_advanced_index(self, vecs: 'np.ndarray'):
- """Not implemented here.
-
-
- .. # noqa: DAR201
-
-
- .. # noqa: DAR101
- """
- raise NotImplementedError
-
- def _load_gzip(self, abspath: str, mode='rb') -> Optional['np.ndarray']:
- try:
- self.logger.info(f'loading index from {abspath}...')
- with gzip.open(abspath, mode) as fp:
- return np.frombuffer(fp.read(), dtype=self.dtype).reshape(
- [-1, self.num_dim]
- )
- except EOFError:
- self.logger.error(
- f'{abspath} is broken/incomplete, perhaps forgot to ".close()" in the last usage?'
- )
-
- @cached_property
- def _raw_ndarray(self) -> Union['np.ndarray', 'np.memmap', None]:
- if not (path.exists(self.index_abspath) or self.num_dim or self.dtype):
- return
-
- if self.compress_level > 0:
- return self._load_gzip(self.index_abspath)
- elif self.size is not None and os.stat(self.index_abspath).st_size:
- self.logger.success(f'memmap is enabled for {self.index_abspath}')
- # `==` is required. `is False` does not work in np
- deleted_keys = len(self.valid_indices[self.valid_indices == False]) # noqa
- return np.memmap(
- self.index_abspath,
- dtype=self.dtype,
- mode='r',
- shape=(self.size + deleted_keys, self.num_dim),
- )
-
- def sample(self) -> Optional[bytes]:
- """Return a random entry from the indexer for sanity check.
-
- :return: A random entry from the indexer.
- """
- k = random.sample(list(self._ext2int_id.values()), k=1)[0]
- return self._raw_ndarray[k]
-
- def __iter__(self):
- return self._raw_ndarray.__iter__()
-
- def query_by_key(
- self, keys: Iterable[str], *args, **kwargs
- ) -> Optional['np.ndarray']:
- """
- Search the index by the external key (passed during `.add(`).
-
- :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf
- :param args: not used
- :param kwargs: not used
- :return: ndarray of vectors
- """
- keys = self._filter_nonexistent_keys(keys, self._ext2int_id.keys())
- if keys:
- indices = [self._ext2int_id[key] for key in keys]
- return self._raw_ndarray[indices]
- else:
- return None
-
- @cached_property
- def _int2ext_id(self) -> Optional['np.ndarray']:
- """Convert internal ids (0,1,2,3,4,...) to external ids (random strings)
-
-
- .. # noqa: DAR201
- """
- if self.key_bytes:
- r = np.frombuffer(self.key_bytes, dtype=(np.str_, self.key_length))
- # `==` is required. `is False` does not work in np
- deleted_keys = len(self.valid_indices[self.valid_indices == False]) # noqa
- if r.shape[0] == (self.size + deleted_keys) == self._raw_ndarray.shape[0]:
- return r
- else:
- print(
- f'the size of the keys and vectors are inconsistent '
- f'({r.shape[0]}, {self._size}, {self._raw_ndarray.shape[0]}), '
- f'did you write to this index twice? or did you forget to save indexer?'
- )
- self.logger.error(
- f'the size of the keys and vectors are inconsistent '
- f'({r.shape[0]}, {self._size}, {self._raw_ndarray.shape[0]}), '
- f'did you write to this index twice? or did you forget to save indexer?'
- )
-
- @cached_property
- def _ext2int_id(self) -> Optional[Dict]:
- """Convert external ids (random strings) to internal ids (0,1,2,3,4,...)
-
-
- .. # noqa: DAR201
- """
- if self._int2ext_id is not None:
- return {k: idx for idx, k in enumerate(self._int2ext_id)}
-
-
-@lru_cache(maxsize=3)
-def _get_ones(x, y):
- return np.ones((x, y))
-
-
-def _ext_A(A):
- nA, dim = A.shape
- A_ext = _get_ones(nA, dim * 3)
- A_ext[:, dim : 2 * dim] = A
- A_ext[:, 2 * dim :] = A ** 2
- return A_ext
-
-
-def _ext_B(B):
- nB, dim = B.shape
- B_ext = _get_ones(dim * 3, nB)
- B_ext[:dim] = (B ** 2).T
- B_ext[dim : 2 * dim] = -2.0 * B.T
- del B
- return B_ext
-
-
-def _euclidean(A_ext, B_ext):
- sqdist = A_ext.dot(B_ext).clip(min=0)
- return np.sqrt(sqdist)
-
-
-def _norm(A):
- return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True)
-
-
-def _cosine(A_norm_ext, B_norm_ext):
- return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2
-
-
-class NumpyIndexer(BaseNumpyIndexer):
- """An exhaustive vector indexers implemented with numpy and scipy.
-
- .. note::
- Metrics other than `cosine` and `euclidean` requires ``scipy`` installed.
-
- :param metric: The distance metric to use. `braycurtis`, `canberra`, `chebyshev`, `cityblock`, `correlation`,
- `cosine`, `dice`, `euclidean`, `hamming`, `jaccard`, `jensenshannon`, `kulsinski`,
- `mahalanobis`,
- `matching`, `minkowski`, `rogerstanimoto`, `russellrao`, `seuclidean`, `sokalmichener`,
- `sokalsneath`, `sqeuclidean`, `wminkowski`, `yule`.
- :param backend: `numpy` or `scipy`, `numpy` only supports `euclidean` and `cosine` distance
- :param compress_level: compression level to use
- """
-
- batch_size = 512
-
- def __init__(
- self,
- metric: str = 'cosine',
- backend: str = 'numpy',
- compress_level: int = 0,
- *args,
- **kwargs,
- ):
- super().__init__(*args, compress_level=compress_level, **kwargs)
- self.metric = metric
- self.backend = backend
-
- @staticmethod
- def _get_sorted_top_k(
- dist: 'np.array', top_k: int
- ) -> Tuple['np.ndarray', 'np.ndarray']:
- """Find top-k smallest distances in ascending order.
-
- Idea is to use partial sort to retrieve top-k smallest distances unsorted and then sort these
- in ascending order. Equivalent to full sort but faster for n >> k. If k >= n revert to full sort.
-
- :param dist: the distances
- :param top_k: nr to limit
- :return: tuple of indices, computed distances
- """
- if top_k >= dist.shape[1]:
- idx = dist.argsort(axis=1)[:, :top_k]
- dist = np.take_along_axis(dist, idx, axis=1)
- else:
- idx_ps = dist.argpartition(kth=top_k, axis=1)[:, :top_k]
- dist = np.take_along_axis(dist, idx_ps, axis=1)
- idx_fs = dist.argsort(axis=1)
- idx = np.take_along_axis(idx_ps, idx_fs, axis=1)
- dist = np.take_along_axis(dist, idx_fs, axis=1)
-
- return idx, dist
-
- def query(
- self, vectors: 'np.ndarray', top_k: int, *args, **kwargs
- ) -> Tuple['np.ndarray', 'np.ndarray']:
- """Find the top-k vectors with smallest ``metric`` and return their ids in ascending order.
-
- :return: a tuple of two ndarray.
- The first is ids in shape B x K (`dtype=int`), the second is metric in shape B x K (`dtype=float`)
-
- .. warning::
- This operation is memory-consuming.
-
- Distance (the smaller the better) is returned, not the score.
-
- :param vectors: the vectors with which to search
- :param args: not used
- :param kwargs: not used
- :param top_k: nr of results to return
- :return: tuple of indices within matrix and distances
- """
- if self.size == 0:
- return np.array([]), np.array([])
- if self.metric not in {'cosine', 'euclidean'} or self.backend == 'scipy':
- dist = self._cdist(vectors, self.query_handler)
- elif self.metric == 'euclidean':
- _query_vectors = _ext_A(vectors)
- dist = self._euclidean(_query_vectors, self.query_handler)
- elif self.metric == 'cosine':
- _query_vectors = _ext_A(_norm(vectors))
- dist = self._cosine(_query_vectors, self.query_handler)
-
- idx, dist = self._get_sorted_top_k(dist, top_k)
- indices = self._int2ext_id[self.valid_indices][idx]
- return indices, dist
-
- def build_advanced_index(self, vecs: 'np.ndarray') -> 'np.ndarray':
- """
- Build advanced index structure based on in-memory numpy ndarray, e.g. graph, tree, etc.
-
- :param vecs: The raw numpy ndarray.
- :return: Advanced index.
- """
- return vecs
-
- @batching(merge_over_axis=1, slice_on=2)
- def _euclidean(self, cached_A, raw_B):
- data = _ext_B(raw_B)
- return _euclidean(cached_A, data)
-
- @batching(merge_over_axis=1, slice_on=2)
- def _cosine(self, cached_A, raw_B):
- data = _ext_B(_norm(raw_B))
- return _cosine(cached_A, data)
-
- @batching(merge_over_axis=1, slice_on=2)
- def _cdist(self, *args, **kwargs):
- with ImportExtensions(required=True):
- from scipy.spatial.distance import cdist
- return cdist(*args, **kwargs, metric=self.metric)
-
-
-class VectorIndexer(NumpyIndexer):
- """Alias to :class:`NumpyIndexer` """
diff --git a/jina/executors/metas.py b/jina/executors/metas.py
index dbf2d43917421..adc65d4e8b43d 100644
--- a/jina/executors/metas.py
+++ b/jina/executors/metas.py
@@ -1,226 +1,19 @@
-"""The default meta config that all executors follow, they can be overridden by the YAML config
-
-.. warning::
-
- When you define your own Executor class, make sure your attributes/methods name do not
- conflict with the name listed below.
-
-
-.. note::
- Essentially, the meta config can be set in two places: as part of the YAML file, or as the class attribute
- via :func:`__init__` or in class definition. When multiple meta specification exists, the overwrite priority is:
-
- metas defined in YAML > metas defined as class attribute > metas default values listed below
-
-
-Any executor inherited from :class:`BaseExecutor` always has the following **meta** fields:
-
- .. confval:: is_updated
-
- indicates if the executor is updated or changed since last save, if not then :func:`save` will do nothing.
- A forced save is possible to use :func:`touch` before :func:`save`
-
- :type: bool
- :default: ``False``
-
- .. confval:: batch_size
-
- the size of each batch, methods decorated by :func:`@batching` will respect this. useful when incoming data is
- too large to fit into (GPU) memory.
-
- :type: int
- :default: ``None``
-
- .. confval:: workspace
-
- the working directory, for persisting the artifacts of the executor. An artifact is a file or collection of files
- used during a workflow run.
-
- By default it is not set, if you expect your executor to be persisted or to persist any data, remember to set it
- to the desired value.
-
- When a `BaseExecutor` is a component of a `CompoundExecutor`, its `workspace` value will be overridden by the `workspace`
- coming from the `CompoundExecutor` unless a particular `workspace` value is set for the component `BaseExecutor`.
-
- :type: str
- :default: None
-
- .. confval:: name
-
- the name of the executor.
-
- :type: str
- :default: class name plus a random string
-
- .. confval:: on_gpu
-
- if the executor is set to run on GPU.
-
- :type: bool
- :default: ``False``
-
-
- .. confval:: py_modules
-
- the external python module paths. it is useful when you want to load external python modules
- using :func:`BaseExecutor.load_config` from a YAML file. If a relative path is given then the root path is set to
- the path of the current YAML file.
-
- Example of ``py_module`` usage:
-
- 1. This is a valid structure and it is RECOMMENDED:
- - "my_cust_module" is a python module
- - all core logic of your customized executor goes to ``__init__.py``
- - to import ``foo.py``, you can use relative import, e.g. ``from .foo import bar``
- - ``helper.py`` needs to be put BEFORE `__init__.py` in YAML ``py_modules``
-
- This is also the structure given by ``jina hub new`` CLI.
-
- .. highlight:: text
- .. code-block:: text
-
- my_cust_module
- |- __init__.py
- |- helper.py
- |- config.yml
- |- py_modules
- |- helper.py
- |- __init__.py
-
- 2. This is a valid structure but not recommended:
- - "my_cust_module" is not a python module (lack of __init__.py under the root)
- - to import ``foo.py``, you must to use ``from jinahub.foo import bar``
- - ``jinahub`` is a common namespace for all plugin-modules, not changeable.
- - ``helper.py`` needs to be put BEFORE `my_cust.py` in YAML ``py_modules``
-
- .. highlight:: text
- .. code-block:: text
-
- my_cust_module
- |- my_cust.py
- |- helper.py
- |- config.yml
- |- py_modules
- |- helper.py
- |- my_cust.py
-
- :type: str/List[str]
- :default: ``None``
-
- .. confval:: pea_id
-
- the integer index used for distinguish each parallel pea of this executor, required in :attr:`shard_workspace`
-
- :type: int
- :default: ``'${{root.metas.pea_id}}'``
-
- .. confval:: root_workspace
-
- the workspace of the root executor. It will be the same as `executor` except in the case when an `Executor` inside a `CompoundExecutor` is used,
- or when a `BaseNumpyIndexer` is used with a `ref_indexer`.
-
- By default, jina will try to find if a `dump` of the executor can be found in `workspace`, otherwise it will try to find it under `root_workspace`
- assuming it may be part of a `CompoundExecutor`.
-
- :type: str
- :default: ``'${{root.metas.workspace}}'``
-
- .. confval:: root_name
-
- the name of the root executor. It will be the same as `executor` except in the case when an `Executor` inside a `CompoundExecutor` is used,
- or when a `BaseNumpyIndexer` is used with a `ref_indexer`
-
- :type: str
- :default: ``'${{root.metas.name}}'``
-
- .. confval:: read_only
-
- do not allow the pod to modify the model, save calls will be ignored. If set to true no serialization of the executor
-
- :type: bool
- :default: ``False``
-
- .. warning::
- ``name`` and ``workspace`` must be set if you want to serialize/deserialize this executor.
-
- .. note::
-
- ``pea_id`` is set in a way that when the executor ``A`` is used as
- a component of a :class:`jina.executors.compound.CompoundExecutor` ``B``, then ``A``'s setting will be overridden by B's counterpart.
-
- These **meta** fields can be accessed via `self.name` or loaded from a YAML config via :func:`load_config`:
-
- .. highlight:: yaml
- .. code-block:: yaml
-
- !MyAwesomeExecutor
- with:
- ...
- metas:
- name: my_transformer # a customized name
- workspace: ./ # path for serialize/deserialize
-
-
-
-"""
-
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-
-from typing import Dict, Union, List
-
-_defaults = None
+from typing import Dict
def get_default_metas() -> Dict:
"""
Get a copy of default meta variables.
- :return: default metas
- """
- import copy
-
- global _defaults
-
- if _defaults is None:
- from ..jaml import JAML
- from pkg_resources import resource_stream
-
- with resource_stream(
- 'jina', '/'.join(('resources', 'executors.metas.default.yml'))
- ) as fp:
- _defaults = JAML.load(
- fp
- ) # do not expand variables at here, i.e. DO NOT USE expand_dict(yaml.load(fp))
+ NOTE: DO NOT ADD MORE ENTRIES HERE!
- return copy.deepcopy(_defaults)
-
-
-def fill_metas_with_defaults(d: Dict) -> Dict:
- """Fill the incomplete ``metas`` field with complete default values
-
- :param d: the loaded YAML map
- :return: dictionary with injected metas
+ :return: a deep copy of the default metas in a new dict
"""
- def _scan(sub_d: Union[Dict, List]):
- if isinstance(sub_d, Dict):
- for k, v in sub_d.items():
- if k == 'metas':
- _tmp = get_default_metas()
- _tmp.update(v)
- sub_d[k] = _tmp
- elif isinstance(v, dict):
- _scan(v)
- elif isinstance(v, list):
- _scan(v)
- elif isinstance(sub_d, List):
- for idx, v in enumerate(sub_d):
- if isinstance(v, dict):
- _scan(v)
- elif isinstance(v, list):
- _scan(v)
-
- _scan(d)
- return d
+ # NOTE: DO NOT ADD MORE ENTRIES HERE!
+ return {
+ 'name': '', #: a string, the name of the executor
+ 'description': '', #: a string, the description of this executor. It will be used in automatics docs UI
+ 'workspace': '', #: a string, the workspace of the executor
+ 'py_modules': '', #: a list of strings, the python dependencies of the executor
+ }
diff --git a/jina/executors/rankers/__init__.py b/jina/executors/rankers/__init__.py
deleted file mode 100644
index ae6d41ab4b522..0000000000000
--- a/jina/executors/rankers/__init__.py
+++ /dev/null
@@ -1,155 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Dict, Optional, List
-
-import numpy as np
-
-from .. import BaseExecutor
-
-
-class BaseRanker(BaseExecutor):
- """
- The base class for a `Ranker`
- :param query_required_keys: Set of keys or features to be extracted from query `Document` by the `Driver` so that
- they are passed as query features or metainfo.
- :param match_required_keys: Set of keys or features to be extracted from match `Document` by the `Driver` so that
- they are passed as match features or metainfo.
- :param args: Extra positional arguments
- :param kwargs: Extra keyword arguments
-
- .. note::
- See how the attributes are accessed in :class:`Document` in :meth:`get_attrs`.
-
- .. highlight:: python
- .. code-block:: python
-
- query = Document({'tags': {'color': 'blue'})
- match = Document({'tags': {'color': 'blue', 'price': 1000}})
-
- ranker = BaseRanker(query_required_keys=('tags__color'), match_required_keys=('tags__color, 'tags__price')
- """
-
- def __init__(
- self,
- query_required_keys: Optional[List[str]] = None,
- match_required_keys: Optional[List[str]] = None,
- *args,
- **kwargs
- ):
- """
-
- :param query_required_keys: Set of keys or features to be extracted from query `Document` by the `Driver` so that
- they are passed as query features or metainfo.
- :param match_required_keys: Set of keys or features to be extracted from match `Document` by the `Driver` so that
- they are passed as match features or metainfo.
- :param args: Extra positional arguments
- :param kwargs: Extra keyword arguments
-
- .. note::
- See how the attributes are accessed in :class:`Document` in :meth:`get_attrs`.
-
- .. highlight:: python
- .. code-block:: python
-
- query = Document({'tags': {'color': 'blue'})
- match = Document({'tags': {'color': 'blue', 'price': 1000}})
-
- ranker = BaseRanker(query_required_keys=('tags__color'), match_required_keys=('tags__color, 'tags__price')
- """
- super().__init__(*args, **kwargs)
- self.query_required_keys = query_required_keys
- self.match_required_keys = match_required_keys
-
- def score(self, *args, **kwargs):
- """Calculate the score. Base class method needs to be implemented in subclass.
- :param args: Extra positional arguments
- :param kwargs: Extra keyword arguments
- """
- raise NotImplementedError
-
-
-class Chunk2DocRanker(BaseRanker):
- """A :class:`Chunk2DocRanker` translates the chunk-wise score (distance) to the doc-wise score.
-
- In the query-time, :class:`Chunk2DocRanker` is an almost-always required component.
- Because in the end we want to retrieve top-k documents of given query-document not top-k chunks of
- given query-chunks. The purpose of :class:`Chunk2DocRanker` is to aggregate the already existed top-k chunks
- into documents.
-
- The key function here is :func:`score`.
-
- .. seealso::
- :mod:`jina.drivers.handlers.score`
-
- """
-
- COL_PARENT_ID = 'match_parent_id'
- COL_DOC_CHUNK_ID = 'match_doc_chunk_id'
- COL_QUERY_CHUNK_ID = 'match_query_chunk_id'
- COL_SCORE = 'score'
-
- def score(
- self, match_idx: 'np.ndarray', query_chunk_meta: Dict, match_chunk_meta: Dict
- ) -> float:
- """
- Given a set of queries (that may correspond to the chunks of a root level query) and a set of matches
- corresponding to the same parent id, compute the matching score of the common parent of the set of matches.
- Returns a score corresponding to the score of the parent document of the matches in `match_idx`
-
- :param match_idx: A [N x 4] numpy ``ndarray``, column-wise:
- - ``match_idx[:, 0]``: ``parent_id`` of the matched docs, integer
- - ``match_idx[:, 1]``: ``id`` of the matched chunks, integer
- - ``match_idx[:, 2]``: ``id`` of the query chunks, integer
- - ``match_idx[:, 3]``: distance/metric/score between the query and matched chunks, float.
- All the matches belong to the same `parent`
- :param query_chunk_meta: The meta information of the query chunks, where the key is query chunks' ``chunk_id``,
- the value is extracted by the ``query_required_keys``.
- :param match_chunk_meta: The meta information of the matched chunks, where the key is matched chunks'
- ``chunk_id``, the value is extracted by the ``match_required_keys``.
-
-
- TODO:
- - ``match_idx[:, 0]`` is redundant because all the matches have the same ``parent_id``.
-
- """
- raise NotImplementedError
-
-
-class Match2DocRanker(BaseRanker):
- """
- Re-scores the matches for a document. This Ranker is only responsible for
- calculating new scores and not for the actual sorting. The sorting is handled
- in the respective ``Matches2DocRankDriver``.
-
- Possible implementations:
- - ReverseRanker (reverse scores of all matches)
- - BucketShuffleRanker (first buckets matches and then sort each bucket).
- """
-
- COL_MATCH_ID = 'match_doc_chunk_id'
- COL_SCORE = 'score'
-
- def score(
- self,
- old_matches_scores: List[List[float]],
- queries_metas: List[Dict],
- matches_metas: List[List[Dict]],
- ) -> List[List[float]]:
- """
- Calculates the new scores for matches and returns them. Returns an iterable of the scores to be assigned to the matches.
- The returned scores need to be returned in the same order as the input `:param old_match_scores`.
-
- .. note::
- The length of `old_match_scores`, `queries_metas` and `matches_metas` correspond to the amount of queries in the batch for which
- one wants to score its matches.
-
- Every Sequence in match metas correspond to the amount of retrieved matches per query.
-
- The resulting list of scores will provide a list of score for every query. And every list will be ordered in the same way as the `matches_metas` lists
-
- :param old_matches_scores: Contains old scores in a list for every query
- :param queries_metas: List of dictionaries containing all the query meta information requested by the `query_required_keys` class_variable for each query in a batch.
- :param matches_metas: List of lists containing all the matches meta information requested by the `match_required_keys` class_variable for every query. Sorted in the same way as `old_match_scores`
- """
- raise NotImplementedError
diff --git a/jina/executors/rankers/trainer.py b/jina/executors/rankers/trainer.py
deleted file mode 100644
index 57048a095e44b..0000000000000
--- a/jina/executors/rankers/trainer.py
+++ /dev/null
@@ -1,26 +0,0 @@
-__copyright__ = "Copyright (c) 2021 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from .. import BaseExecutor
-
-
-class RankerTrainer(BaseExecutor):
- """Class :class:`RankerTrainer` is used to train a ranker for ranker fine-tunning purpose.
- such as offline-learning and online-learning.
- """
-
- def __init__(self, *args, **kwargs):
- super().__init__(*args, **kwargs)
-
- def train(self, *args, **kwargs):
- """Train ranker based on user feedback, updating ranker weights based on
- the `loss` function.
-
- :param args: Additional arguments.
- :param kwargs: Additional key value arguments.
- """
- raise NotImplementedError
-
- def save(self):
- """Save the of the ranker model."""
- raise NotImplementedError
diff --git a/jina/executors/requests.py b/jina/executors/requests.py
deleted file mode 100644
index 74ac0d5abcba0..0000000000000
--- a/jina/executors/requests.py
+++ /dev/null
@@ -1,46 +0,0 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Dict, List
-
-from ..jaml import JAML
-
-_defaults = {}
-
-
-def get_default_reqs(cls_mro: List[type]) -> Dict:
- """Get a copy of default meta variables
-
- :param cls_mro: the MRO inherited order followed.
- """
- import copy
-
- global _defaults
-
- for cls in cls_mro:
- try:
- if cls.__name__ not in _defaults:
- from pkg_resources import resource_stream
-
- with resource_stream(
- 'jina',
- '/'.join(('resources', f'executors.requests.{cls.__name__}.yml')),
- ) as fp:
- _defaults[cls.__name__] = JAML.load(
- fp
- ) # do not expand variables at here, i.e. DO NOT USE expand_dict(yaml.load(fp))
-
- if cls.__name__ != cls_mro[0].__name__:
- from ..logging import default_logger
-
- default_logger.debug(
- f'"requests.on" setting of {cls_mro[0]} fallback to general {cls} setting, '
- f'because you did not specify {cls_mro[0]}'
- )
- return copy.deepcopy(_defaults[cls.__name__])
- except FileNotFoundError:
- pass
-
- raise ValueError(
- f'not able to find any default settings along this chain {cls_mro!r}'
- )
diff --git a/jina/executors/segmenters/__init__.py b/jina/executors/segmenters/__init__.py
deleted file mode 100644
index e031ce694f79b..0000000000000
--- a/jina/executors/segmenters/__init__.py
+++ /dev/null
@@ -1,21 +0,0 @@
-__copyright__ = "Copyright (c) 2021 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
-from typing import Dict, List, Union
-
-from .. import BaseExecutor
-
-
-class BaseSegmenter(BaseExecutor):
- """:class:`BaseSegmenter` works on doc-level,
- it chunks Documents into set of Chunks
- :param args: Variable length arguments
- :param kwargs: Variable length keyword arguments
- """
-
- def segment(self, *args, **kwargs) -> Union[List[List[Dict]], List[Dict]]:
- """
- :param args: Variable length arguments
- :param kwargs: Variable length keyword arguments
- """
- raise NotImplementedError
diff --git a/jina/flow/__init__.py b/jina/flow/__init__.py
index 12b7997b9c868..bc133c05ed8cd 100644
--- a/jina/flow/__init__.py
+++ b/jina/flow/__init__.py
@@ -1,12 +1,9 @@
from .base import BaseFlow
-from .mixin.control import ControlFlowMixin
-from .mixin.crud import CRUDFlowMixin
+from ..clients.mixin import PostMixin
-class Flow(CRUDFlowMixin, ControlFlowMixin, BaseFlow):
+class Flow(PostMixin, BaseFlow):
"""The synchronous version of :class:`AsyncFlow`.
For proper usage see `this guide`
"""
-
- pass
diff --git a/jina/flow/asyncio.py b/jina/flow/asyncio.py
index 95c718902dbb2..067dc76440e2a 100644
--- a/jina/flow/asyncio.py
+++ b/jina/flow/asyncio.py
@@ -1,10 +1,9 @@
from .base import BaseFlow
-from .mixin.async_crud import AsyncCRUDFlowMixin
-from .mixin.async_control import AsyncControlFlowMixin
from ..clients.asyncio import AsyncClient, AsyncWebSocketClient
+from ..clients.mixin import AsyncPostMixin
-class AsyncFlow(AsyncCRUDFlowMixin, AsyncControlFlowMixin, BaseFlow):
+class AsyncFlow(AsyncPostMixin, BaseFlow):
"""
:class:`AsyncFlow` is the asynchronous version of the :class:`Flow`. They share the same interface, except
in :class:`AsyncFlow` :meth:`train`, :meth:`index`, :meth:`search` methods are coroutines
@@ -38,28 +37,7 @@ class AsyncFlow(AsyncCRUDFlowMixin, AsyncControlFlowMixin, BaseFlow):
https://ipython.readthedocs.io/en/stable/interactive/autoawait.html
Another example is when using Jina as an integration. Say you have another IO-bounded job ``heavylifting()``, you
- can use this feature to schedule Jina ``index()`` and ``heavylifting()`` concurrently. For example,
-
- .. highlight:: python
- .. code-block:: python
-
- async def run_async_flow_5s():
- # WaitDriver pause 5s makes total roundtrip ~5s
- with AsyncFlow().add(uses='- !WaitDriver {}') as f:
- await f.index_ndarray(np.random.random([5, 4]), on_done=validate)
-
-
- async def heavylifting():
- # total roundtrip takes ~5s
- print('heavylifting other io-bound jobs, e.g. download, upload, file io')
- await asyncio.sleep(5)
- print('heavylifting done after 5s')
-
-
- async def concurrent_main():
- # about 5s; but some dispatch cost, can't be just 5s, usually at <7s
- await asyncio.gather(run_async_flow_5s(), heavylifting())
-
+ can use this feature to schedule Jina ``index()`` and ``heavylifting()`` concurrently.
One can think of :class:`Flow` as Jina-managed eventloop, whereas :class:`AsyncFlow` is self-managed eventloop.
"""
diff --git a/jina/flow/base.py b/jina/flow/base.py
index 956d4a635dded..d8ce5539862a1 100644
--- a/jina/flow/base.py
+++ b/jina/flow/base.py
@@ -1,6 +1,3 @@
-__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."
-__license__ = "Apache-2.0"
-
import argparse
import base64
import copy
@@ -8,6 +5,7 @@
import re
import threading
import uuid
+import warnings
from collections import OrderedDict, defaultdict
from contextlib import ExitStack
from typing import Optional, Union, Tuple, List, Set, Dict, TextIO
@@ -32,7 +30,7 @@
__all__ = ['BaseFlow']
from ..peapods import Pod
-from ..peapods.pods.compoundpod import CompoundPod
+from ..peapods.pods.compound import CompoundPod
from ..peapods.pods.factory import PodFactory
@@ -106,19 +104,6 @@ def _update_args(self, args, **kwargs):
args, _flow_parser
) #: for yaml dump
- @property
- def yaml_spec(self):
- """
- get the YAML representation of the instance
-
-
- .. # noqa: DAR401
-
-
- .. # noqa: DAR201
- """
- return JAML.dump(self)
-
@staticmethod
def _parse_endpoints(op_flow, pod_name, endpoint, connect_to_last_pod=False) -> Set:
# parsing needs
@@ -307,6 +292,10 @@ def add(
parser = set_gateway_parser()
args = ArgNamespace.kwargs2namespace(kwargs, parser)
+
+ # pod workspace if not set then derive from flow workspace
+ args.workspace = os.path.abspath(args.workspace or self.workspace)
+
op_flow._pod_nodes[pod_name] = PodFactory.build_pod(args, needs)
op_flow.last_pod = pod_name
@@ -364,7 +353,6 @@ def inspect(self, name: str = 'inspect', *args, **kwargs) -> 'BaseFlow':
def gather_inspect(
self,
name: str = 'gather_inspect',
- uses='_merge_eval',
include_last_pod: bool = True,
*args,
**kwargs,
@@ -378,7 +366,6 @@ def gather_inspect(
in general you don't need to manually call :meth:`gather_inspect`.
:param name: the name of the gather Pod
- :param uses: the config of the executor, by default is ``_pass``
:param include_last_pod: if to include the last modified Pod in the Flow
:param args: args for .add()
:param kwargs: kwargs for .add()
@@ -396,7 +383,6 @@ def gather_inspect(
needs.append(self.last_pod)
return self.add(
name=name,
- uses=uses,
needs=needs,
pod_role=PodRoleType.JOIN_INSPECT,
*args,
@@ -609,8 +595,13 @@ def __eq__(self, other: 'BaseFlow') -> bool:
return a._pod_nodes == b._pod_nodes
+ @property
@build_required(FlowBuildLevel.GRAPH)
- def _get_client(self, **kwargs) -> 'Client':
+ def client(self) -> 'Client':
+ """Return a :class:`Client` object attach to this Flow.
+
+ .. # noqa: DAR201"""
+ kwargs = {}
kwargs.update(self._common_kwargs)
if 'port_expose' not in kwargs:
kwargs['port_expose'] = self.port_expose
@@ -792,27 +783,6 @@ def _mermaid_to_url(self, mermaid_str: str, img_type: str) -> str:
return f'https://mermaid.ink/{img_type}/{encoded_str}'
- @build_required(FlowBuildLevel.GRAPH)
- def to_swarm_yaml(self, path: TextIO):
- """
- Generate the docker swarm YAML compose file
-
- :param path: the output yaml path
- """
- swarm_yml = {'version': '3.4', 'services': {}}
-
- for k, v in self._pod_nodes.items():
- if v.role == PodRoleType.GATEWAY:
- cmd = 'jina gateway'
- else:
- cmd = 'jina pod'
- swarm_yml['services'][k] = {
- 'command': f'{cmd} {" ".join(ArgNamespace.kwargs2list(vars(v.args)))}',
- 'deploy': {'parallel': 1},
- }
-
- JAML.dump(swarm_yml, path)
-
@property
@build_required(FlowBuildLevel.GRAPH)
def port_expose(self) -> int:
@@ -919,6 +889,13 @@ def _update_client(self):
if self._pod_nodes['gateway'].args.restful:
self._cls_client = WebSocketClient
+ @property
+ def workspace(self) -> str:
+ """Return the workspace path of the flow.
+
+ .. # noqa: DAR201"""
+ return os.path.abspath(self.args.workspace or './')
+
@property
def workspace_id(self) -> Dict[str, str]:
"""Get all Pods' ``workspace_id`` values in a dict
@@ -939,9 +916,7 @@ def workspace_id(self, value: str):
for k, p in self:
if hasattr(p.args, 'workspace_id'):
p.args.workspace_id = value
- args = getattr(p, 'peas_args', None)
- if args is None:
- args = getattr(p, 'replicas_args', None)
+ args = getattr(p, 'peas_args', getattr(p, 'replicas_args', None))
if args is None:
raise ValueError(
f'could not find "peas_args" or "replicas_args" on {p}'
@@ -990,6 +965,14 @@ def rolling_update(self, pod_name: str, dump_path: Optional[str] = None):
:param dump_path: the path from which to read the dump data
:param pod_name: pod to update
"""
+ # TODO: By design after the Flow object started, Flow shouldn't have memory access to its sub-objects anymore.
+ # All controlling should be issued via Network Request, not via memory access.
+ # In the current master, we have Flow.rolling_update() & Flow.dump() method avoid the above design.
+ # Avoiding this design make the whole system NOT cloud-native.
+ warnings.warn(
+ 'This function is experimental and facing potential refactoring',
+ FutureWarning,
+ )
compound_pod = self._pod_nodes[pod_name]
if isinstance(compound_pod, CompoundPod):
@@ -998,13 +981,3 @@ def rolling_update(self, pod_name: str, dump_path: Optional[str] = None):
raise ValueError(
f'The BasePod {pod_name} is not a CompoundPod and does not support updating'
)
-
- def dump(self, pod_name: str, dump_path: str, shards: int, timeout=-1):
- """Emit a Dump request to a specific Pod
- :param shards: the nr of shards in the dump
- :param dump_path: the path to which to dump
- :param pod_name: the name of the pod
- :param timeout: time to wait (seconds)
- """
- pod: BasePod = self._pod_nodes[pod_name]
- pod.dump(pod_name, dump_path, shards, timeout)
diff --git a/jina/flow/builder.py b/jina/flow/builder.py
index c38c8cacfe98f..15afb9af34eca 100644
--- a/jina/flow/builder.py
+++ b/jina/flow/builder.py
@@ -5,12 +5,11 @@
from .. import __default_host__
from ..enums import SocketType, FlowBuildLevel, PodRoleType
from ..excepts import FlowBuildLevelError, SocketTypeError
-from ..peapods.pods import BasePod
+from ..peapods import BasePod
# noinspection PyUnreachableCode
if False:
from . import Flow
- from ..peapods import BasePod
def build_required(required_level: 'FlowBuildLevel'):
diff --git a/jina/flow/mixin/async_control.py b/jina/flow/mixin/async_control.py
deleted file mode 100644
index 2e8cdf4cad01b..0000000000000
--- a/jina/flow/mixin/async_control.py
+++ /dev/null
@@ -1,30 +0,0 @@
-from typing import Union, Sequence
-
-from ...clients.base import CallbackFnType
-
-
-class AsyncControlFlowMixin:
- """The asynchronous version of the Mixin for controlling, scaling the Flow"""
-
- async def reload(
- self,
- targets: Union[str, Sequence[str]],
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Reload the executor of certain peas/pods in the Flow
- It will start a :py:class:`CLIClient` and call :py:func:`reload`.
-
- :param targets: the regex string or list of regex strings to match the pea/pod names.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yield: result
- """
- async for r in self._get_client(**kwargs).reload(
- targets, on_done, on_error, on_always, **kwargs
- ):
- yield r
diff --git a/jina/flow/mixin/async_crud.py b/jina/flow/mixin/async_crud.py
deleted file mode 100644
index f72ad5b39fcae..0000000000000
--- a/jina/flow/mixin/async_crud.py
+++ /dev/null
@@ -1,611 +0,0 @@
-import warnings
-from typing import Union, Iterable, TextIO, Dict, Optional
-
-import numpy as np
-
-from ...clients.base import InputType, CallbackFnType
-from ...enums import DataInputType
-from ...helper import deprecated_alias
-
-
-class AsyncCRUDFlowMixin:
- """The asynchronous version of the Mixin for CRUD in Flow"""
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def train(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do training on the current Flow
-
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- warnings.warn(f'{self.train} is under heavy refactoring', FutureWarning)
- async for r in self._get_client(**kwargs).train(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def index_ndarray(
- self,
- array: 'np.ndarray',
- axis: int = 0,
- size: Optional[int] = None,
- shuffle: bool = False,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Using numpy ndarray as the index source for the current Flow
-
- :param array: the numpy ndarray data source
- :param axis: iterate over that axis
- :param size: the maximum number of the sub arrays
- :param shuffle: shuffle the the numpy data source beforehand
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_ndarray
-
- async for r in self._get_client(**kwargs).index(
- _input_ndarray(array, axis, size, shuffle),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def search_ndarray(
- self,
- array: 'np.ndarray',
- axis: int = 0,
- size: Optional[int] = None,
- shuffle: bool = False,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a numpy ndarray as the query source for searching on the current Flow
-
- :param array: the numpy ndarray data source
- :param axis: iterate over that axis
- :param size: the maximum number of the sub arrays
- :param shuffle: shuffle the the numpy data source beforehand
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_ndarray
-
- async for r in self._get_client(**kwargs).search(
- _input_ndarray(array, axis, size, shuffle),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def index_lines(
- self,
- lines: Optional[Union[Iterable[str], TextIO]] = None,
- filepath: Optional[str] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: str = 'r',
- line_format: str = 'json',
- field_resolver: Optional[Dict[str, str]] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
-
- :param lines: a list of strings, each is considered as d document
- :param filepath: a text file that each line contains a document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in binary
- :param line_format: the format of each line: ``json`` or ``csv``
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_lines
-
- async for r in self._get_client(**kwargs).index(
- _input_lines(
- lines,
- filepath,
- size=size,
- sampling_rate=sampling_rate,
- read_mode=read_mode,
- line_format=line_format,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- ):
- yield r
-
- async def index_csv(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Dict[str, str] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_csv
-
- async for r in self._get_client(**kwargs).index(
- _input_csv(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- ):
- yield r
-
- async def index_ndjson(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_ndjson
-
- async for r in self._get_client(**kwargs).index(
- _input_ndjson(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def index_files(
- self,
- patterns: Union[str, Iterable[str]],
- recursive: bool = True,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: Optional[str] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a set of files as the index source for indexing on the current Flow
-
- :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]'
- :param recursive: If recursive is true, the pattern '**' will match any files and
- zero or more directories and subdirectories.
- :param size: the maximum number of the files
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in binary mode
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_files
-
- async for r in self._get_client(**kwargs).index(
- _input_files(patterns, recursive, size, sampling_rate, read_mode),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def search_files(
- self,
- patterns: Union[str, Iterable[str]],
- recursive: bool = True,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: Optional[str] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a set of files as the query source for searching on the current Flow
-
- :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]'
- :param recursive: If recursive is true, the pattern '**' will match any files and
- zero or more directories and subdirectories.
- :param size: the maximum number of the files
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_files
-
- async for r in self._get_client(**kwargs).search(
- _input_files(patterns, recursive, size, sampling_rate, read_mode),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- ):
- yield r
-
- async def search_ndjson(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of files as the query source for searching on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_ndjson
-
- async for r in self._get_client(**kwargs).search(
- _input_ndjson(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- ):
- yield r
-
- async def search_csv(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_csv
-
- async for r in self._get_client(**kwargs).search(
- _input_csv(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def search_lines(
- self,
- lines: Optional[Union[Iterable[str], TextIO]] = None,
- filepath: Optional[str] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: str = 'r',
- line_format: str = 'json',
- field_resolver: Optional[Dict[str, str]] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of files as the query source for searching on the current Flow
-
- :param filepath: a text file that each line contains a document
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in binary
- :param line_format: the format of each line: ``json`` or ``csv``
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- from ...clients.sugary_io import _input_lines
-
- async for r in self._get_client(**kwargs).search(
- _input_lines(
- lines,
- filepath,
- size=size,
- sampling_rate=sampling_rate,
- read_mode=read_mode,
- line_format=line_format,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def index(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do indexing on the current Flow
-
- It will start a :py:class:`CLIClient` and call :py:func:`index`.
-
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- async for r in self._get_client(**kwargs).index(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def update(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do updates on the current Flow
-
- It will start a :py:class:`CLIClient` and call :py:func:`index`.
-
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- async for r in self._get_client(**kwargs).update(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def delete(
- self,
- ids: Iterable[str],
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do deletion on the current Flow
-
- :param ids: An iterable of ids
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- async for r in self._get_client(**kwargs).delete(
- ids, on_done, on_error, on_always, **kwargs
- ):
- yield r
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- async def search(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do searching on the current Flow
-
- It will start a :py:class:`CLIClient` and call :py:func:`search`.
-
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :yields: results
- """
- async for r in self._get_client(**kwargs).search(
- inputs, on_done, on_error, on_always, **kwargs
- ):
- yield r
diff --git a/jina/flow/mixin/control.py b/jina/flow/mixin/control.py
deleted file mode 100644
index 90b3306bc8d77..0000000000000
--- a/jina/flow/mixin/control.py
+++ /dev/null
@@ -1,29 +0,0 @@
-from typing import Union, Sequence
-
-from ...clients.base import CallbackFnType
-
-
-class ControlFlowMixin:
- """The synchronous version of the Mixin for controlling, scaling the Flow"""
-
- def reload(
- self,
- targets: Union[str, Sequence[str]],
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Reload the executor of certain peas/pods in the Flow
- It will start a :py:class:`CLIClient` and call :py:func:`reload`.
-
- :param targets: the regex string or list of regex strings to match the pea/pod names.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- return self._get_client(**kwargs).reload(
- targets, on_done, on_error, on_always, **kwargs
- )
diff --git a/jina/flow/mixin/crud.py b/jina/flow/mixin/crud.py
deleted file mode 100644
index cb54e4346e974..0000000000000
--- a/jina/flow/mixin/crud.py
+++ /dev/null
@@ -1,565 +0,0 @@
-import warnings
-from typing import Union, Iterable, TextIO, Dict, Optional
-
-import numpy as np
-
-from ...clients.base import InputType, InputDeleteType, CallbackFnType
-from ...enums import DataInputType
-from ...helper import deprecated_alias
-
-
-class CRUDFlowMixin:
- """The synchronous version of the Mixin for CRUD in Flow"""
-
- @deprecated_alias(input_fn=('inputs', 0))
- def train(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do training on the current Flow
-
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- warnings.warn(f'{self.train} is under heavy refactoring', FutureWarning)
- return self._get_client(**kwargs).train(
- inputs, on_done, on_error, on_always, **kwargs
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def index_ndarray(
- self,
- array: 'np.ndarray',
- axis: int = 0,
- size: Optional[int] = None,
- shuffle: bool = False,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Using numpy ndarray as the index source for the current Flow
-
- :param array: the numpy ndarray data source
- :param axis: iterate over that axis
- :param size: the maximum number of the sub arrays
- :param shuffle: shuffle the the numpy data source beforehand
- :param on_done: the callback function to invoke after indexing
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_ndarray
-
- return self._get_client(**kwargs).index(
- _input_ndarray(array, axis, size, shuffle),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def search_ndarray(
- self,
- array: 'np.ndarray',
- axis: int = 0,
- size: Optional[int] = None,
- shuffle: bool = False,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a numpy ndarray as the query source for searching on the current Flow
-
- :param array: the numpy ndarray data source
- :param axis: iterate over that axis
- :param size: the maximum number of the sub arrays
- :param shuffle: shuffle the the numpy data source beforehand
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- """
- from ...clients.sugary_io import _input_ndarray
-
- self._get_client(**kwargs).search(
- _input_ndarray(array, axis, size, shuffle),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def index_lines(
- self,
- lines: Optional[Union[Iterable[str], TextIO]] = None,
- filepath: Optional[str] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: str = 'r',
- line_format: str = 'json',
- field_resolver: Optional[Dict[str, str]] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param filepath: a text file that each line contains a document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in binary
- :param line_format: the format of each line: ``json`` or ``csv``
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_lines
-
- return self._get_client(**kwargs).index(
- _input_lines(
- lines,
- filepath,
- size=size,
- sampling_rate=sampling_rate,
- read_mode=read_mode,
- line_format=line_format,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- )
-
- def index_ndjson(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_ndjson
-
- return self._get_client(**kwargs).index(
- _input_ndjson(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- )
-
- def index_csv(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_csv
-
- return self._get_client(**kwargs).index(
- _input_csv(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- )
-
- def search_csv(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of lines as the index source for indexing on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_csv
-
- return self._get_client(**kwargs).search(
- _input_csv(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def index_files(
- self,
- patterns: Union[str, Iterable[str]],
- recursive: bool = True,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: Optional[str] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a set of files as the index source for indexing on the current Flow
- :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]'
- :param recursive: If recursive is true, the pattern '**' will match any files and
- zero or more directories and subdirectories.
- :param size: the maximum number of the files
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in binary mode
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_files
-
- return self._get_client(**kwargs).index(
- _input_files(patterns, recursive, size, sampling_rate, read_mode),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def search_files(
- self,
- patterns: Union[str, Iterable[str]],
- recursive: bool = True,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: Optional[str] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a set of files as the query source for searching on the current Flow
- :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]'
- :param recursive: If recursive is true, the pattern '**' will match any files and
- zero or more directories and subdirectories.
- :param size: the maximum number of the files
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_files
-
- return self._get_client(**kwargs).search(
- _input_files(patterns, recursive, size, sampling_rate, read_mode),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.CONTENT,
- **kwargs,
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def search_lines(
- self,
- lines: Optional[Union[Iterable[str], TextIO]] = None,
- filepath: Optional[str] = None,
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- read_mode: str = 'r',
- line_format: str = 'json',
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of files as the query source for searching on the current Flow
- :param filepath: a text file that each line contains a document
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param read_mode: specifies the mode in which the file
- is opened. 'r' for reading in text mode, 'rb' for reading in binary
- :param line_format: the format of each line ``json`` or ``csv``
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_lines
-
- return self._get_client(**kwargs).search(
- _input_lines(
- lines,
- filepath,
- size=size,
- sampling_rate=sampling_rate,
- read_mode=read_mode,
- line_format=line_format,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- )
-
- def search_ndjson(
- self,
- lines: Union[Iterable[str], TextIO],
- field_resolver: Optional[Dict[str, str]] = None,
- size: Optional[int] = None,
- sampling_rate: Optional[float] = None,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Use a list of files as the query source for searching on the current Flow
- :param lines: a list of strings, each is considered as d document
- :param size: the maximum number of the documents
- :param sampling_rate: the sampling rate between [0, 1]
- :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field
- names defined in Protobuf. This is only used when the given ``document`` is
- a JSON string or a Python dict.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- from ...clients.sugary_io import _input_ndjson
-
- return self._get_client(**kwargs).search(
- _input_ndjson(
- lines,
- size=size,
- sampling_rate=sampling_rate,
- field_resolver=field_resolver,
- ),
- on_done,
- on_error,
- on_always,
- data_type=DataInputType.AUTO,
- **kwargs,
- )
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def index(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do indexing on the current Flow
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- return self._get_client(**kwargs).index(
- inputs, on_done, on_error, on_always, **kwargs
- )
-
- @deprecated_alias(input_fn=('inputs', 0))
- def update(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Updates Documents on the current Flow
-
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- """
- self._get_client(**kwargs).update(
- inputs, on_done, on_error, on_always, **kwargs
- )
-
- def delete(
- self,
- ids: InputDeleteType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do deletion on the current Flow
-
- :param ids: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- """
- self._get_client(**kwargs).delete(ids, on_done, on_error, on_always, **kwargs)
-
- @deprecated_alias(
- input_fn=('inputs', 0),
- buffer=('inputs', 1),
- callback=('on_done', 1),
- output_fn=('on_done', 1),
- )
- def search(
- self,
- inputs: InputType,
- on_done: CallbackFnType = None,
- on_error: CallbackFnType = None,
- on_always: CallbackFnType = None,
- **kwargs,
- ):
- """Do searching on the current Flow
- It will start a :py:class:`CLIClient` and call :py:func:`search`.
-
- :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**.
- :param on_done: the function to be called when the :class:`Request` object is resolved.
- :param on_error: the function to be called when the :class:`Request` object is rejected.
- :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected.
- :param kwargs: accepts all keyword arguments of `jina client` CLI
- :return: results
- """
- return self._get_client(**kwargs).search(
- inputs, on_done, on_error, on_always, **kwargs
- )
diff --git a/jina/helloworld/chatbot/__init__.py b/jina/helloworld/chatbot/__init__.py
index 9cbfa9558d014..e69de29bb2d1d 100644
--- a/jina/helloworld/chatbot/__init__.py
+++ b/jina/helloworld/chatbot/__init__.py
@@ -1,71 +0,0 @@
-import os
-import webbrowser
-from pathlib import Path
-
-from pkg_resources import resource_filename
-
-from ..helper import download_data
-from ... import Flow
-from ...importer import ImportExtensions
-from ...logging import default_logger
-
-
-def hello_world(args):
- """
- Execute the chatbot example.
-
- :param args: arguments passed from CLI
- """
- Path(args.workdir).mkdir(parents=True, exist_ok=True)
-
- with ImportExtensions(
- required=True,
- help_text='this demo requires Pytorch and Transformers to be installed, '
- 'if you haven\'t, please do `pip install jina[torch,transformers]`',
- ):
- import transformers, torch
-
- assert [torch, transformers] #: prevent pycharm auto remove the above line
-
- targets = {
- 'covid-csv': {
- 'url': args.index_data_url,
- 'filename': os.path.join(args.workdir, 'dataset.csv'),
- }
- }
-
- # download the data
- download_data(targets, args.download_proxy, task_name='download csv data')
-
- # this envs are referred in index and query flow YAMLs
- os.environ['HW_WORKDIR'] = args.workdir
-
- # now comes the real work
- # load index flow from a YAML file
-
- f = (
- Flow()
- .add(uses='TransformerTorchEncoder', parallel=args.parallel)
- .add(
- uses=f'{resource_filename("jina", "resources")}/chatbot/helloworld.indexer.yml'
- )
- )
-
- # index it!
- with f, open(targets['covid-csv']['filename']) as fp:
- f.index_csv(fp, field_resolver={'question': 'text', 'url': 'uri'})
-
- # switch to REST gateway
- f.use_rest_gateway(args.port_expose)
- with f:
- try:
- webbrowser.open(args.demo_url, new=2)
- except:
- pass # intentional pass, browser support isn't cross-platform
- finally:
- default_logger.success(
- f'You should see a demo page opened in your browser, '
- f'if not, you may open {args.demo_url} manually'
- )
- if not args.unblock_query_flow:
- f.block()
diff --git a/jina/helloworld/chatbot/app.py b/jina/helloworld/chatbot/app.py
new file mode 100644
index 0000000000000..e9aacd86de802
--- /dev/null
+++ b/jina/helloworld/chatbot/app.py
@@ -0,0 +1,103 @@
+import os
+import urllib.request
+import webbrowser
+from pathlib import Path
+
+from jina import Flow, Document
+from jina.importer import ImportExtensions
+from jina.logging import default_logger
+from jina.logging.profile import ProgressBar
+from jina.parsers.helloworld import set_hw_chatbot_parser
+
+if __name__ == '__main__':
+ from executors import MyTransformer, MyIndexer
+else:
+ from .executors import MyTransformer, MyIndexer
+
+
+def hello_world(args):
+ """
+ Execute the chatbot example.
+
+ :param args: arguments passed from CLI
+ """
+ Path(args.workdir).mkdir(parents=True, exist_ok=True)
+
+ with ImportExtensions(
+ required=True,
+ help_text='this demo requires Pytorch and Transformers to be installed, '
+ 'if you haven\'t, please do `pip install jina[torch,transformers]`',
+ ):
+ import transformers, torch
+
+ assert [torch, transformers] #: prevent pycharm auto remove the above line
+
+ targets = {
+ 'covid-csv': {
+ 'url': args.index_data_url,
+ 'filename': os.path.join(args.workdir, 'dataset.csv'),
+ }
+ }
+
+ # download the data
+ download_data(targets, args.download_proxy, task_name='download csv data')
+
+ # now comes the real work
+ # load index flow from a YAML file
+
+ f = (
+ Flow()
+ .add(uses=MyTransformer, parallel=args.parallel)
+ .add(uses=MyIndexer, workspace=args.workdir)
+ )
+
+ # index it!
+ with f, open(targets['covid-csv']['filename']) as fp:
+ f.index(Document.from_csv(fp, field_resolver={'question': 'text'}))
+
+ # switch to REST gateway
+ url_html_path = 'file://' + os.path.abspath(
+ os.path.join(os.path.dirname(os.path.realpath(__file__)), 'static/index.html')
+ )
+ f.use_rest_gateway(args.port_expose)
+ with f:
+ try:
+ webbrowser.open(url_html_path, new=2)
+ except:
+ pass # intentional pass, browser support isn't cross-platform
+ finally:
+ default_logger.success(
+ f'You should see a demo page opened in your browser, '
+ f'if not, you may open {url_html_path} manually'
+ )
+ if not args.unblock_query_flow:
+ f.block()
+
+
+def download_data(targets, download_proxy=None, task_name='download fashion-mnist'):
+ """
+ Download data.
+
+ :param targets: target path for data.
+ :param download_proxy: download proxy (e.g. 'http', 'https')
+ :param task_name: name of the task
+ """
+ opener = urllib.request.build_opener()
+ opener.addheaders = [('User-agent', 'Mozilla/5.0')]
+ if download_proxy:
+ proxy = urllib.request.ProxyHandler(
+ {'http': download_proxy, 'https': download_proxy}
+ )
+ opener.add_handler(proxy)
+ urllib.request.install_opener(opener)
+ with ProgressBar(task_name=task_name, batch_unit='') as t:
+ for k, v in targets.items():
+ if not os.path.exists(v['filename']):
+ urllib.request.urlretrieve(
+ v['url'], v['filename'], reporthook=lambda *x: t.update_tick(0.01)
+ )
+
+
+if __name__ == '__main__':
+ args = set_hw_chatbot_parser().parse_args()
+ hello_world(args)
diff --git a/jina/helloworld/chatbot/executors.py b/jina/helloworld/chatbot/executors.py
new file mode 100644
index 0000000000000..746f42441f558
--- /dev/null
+++ b/jina/helloworld/chatbot/executors.py
@@ -0,0 +1,169 @@
+import os
+from pathlib import Path
+from typing import Optional, Dict, Tuple
+
+import numpy as np
+import torch
+from transformers import AutoModel, AutoTokenizer
+
+from jina import Executor, DocumentArray, requests, Document
+
+
+class MyTransformer(Executor):
+ """Transformer executor class """
+
+ def __init__(
+ self,
+ pretrained_model_name_or_path: str = 'sentence-transformers/distilbert-base-nli-stsb-mean-tokens',
+ base_tokenizer_model: Optional[str] = None,
+ pooling_strategy: str = 'mean',
+ layer_index: int = -1,
+ max_length: Optional[int] = None,
+ acceleration: Optional[str] = None,
+ embedding_fn_name: str = '__call__',
+ *args,
+ **kwargs,
+ ):
+ super().__init__(*args, **kwargs)
+ self.pretrained_model_name_or_path = pretrained_model_name_or_path
+ self.base_tokenizer_model = (
+ base_tokenizer_model or pretrained_model_name_or_path
+ )
+ self.pooling_strategy = pooling_strategy
+ self.layer_index = layer_index
+ self.max_length = max_length
+ self.acceleration = acceleration
+ self.embedding_fn_name = embedding_fn_name
+ self.tokenizer = AutoTokenizer.from_pretrained(self.base_tokenizer_model)
+ self.model = AutoModel.from_pretrained(
+ self.pretrained_model_name_or_path, output_hidden_states=True
+ )
+ self.model.to(torch.device('cpu'))
+
+ def _compute_embedding(self, hidden_states: 'torch.Tensor', input_tokens: Dict):
+ import torch
+
+ fill_vals = {'cls': 0.0, 'mean': 0.0, 'max': -np.inf, 'min': np.inf}
+ fill_val = torch.tensor(
+ fill_vals[self.pooling_strategy], device=torch.device('cpu')
+ )
+
+ layer = hidden_states[self.layer_index]
+ attn_mask = input_tokens['attention_mask'].unsqueeze(-1).expand_as(layer)
+ layer = torch.where(attn_mask.bool(), layer, fill_val)
+
+ embeddings = layer.sum(dim=1) / attn_mask.sum(dim=1)
+ return embeddings.cpu().numpy()
+
+ @requests
+ def encode(self, docs: 'DocumentArray', *args, **kwargs):
+ import torch
+
+ with torch.no_grad():
+
+ if not self.tokenizer.pad_token:
+ self.tokenizer.add_special_tokens({'pad_token': '[PAD]'})
+ self.model.resize_token_embeddings(len(self.tokenizer.vocab))
+
+ input_tokens = self.tokenizer(
+ docs.get_attributes('content'),
+ max_length=self.max_length,
+ padding='longest',
+ truncation=True,
+ return_tensors='pt',
+ )
+ input_tokens = {
+ k: v.to(torch.device('cpu')) for k, v in input_tokens.items()
+ }
+
+ outputs = getattr(self.model, self.embedding_fn_name)(**input_tokens)
+ if isinstance(outputs, torch.Tensor):
+ return outputs.cpu().numpy()
+ hidden_states = outputs.hidden_states
+
+ embeds = self._compute_embedding(hidden_states, input_tokens)
+ for doc, embed in zip(docs, embeds):
+ doc.embedding = embed
+
+
+class MyIndexer(Executor):
+ """Simple indexer class """
+
+ def __init__(self, **kwargs):
+ super().__init__(**kwargs)
+ self._docs = DocumentArray()
+ Path(self.workspace).mkdir(parents=True, exist_ok=True)
+ self.filename = os.path.join(self.workspace, 'chatbot.ndjson')
+ if os.path.exists(self.filename):
+ self._docs = DocumentArray.load(self.filename)
+
+ def close(self) -> None:
+ self._docs.save(self.filename)
+
+ @requests(on='/index')
+ def index(self, docs: 'DocumentArray', **kwargs):
+ self._docs.extend(docs)
+
+ @requests(on='/search')
+ def search(self, docs: 'DocumentArray', **kwargs):
+ a = np.stack(docs.get_attributes('embedding'))
+ b = np.stack(self._docs.get_attributes('embedding'))
+ q_emb = _ext_A(_norm(a))
+ d_emb = _ext_B(_norm(b))
+ dists = _cosine(q_emb, d_emb)
+ idx, dist = self._get_sorted_top_k(dists, 1)
+ for _q, _ids, _dists in zip(docs, idx, dist):
+ for _id, _dist in zip(_ids, _dists):
+ d = Document(self._docs[int(_id)], copy=True)
+ d.score.value = 1 - _dist
+ _q.matches.append(d)
+
+ @staticmethod
+ def _get_sorted_top_k(
+ dist: 'np.array', top_k: int
+ ) -> Tuple['np.ndarray', 'np.ndarray']:
+ if top_k >= dist.shape[1]:
+ idx = dist.argsort(axis=1)[:, :top_k]
+ dist = np.take_along_axis(dist, idx, axis=1)
+ else:
+ idx_ps = dist.argpartition(kth=top_k, axis=1)[:, :top_k]
+ dist = np.take_along_axis(dist, idx_ps, axis=1)
+ idx_fs = dist.argsort(axis=1)
+ idx = np.take_along_axis(idx_ps, idx_fs, axis=1)
+ dist = np.take_along_axis(dist, idx_fs, axis=1)
+
+ return idx, dist
+
+
+def _get_ones(x, y):
+ return np.ones((x, y))
+
+
+def _ext_A(A):
+ nA, dim = A.shape
+ A_ext = _get_ones(nA, dim * 3)
+ A_ext[:, dim : 2 * dim] = A
+ A_ext[:, 2 * dim :] = A ** 2
+ return A_ext
+
+
+def _ext_B(B):
+ nB, dim = B.shape
+ B_ext = _get_ones(dim * 3, nB)
+ B_ext[:dim] = (B ** 2).T
+ B_ext[dim : 2 * dim] = -2.0 * B.T
+ del B
+ return B_ext
+
+
+def _euclidean(A_ext, B_ext):
+ sqdist = A_ext.dot(B_ext).clip(min=0)
+ return np.sqrt(sqdist)
+
+
+def _norm(A):
+ return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True)
+
+
+def _cosine(A_norm_ext, B_norm_ext):
+ return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2
diff --git a/jina/helloworld/chatbot/static/index.html b/jina/helloworld/chatbot/static/index.html
new file mode 100644
index 0000000000000..56f0fb801ecd7
--- /dev/null
+++ b/jina/helloworld/chatbot/static/index.html
@@ -0,0 +1,59 @@
+
+
+
+
+ COVID-19 Simple QA Demo
+
+
+
+
+
+
+
+
+
+
+
+
+
Covid-19 Simple QA
+
Powered by Jina
+
+
+
+
+
+
+
+
+
+
+
+
+ Chatbox UI credited to https://codepen.io/supah/ Copyright reserved Fabio Ottaviani
+ Covid19 dataset from https://www.kaggle.com/xhlulu/covidqa
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/jina/helloworld/chatbot/static/license.txt b/jina/helloworld/chatbot/static/license.txt
new file mode 100644
index 0000000000000..c9870ab94316f
--- /dev/null
+++ b/jina/helloworld/chatbot/static/license.txt
@@ -0,0 +1,8 @@
+Copyright (c) 2021 by Fabio Ottaviani (https://codepen.io/supah/pen/jqOBqp)
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
+
diff --git a/jina/helloworld/chatbot/static/script.js b/jina/helloworld/chatbot/static/script.js
new file mode 100644
index 0000000000000..990a1a91658ea
--- /dev/null
+++ b/jina/helloworld/chatbot/static/script.js
@@ -0,0 +1,100 @@
+var $messages = $('.messages-content'),
+ d, h, m,
+ i = 0;
+
+$(window).load(function () {
+ $messages.mCustomScrollbar();
+ setTimeout(function () {
+ fakeMessage("Hi there, please ask me COVID-19 related questions. For example,