diff --git a/.github/.is_A b/.github/.is_A deleted file mode 100644 index 8b137891791fe..0000000000000 --- a/.github/.is_A +++ /dev/null @@ -1 +0,0 @@ - diff --git a/.github/1500x667new.gif b/.github/1500x667new.gif deleted file mode 100644 index 7560375eb4424..0000000000000 Binary files a/.github/1500x667new.gif and /dev/null differ diff --git "a/.github/1500\321\205667.gif" "b/.github/1500\321\205667.gif" deleted file mode 100644 index c022c0d59a7c2..0000000000000 Binary files "a/.github/1500\321\205667.gif" and /dev/null differ diff --git a/.github/2.0/1.xvs2.0BaseExecutor.svg b/.github/2.0/1.xvs2.0BaseExecutor.svg new file mode 100644 index 0000000000000..c59f580311fb5 --- /dev/null +++ b/.github/2.0/1.xvs2.0BaseExecutor.svg @@ -0,0 +1,40 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/.github/2.0/cookbooks/CleanCode.md b/.github/2.0/cookbooks/CleanCode.md new file mode 100644 index 0000000000000..0cf5fadc31457 --- /dev/null +++ b/.github/2.0/cookbooks/CleanCode.md @@ -0,0 +1,84 @@ +# Temporary Cookbook on Clean Code + +Jina is designed as a lean and efficient framework. Solutions built on top of Jina also mean to be so. Here are some +tips to help you write clean & beautiful code. + + + + + + + + +1. `from jina import Document, DocumentArray, Executor, Flow, requests` is all you need. Copy-paste it as the first line of your code. + +1. No need to implement `__init__` if your `Executor` does not contain initial states. + + ✅ Do: + ```python + from jina import Executor + + class MyExecutor(Executor): + def foo(self, **kwargs): + ... + ``` + 😔 Don't: + ```python + from jina import Executor + + class MyExecutor(Executor): + def __init__(**kwargs): + super().__init__(**kwargs) + + def foo(self, **kwargs): + ... + ``` + +1. Use `@requests` without specifying `on=` if your function mean to work on all requests. You can use it for catching all requests that are not for this Executor. + + ✅ Do: + ```python + from jina import Executor, requests + + class MyExecutor(Executor): + + @requests + def _skip_all(self, **kwargs): + pass + ``` + 😔 Don't: + ```python + from jina import Executor + + class MyExecutor(Executor): + @requests(on='/index') + def _skip_index(self, **kwargs): + pass + + @requests(on='/search') + def _skip_search(self, **kwargs): + pass + ``` + +1. Fold unnecessary arguments into `**kwargs`, only get what you need. + + ✅ Do: + ```python + from jina import Executor, requests + + class MyExecutor(Executor): + + @requests + def foo_need_pars_only(self, parameters, **kwargs): + print(parameters) + ``` + 😔 Don't: + ```python + from jina import Executor, requests + + class MyExecutor(Executor): + + @requests + def foo_need_pars_only(self, docs, parameters, docs_matrix, groundtruths_matrix, **kwargs): + print(parameters) + ``` \ No newline at end of file diff --git a/.github/2.0/cookbooks/Document.md b/.github/2.0/cookbooks/Document.md new file mode 100644 index 0000000000000..662cb5d3d7c98 --- /dev/null +++ b/.github/2.0/cookbooks/Document.md @@ -0,0 +1,612 @@ +Document, Executor, Flow are three fundamental concepts in Jina. + +- [**Document**](Document.md) is the basic data type in Jina; +- [**Executor**](Executor.md) is how Jina processes Documents; +- [**Flow**](Flow.md) is how Jina streamlines and scales Executors. + +*Learn them all, nothing more, you are good to go.* + +--- + +# Cookbook on `Document`/`DocumentArray` 2.0 API + +`Document` is the basic data type that Jina operates with. Text, picture, video, audio, image, 3D mesh, they are +all `Document` in Jina. + +`DocumentArray` is a sequence container of `Document`. It is the first-class citizen of `Executor`, serving as the input +& output. + +One can say `Document` to Jina is like `np.float` to Numpy, then `DocumentArray` is like `np.ndarray`. + + + +Table of Contents + +- [Minimum working example](#minimum-working-example) +- [`Document` API](#document-api) + - [`Document` Attributes](#document-attributes) + - [Construct `Document`](#construct-document) + - [Exclusivity of `doc.content`](#exclusivity-of-doccontent) + - [Conversion between `doc.content`](#conversion-between-doccontent) + - [Construct with Multiple Attributes](#construct-with-multiple-attributes) + - [Construct from Dict or JSON String](#construct-from-dict-or-json-string) + - [Construct from Another `Document`](#construct-from-another-document) + - [Construct from JSON, CSV, `ndarray` and Files](#construct-from-json-csv-ndarray-and-files) + - [Serialize `Document`](#serialize-document) + - [Add Recursion to `Document`](#add-recursion-to-document) + - [Recursive Attributes](#recursive-attributes) + - [Visualize `Document`](#visualize-document) + - [Add Relevancy to `Document`](#add-relevancy-to-document) + - [Relevance Attributes](#relevance-attributes) +- [`DocumentArray` API](#documentarray-api) + - [Construct `DocumentArray`](#construct-documentarray) + - [Persistence via `save()`/`load()`](#persistence-via-saveload) + - [Access Element](#access-element) + - [Sort Elements](#sort-elements) + - [Filter Elements](#filter-elements) + - [Use `itertools` on `DocumentArray`](#use-itertools-on-documentarray) + - [Get Attributes in Bulk](#get-attributes-in-bulk) + + + +## Minimum working example + +```python +from jina import Document + +d = Document() +``` + +## `Document` API + +### `Document` Attributes + +A `Document` object has the following attributes, which can be put into the following categories: + +| | | +|---|---| +| Content attributes | `.buffer`, `.blob`, `.text`, `.uri`, `.content`, `.embedding` | +| Meta attributes | `.id`, `.weight`, `.mime_type`, `.location`, `.tags`, `.offset`, `.modality` | +| Recursive attributes | `.chunks`, `.matches`, `.granularity`, `.adjacency` | +| Relevance attributes | `.score`, `.evaluations` | + +### Construct `Document` + +##### Content Attributes + +| | | +| --- | --- | +| `doc.buffer` | The raw binary content of this document | +| `doc.blob` | The `ndarray` of the image/audio/video document | +| `doc.text` | The text info of the document | +| `doc.uri` | A uri of the document could be: a local file path, a remote url starts with http or https or data URI scheme | +| `doc.content` | One of the above non-empty field | +| `doc.embedding` | The embedding `ndarray` of this Document | + +You can assign `str`, `ndarray`, `buffer`, `uri` to a `Document`. + +```python +from jina import Document +import numpy as np + +d1 = Document(content='hello') +d2 = Document(content=b'\f1') +d3 = Document(content=np.array([1, 2, 3])) +d4 = Document(content='https://static.jina.ai/logo/core/notext/light/logo.png') +``` + +```text + + + + +``` + +The content will be automatically assigned to one of `text`, `buffer`, `blob`, `uri` fields, `id` and `mime_type` are +auto-generated when not given. + +In Jupyter notebook or use `.plot()`, you can get the visualization of a `Document` object. + + + +#### Exclusivity of `doc.content` + +![](../doc.content.svg?raw=true) + +Note that one `Document` can only contain one type of `content`: it is one of `text`, `buffer`, `blob`, `uri`. +Setting `text` first and then set `uri` will clear the `text field. + +```python +d = Document(text='hello world') +d.uri = 'https://jina.ai/' +assert not d.text # True + +d = Document(content='https://jina.ai') +assert d.uri == 'https://jina.ai' # True +assert not d.text # True +d.text = 'hello world' + +assert d.content == 'hello world' # True +assert not d.uri # True +``` + +#### Conversion between `doc.content` + +You can use the following methods to convert between `.uri`, `.text`, `.buffer`, `.blob`: + +```python +doc.convert_buffer_to_blob() +doc.convert_blob_to_buffer() +doc.convert_uri_to_buffer() +doc.convert_buffer_to_uri() +doc.convert_text_to_uri() +doc.convert_uri_to_text() +``` + +You can convert a URI to data URI (a data in-line URI scheme) using `doc.convert_uri_to_datauri()`. This will fetch the +resource and make it inline. + +In particular, when you work with the image `Document`, there are some extra helpers that enables more conversion. + +```python +doc.convert_image_buffer_to_blob() +doc.convert_image_blob_to_uri() +doc.convert_image_uri_to_blob() +doc.convert_image_datauri_to_blob() +``` + +##### Set Embedding + +Embedding is the high-dimensional representation of a `Document`. You can assign any Numpy `ndarray` as its embedding. + +```python +import numpy as np +from jina import Document + +d1 = Document(embedding=np.array([1, 2, 3])) +d2 = Document(embedding=np.array([[1, 2, 3], [4, 5, 6]])) +``` + +#### Construct with Multiple Attributes + +##### Meta Attributes + +| | | +| --- | --- | +| `doc.tags` | A structured data value, consisting of field which map to dynamically typed values | +| `doc.id` | A hexdigest that represents a unique document ID | +| `doc.weight` | The weight of this document | +| `doc.mime_type` | The mime type of this document | +| `doc.location` | The position of the doc, could be start and end index of a string; could be x,y (top, left) coordinate of an image crop; could be timestamp of an audio clip | +| `doc.offset` | The offset of this doc in the previous granularity document| +| `doc.modality` | An identifier to the modality this document belongs to| + +You can assign multiple attributes in the constructor via: + +```python +from jina import Document + +d = Document(content='hello', + uri='https://jina.ai', + mime_type='text/plain', + granularity=1, + adjacency=3, + tags={'foo': 'bar'}) +``` + +```text + +``` + +#### Construct from Dict or JSON String + +You can build a `Document` from `dict` or a JSON string. + +```python +from jina import Document +import json + +d = {'id': 'hello123', 'content': 'world'} +d1 = Document(d) + +d = json.dumps({'id': 'hello123', 'content': 'world'}) +d2 = Document(d) +``` + +##### Parsing Unrecognized Fields + +Unrecognized fields in Dict/JSON string are automatically put into `.tags` field. + +```python +from jina import Document + +d1 = Document({'id': 'hello123', 'foo': 'bar'}) +``` + +```text + +``` + +You can use `field_resolver` to map the external field name to `Document` attributes, e.g. + +```python +from jina import Document + +d1 = Document({'id': 'hello123', 'foo': 'bar'}, field_resolver={'foo': 'content'}) +``` + +```text + +``` + +#### Construct from Another `Document` + +Assigning a `Document` object to another `Document` object will make a shallow copy. + +```python +from jina import Document + +d = Document(content='hello, world!') +d1 = d + +assert id(d) == id(d1) # True +``` + +To make a deep copy, use `copy=True`, + +```python +d1 = Document(d, copy=True) + +assert id(d) == id(d1) # False +``` + +You can update a `Document` partially according to another source `Document`, + +```python +from jina import Document + +s = Document( + id='🐲', + content='hello-world', + tags={'a': 'b'}, + chunks=[Document(id='🐢')], +) +d = Document( + id='🐦', + content='goodbye-world', + tags={'c': 'd'}, + chunks=[Document(id='🐯')], +) + +# only update `id` field +d.update(s, include_fields=('id',)) + +# only preserve `id` field +d.update(s, exclude_fields=('id',)) +``` + +#### Construct from JSON, CSV, `ndarray` and Files + +You can also construct `Document` from common file types such as JSON, CSV, `ndarray` and text files. The following functions will give a generator of `Document`, where each `Document` object corresponds to a line/row in the original format: + +| | | +| --- | --- | +| `Document.from_ndjson()` | Yield `Document` from a line-based JSON file, each line is a `Document` object | +| `Document.from_csv()` | Yield `Document` from a CSV file, each line is a `Document` object | +| `Document.from_files()` | Yield `Document` from a glob files, each file is a `Document` object | +| `Document.from_ndarray()` | Yield `Document` from a `ndarray`, each row (depending on `axis`) is a `Document` object | + +Using generator is sometimes less memory demanding, as it does not load build all Document objects in one shot. + +### Serialize `Document` + +You can serialize a `Document` into JSON string or Python dict or binary string via + +```python +from jina import Document + +d = Document(content='hello, world') +d.json() +``` + +``` +{ + "id": "6a1c7f34-aef7-11eb-b075-1e008a366d48", + "mimeType": "text/plain", + "text": "hello world" +} +``` + +```python +d.dict() +``` + +``` +{'id': '6a1c7f34-aef7-11eb-b075-1e008a366d48', 'mimeType': 'text/plain', 'text': 'hello world'} +``` + +```python +d.binary_str() +``` + +``` +b'\n$6a1c7f34-aef7-11eb-b075-1e008a366d48R\ntext/plainj\x0bhello world' +``` + +### Add Recursion to `Document` + +#### Recursive Attributes + +`Document` can be recurred in both horizontal & vertical way. + +| | | +| --- | --- | +| `doc.chunks` | The list of sub-documents of this document. They have `granularity + 1` but same `adjacency` | +| `doc.matches` | The list of matched documents of this document. They have `adjacency + 1` but same `granularity` | +| `doc.granularity` | The recursion "depth" of the recursive chunks structure | +| `doc.adjacency` | The recursion "width" of the recursive match structure | + +You can add **chunks** (sub-document) and **matches** (neighbour-document) to a `Document` via the following ways: + +- Add in constructor: + + ```python + d = Document(chunks=[Document(), Document()], matches=[Document(), Document()]) + ``` + +- Add to existing `Document`: + + ```python + d = Document() + d.chunks = [Document(), Document()] + d.matches = [Document(), Document()] + ``` + +- Add to existing `doc.chunks` or `doc.matches`: + + ```python + d = Document() + d.chunks.append(Document()) + d.matches.append(Document()) + ``` + +Note that both `doc.chunks` and `doc.matches` return `DocumentArray`, which we will introduce later. + +### Visualize `Document` + +To better see the Document's recursive structure, you can use `.plot()` function. If you are using JupyterLab/Notebook, +all `Document` objects will be auto-rendered. + + + + + + +
+ +```python +import numpy as np +from jina import Document + +d0 = Document(id='🐲', embedding=np.array([0, 0])) +d1 = Document(id='🐦', embedding=np.array([1, 0])) +d2 = Document(id='🐢', embedding=np.array([0, 1])) +d3 = Document(id='🐯', embedding=np.array([1, 1])) + +d0.chunks.append(d1) +d0.chunks[0].chunks.append(d2) +d0.matches.append(d3) + +d0.plot() # simply `d0` on JupyterLab +``` + + + +
+ +### Add Relevancy to `Document` + +#### Relevance Attributes + +| | | +| --- | --- | +| `doc.score` | The relevance information of this document | +| `doc.evaluations` | The evaluation information of this document | + +You can add relevance score to a `Document` object via: + +```python +from jina import Document +d = Document() +d.score.value = 0.96 +d.score.description = 'cosine similarity' +d.score.op_name = 'cosine()' +``` + +```text + +``` + +Score information is often used jointly with `matches`. For example, you often see the indexer adding `matches` as +follows: + +```python +from jina import Document + +# some query document +q = Document() +# get match document `m` +m = Document() +m.score.value = 0.96 +q.matches.append(m) +``` + +## `DocumentArray` API + +`DocumentArray` is a list of `Document` objects. You can construct, delete, insert, sort, traverse a `DocumentArray` +like a Python `list`. + +Methods supported by `DocumentArray`: + +| | | +|--- |--- | +| Python `list`-like interface | `__getitem__`, `__setitem__`, `__delitem__`, `__len__`, `insert`, `append`, `reverse`, `extend`, `pop`, `remove`, `__iadd__`, `__add__`, `__iter__`, `__clear__`, `sort` | +| Persistence | `save`, `load` | +| Advanced getters | `get_attributes`, `get_attributes_with_docs` | + +### Construct `DocumentArray` + +One can construct a `DocumentArray` from iterable of `Document` via: + +```python +from jina import DocumentArray, Document + +# from list +da1 = DocumentArray([Document(), Document()]) + +# from generator +da2 = DocumentArray((Document() for _ in range(10))) + +# from another `DocumentArray` +da3 = DocumentArray(da2) +``` + +### Persistence via `save()`/`load()` + +To save all elements in a `DocumentArray` in a JSON lines format: + +```python +from jina import DocumentArray, Document + +da = DocumentArray([Document(), Document()]) + +da.save('data.json') +da1 = DocumentArray.load('data.json') +``` + +### Access Element + +You can access a `Document` in the `DocumentArray` via integer index, string `id` and `slice` indices. + +```python +from jina import DocumentArray, Document + +da = DocumentArray([Document(id='hello'), Document(id='world'), Document(id='goodbye')]) + +da[0] +# + +da['world'] +# + +da[1:2] +# +``` + +### Sort Elements + +`DocumentArray` is a subclass of `MutableSequence`, therefore you can use built-in Python `sort` to sort elements in a `DocumentArray` object, e.g. + +```python +from jina import DocumentArray, Document + +da = DocumentArray( + [ + Document(tags={'id': 1}), + Document(tags={'id': 2}), + Document(tags={'id': 3}) + ] +) + +da.sort(key=lambda d: d.tags['id'], reverse=True) +print(da) +``` + +this sorts elements in `da` in-place, using `tags[id]` value in a descending manner: + +```text + + +{'id': '6a79982a-b6b0-11eb-8a66-1e008a366d49', 'tags': {'id': 3.0}}, +{'id': '6a799744-b6b0-11eb-8a66-1e008a366d49', 'tags': {'id': 2.0}}, +{'id': '6a799190-b6b0-11eb-8a66-1e008a366d49', 'tags': {'id': 1.0}} +``` + +### Filter Elements + +You can use [built-in Python `filter()`](https://docs.python.org/3/library/functions.html#filter) to filter elements in a `DocumentArray` object, e.g. + +```python +from jina import DocumentArray, Document + +da = DocumentArray([Document() for _ in range(6)]) + +for j in range(6): + da[j].score.value = j + +for d in filter(lambda d: d.score.value > 2, da): + print(d) +``` + +```text + + + +``` + +You can build a `DocumentArray` object from the filtered result: + +```python +from jina import DocumentArray, Document + +da = DocumentArray([Document(weight=j) for j in range(6)]) +da2 = DocumentArray(list(filter(lambda d: d.weight > 2, da))) + +print(da2) +``` + +```text +DocumentArray has 3 items: +{'id': '3bd0d298-b6da-11eb-b431-1e008a366d49', 'weight': 3.0}, +{'id': '3bd0d324-b6da-11eb-b431-1e008a366d49', 'weight': 4.0}, +{'id': '3bd0d392-b6da-11eb-b431-1e008a366d49', 'weight': 5.0} +``` + +### Use `itertools` on `DocumentArray` + +As `DocumenArray` is an `Iterable`, you can also use [Python built-in `itertools` module](https://docs.python.org/3/library/itertools.html) on it. This enables advanced "iterator algebra" on the `DocumentArray`. + +### Get Attributes in Bulk + +`DocumentArray` implements powerful getters that allows one to fetch multiple attributes from the documents it contains +in one-shot. + +```python +import numpy as np + +from jina import DocumentArray, Document + +da = DocumentArray([Document(id=1, text='hello', embedding=np.array([1, 2, 3])), + Document(id=2, text='goodbye', embedding=np.array([4, 5, 6])), + Document(id=3, text='world', embedding=np.array([7, 8, 9]))]) + +da.get_attributes('id', 'text', 'embedding') +``` + +```text +[('1', '2', '3'), ('hello', 'goodbye', 'world'), (array([1, 2, 3]), array([4, 5, 6]), array([7, 8, 9]))] +``` + +This can be very useful when extracting a batch of embeddings, + +```python +import numpy as np + +np.stack(da.get_attributes('embedding')) +``` + +```text +[[1 2 3] + [4 5 6] + [7 8 9]] +``` diff --git a/.github/2.0/cookbooks/Executor.md b/.github/2.0/cookbooks/Executor.md new file mode 100644 index 0000000000000..5a49e7c2ecc69 --- /dev/null +++ b/.github/2.0/cookbooks/Executor.md @@ -0,0 +1,475 @@ +Document, Executor, Flow are three fundamental concepts in Jina. + +- [**Document**](Document.md) is the basic data type in Jina; +- [**Executor**](Executor.md) is how Jina processes Documents; +- [**Flow**](Flow.md) is how Jina streamlines and scales Executors. + +*Learn them all, nothing more, you are good to go.* + +--- + +# Cookbook on `Executor` 2.0 API + + + +Table of Contents + +- [Minimum working example](#minimum-working-example) + - [Pure Python](#pure-python) + - [With YAML](#with-yaml) +- [Executor API](#executor-api) + - [Inheritance](#inheritance) + - [`__init__` Constructor](#__init__-constructor) + - [Method naming](#method-naming) + - [`@requests` decorator](#requests-decorator) + - [Default binding: `@requests` without `on=`](#default-binding-requests-without-on) + - [Multiple binding: `@requests(on=[...])`](#multiple-binding-requestson) + - [No binding](#no-binding) + - [Method Signature](#method-signature) + - [Method Arguments](#method-arguments) + - [Method Returns](#method-returns) + - [YAML Interface](#yaml-interface) + - [Load and Save Executor's YAML config](#load-and-save-executors-yaml-config) +- [Executor Built-in Features](#executor-built-in-features) + - [1.x vs 2.0](#1x-vs-20) + - [Workspace](#workspace) + - [Metas](#metas) + - [`.metas` & `.runtime_args`](#metas--runtime_args) +- [Migration in Practice](#migration-in-practice) + - [`jina hello fashion`](#jina-hello-fashion) + - [Encoder](#encoder) +- [Remarks](#remarks) + - [Joining/Merging](#joiningmerging) + + + +## Minimum working example + +### Pure Python + +```python +from jina import Executor, Flow, Document, requests + + +class MyExecutor(Executor): + + @requests + def foo(self, **kwargs): + print(kwargs) + + +f = Flow().add(uses=MyExecutor) + +with f: + f.post(on='/random_work', inputs=Document(), on_done=print) +``` + +### With YAML + +`my.yml`: + +```yaml +jtype: MyExecutor +with: + bar: 123 +metas: + name: awesomeness + description: my first awesome executor +requests: + /random_work: foo +``` + +```python +from jina import Executor, Flow, Document + + +class MyExecutor(Executor): + + def __init__(self, bar: int, **kwargs): + super().__init__(**kwargs) + self.bar = bar + + def foo(self, **kwargs): + print(f'foo says: {self.bar} {self.metas} {kwargs}') + + +f = Flow().add(uses='my.yml') + +with f: + f.post(on='/random_work', inputs=Document(), on_done=print) +``` + +## Executor API + +- All `executor` come from `Executor` class directly. +- An `executor` class can contain arbitrary number of functions with arbitrary names. It is a bag of functions with + shared state (via `self`). +- Functions decorated by `@requests` will be invoked according to their `on=` endpoint. + +### Inheritance + +Every new executor should be inherited directly from `jina.Executor`. + +The 1.x inheritance tree is removed, `Executor` does not have polymorphism anymore. + +You can name your executor class freely. + +### `__init__` Constructor + +If your executor defines `__init__`, it needs to carry `**kwargs` in the signature and call `super().__init__(**kwargs)` +in the body, e.g. + +```python +from jina import Executor + + +class MyExecutor(Executor): + + def __init__(self, foo: str, bar: int, **kwargs): + super().__init__(**kwargs) + self.bar = bar + self.foo = foo +``` + +Here, `kwargs` contains `metas` and `requests` (representing the request-to-function mapping) values from YAML config, +and `runtime_args` injected on startup. Note that you can access their values in `__init__` body via `self.metas` +/`self.requests`/`self.runtime_args`, or modifying their values before sending to `super().__init__()`. + +### Method naming + +`Executor`'s method can be named freely. Methods are not decorated with `@requests` are irrelevant to Jina. + +### `@requests` decorator + +`@requests` defines when a function will be invoked. It has a keyword `on=` to define the endpoint. + +To call an Executor's function, uses `Flow.post(on=..., ...)`. For example, given + +```python +from jina import Executor, Flow, requests + + +class MyExecutor(Executor): + + @requests(on='/index') + def foo(self, **kwargs): + print(kwargs) + + @requests(on='/random_work') + def bar(self, **kwargs): + print(kwargs) + + +f = Flow().add(uses=MyExecutor) + +with f: + pass +``` + +Then: + +- `f.post(on='/index', ...)` will trigger `MyExecutor.foo`; +- `f.post(on='/random_work', ...)` will trigger `MyExecutor.bar`; +- `f.post(on='/blah', ...)` will throw an error, as no function bind with `/blah`; + +#### Default binding: `@requests` without `on=` + +A class method decorated with plain `@requests` (without `on=`) is the default handler for all endpoints. That means, it +is the fallback handler for endpoints that are not found. `f.post(on='/blah', ...)` will invoke `MyExecutor.foo` + +```python +from jina import Executor, requests + + +class MyExecutor(Executor): + + @requests + def foo(self, **kwargs): + print(kwargs) + + @requests(on='/index') + def bar(self, **kwargs): + print(kwargs) +``` + +#### Multiple binding: `@requests(on=[...])` + +To bind a method with multiple endpoints, one can use `@requests(on=['/foo', '/bar'])`. This allows +either `f.post(on='/foo', ...)` or `f.post(on='/bar', ...)` to invoke that function. + +#### No binding + +A class with no `@requests` binding plays no part in the Flow. The request will simply pass through without any processing. + + +### Method Signature + +Class method decorated by `@request` follows the signature below: + +```python +def foo(docs: Optional[DocumentArray], + parameters: Dict, + docs_matrix: List[DocumentArray], + groundtruths: Optional[DocumentArray], + groundtruths_matrix: List[DocumentArray]) -> Optional[DocumentArray]: + pass +``` + +### Method Arguments + +The Executor's method receive the following arguments in order: + +| Name | Type | Description | +| --- | --- | --- | +| `docs` | `Optional[DocumentArray]` | `Request.docs`. When multiple requests are available, it is a concatenation of all `Request.docs` as one `DocumentArray`. When `DocumentArray` has zero element, then it is `None`. | +| `parameters` | `Dict` | `Request.parameters`, given by `Flow.post(..., parameters=)` | +| `docs_matrix` | `List[DocumentArray]` | When multiple requests are available, it is a list of all `Request.docs`. On single request, it is `None` | +| `groundtruths` | `Optional[DocumentArray]` | `Request.groundtruths`. Same behavior as `docs` | +| `groundtruths_matrix` | `List[DocumentArray]` | Same behavior as `docs_matrix` but on `Request.groundtruths` | + +Note, executor's methods not decorated with `@request` do not enjoy these arguments. + +The arguments order is designed as common-usage-first. Not based on alphabetical order or semantic closeness. + +If you don't need some arguments, you can suppress it into `**kwargs`. For example: + +```python +@requests +def foo(docs, **kwargs): + bar(docs) + + +@requests +def foo(docs, parameters, **kwargs): + bar(docs) + bar(parameters) + + +@requests +def foo(**kwargs): + bar(kwargs['docs_matrix']) +``` + +### Method Returns + +Method decorated with `@request` can return `Optional[DocumentSet]`. If not `None`, then the current `Request.docs` will +be overridden by the return value. + +If return is just a shallow copy of `Request.docs`, then nothing happens. + +### YAML Interface + +Executor can be load from and stored to a YAML file. The YAML file has the following format: + +```yaml +jtype: MyExecutor +with: + ... +metas: + ... +requests: + ... +``` + +- `jtype` is a string. Defines the class name, interchangeable with bang mark `!`; +- `with` is a map. Defines kwargs of the class `__init__` method +- `metas` is a map. Defines the meta information of that class, comparing to `1.x` it is reduced to the following + fields: + - `name` is a string. Defines the name of the executor; + - `description` is a string. Defines the description of this executor. It will be used in automatics docs UI; + - `workspace` is a string. Defines the workspace of the executor + - `py_modules` is a list of string. Defines the python dependencies of the executor. +- `requests` is a map. Defines the mapping from endpoint to class method name. + +### Load and Save Executor's YAML config + +You can use class method `Executor.load_config` and object method `exec.save_config` to load & save YAML config as +follows: + +```python +from jina import Executor + + +class MyExecutor(Executor): + + def __init__(self, bar: int, **kwargs): + super().__init__(**kwargs) + self.bar = bar + + def foo(self, **kwargs): + pass + + +y_literal = """ +jtype: MyExecutor +with: + bar: 123 +metas: + name: awesomeness + description: my first awesome executor +requests: + /random_work: foo +""" + +exec = Executor.load_config(y_literal) +exec.save_config('y.yml') +Executor.load_config('y.yml') +``` + +## Executor Built-in Features + +In 2.0 Executor class has few built-in features than in 1.x. The design principles are (`user` here means "Executor +developer"): + +- **Do not surprise user**: keep `Executor` class as Pythonic as possible, it should be as light and less intrusive as + a `mixin` class: + - do not customize the class constructor logic; + - do not change its builtin interface `__getstate__`, `__setstate__`; + - do not add new members to the `Executor` object unless we must. +- **Do not overpromise to user**: do not promise features that we can hardly deliver. Trying to control the interface + while delivering just loosely implemented features is bad for scaling the core framework. For example, `save`, `load` + , `on_gpu`, etc. + +We want to give back the programming freedom to user. If a user is a good Python programmer, he/she should pick +up `Executor` in no time - not spending extra time on learning the implicit boilerplate as in 1.x. Plus, +subclassing `Executor` should be easy. + +### 1.x vs 2.0 + +- ❌: Completely removed. Users have to implement it on their own. +- ✅: Preserved. + +| 1.x | 2.0 | +| --- | --- | +| `.save_config()` | ✅ | +| `.load_config()` | ✅ | +| `.close()` | ✅ | +| `workspace` interface | ✅ [Refactored](#workspace). | +| `metas` config | Moved to `self.metas.xxx`. [Number of fields are greatly reduced](#yaml-interface). | +| `._drivers` | Refactored and moved to `self.requests.xxx`. | +| `.save()` | ❌ | +| `.load()` | ❌ | +| `.logger` | ❌ | +| Pickle interface | ❌ | +| init boilerplates (`pre_init`, `post_init`) | ❌ | +| Context manager interface | ❌ | +| Inline `import` coding style | ❌ | + +![](1.xvs2.0%20BaseExecutor.svg) + +### Workspace + +Executor's workspace is inherited according to the following rule (`OR` is a python `or`, i.e. first thing first, if NA +then second): + +![](../workspace-inherit.svg?raw=true) + +### Metas + +The meta attributes of an `Executor` object are now gathered in `self.metas`, instead of directly posing them to `self`, +e.g. to access `name` use `self.metas.name`. + +### `.metas` & `.runtime_args` + +An `Executor` object by default contains two collections of attributes `.metas` and `.runtime_args`. They are both +in `SimpleNamespace` type and contain some key-value information. However, they are defined and serve differently. + +- **`.metas` are statically defined.** "Static" means, e.g. from hardcoded value in the code, from a YAML file. +- **`.runtime_args` are dynamically determined during runtime.** Means that you don't know the value before running + the `Executor`, e.g. `pea_id`, `replicas`, `replica_id`. Those values are often related to the system/network + environment around the `Executor`, and less about `Executor` itself. + +In 2.0rc1, the following fields are valid for `metas` and `runtime_args`: + +||| +| --- | --- | +| `.metas` (static values from hardcode, YAML config) | `name`, `description`, `py_modules`, `workspace` | +| `.runtime_args` (runtime values from its containers, e.g. `Runtime`, `Pea`, `Pod`) | `name`, `description`, `workspace`, `log_config`, `quiet`, `quiet_error`, `identity`, `port_ctrl`, `ctrl_with_ipc`, `timeout_ctrl`, `ssh_server`, `ssh_keyfile`, `ssh_password`, `uses`, `py_modules`, `port_in`, `port_out`, `host_in`, `host_out`, `socket_in`, `socket_out`, `read_only`, `memory_hwm`, `on_error_strategy`, `num_part`, `uses_internal`, `entrypoint`, `docker_kwargs`, `pull_latest`, `volumes`, `host`, `port_expose`, `quiet_remote_logs`, `upload_files`, `workspace_id`, `daemon`, `runtime_backend`, `runtime_cls`, `timeout_ready`, `env`, `expose_public`, `pea_id`, `pea_role`, `noblock_on_start`, `uses_before`, `uses_after`, `parallel`, `replicas`, `polling`, `scheduling`, `pod_role`, `peas_hosts` | + +Note that, YAML API will ignore `.runtime_args` during save & load as they are not for statically stored. + +Also note that, for any other parametrization of the Executor, you can still access its constructor arguments (defined in the class `__init__`) and the request `parameters`. + +--- + +## Migration in Practice + +### `jina hello fashion` + +#### Encoder + +Left is 1.x, right is 2.0. + +![img.png](../migration-fashion.png?raw=true) + +Line number corresponds to the 1.x code: + +- `L5`: change imports to top-level namespace `jina`; +- `L8`: all executors now subclass from `Executor` class; +- `L13-14`: there is no need to inherit from `__init__`, no signature is enforced; +- `L20`: `.touch()` is removed; for this particular encoder as long as the seed is fixed there is no need to store; +- `L22`: adding `@requests` to decorate the core method, changing signature to `docs, **kwargs`; +- `L32`: + - the content extraction and embedding assignment are now done manually; + - replacing previous `Blob2PngURI` and `ExcludeQL` driver logic using `Document` built-in + methods `convert_blob_to_uri` and `pop` + - there is nothing to return, as the change is done in-place. + +## Remarks + +### Joining/Merging + +Combining `docs` from multiple requests is already done by the `ZEDRuntime` before feeding to Executor's function. +Hence, simple joining is just returning this `docs`. Complicated joining should be implemented at `Document` +/`DocumentArray` + +```python +from jina import Executor, requests, Flow, Document + + +class C(Executor): + + @requests + def foo(self, docs, **kwargs): + # 6 docs + return docs + + +class B(Executor): + + @requests + def foo(self, docs, **kwargs): + # 3 docs + for idx, d in enumerate(docs): + d.text = f'hello {idx}' + + +class A(Executor): + + @requests + def A(self, docs, **kwargs): + # 3 docs + for idx, d in enumerate(docs): + d.text = f'world {idx}' + + +f = Flow().add(uses=A).add(uses=B, needs='gateway').add(uses=C, needs=['pod0', 'pod1']) + +with f: + f.post(on='/some_endpoint', + inputs=[Document() for _ in range(3)], + on_done=print) +``` + +You can also modify the docs while merging, which is not feasible to do in 1.x, e.g. + +```python +class C(Executor): + + @requests + def foo(self, docs, **kwargs): + # 6 docs + for d in docs: + d.text += '!!!' + return docs +``` diff --git a/.github/2.0/cookbooks/Flow.md b/.github/2.0/cookbooks/Flow.md new file mode 100644 index 0000000000000..9632f724b564c --- /dev/null +++ b/.github/2.0/cookbooks/Flow.md @@ -0,0 +1,321 @@ +Document, Executor, Flow are three fundamental concepts in Jina. + +- [**Document**](Document.md) is the basic data type in Jina; +- [**Executor**](Executor.md) is how Jina processes Documents; +- [**Flow**](Flow.md) is how Jina streamlines and scales Executors. + +*Learn them all, nothing more, you are good to go.* + +--- + +# Cookbook on `Flow` 2.0 API + + + +Table of Contents + +- [Minimum working example](#minimum-working-example) + - [Pure Python](#pure-python) + - [With YAML](#with-yaml) +- [Flow API](#flow-api) + - [Create a Flow](#create-a-flow) + - [Add Executor to a Flow](#add-executor-to-a-flow) + - [Create Inter & Intra Parallelism via `needs`](#create-inter--intra-parallelism-via-needs) + - [Decentralized Flow](#decentralized-flow) +- [Send Data to Flow](#send-data-to-flow) + - [`post` method](#post-method) + - [Fetch Result from Flow](#fetch-result-from-flow) + - [Asynchronous Flow](#asynchronous-flow) + - [REST Interface](#rest-interface) + + + +## Minimum working example + +### Pure Python + +```python +from jina import Flow, Document + +f = Flow().add(name='foo') + +with f: + f.post(on='/bar', inputs=Document(), on_done=print) +``` + +### With YAML + +`my.yml`: + +```yaml +jtype: Flow +executors: + - name: foo +``` + +```python +from jina import Flow, Document + +f = Flow.load_config('my.yml') + +with f: + f.post(on='/bar', inputs=Document(), on_done=print) +``` + +## Flow API + +In Jina, Flow is how Jina streamlines and scales Executors. A `Flow` object has the following methods: + +| | | +|---|---| +|Construct| `.add()`, `.needs()`, `.needs_all()` `.inspect()`, `.gather_inspect()`, `.use_grpc_gateway`, `.use_rest_gateway` | +|Request| `.post()`, `.index()`, `.search()`, `.update()`, `.delete()`| + +### Create a Flow + +An empty Flow can be created via: + +```python +from jina import Flow + +f = Flow() +``` + +To use `f`, always open it as a content manager: + +```python +with f: + ... +``` + +### Add Executor to a Flow + +`Flow.add()` is the method to add executor to the `Flow` object. It is often used with `uses` parameter to specify +the [Executor](Executor.md). + +`uses` accepts multiple value types including class name, Docker image, (inline) YAML. + +```python +from jina import Flow + +f = (Flow() + .add(uses=MyExecutor) # the class of a Jina Executor + .add(uses='myexecutor.yml') # YAML serialization of a Jina Executor + .add(uses=''' +jtype: MyExecutor +with: + bar: 123 +metas: + name: awesomeness + description: my first awesome executor +requests: + /random_work: foo + ''') #inline YAML + .add(uses={'jtype': 'MyBertEncoder', 'with': {'param': 1.23}})) # dict config object with __cls keyword +``` + +The power of Jina lies in its decentralized architecture: Each `add` creates a new Executor, and these Executors can be +run as a local thread/process, a remote process, inside a Docker container, or even inside a remote Docker container. + +### Create Inter & Intra Parallelism via `needs` + +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/basic-inter-intra-parallelism.ipynb) + +Chaining `.add()`s creates a sequential Flow. For parallelism, use the `needs` parameter: + +```python +from jina import Flow + +f = (Flow() + .add(name='p1', needs='gateway') + .add(name='p2', needs='gateway') + .add(name='p3', needs='gateway') + .needs(['p1', 'p2', 'p3'], name='r1').plot()) +``` + + + +`p1`, `p2`, `p3` now subscribe to `Gateway` and conduct their work in parallel. The last `.needs()` blocks all Executors +until they finish their work. Note: parallelism can also be performed inside a Executor using `parallel`: + +```python + +from jina import Flow + +f = (Flow() + .add(name='p1', needs='gateway') + .add(name='p2', needs='gateway') + .add(name='p3', parallel=3) + .needs(['p1', 'p3'], name='r1').plot()) +``` + + + +### Decentralized Flow + +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/decentralized-flow.ipynb) + +A Flow does not have to be local-only: You can put any Executor to remote(s). In the example below, with the `host` +keyword `gpu-exec`, is put to a remote machine for parallelization, whereas other Executors stay local. Extra file +dependencies that need to be uploaded are specified via the `upload_files` keyword. + + + + + + + + + + +
123.456.78.9 + +```bash +# have docker installed +docker run --name=jinad --network=host -v /var/run/docker.sock:/var/run/docker.sock jinaai/jina:latest-daemon --port-expose 8000 + to stop it +docker rm -f jinad +``` + +
+ Local + + +```python +from jina import Flow + +f = (Flow() + .add() + .add(name='gpu_exec', + uses='mwu_encoder.yml', + host='123.456.78.9:8000', + parallel=2, + upload_files=['mwu_encoder.py']) + .add()) +``` + +
+ +### Send Data to Flow + +#### `post` method + +`post` is the core method. All 1.x methods, e.g. `index`, `search`, `update`, `delete` are just sugary syntax of `post` +by specifying `on='/index'`, `on='/search'`, etc. + +```python +def post( + self, + on: str, + inputs: InputType, + on_done: CallbackFnType = None, + on_error: CallbackFnType = None, + on_always: CallbackFnType = None, + parameters: Optional[dict] = None, + target_peapod: Optional[str] = None, + **kwargs, +) -> None: + """Post a general data request to the Flow. + + :param on: the endpoint is used for identifying the user-defined ``request_type``, labeled by ``@requests(on='/abc')`` + :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id. + :param on_done: the function to be called when the :class:`Request` object is resolved. + :param on_error: the function to be called when the :class:`Request` object is rejected. + :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. + :param target_peapod: a regex string represent the certain peas/pods request targeted + :param parameters: the kwargs that will be sent to the executor + :param kwargs: additional parameters + :return: None + """ +``` + +Comparing to 1.x Client/Flow API, the three new arguments are: + +- `on`: endpoint, as explained above +- `parameters`: the kwargs that will be sent to the executor, as explained above +- `target_peapod`: a regex string represent the certain peas/pods request targeted + +### Fetch Result from Flow + +Once a request is done, callback functions are fired. Jina Flow implements a Promise-like interface: You can add +callback functions `on_done`, `on_error`, `on_always` to hook different events. In the example below, our Flow passes +the message then prints the result when successful. If something goes wrong, it beeps. Finally, the result is written +to `output.txt`. + +```python +def beep(*args): + # make a beep sound + import os + os.system('echo -n "\a";') + + +with Flow().add() as f, open('output.txt', 'w') as fp: + f.index(numpy.random.random([4, 5, 2]), + on_done=print, on_error=beep, on_always=lambda x: fp.write(x.json())) +``` + +### Asynchronous Flow + +[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/jina-ai/jupyter-notebooks/blob/main/basic-inter-intra-parallelism.ipynb) + +While synchronous from outside, Jina runs asynchronously under the hood: it manages the eventloop(s) for scheduling the +jobs. If the user wants more control over the eventloop, then `AsyncFlow` can be used. + +Unlike `Flow`, the CRUD of `AsyncFlow` accepts input and output functions +as [async generators](https://www.python.org/dev/peps/pep-0525/). This is useful when your data sources involve other +asynchronous libraries (e.g. motor for MongoDB): + +```python +from jina import AsyncFlow + + +async def input_function(): + for _ in range(10): + yield Document() + await asyncio.sleep(0.1) + + +with AsyncFlow().add() as f: + async for resp in f.index(input_function): + print(resp) +``` + +`AsyncFlow` is particularly useful when Jina and another heavy-lifting job are running concurrently: + +```python +async def run_async_flow_5s(): # WaitDriver pause 5s makes total roundtrip ~5s + with AsyncFlow().add(uses='- !WaitDriver {}') as f: + async for resp in f.index_ndarray(numpy.random.random([5, 4])): + print(resp) + + +async def heavylifting(): # total roundtrip takes ~5s + print('heavylifting other io-bound jobs, e.g. download, upload, file io') + await asyncio.sleep(5) + print('heavylifting done after 5s') + + +async def concurrent_main(): # about 5s; but some dispatch cost, can't be just 5s, usually at <7s + await asyncio.gather(run_async_flow_5s(), heavylifting()) + + +if __name__ == '__main__': + asyncio.run(concurrent_main()) +``` + +`AsyncFlow` is very useful when using Jina inside a Jupyter Notebook. where it can run out-of-the-box. + +### REST Interface + +In practice, the query Flow and the client (i.e. data sender) are often physically separated. Moreover, the client may +prefer to use a REST API rather than gRPC when querying. You can set `port_expose` to a public port and turn +on [REST support](https://api.jina.ai/rest/) with `restful=True`: + +```python +f = Flow(port_expose=45678, restful=True) + +with f: + f.block() +``` + + diff --git a/.github/2.0/doc.content.svg b/.github/2.0/doc.content.svg new file mode 100644 index 0000000000000..721c6926b8528 --- /dev/null +++ b/.github/2.0/doc.content.svg @@ -0,0 +1,140 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/.github/2.0/migration-fashion.png b/.github/2.0/migration-fashion.png new file mode 100644 index 0000000000000..370bd9a8650f0 Binary files /dev/null and b/.github/2.0/migration-fashion.png differ diff --git a/.github/2.0/workspace-inherit.svg b/.github/2.0/workspace-inherit.svg new file mode 100644 index 0000000000000..094f3cd838917 --- /dev/null +++ b/.github/2.0/workspace-inherit.svg @@ -0,0 +1,162 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/.github/CODEOWNERS b/.github/CODEOWNERS index 1c3eac3f87ee0..a90da486b9506 100644 --- a/.github/CODEOWNERS +++ b/.github/CODEOWNERS @@ -4,7 +4,10 @@ # review when someone opens a pull request. * @jina-ai/engineering -# Han Xiao owns CICD and README.md -.github @hanxiao +.github/workflows @hanxiao setup.py @hanxiao -extra-requirements.txt @hanxiao \ No newline at end of file +extra-requirements.txt @hanxiao +jina/__init__.py @hanxiao +requirements.txt @hanxiao +MANIFEST.in @hanxiao +README.md @hanxiao \ No newline at end of file diff --git a/.github/banner.gif b/.github/banner.gif deleted file mode 100644 index c022c0d59a7c2..0000000000000 Binary files a/.github/banner.gif and /dev/null differ diff --git a/.github/i18n/README.de.md b/.github/i18n/README.de.md deleted file mode 100644 index 58b81a3b2aa8b..0000000000000 --- a/.github/i18n/README.de.md +++ /dev/null @@ -1,397 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai) -[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai) - - tweet button - -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]() -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22) -

- -

- English • - Français • - 中文 • - 日本語 • - 한국어 • - Português • - Русский язык • - український • - Español -

- -

- Website • - Docs • - Examples • - Newsletter • - - Dashboard (beta) • - Twitter • - We are Hiring - -

- -Jina ist ein durch Deep Learning gestütztes Framework um Cross- und/Multi-Modale Suchsysteme (e.g. text, images, video, audio) in der Cloud zu erstellen. - -⏱️ **Zeitersparnis** – Erstellen Sie ein KI-System innerhalb weniger Minuten. - -🧠 **Erstklassige KI Modelle** – *Das* Designmuster für neuronale Systeme, mit erstklassiger Unterstützung durch [state-of-the-art KI Modelle](https://docs.jina.ai/chapters/all_exec.html). - -🌌 **Universelle Suchlösung** – Skalierbares Indizieren und Suchen von beliebigen Daten. Z. B.: Videos, Bilder, lange und kurze Texte, Musik, Quellcode, usw. - -☁️ **Cloud Ready** - Dezentralisierte Architektur mit integrierten Cloud Native-Funktionen. Z.B.: Containervirtualisierung, Microservices, Skalierung, Sharding, Async IO, REST, gRPC. - -🧩 **Plug-and-play** – Einfach mit Python erweiterbar. - -❤️ **Mit Liebe gemacht** – Qualität steht an erster Stelle, und wird von unseren [Teams](https://jina.ai) kompromissfrei gewährleistet. - ---- - -

-DokumentationHello WorldQuick StartlernenmitwirkenJobsWebseiteSlack -

- - -## Inhaltsverzeichnis - - - - - - -- [Installieren](#installieren) -- [Jina "Hallo, Welt!" 👋🌍](#jina-hallo-welt-) -- [Erste Schritte](#erste-schritte) -- [Dokumentation](#dokumentation) -- [Beitragend](#beitragend) -- [Gemeinschaft](#gemeinschaft) -- [Fahrplan](#fahrplan) -- [Lizenz](#lizenz) - - - -## Installieren - -#### Aus PyPi installieren - -Unter Linux/MacOS mit installiertem Python >= 3.7 führen Sie einfach diesen Befehl in Ihrem Terminal aus: - -```bash -pip install jina -``` - -So installieren Sie Jina mit zusätzlichen Abhängigkeiten, oder installieren Sie es auf Raspberry Pi[bitte beachten Sie die Dokumentationen](https://docs.jina.ai). - -#### ...oder Ausführen mit Docker-Container - -Wir bieten ein universelles Docker-Image (nur 80MB!) an, das mehrere Architekturen unterstützt (einschließlich x64, x86, arm-64/v7/v6), einfach tun - -```bash -docker run jinaai/jina -``` - -## Jina "Hallo, Welt!" 👋🌍 - -Als Einsteiger sind Sie eingeladen, Jinas "Hello, World" auszuprobieren - eine einfache Demo der neuronalen Bildsuche für[Mode-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Keine zusätzlichen Abhängigkeiten nötig, einfach tun: - -```bash -jina hello-world -``` - -...oder noch einfacher für Docker-Benutzer,**keine Installation erforderlich,** einfach: - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux -``` - -
-Click here to see the console output - -

- hello world console output -

- -
- -Sie lädt die Trainings- und Testdaten von Fashion-MNIST herunter; Jina wird angewiesen, 60.000 Bilder aus dem Trainingsset zu indexieren. Dann entnimmt sie nach dem Zufallsprinzip Bilder aus dem Testset als Abfragen und bittet Jina, relevante Ergebnisse abzurufen. Nach etwa 1 Minute öffnet sie eine Webseite und zeigt Ergebnisse wie dieses: - -

- Jina banner -

- -Und die Umsetzung dahinter? So einfach wie es sein sollte: - - - - - - - - - - - - -
Python API index.yml Flow in Dashboard
- -```python -from jina.flow import Flow - -f = Flow.load_config('index.yml') - -with f: - f.index(inputs) -``` - - - - -```yaml -!Flow -pods: - chunk_seg: - uses: helloworld.crafter.yml - replicas: $REPLICAS - read_only: true - doc_idx: - uses: helloworld.indexer.doc.yml - encode: - uses: helloworld.encoder.yml - needs: chunk_seg - replicas: $REPLICAS - chunk_idx: - uses: helloworld.indexer.chunk.yml - replicas: $SHARDS - join_all: - uses: _pass - needs: [doc_idx, chunk_idx] - read_only: true -``` - - - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -Alle großen Wörter, die Sie nennen können: Computer Vision, neuronale IR, Mikroservice, Nachrichtenwarteschlange, elastisch, Repliken & Scherben geschahen in nur einer Minute! - -Interessiert? Spielen Sie und probieren Sie verschiedene Optionen aus: - -```bash -jina hello-world --help -``` - -[Vergewissern Sie sich, dass Sie mit unserem Jina 101 Leitfaden fortfahren](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - alle Schlüsselkonzepte von Jina in 3 Minuten verstehen! - -## Erste Schritte - -### Starten Sie ein Projekt von der Vorlage aus - -```bash -pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina -``` - -### Tutorials - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: First Thing to Learn About Jina

-  English • - 日本語 • - français • - Português • - Русский язык • - 中文 -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TutorialsLevel
-

Use Flow API to Compose Your Search Workflow

-Learn how to orchestrate Pods to work together: sequentially and in parallel; locally and remotely -

🐣

-

Input and Output Functions in Jina

-Learn how the input and output functions work in Jina -

🐣

-

Use Dashboard to Get Insight of Jina Workflow

-Learn to use dashboard to monitor and get insight of a running workflow -

🐣

-

From BERT-as-Service to X-as-Service

-Learn how to use Jina to extract feature vector using any deep learning representation -

🐣

-

Build a NLP Semantic Search System

-Learn how to build a script search system for South Park and practice your knowledge on Flows and Pods -

🐣

-

Build a Flower Image Search System

-Learn how to build an image search system and define you own executors and run them in docker -

🐣

-

Video Semantic Search in Scale with Prefetching and Sharding

-Learn how to increase the performance by using prefetching and sharding -

🕊

-

Revisit "Hello, World!" in a Client-Server Architecture

-Learn how to run a Flow remotely and use a local client to connect to it -

🕊

-

Distribute Your Workflow Remotely

-Learn to run Jina on remote instances and distribute your workflow -

🕊

-

Extend Jina by Implementing Your Own Executor

-Learn how to implement your own ideas into Jina's plugin -

🕊

-

Run Jina Pod via Docker Container

-Learn how Jina solves complex dependencies easily with Docker container -

🕊

-

Google's Big Transfer Model in (Poké-)Production

-Learn to how to use SOTA visual representation for searching Pokémon! -

🚀

- - -## Dokumentation - - - - - -Der beste Weg, Jina gründlich kennenzulernen, ist, unsere Dokumentation zu lesen. Die Dokumentation wird bei jedem Push, Merge und Release-Ereignis des Master-Zweiges erstellt. Weitere Einzelheiten zu den folgenden Themen finden Sie in unserer Dokumentation. - -- [Jina Befehlszeilenschnittstelle Argumente erklärt](https://docs.jina.ai/chapters/cli/index.html) -- [Jina Python API-Schnittstelle](https://docs.jina.ai/api/jina.html) -- [Jina YAML-Syntax für Ausführer, Treiber und Ablauf](https://docs.jina.ai/chapters/yaml/yaml.html) -- [Jina Protobuf-Schema](https://docs.jina.ai/chapters/proto/index.html) -- [In Jina verwendete Umgebungsvariablen](https://docs.jina.ai/chapters/envs.html) -- ..[und mehr](https://docs.jina.ai/index.html) - -Sind Sie ein "Doc"-Star? Bejaht? Kommen Sie zu uns! Wir begrüßen alle Arten von Verbesserungen an der Dokumentation - -[Dokumentationen für die älteren Versionen werden hier archiviert](https://github.com/jina-ai/docs/releases). - -## Beitragend - -Wir begrüßen alle Arten von Beiträgen aus der Open-Source-Gemeinschaft, von Einzelpersonen und Partnern. Ohne Ihre aktive Beteiligung wird Jina nicht erfolgreich sein. - -Die folgenden Ressourcen werden Ihnen helfen, einen guten ersten Beitrag zu leisten: - -- [Richtlinien zur Beitragsleistung](CONTRIBUTING.md) -- [Release-Zyklen und Entwicklungsstufen](RELEASE.md) - -## Gemeinschaft - -- [Schlupfkanal](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - eine Kommunikationsplattform für Entwickler, um über Jina zu diskutieren -- [Rundbrief der Gemeinschaft](mailto:newsletter+subscribe@jina.ai) - abonnieren Sie die neuesten Aktualisierungs-, Veröffentlichungs- und Veranstaltungsnachrichten von Jina -- [VerlinktIn](https://www.linkedin.com/company/jinaai/) - jina AI als Unternehmen kennenlernen und Stellenangebote finden -- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - folgen Sie uns und interagieren Sie mit uns mittels Hashtag`#JinaSearch` -- [Unternehmen](https://jina.ai) - erfahren Sie mehr über unser Unternehmen, wir setzen uns voll und ganz für Open-Source ein! - -## Fahrplan - -[Meilensteine von GitHub](https://github.com/jina-ai/jina/milestones) den Weg zu den künftigen Verbesserungen aufzuzeigen. - -Wir suchen nach Partnerschaften zum Aufbau eines Open-Governance-Modells (z.B. Technischer Lenkungsausschuss) um Jina herum, das ein gesundes Open-Source-Ökosystem und eine entwicklerfreundliche Kultur ermöglicht. Wenn Sie an einer Teilnahme interessiert sind, zögern Sie nicht, uns zu kontaktieren[hello@jina.ai](mailto:hello@jina.ai). - -## Lizenz - -Urheberrecht (c) 2020 Jina AI Limited. Alle Rechte vorbehalten. - -Jina ist unter der Apache-Lizenz, Version 2.0, lizenziert[Siehe LIZENZ für den vollständigen Lizenztext.](LICENSE) diff --git a/.github/i18n/README.es.md b/.github/i18n/README.es.md deleted file mode 100644 index bd319b577a5b5..0000000000000 --- a/.github/i18n/README.es.md +++ /dev/null @@ -1,422 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/) -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/) -[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina) - -

- -

- English • - Français • - Deutsch • - 中文 • - 日本語 • - 한국어 • - Português • - Русский язык • - український -

- - -

- Sitio Web • - Documentos • - Ejemplos • - Hub (beta) • - Dashboard (beta) • - Jinabox (beta) • - Twitter • - Estamos contratando -

- -Jina es un framework de búsqueda basado en IA que permite a los desarrolladores crear sistemas de búsqueda **cross/multi-modals** (como texto, imágenes, video, audio) en la nube. - -⏱️ **Ahorro de tiempo** - Inicie un sistema AI-powered en sólo unos minutos.. - -🧠 **Modelos IA de primera clase** - *El* patrón de diseño de los sistemas de búsqueda neuronal, con soporte de primera clase para [modelos IA de última generación](https://docs.jina.ai/chapters/all_exec.html). - -🌌 **Búsqueda universal** - indexación y consulta a gran escala de cualquier tipo de datos en múltiples plataformas: vídeo, imagen, texto largo/corto, música, código fuente, etc. - -☁️ **Cloud Ready** - Arquitectura descentralizada con características propias cloud-natives: contenedorización, microservicio, escalado, sharding, async IO, REST, gRPC. - -🧩 **Plug & Play** - Fácilmente ampliable con la interfaz Pythonic. - -❤️ **Hecho con amor** - La calidad es lo primero, nunca se compromete, mantenido por un [equipo a tiempo completo, respaldado por la empresa](https://jina.ai). - - -## Resumen - - - - - - -- [Instalación](#instalaci%C3%B3n) -- [Jina "Hola, mundo!" 👋🌍](#jina-hola-mundo-) -- [Tutoriales](#tutoriales) -- [Documentación](#documentaci%C3%B3n) -- [Contribuyendo](#contribuyendo) -- [Comunidad](#comunidad) -- [Gobernanza abierta](#gobernanza-abierta) -- [Únase](#%C3%BAnase) -- [Licencia](#licencia) - - - -## Instalación - -### Con PyPi - -En sistemas operativos Linux/MacOS con Python >= 3.7: - -```bash -pip install jina -``` - -Para instalar Jina con dependencias adicionales o en Raspberry Pi, [por favor revise la documentación](https://docs.jina.ai). - -### En un contenedor Docker - -Ofrecemos una imagen Docker universal con soporte para varios tipos de arquitectura (incluyendo x64, x86, arm-64/v7/v6). Simplemente funciona: - -```bash -docker run jinaai/jina --help -``` - -## Jina "Hola, mundo!" 👋🌍 - -Para empezar, puede probar nuestro "Hola, Mundo" - una simple demostración de búsqueda de imágenes mediante redes neuronales [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). No se necesitan dependencias adicionales, simplemente ejecute: - -```bash -jina hello-world -``` - -...o, más fácilmente, para los usuarios de Docker, **sin necesidad de instalación**: - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # Reemplaza "open" por "xdg-open" en Linux -``` - -
-Haga clic aquí para ver la salida en la consola - -

- hello world console output -

- -
- -La imagen de Docker descarga el conjunto de datos de entrenamiento y pruebas del Fashion-MNIST y le dice a Jina que indexe 60.000 imágenes de los datos de entrenamiento. La imagen de Docker selecciona muestras aleatorias de imágenes de prueba, las define como consultas y le pide a Jina que extraiga los resultados relevantes. Todo este proceso toma alrededor de 1 minuto, y eventualmente abre una página web con resultados, que se ven así: - -

- Jina banner -

- -La implementación detrás de esto es simple: - - - - - - - - - - - - -
Python API o use YAML spec o use Dashboard
- - -```python -from jina.flow import Flow - -f = (Flow() - .add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) - -with f: - f.index(fashion_mnist, batch_size=1024) -``` - - - -```yaml -!Flow -pods: - encode: - uses: encoder.yml - parallel: 2 - index: - uses: indexer.yml - shards: 2 -``` - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -
-Explore el sharding, la contenedorización, la concatenación de incrustaciones y más - -#### Adicionando Paralelismo y Sharding - -```python -from jina.flow import Flow - -f = (Flow().add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) -``` - -#### [Distribuyendo Flow](https://docs.jina.ai/chapters/remote/index.html) - -```python -from jina.flow import Flow - -f = Flow().add(uses='encoder.yml', host='192.168.0.99') -``` - -#### [Usando un Contenedor de Docker](https://docs.jina.ai/chapters/hub/index.html) - -```python -from jina.flow import Flow - -f = (Flow().add(uses='jinahub/cnn-encode:0.1') - .add(uses='jinahub/faiss-index:0.2', host='192.168.0.99')) -``` - -#### Conectando embeddings - -```python -from jina.flow import Flow - -f = (Flow().add(name='eb1', uses='BiTImageEncoder') - .add(name='eb2', uses='KerasImageEncoder', needs='gateway') - .needs(['eb1', 'eb2'], uses='_concat')) -``` - -#### [Permitindo Network Query](https://docs.jina.ai/chapters/restapi/index.html) - -```python -from jina.flow import Flow - -f = Flow(port_expose=45678, rest_api=True) - -with f: - f.block() -``` - -¿Está interesado? Explora otras opciones: - -```bash -jina hello-world --help -``` -
- -### Cree su primer proyecto con Jina - -```bash -pip install jina[devel] -jina hub new --type app -``` - -Puede crear fácilmente un proyecto con Jina a partir de plantillas, sólo con un comando en la terminal. Este comando de arriba crea un punto de entrada de Python(entrypoint), ajustes de YAML y un Dockerfile. Puedes empezar desde ahí. - - -## Tutoriales - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: Lo primero que debe aprender sobre Jina

-  English • - 日本語 • - Français • - Português • - Deutsch • - Русский язык • - 中文 • - عربية -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NivelTutoriales

🐣

-

Construir un sistema de búsqueda semántica con PLN (NLP)

-Busque los scripts de South Park y practique con Flows y Pods -

🐣

-

Mi primera app con Jina

-Usa cookiecutter para iniciar una app con Jina -

🐣

-

Fashion Search con Query Language (lenguaje de consulta)

-Hacer que Hello-World sea más interesante con un lenguaje de consulta -

🕊

-

Use Chunk para buscar letras de canciones

-Divida los documentos para buscar a nivel detallado -

🕊

-

Mezcle y combine imágenes y subtítulos

-Busque el cross modal para obtener imágenes de sus subtítulos y viceversa -

🚀

-

Aumente la intensidad de la búsqueda semántica de video

-Mejore el rendimiento utilizando el prefetching y el sharding -
- -## Documentación - - - - - -La mejor manera de aprender Jina en profundidad es leyendo nuestra documentación. La documentación se construye sobre cada actualización y publicación en la rama master. - -#### El básico - -- [Utilice la API de flujo para componer su Workflow (flujo de trabajo) de búsqueda](https://docs.jina.ai/chapters/flow/index.html) -- [Funciones de entrada y salida en Jina](https://docs.jina.ai/chapters/io/index.html) -- [Registra y monitorea con el Dashboard gráfico de Jina](https://github.com/jina-ai/dashboard) -- [Distribuya su Workflow(flujo de trabajo) de forma remota](https://docs.jina.ai/chapters/remote/index.html) -- [Construye tu Pod en una imagen Docker: Cómo y por qué](https://docs.jina.ai/chapters/hub/index.html) - -#### Referencia - -- [Argumentos de la interfaz de la línea de comando(CLI)](https://docs.jina.ai/chapters/cli/index.html) -- [interfaz Python API](https://docs.jina.ai/api/jina.html) -- [Sintaxis YAML para Executor, Driver y Flow](https://docs.jina.ai/chapters/yaml/yaml.html) -- [Protobuf schema](https://docs.jina.ai/chapters/proto/index.html) -- [Las variables de entorno](https://docs.jina.ai/chapters/envs.html) -- ... [y más](https://docs.jina.ai/index.html) - -¿Eres una estrella del "Doc"? ¡Únase a nosotros! Toda clase de ayuda con la documentación es bienvenida. - -[La documentación de las versiones anteriores está archivada aquí](https://github.com/jina-ai/docs/releases). - -## Contribuyendo - -Todo tipo de contribuciones de la comunidad de código abierto son bienvenidas, individuos y socios. Debemos nuestro éxito a su participación activa. - -- [Pautas para la contribución](CONTRIBUTING.md) -- [Ciclos de publicación y etapas de desarrollo](RELEASE.md) - -### Colaboradores ✨ - - -[![All Contributors](https://img.shields.io/badge/all_contributors-66-orange.svg?style=flat-square)](#contributors-) - - - - - - - - - - - - - - - - - - - - -## Comunidad - -- [Slack workspace](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - Únase al General en nuestro Slack para conocer al equipo y hacer preguntas -- [Canal en YouTube](https://youtube.com/c/jina-ai) - regístrese para nuestros últimos tutoriales, demostraciones de lanzamiento, seminarios web y presentaciones -- [LinkedIn](https://www.linkedin.com/company/jinaai/) - conozca Jina AI como empresa y encuentre oportunidades de trabajo -- [![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - síganos e interactue con nosotros usando hashtag `#JinaSearch` -- [Empresa](https://jina.ai) - aprenda más sobre nuestra empresa y cómo estamos totalmente comprometidos con el código abierto. - -## Gobernanza abierta -[Marcos/milestones GitHub](https://github.com/jina-ai/jina/milestones) planee el camino para las futuras mejoras de Jina. - -Como parte de nuestro modelo de gobernanza abierta, alojamos [Engineering All Hands]((https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/)) de Jina publicamente. Esta reunión en Zoom tiene lugar mensualmente el segundo martes de cada mes a las 14:00-15:30 (CET). Cualquiera puede unirse mediante la siguiente invitación del calendario. - -- [Adicionar al Google Calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL) -- [Download .ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics) - -Se hará una transmisión en vivo de la reunión, que luego se publicará en nuestro [Canal de YouTube](https://youtube.com/c/jina-ai). - -## Únase - -Jina es un proyecto open-source. [Estamos contratando](https://jobs.jina.ai) desarrolladores full-stack, evangelistas, y PMs para construir el próximo ecosistema de búsqueda neural de código abierto(open-source) - -## Licencia - -Copyright (c) 2020 Jina AI Limited. All rights reserved. - -Jina está licenciada bajo la Licencia Apache, Version 2.0. [Ver LICENCIA para el texto completo de la licencia.](LICENSE) diff --git a/.github/i18n/README.fr.md b/.github/i18n/README.fr.md deleted file mode 100644 index 4e2a9b0205941..0000000000000 --- a/.github/i18n/README.fr.md +++ /dev/null @@ -1,390 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai) -[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai) - - tweet button - -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]() -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22) - -

- -

- English • - Deutsch • - 中文 • - 日本語 • - 한국어 • - Português • - Русский язык • - український • - Español -

- -

- Website • - Docs • - Examples • - Newsletter • - - Dashboard (beta) • - Twitter • - We are Hiring - -

- -Vous souhaitez mettre en place un système de recherche fondé sur un apprentissage approfondi ? Vous êtes au bon endroit ! - -Jina est le cadre de recherche neuronale natif des nuages alimenté par l'IA de pointe et l'apprentissage profond. Elle est soutenue à long terme par une équipe à plein temps, soutenue par le capital-risque. - -🌌**La solution de recherche universelle** - Jina permet l'indexation et la recherche à grande échelle de tout type de données sur de multiples plateformes et architectures. Que vous recherchiez des images, des clips vidéo, des extraits audio, de longs documents juridiques ou de courts tweets, Jina peut les gérer tous. - -🚀**Performances et état de l'art** - Jina vise l'AI-in-production. Vous pouvez facilement adapter votre VideoBERT, Xception, votre word tokenizer, votre segmenteur d'images et votre base de données pour traiter des données de plusieurs milliards de niveaux. Des fonctionnalités telles que les répliques et les tessons sont disponibles sur le marché. - -🐣**L'ingénierie des systèmes rendue facile** - Jina propose une solution unique qui vous libère de l'artisanat et du collage de paquets, de bibliothèques et de bases de données. Avec l'API la plus intuitive et[tableau de bord](https://github.com/jina-ai/dashboard)La construction d'un système de recherche dans les nuages n'est qu'une question de minutes. - -Jina est un projet open-source[Nous recrutons](https://jobs.jina.ai) Des ingénieurs en IA, des développeurs, des évangélistes et des députés pour construire le prochain écosystème de recherche neurale en open-source - -## Table des matières - - - - - - -- [Installez](#installez) -- [Jina "Bonjour, le monde ! 👋🌍](#jina-bonjour-le-monde--) -- [Pour commencer](#pour-commencer) -- [Documentation](#documentation) -- [Contribuer à](#contribuer-%C3%A0) -- [Communauté](#communaut%C3%A9) -- [Feuille de route](#feuille-de-route) -- [Licence](#licence) - - - -## Installez - -#### Installer à partir de PyPi - -Sous Linux/MacOS avec Python >= 3.7 installé, il suffit d'exécuter cette commande dans votre terminal : - -```bash -pip install jina -``` - -Pour installer Jina avec des dépendances supplémentaires, ou l'installer sur Raspberry Pi[veuillez vous référer aux documentations](https://docs.jina.ai). - -#### ...ou courir avec un conteneur de docker - -Nous fournissons une image Docker universelle (seulement 80MB !) qui supporte plusieurs architectures (y compris x64, x86, arm-64/v7/v6), il suffit de le faire - -```bash -docker run jinaai/jina -``` - -## Jina "Bonjour, le monde ! 👋🌍 - -Pour commencer, vous êtes invités à essayer "Hello, World" de Jina - une simple démo de recherche neuronale d'images[Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Aucune dépendance supplémentaire n'est nécessaire, il suffit de le faire : - -```bash -jina hello-world -``` - -...ou encore plus facile pour les utilisateurs de Docker,**aucune installation n'est nécessaire,** simplement : - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux -``` - -
-Click here to see the console output - -

- hello world console output -

- -
- -Il télécharge les données de la formation et des tests du Fashion-MNIST ; il indique à Jina d'indexer 60 000 images de la série de formation. Ensuite, il échantillonne aléatoirement des images de la série de tests sous forme de requêtes, et demande à Jina de récupérer les résultats pertinents. Au bout d'une minute environ, elle ouvre une page web et affiche les résultats comme ceci : - -

- Jina banner -

- -Et la mise en œuvre derrière ? Aussi simple qu'elle devrait l'être : - - - - - - - - - - - - -
Python API index.yml Flow in Dashboard
- -```python -from jina.flow import Flow - -f = Flow.load_config('index.yml') - -with f: - f.index(inputs) -``` - - - - -```yaml -!Flow -pods: - chunk_seg: - uses: helloworld.crafter.yml - replicas: $REPLICAS - read_only: true - doc_idx: - uses: helloworld.indexer.doc.yml - encode: - uses: helloworld.encoder.yml - needs: chunk_seg - replicas: $REPLICAS - chunk_idx: - uses: helloworld.indexer.chunk.yml - replicas: $SHARDS - join_all: - uses: _pass - needs: [doc_idx, chunk_idx] - read_only: true -``` - - - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -Tous les grands mots que vous pouvez nommer : vision par ordinateur, IR neuronal, micro-services, file d'attente de messages, élastiques, répliques et tessons sont arrivés en une minute seulement ! - -Intrigué ? Jouez et essayez différentes options : - -```bash -jina hello-world --help -``` - -[Assurez-vous de continuer avec notre guide Jina 101](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - comprendre tous les concepts clés de Jina en 3 minutes ! - -## Pour commencer - -### Démarrer un projet à partir du modèle - -```Bash -pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina -``` - -### Tutoriels - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -

Jina 101: First Thing to Learn About Jina

- English • - 日本語 • - français • - Português • - Русский язык • - 中文 -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TutorialsLevel
-

Use Flow API to Compose Your Search Workflow

-Learn how to orchestrate Pods to work together: sequentially and in parallel; locally and remotely -

🐣

-

Input and Output Functions in Jina

-Learn how the input and output functions work in Jina -

🐣

-

Use Dashboard to Get Insight of Jina Workflow

-Learn to use dashboard to monitor and get insight of a running workflow -

🐣

-

From BERT-as-Service to X-as-Service

-Learn how to use Jina to extract feature vector using any deep learning representation -

🐣

-

Build a NLP Semantic Search System

-Learn how to build a script search system for South Park and practice your knowledge on Flows and Pods -

🐣

-

Build a Flower Image Search System

-Learn how to build an image search system and define you own executors and run them in docker -

🐣

-

Video Semantic Search in Scale with Prefetching and Sharding

-Learn how to increase the performance by using prefetching and sharding -

🕊

-

Revisit "Hello, World!" in a Client-Server Architecture

-Learn how to run a Flow remotely and use a local client to connect to it -

🕊

-

Distribute Your Workflow Remotely

-Learn to run Jina on remote instances and distribute your workflow -

🕊

-

Extend Jina by Implementing Your Own Executor

-Learn how to implement your own ideas into Jina's plugin -

🕊

-

Run Jina Pod via Docker Container

-Learn how Jina solves complex dependencies easily with Docker container -

🕊

-

Google's Big Transfer Model in (Poké-)Production

-Learn to how to use SOTA visual representation for searching Pokémon! -

🚀

- - -## Documentation - - - - - -La meilleure façon d'apprendre Jina en profondeur est de lire notre documentation. La documentation est construite sur chaque événement de poussée, de fusion et de libération de la branche maîtresse. Vous pouvez trouver plus de détails sur les sujets suivants dans notre documentation. - -- [Explication des arguments de l'interface en ligne de commande de Jina](https://docs.jina.ai/chapters/cli/index.html) -- [Interface API Jina Python](https://docs.jina.ai/api/jina.html) -- [Syntaxe Jina YAML pour l'exécuteur, le conducteur et le flux](https://docs.jina.ai/chapters/yaml/yaml.html) -- [Schéma de Jina Protobuf](https://docs.jina.ai/chapters/proto/index.html) -- [Variables environnementales utilisées dans Jina](https://docs.jina.ai/chapters/envs.html) -- ..[et plus](https://docs.jina.ai/index.html) - -Etes-vous une star du "Doc" ? Affirmatif ? Rejoignez-nous ! Nous accueillons toutes sortes d'améliorations de la documentation - -[Les documents des anciennes versions sont archivés ici](https://github.com/jina-ai/docs/releases). - -## Contribuer à - -Nous accueillons toutes sortes de contributions de la communauté open-source, des individus et des partenaires. Sans votre participation active, Jina ne pourra pas réussir. - -Les ressources suivantes vous aideront à faire une bonne première contribution : - -- [Directives de contribution](CONTRIBUTING.md) -- [Cycles de diffusion et stades de développement](RELEASE.md) - -## Communauté - -- [Canal Slack](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - une plateforme de communication pour les développeurs afin de discuter de Jina -- [Bulletin d'information communautaire](mailto:newsletter+subscribe@jina.ai) - s'abonner à la dernière mise à jour, au communiqué et à l'actualité de Jina -- [LinkedIn](https://www.linkedin.com/company/jinaai/) - apprendre à connaître Jina AI en tant qu'entreprise et trouver des opportunités d'emploi -- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - nous suivre et interagir avec nous en utilisant hashtag`#JinaSearch` -- [Société](https://jina.ai) - pour en savoir plus sur notre entreprise, nous nous engageons pleinement en faveur de l'open-source ! - -## Feuille de route - -[Les grandes étapes de GitHub](https://github.com/jina-ai/jina/milestones) tracer la voie vers les améliorations futures. - -Nous recherchons des partenariats pour construire un modèle de gouvernance ouverte (par exemple un comité de pilotage technique) autour de Jina, qui permette un écosystème open source sain et une culture favorable aux développeurs. Si vous souhaitez participer, n'hésitez pas à nous contacter à l'adresse suivante[hello@jina.ai](mailto:hello@jina.ai). - -## Licence - -Copyright (c) 2020 Jina AI Limited. Tous droits réservés. - -Jina est sous licence Apache, version 2.0[Voir LICENCE pour le texte complet de la licence.](LICENSE) diff --git a/.github/i18n/README.ja.md b/.github/i18n/README.ja.md deleted file mode 100644 index 1e546d3bc44b7..0000000000000 --- a/.github/i18n/README.ja.md +++ /dev/null @@ -1,388 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai) -[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai) - - tweet button - -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]() -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22) -

- -

- English • - Français • - Deutsch • - 中文 • - 한국어 • - Português • - Русский язык • - український • - Español -

- -

- Website • - Docs • - Examples • - Newsletter • - - Dashboard (beta) • - Twitter • - We are Hiring - -

- -ディープラーニングに裏打ちされた検索システムを構築してみませんか?あなたは正しい場所に来ています! - -Jinaは、最先端のAIとディープラーニングを搭載したクラウドネイティブのニューラル検索フレームワークです。ベンチャーからの支援を受けたフルタイムのチームによって長期的にサポートされています。 - -🌌**ユニバーサル検索ソリューション** - Jinaは、複数のプラットフォームとアーキテクチャ上で、あらゆる種類の大規模なインデックスとクエリを可能にします。画像、ビデオクリップ、オーディオスニペット、長い法律文書、短いツイートなど、Jinaはそれらすべてを処理することができます。 - -🚀**ハイパフォーマント&ステートオブザアート** - JinaはAIインプロダクションを目指しています。VideoBERT、Xception、ワードトークナイザー、画像セグメンテーション、データベースを簡単にスケールアウトして、10億レベルのデータを扱うことができます。レプリカやシャードなどの機能は、すぐに使えるようになっています。 - -🐣**システムエンジニアリングが簡単に** - Jinaは、パッケージやライブラリ、データベースの手作りや接着から解放されるワンストップソリューションを提供しています。最も直感的なAPIと[ダッシュボード](https://github.com/jina-ai/dashboard)クラウドネイティブの検索システムを構築するのは、ほんの少しのことです。 - -Jinaはオープンソースのプロジェクトです。[採用しています。](https://jobs.jina.ai)AIエンジニア、フルスタック開発者、エバンジェリスト、PMがオープンソースで次のニューラル検索エコシステムを構築する。 - -## 目次 - - - - - - -- [インストール](#%E3%82%A4%E3%83%B3%E3%82%B9%E3%83%88%E3%83%BC%E3%83%AB) -- [ジーナ「こんにちはwwww👋🌍](#%E3%82%B8%E3%83%BC%E3%83%8A%E3%81%93%E3%82%93%E3%81%AB%E3%81%A1%E3%81%AF%EF%BD%97%EF%BD%97%EF%BD%97%EF%BD%97) -- [はじめに](#%E3%81%AF%E3%81%98%E3%82%81%E3%81%AB) -- [ドキュメント](#%E3%83%89%E3%82%AD%E3%83%A5%E3%83%A1%E3%83%B3%E3%83%88) -- [貢献](#%E8%B2%A2%E7%8C%AE) -- [コミュニティ](#%E3%82%B3%E3%83%9F%E3%83%A5%E3%83%8B%E3%83%86%E3%82%A3) -- [ロードマップ](#%E3%83%AD%E3%83%BC%E3%83%89%E3%83%9E%E3%83%83%E3%83%97) -- [ライセンス](#%E3%83%A9%E3%82%A4%E3%82%BB%E3%83%B3%E3%82%B9) - - - -## インストール - -#### PyPi からのインストール - -Python >= 3.7 がインストールされている Linux/MacOS では、ターミナルでこのコマンドを実行するだけです。 - -```bash -pip install jina -``` - -Jinaを追加の依存関係でインストールしたり、Raspberry Piにインストールしたりするには[資料を参照してください](https://docs.jina.ai). - -#### ...またはDockerコンテナで実行 - -複数アーキテクチャ(x64、x86、arm-64/v7/v6を含む)をサポートするユニバーサルなDockerイメージ(わずか80MB!)を提供し、シンプルに行います。 - -```bash -docker run jinaai/jina -``` - -## ジーナ「こんにちはwwww👋🌍 - -スターターとして、Jinaの "Hello, World "を試してみてください。[ファッションMNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/).余分な依存関係は必要ありません。 - -```bash -jina hello-world -``` - -...あるいは、Dockerユーザーにとってはもっと簡単になります。**必要なインストールはありません**。単純に - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux -``` - -
-Click here to see the console output - -

- hello world console output -

- -
- -それは Fashion-MNIST のトレーニングデータとテストデータをダウンロードし、Jina にトレーニングセットから 60,000 個の画像をインデックス化するように指示します。そして、テストセットからランダムに画像をクエリーとしてサンプリングし、Jinaに関連する結果を取得するように指示します。約1分後にウェブページが開き、このような結果が表示されます。 - -

- Jina banner -

- -そして、その背後にある実装は?当然のようにシンプルです。 - - - - - - - - - - - - -
Python API index.yml Flow in Dashboard
- -```python -from jina.flow import Flow - -f = Flow.load_config('index.yml') - -with f: - f.index(inputs) -``` - - - - -```yaml -!Flow -pods: - chunk_seg: - uses: helloworld.crafter.yml - replicas: $REPLICAS - read_only: true - doc_idx: - uses: helloworld.indexer.doc.yml - encode: - uses: helloworld.encoder.yml - needs: chunk_seg - replicas: $REPLICAS - chunk_idx: - uses: helloworld.indexer.chunk.yml - replicas: $SHARDS - join_all: - uses: _pass - needs: [doc_idx, chunk_idx] - read_only: true -``` - - - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -コンピュータビジョン、ニューラルIR、マイクロサービス、メッセージキュー、エラスティック、レプリカ、シャードなどのビッグワードが1分で実現しました。 - -興味をそそられましたか?プレイして、さまざまなオプションを試してみてください。 - -```bash -jina hello-world --help -``` - -[ジナ101ガイドの続きをお忘れなく](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - 3分でジーナの重要な概念をすべて理解する! - -## はじめに - -### テンプレートからプロジェクトを開始する - -```bash -pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina -``` - -### チュートリアル - - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: First Thing to Learn About Jina

-  English • - 日本語 • - français • - Português • - Русский язык • - 中文 -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TutorialsLevel
-

Use Flow API to Compose Your Search Workflow

-Learn how to orchestrate Pods to work together: sequentially and in parallel; locally and remotely -

🐣

-

Input and Output Functions in Jina

-Learn how the input and output functions work in Jina -

🐣

-

Use Dashboard to Get Insight of Jina Workflow

-Learn to use dashboard to monitor and get insight of a running workflow -

🐣

-

From BERT-as-Service to X-as-Service

-Learn how to use Jina to extract feature vector using any deep learning representation -

🐣

-

Build a NLP Semantic Search System

-Learn how to build a script search system for South Park and practice your knowledge on Flows and Pods -

🐣

-

Build a Flower Image Search System

-Learn how to build an image search system and define you own executors and run them in docker -

🐣

-

Video Semantic Search in Scale with Prefetching and Sharding

-Learn how to increase the performance by using prefetching and sharding -

🕊

-

Revisit "Hello, World!" in a Client-Server Architecture

-Learn how to run a Flow remotely and use a local client to connect to it -

🕊

-

Distribute Your Workflow Remotely

-Learn to run Jina on remote instances and distribute your workflow -

🕊

-

Extend Jina by Implementing Your Own Executor

-Learn how to implement your own ideas into Jina's plugin -

🕊

-

Run Jina Pod via Docker Container

-Learn how Jina solves complex dependencies easily with Docker container -

🕊

-

Google's Big Transfer Model in (Poké-)Production

-Learn to how to use SOTA visual representation for searching Pokémon! -

🚀

- - -## ドキュメント - - - - - -Jina をより深く学ぶための最良の方法はドキュメントを読むことです。ドキュメントはマスターブランチのプッシュ、マージ、リリースイベントごとに構築されています。以下のトピックについての詳細はドキュメントを参照してください。 - -- [Jinaのコマンドラインインターフェースの引数の説明](https://docs.jina.ai/chapters/cli/index.html) -- [Jina Python API インターフェース](https://docs.jina.ai/api/jina.html) -- [実行者、ドライバ、フローのための Jina YAML 構文](https://docs.jina.ai/chapters/yaml/yaml.html) -- [ジナ・プロトバフスキーマ](https://docs.jina.ai/chapters/proto/index.html) -- [Jinaで使用される環境変数](https://docs.jina.ai/chapters/envs.html) -- ..[此れ以上](https://docs.jina.ai/index.html) - -あなたは "ドク "スターですか?肯定的ですか?私たちに参加してください。私たちはドキュメントの改善を歓迎します。 - -[古いバージョンのドキュメントはこちらにアーカイブされています。](https://github.com/jina-ai/docs/releases). - -## 貢献 - -私たちは、オープンソースコミュニティ、個人、パートナーからのあらゆる種類の貢献を歓迎します。皆さんの積極的な参加なくして、Jinaは成功しません。 - -以下のリソースを参考にしていただければ、初投稿がうまくいきます。 - -- [投稿ガイドライン](CONTRIBUTING.md) -- [リリースサイクルと開発ステージ](RELEASE.md) - -## コミュニティ - -- [スラックチャンネル](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - jinaについて議論するための開発者のためのコミュニケーションプラットフォーム -- [コミュニティ・ニュースレター](mailto:newsletter+subscribe@jina.ai) - ジナの最新更新情報、リリース情報、イベント情報を購読する -- [LinkedIn](https://www.linkedin.com/company/jinaai/) - jina AIの企業情報を知り、求人情報を探す -- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - 私たちをフォローして、ハッシュタグを使って交流してください。`#JinaSearch` -- [会社概要](https://jina.ai) - 私たちの会社の詳細を知るために、私たちは完全にオープンソースにコミットしています! - -## ロードマップ - -[GitHub のマイルストーン](https://github.com/jina-ai/jina/milestones)今後の改善に向けての道筋を示します。 - -健全なオープンソースエコシステムと開発者に優しい文化を可能にするJinaを中心としたオープンガバナンスモデル(技術運営委員会など)を構築するためのパートナーシップを募集しています。参加をご希望の方は、下記までお気軽にご連絡ください。[hello@jina.ai](mailto:hello@jina.ai). - -## ライセンス - -Copyright (c) 2020 Jina AI Limited.すべての権利を保有しています。 - -JinaはApache License, Version 2.0の下でライセンスされています。[ライセンスの全文はLICENSEを参照してください。](LICENSE) diff --git a/.github/i18n/README.kr.md b/.github/i18n/README.kr.md deleted file mode 100644 index afdefecbeb32b..0000000000000 --- a/.github/i18n/README.kr.md +++ /dev/null @@ -1,425 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/) -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/) -[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina) - -

- -

- English • - Français • - Deutsch • - 中文 • - 日本語 • - Português • - Русский язык • - український • - Español -

- - -

- 웹사이트 • - 문서들 • - • - 허브(허브) • - 대시보드(메시지) • - 지나복스 (beta) • - 트위터 • - 고용정보. - -

- -지나(Jina)는 AI로 구동되는 검색 프레임워크로 개발자가 클라우드 상에 **크로스/멀티-모달 검색 시스템**(예: 텍스트, 이미지, 비디오, 오디오)을 만들 수 있도록 한다. 지나는 [풀타임, 벤처후원팀]의 지원을 받고 있다.(https://jina.ai). - -⏱️ **시간 절약** - 몇 분 안에 AI로 구동되는 시스템을 부트스트랩한다. - -🧠 **최상의 AI 모델** - 지나(Jina)는 신경 검색 시스템의 새로운 디자인 패턴으로, [최첨단 AI 모델]을 최상급으로 지원한다.(https://docs.jina.ai/chapters/all_exec.html). - -🌌 **광범위한 검색** - 여러 플랫폼에서 모든 종류의 대규모 인덱싱 및 데이터 쿼리를 지원한다: 비디오, 이미지, 긴/짧은 텍스트, 음악, 소스 코드 등 - -🚀 **클라우드 준비** - 컨테이너화, 마이크로 서비스, 배포, 확장, 샤딩, 비동기 IO, REST, gRPC와 같은 클라우드 네이티브 기능을 사용하는 분산형 아키텍쳐이다. - -🧩 **플러그 앤 플레이** - Pythonic 인터페이스로 쉽게 확장할 수 있다. - -## Contents - - - - - - -- [시작](#%EC%B0%A9%EC%88%98%ED%95%98%EB%8B%A4) -- [Jina “Hello, World!” 👋🌍](#jina-%EC%95%88%EB%85%95-%EC%84%B8%EA%B3%84-) -- [튜토리얼](#%EC%9E%90%EC%8A%B5%EC%84%9C) -- [문서화](#%EB%AC%B8%EC%84%9C%ED%99%94) -- [기여](#%EA%B8%B0%EC%97%AC%ED%95%98%EB%8A%94) -- [커뮤니티](#community) -- [오픈 거버넌스](#%EC%98%A4%ED%94%88-%EA%B1%B0%EB%B2%84%EB%84%8C%EC%8A%A4) -- [참여하기](#%EC%B0%B8%EC%97%AC%ED%95%98%EA%B8%B0) -- [라이선스](#%EB%A9%B4%ED%97%88%EC%A6%9D) - - - -## 설치 - -Python 3.7/3.8이 포함된 Linux/MacOS: - -```bash -pip install jina -``` - -추가적인 의존성을 가진 Jina를 설치하거나, Raspberry Pi에 설치하고자 한다면, [문서를 참조해라.](https://docs.jina.ai). - -⚠️ 윈도우 사용자들은 jina를 [윈도우상의 리눅스 하위 시스템](https://docs.jina.ai/chapters/install/via-pip.html?highlight=windows#on-windows-and-other-oses)을 통해 사용할 수 있다. 우리 커뮤니티는 [윈도우 지원](https://github.com/jina-ai/jina/issues/1252)에 대한 도움을 환영하고 있다. - - -### Docker 컨테이너 - -여러 아키텍쳐(x64, x86, arm-64/v7/v6을 포함)를 지원하는 범용적인 Docker 이미지를 제공한다. 아무것도 설치할 필요 없이, 그냥 수행하면 된다. - -```bash -docker run jinaai/jina --help -``` - -## Jina "Hello, World!" 👋🌍 - -스타터로서, [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/)를 위한 이미지 신경 검색의 간단한 데모인 "Hello, World!"를 사용해보세요. 추가 종속성이 필요하지 않으며 다음을 실행하십시오.: - -```bash -jina hello-world -``` - -...또는 Docker 사용자의 경우, **설치가 필요하지 않습니다.**: - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux -``` - -
-콘솔 출력을 보려면 여기를 클릭하십시오. - -

- hello world console output -

- -
- -이것은 Fashion-MNIST 교육과 테스트 데이터 세트를 다운로드하고 지나에게 교육 세트에서 6만 개의 이미지를 인덱싱하라고 말한다. 그런 다음 검사 세트에서 무작위로 영상을 샘플링해 조회하고 지나에게 관련 결과를 가져오라고 한다. 전체 과정은 약 1분이 소요되며, 결과적으로 웹 페이지를 열고 다음과 같은 결과를 보여준다. - -

- Jina banner -

- -이면의 구현은 간단하다: - - - - - - - - - - - - -
Python API or use YAML spec or use Dashboard
- - -```python -from jina.flow import Flow - -f = (Flow() - .add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) - -with f: - f.index(fashion_mnist, batch_size=1024) -``` - - - -```yaml -!Flow -pods: - encode: - uses: encoder.yml - parallel: 2 - index: - uses: indexer.yml - shards: 2 -``` - - - -![대시보드의 흐름](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -
-샤딩, 컨테이너화, 임베딩 연결 등을 살펴보십시오. - -#### 병렬 및 샤딩 추가 - -```python -from jina.flow import Flow - -f = (Flow().add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) -``` - -#### [플로우 배포](https://docs.jina.ai/chapters/remote/index.html) - -```python -from jina.flow import Flow - -f = Flow().add(uses='encoder.yml', host='192.168.0.99') -``` - -#### [Docker 컨테이너 ](https://docs.jina.ai/chapters/hub/index.html) - -```python -from jina.flow import Flow - -f = (Flow().add(uses='jinahub/cnn-encode:0.1') - .add(uses='jinahub/faiss-index:0.2', host='192.168.0.99')) -``` - -#### 연결 임베딩 - -```python -from jina.flow import Flow - -f = (Flow().add(name='eb1', uses='BiTImageEncoder') - .add(name='eb2', uses='KerasImageEncoder', needs='gateway') - .needs(['eb1', 'eb2'], uses='_concat')) -``` - -#### [네트워크 쿼리 사용](https://docs.jina.ai/chapters/restapi/index.html) - -```python -from jina.flow import Flow - -f = Flow(port_expose=45678, rest_api=True) - -with f: - f.block() -``` - -흥미롭다면? 다른 옵션으로 재생하세요: - -```bash -jina hello-world --help -``` -
- -### 첫 번째 Jina 프로젝트 생성하기 - -```bash -pip install jina[devel] -jina hub new --type app -``` - -하나의 터미널 명령으로 템플릿에서 쉽게 지나 프로젝트를 만들 수 있다. 이를 통해 Python 진입점, YAML 구성 및 Docker 파일이 생성된다. 그곳에서부터 귀하가 시작할 수 있다. - - -## 튜토리얼 - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: First Things to Learn About Jina

-  English • - 日本語 • - Français • - Português • - Deutsch • - Русский язык • - 中文 • - عربية • - Korean -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
레벨튜토리얼

🐣

-

NLP 의미 검색 시스템 구축

-South Park의 문서를 검색하고 Flow와 Pods를 이용하여 연습해라 -

🐣

-

내 첫 Jina 앱

-Jina 앱을 bootstarp하기 위하여 cookiecutter를 사용 -

🐣

-

쿼리 언어를 사용한 패션 검색

-쿼리 언어로 Hello-World에 활기 불어넣기 -

🕊

-

청크를 사용하여 가사 검색

-Findgrained level에서 검색하기 위하여 문서를 쪼개기 -

🕊

-

이미지와 캡션을 믹스 앤 매치

-이미지로부터 캡션 또는 캡션으로부터 이미지를 얻기 위하여 크로스 모달을 검색 -

🚀

-

스케일업 비디오 의미 검색

-프리패칭과 샤딩을 이용한 퍼포먼스의 향상 -
- -## 문서화 - - - - - -지나를 깊이 있게 배우는 가장 좋은 방법은 우리의 문서를 읽는 것이다. 문서는 마스터 브랜치의 모든 푸쉬, 머지, 릴리즈에 기초하여 작성된다. - -#### 기본 사항 - -- [Flow API를 사용하여 검색 워크플로우 구성](https://docs.jina.ai/chapters/flow/index.html) -- [Jina의 입력 및 출력 기능](https://docs.jina.ai/chapters/io/index.html) -- [Dashboard를 사용하여 jina 워크플로우의 인사이트 확보](https://github.com/jina-ai/dashboard) -- [워크플로우를 원격으로 배포](https://docs.jina.ai/chapters/remote/index.html) -- [Docker Container를 통해 Jina 포드 실행](https://docs.jina.ai/chapters/hub/index.html) - -#### 참조 - -- [command line 인터페이스 논의](https://docs.jina.ai/chapters/cli/index.html) -- [파이썬 API 인터페이스](https://docs.jina.ai/api/jina.html) -- [Executor과 Driver, Flow를 위한 VAML 문장](https://docs.jina.ai/chapters/yaml/yaml.html) -- [Protobuf 스키마](https://docs.jina.ai/chapters/proto/index.html) -- [환경변수](https://docs.jina.ai/chapters/envs.html) -- ... [그 외](https://docs.jina.ai/index.html) - -당신은 “DOC” 스타인가요? 우리와 함께해요! 우리는 문서에 대한 모든 종류의 개선을 환영합니다. - -[이전 버전에 대한 설명서는 여기에 보관되어 있다.](https://github.com/jina-ai/docs/releases). - -## 기여 - -우리는 오픈 소스 커뮤니티, 개인 및 파트너의 모든 종류의 기부를 환영한다. 우리의 성공은 당신의 적극적인 참여 덕분이다. - -- [기여 지침](CONTRIBUTING.md) -- [릴리스 주기 및 개발 단계](RELEASE.md) - -### 기부자 ✨ - - -[![모든 기부자](https://img.shields.io/badge/all_contributors-74-orange.svg?style=flat-square)](#기부자-) - - - - - - - - - - - - - - - - - - - - - -## community - -- [Slack 작업영역](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - #장군에 합류하여 우리 슬랙을 팀원들과 만나 질문하다. -- [유튜브 채널](https://youtube.com/c/jina-ai) - 최신 비디오 튜토리얼, 릴리즈 데모, 웨비나 및 프리젠테이션을 구독하십시오. -- [링크드인](https://www.linkedin.com/company/jinaai/) - 지나 AI를 기업으로서 알게 되고 취업의 기회를 찾다. -- [![트위터 팔로우](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - 해시태그로 우리와 교류하다. `#JinaSearch` -- [회사](https://jina.ai) - 우리 회사에 대해 더 많이 알고 어떻게 우리가 오픈소스에 전념하고 있는지 알고 있다.. - -## 오픈 거버넌스 - -[깃허브 이정표](https://github.com/jina-ai/jina/milestones)로 Jina의 미래 개선점들에 대한 윤곽을 잡았음 - -여러분은 우리의 오픈 거버넌스 모델의 일환으로 모두를 위한 Jina의 공학을 주최한다. -Zoom미팅은 매달 두 번째 화요일마다 진행을 하며 시간은 14:00-15:30(CET)이다. Calendar 초대를 통해 모두 참여가 가능하다. - -- [Google 캘린더에 추가](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL) -- [.ics다운로드 하기](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics) - -또한 이 회의는 생방송으로 송출될 것이며 이 후에 [유튜브 채널에 영상으로 제작될 것이다.](https://youtube.com/c/jina-ai). -## 참여하기 - -Jina는 오픈소스 프로젝트이다. 우리는 풀스택 개발자, evangelists, 프로젝트 매니저들을 [채용](https://jobs.jina.ai)하여 뉴럴 탐색 생태계를 오픈소스에 구축하려고 한다. - -## 라이선스 - -Copyright (c) 2020 Jina AI Limited. All rights reserved - -Jina는 Apache Licence 2.0을 사용한다. [라이선스 문서의 전문을 확인하기 위해서는 License를 참조하세요.](LICENSE) diff --git a/.github/i18n/README.pt_br.md b/.github/i18n/README.pt_br.md deleted file mode 100644 index 60b95a2eefdf1..0000000000000 --- a/.github/i18n/README.pt_br.md +++ /dev/null @@ -1,421 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/) -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/) -[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina) - -

- -

- English • - Français • - Deutsch • - 中文 • - 日本語 • - 한국어 • - Русский язык • - український • - Español -

- - -

- Website • - Docs • - Examples • - Hub (beta) • - Dashboard (beta) • - Jinabox (beta) • - Twitter • - We are Hiring -

- -Jina é um framework de pesquisa baseada em IA, que permite que desenvolvedores criem sistemas de busca **cross/multi-modals** (como texto, imagens, video, áudio) na nuvem. Jina é mantida a logo prazo por [um grupo movido pela aventura em tempo integral](https://jina.ai). - -⏱️ **Economia de tempo** - Inicie um sistema impulsionado por IA em poucos minutos. - -🧠 **Modelos de IA de primeira classe** - Jina é um novo padrão de design para sistemas neurais de pesquisa com apoio especial para [modelos de IA de state-of-the-art](https://docs.jina.ai/chapters/all_exec.html). - -🌌 **Busca universal** - Indexação de larga escala e consulta de qualquer tipo em várias plataformas. Vídeo, imagem, texto curto/longo, código-fonte, e mais. - -🚀 **Production Ready** - Features nativas de nuvem que funcionam "out-of-the-box" (fora da caixa), por ex. conteinerização, microsserviço, distribuição, escalabilidade (scaling), sharding, async IO, REST, gRPC. - -🧩 **Conecte e use** - Com [Jina Hub](https://github.com/jina-ai/jina-hub), é fácil expandir Jina com simples scripts em Python ou com Imagens Docker otimizadas para seu campo de pesquisa. - -## Sumário - - - - - - -- [Instalação](#instala%C3%A7%C3%A3o) -- [Jina "Olá, mundo!" 👋🌍](#jina-ol%C3%A1-mundo-) -- [Tutoriais](#tutoriais) -- [Documentação](#documenta%C3%A7%C3%A3o) -- [Contribuindo](#contribuindo) -- [Comunidade](#comunidade) -- [Governança Aberta](#governan%C3%A7a-aberta) -- [Junte-se a nós](#junte-se-a-n%C3%B3s) -- [Licença](#licen%C3%A7a) - - - -## Instalação - -### Com PyPi - -No sistemas operacionais Linux/MacOS com Python >= 3.7: - -```bash -pip install jina -``` - -Para instalar Jina em dependências adicionais ou no Raspberry Pi, [favor checar a documentação](https://docs.jina.ai). - -### Em um Container Docker - -Nós oferecemos uma Imagem Docker universal com suporte para diversos tipos de arquitetura (incluindo x64, x86, arm-64/v7/v6). É só rodar: - -```bash -docker run jinaai/jina --help -``` - -## Jina "Olá, mundo!" 👋🌍 - -Paara começar, você pode tentar nosso "Hello, World" (que significa "Olá, mundo") - uma simples demonstração de busca neural de imagem para [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Nenhuma outra dependência é necessária. É só rodar: - -```bash -jina hello-world -``` - -...ou, mais facilmente, para usuários de Docker, **sem instalação necessária**: - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # substituir "open" por "xdg-open" no Linux -``` - -
-Clique aqui para ver a saída do console - -

- hello world console output -

- -
- -A Imagem Docker baixa o treinamento e o dataset de teste do Fashion-MNIST e diz para Jina indexar 60.000 imagens dos dados de treinamento. Em seguida, a Imagem Docker seleciona amostras aleatórias de imagens do teste, as define como queries e pede para Jina extrair os resultados relevantes. Todo esse processo leva em torno de 1 minuto, e eventualmente abre uma página web com resultados, que se parecem com esse: - -

- Jina banner -

- -A implementação por trás disso é simples: - - - - - - - - - - - - -
Python API ou use YAML spec ou use Dashboard
- - -```python -from jina.flow import Flow - -f = (Flow() - .add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) - -with f: - f.index(fashion_mnist, batch_size=1024) -``` - - - -```yaml -!Flow -pods: - encode: - uses: encoder.yml - parallel: 2 - index: - uses: indexer.yml - shards: 2 -``` - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -
-Explore sharding, conteinerização, concatenação de embeddings, e mais - -#### Adicionando Paralelismo e Sharding - -```python -from jina.flow import Flow - -f = (Flow().add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) -``` - -#### [Distribuindo Flow](https://docs.jina.ai/chapters/remote/index.html) - -```python -from jina.flow import Flow - -f = Flow().add(uses='encoder.yml', host='192.168.0.99') -``` - -#### [Usando um Container Docker](https://docs.jina.ai/chapters/hub/index.html) - -```python -from jina.flow import Flow - -f = (Flow().add(uses='jinahub/cnn-encode:0.1') - .add(uses='jinahub/faiss-index:0.2', host='192.168.0.99')) -``` - -#### Concatenando embeddings - -```python -from jina.flow import Flow - -f = (Flow().add(name='eb1', uses='BiTImageEncoder') - .add(name='eb2', uses='KerasImageEncoder', needs='gateway') - .needs(['eb1', 'eb2'], uses='_concat')) -``` - -#### [Permitindo Network Query](https://docs.jina.ai/chapters/restapi/index.html) - -```python -from jina.flow import Flow - -f = Flow(port_expose=45678, rest_api=True) - -with f: - f.block() -``` - -Ficou interessado(a)? Explore outras opções: - -```bash -jina hello-world --help -``` -
- -### Crie seu primeiro projeto com Jina - -```bash -pip install jina[devel] -jina hub new --type app -``` - -Você pode facilmente criar um projeto com Jina a partir de templates, basta um comando no terminal. Este comando acima cria um ponto de entrada (entrypoint) de Python, configurações YAML e um Dockerfile. Você pode começar a partir dali. - - -## Tutoriais - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: A primeira coisa para se aprender sobre Jina

-  English • - 日本語 • - Français • - Português • - Deutsch • - Русский язык • - 中文 • - عربية -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
NívelTutoriais

🐣

-

Construir um sistema de busca semântica com PLN (NLP)

-Procure scripts de South Park e pratique com Flows e Pods -

🐣

-

Meu primeiro App com Jina

-Use cookiecutter para iniciar um app com Jina -

🐣

-

Fashion Search com Query Language (Linguagem de consulta)

-Torne o Hello-World mais interessante com uma Query Language -

🕊

-

Use Chunk para procurar letras de músicas

-Divida documentos para procurar em um nível detalhado -

🕊

-

Misture e combine imagens e legendas

-Procure cross modal para conseguir imagens a partir de suas legendas e vice-e-versa -

🚀

-

Aumente a intensidade da busca semântica de vídeo

-Melhore a performance usando prefetching e sharding -
- -## Documentação - - - - - -A melhor maneira de aprender Jina de forma aprofundada é lendo nossa documentação. A documentação é construída em cima de cada push, merge, e release na branch master. - -#### O básico - -- [Use Flow API para compor o seu Workflow de busca](https://docs.jina.ai/chapters/flow/index.html) -- [Funções de Entrada e Saída em Jina](https://docs.jina.ai/chapters/io/index.html) -- [Use Dashboard para conseguir Insight do Workflow de Jina](https://github.com/jina-ai/dashboard) -- [Distribua seu Workflow remotamente](https://docs.jina.ai/chapters/remote/index.html) -- [Rode Jina Pods via Conteiner Docker](https://docs.jina.ai/chapters/hub/index.html) - -#### Referência - -- [Argumentos de interface da linha de comando](https://docs.jina.ai/chapters/cli/index.html) -- [Interface Python API](https://docs.jina.ai/api/jina.html) -- [YAML sintaxe para Executor, Driver e Flow](https://docs.jina.ai/chapters/yaml/yaml.html) -- [Protobuf schema](https://docs.jina.ai/chapters/proto/index.html) -- [Variáveis de ambiente](https://docs.jina.ai/chapters/envs.html) -- ... [e mais](https://docs.jina.ai/index.html) - -Você é um(a) "Doc"-star? Junte-se a nós! Todo tipo de ajuda na documentação é bem-vindo. - -[Documentação para versões antigas está arquivada aqui](https://github.com/jina-ai/docs/releases). - -## Contribuindo - -Todo tipo de contribuição da comunidade open-source é bem-vindo, individuais e parceiros. Nós devemos nosso sucesso à sua participação ativa. - -- [Orientações para contribuição](CONTRIBUTING.md) -- [Ciclos de Release e estágios de desenvolvimento](RELEASE.md) - -### Contribuidores ✨ - - -[![All Contributors](https://img.shields.io/badge/all_contributors-66-orange.svg?style=flat-square)](#contributors-) - - - - - - - - - - - - - - - - - - - - -## Comunidade - -- [Slack workspace](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - junte-se ao #general no nosso Slack para conhecer o time e fazer perguntas -- [Canal no YouTube](https://youtube.com/c/jina-ai) - inscreva-se para receber nossos tutoriais mais recentes, demos de release, webinários e apresentações -- [LinkedIn](https://www.linkedin.com/company/jinaai/) - conheça Jina AI como uma empresa e encontre oportunidades de emprego -- [![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - siga e interaja conosco usando a hashtag `#JinaSearch` -- [Empresa](https://jina.ai) - aprenda mais sobre nossa empresa e como somos totalmente comprometidos com open-source. - -## Governança Aberta - -[Marcos/milestones GitHub](https://github.com/jina-ai/jina/milestones) planeje o caminho para futuras melhoras de Jina. - -Como parte do nosso modelo de governança aberta, nós hosteamos [Engineering All Hands]((https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/)) de Jina publicamente. Essa reunião no Zoom ocorre mensalmente na segunda terça-feira de cada mês, às 14:00-15:30 (CET). Qualquer um pode se juntar por meio do convite de calendário a seguir. - -- [Adicionar ao Google Calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL) -- [Download .ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics) - -Será feita uma live-stream da reunião, que depois será publicada em nosso [Canal do YouTube](https://youtube.com/c/jina-ai). - -## Junte-se a nós - -Jina é um projeto open-source. [Estamos contratando](https://jobs.jina.ai) desenvolvedores full-stack, evangelists, e PMs para construir o próximo ecossistema de busca neural em open source. - - -## Licença - -Copyright (c) 2020 Jina AI Limited. All rights reserved. - -Jina is licensed under the Apache License, Version 2.0. [See LICENSE for the full license text.](LICENSE) diff --git a/.github/i18n/README.ru.md b/.github/i18n/README.ru.md deleted file mode 100644 index 1b58c806298aa..0000000000000 --- a/.github/i18n/README.ru.md +++ /dev/null @@ -1,389 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai) -[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai) - - tweet button - -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]() -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22) -

- -

- English • - Français • - Deutsch • - 中文 • - 日本語 • - 한국어 • - Português • - український • - Español -

- -

- Website • - Docs • - Examples • - Newsletter • - - Dashboard (beta) • - Twitter • - We are Hiring - -

- -Хотите построить поисковую систему, подкрепленную глубоким изучением? Вы пришли в нужное место! - -Jina - это облачная нейронная поисковая система, основанная на современном ИИ и глубоком обучении. Долгосрочную поддержку ей оказывает команда, работающая на полную ставку. - -🌌**Универсальное поисковое решение** - Jina позволяет создавать крупномасштабные индексы и запросы любого типа на различных платформах и архитектурах. Ищите ли вы изображения, видеоклипы, аудиофрагменты, длинные юридические документы, короткие твиты - Jina справится со всеми этими задачами. - -🚀**Высокая производительность и state of the art подходы** - Jina нацелена на промышленное применение ИИ. Вы можете легко масштабировать ваш VideoBERT, Xception, ваш токенизатор слов, сегментацию изображений и базу данных для обработки данных масштаба миллиардов объектов. Такие функции, как репликация и шардирование, работают из коробки. - -🐣**Системный инжиниринг стал простым** - Jina предлагает универсальное решение, которое освобождает вас от ручной работы и сборки пакетов, библиотек и баз данных. С самым интуитивным API и [дашбордом](https://github.com/jina-ai/dashboard) построить облачную поисковую систему - занимает всего лишь минуту. - -Jina - проект с открытым исходным кодом. [Мы нанимаем](https://jobs.jina.ai) ИИ Инженеров, full-stack разработчиков, евангелистов, менеджеров проектов для построения новой нейронной поисковой эко-системы с открытым исходным кодом - -## Содержание - - - - - - -- [Установить](#%D0%A3%D1%81%D1%82%D0%B0%D0%BD%D0%BE%D0%B2%D0%B8%D1%82%D1%8C) -- [Jina "Hello world!" 👋🌍](#jina-hello-world-) -- [Начало работы](#%D0%9D%D0%B0%D1%87%D0%B0%D0%BB%D0%BE-%D1%80%D0%B0%D0%B1%D0%BE%D1%82%D1%8B) -- [Документация](#%D0%94%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B0%D1%86%D0%B8%D1%8F) -- [Вклад](#%D0%92%D0%BA%D0%BB%D0%B0%D0%B4) -- [Сообщество](#%D0%A1%D0%BE%D0%BE%D0%B1%D1%89%D0%B5%D1%81%D1%82%D0%B2%D0%BE) -- [Дорожная карта"](#%D0%94%D0%BE%D1%80%D0%BE%D0%B6%D0%BD%D0%B0%D1%8F-%D0%BA%D0%B0%D1%80%D1%82%D0%B0) -- [Лицензия](#%D0%9B%D0%B8%D1%86%D0%B5%D0%BD%D0%B7%D0%B8%D1%8F) - - - -## Установить - -#### Установка из PyPi - -В Linux/MacOS с установленным Python >= 3.7 просто запустите эту команду в терминале: - -```bash -pip install jina -``` - -Чтобы установить Jina с дополнительными зависимостями, или установить его на Raspberry Pi [пожалуйста, ознакомьтесь с документацией](https://docs.jina.ai). - -#### ...или запуск из Docker контейнера.. - -Мы предоставляем универсальный образ Docker (всего 80MB!), который поддерживает несколько архитектур (включая x64, x86, arm-64/v7/v6), просто выполните эту команду: - -```bash -docker run jinaai/jina -``` - -## Jina "Hello world!" 👋🌍 - -Для начала советуем вам попробовать Jina "Hello World" - простую демо-версию нейросетевого поиска изображений для [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Никаких дополнительных зависимостей не нужно, просто запустите следующую команду: - -```bash -jina hello-world -``` - -... или даже проще для пользователей Docker, **установка не требуется**: - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux -``` - -
-Click here to see the console output - -

- hello world console output -

- -
- -Docker образ загружает данные тренинга и тестов Fashion-MNIST и запускает Jina так, чтобы она индексировала 60 000 изображений из учебного набора. Затем Jina случайным образом берет образцы с тестового набора в качестве запросов и получает соответствующие результаты. Примерно через 1 минуту откроется веб-страница с результатами: - -

- Jina banner -

- -За всем этим стоит довлольно простая реализация: - - - - - - - - - - - - -
Python API index.yml Flow in Dashboard
- -```python -from jina.flow import Flow - -f = Flow.load_config('index.yml') - -with f: - f.index(inputs) -``` - - - - -```yaml -!Flow -pods: - chunk_seg: - uses: helloworld.crafter.yml - replicas: $REPLICAS - read_only: true - doc_idx: - uses: helloworld.indexer.doc.yml - encode: - uses: helloworld.encoder.yml - needs: chunk_seg - replicas: $REPLICAS - chunk_idx: - uses: helloworld.indexer.chunk.yml - replicas: $SHARDS - join_all: - uses: _pass - needs: [doc_idx, chunk_idx] - read_only: true -``` - - - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -Все ключевые слова, которые вы можете назвать: компьютерное зрение, нейросетевой поиск информации, микро-сервис, очередь сообщений, гибкость, репликации и шардирование заработают всего за одну минуту! - -Заинтригованы? Попробуйте разные варианты: - -```bash -jina hello-world --help -``` - -[Убедитесь в том, что вы продолжаете пользоваться нашим гидом Jina 101](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - понимание всех ключевых понятий Jina за 3 минуты! - -## Начало работы - -### Начать проект с шаблона. - -```bash -pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina -``` - -### Учебники - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: First Thing to Learn About Jina

-  English • - 日本語 • - français • - Português • - Русский язык • - 中文 -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TutorialsLevel
-

Use Flow API to Compose Your Search Workflow

-Learn how to orchestrate Pods to work together: sequentially and in parallel; locally and remotely -

🐣

-

Input and Output Functions in Jina

-Learn how the input and output functions work in Jina -

🐣

-

Use Dashboard to Get Insight of Jina Workflow

-Learn to use dashboard to monitor and get insight of a running workflow -

🐣

-

From BERT-as-Service to X-as-Service

-Learn how to use Jina to extract feature vector using any deep learning representation -

🐣

-

Build a NLP Semantic Search System

-Learn how to build a script search system for South Park and practice your knowledge on Flows and Pods -

🐣

-

Build a Flower Image Search System

-Learn how to build an image search system and define you own executors and run them in docker -

🐣

-

Video Semantic Search in Scale with Prefetching and Sharding

-Learn how to increase the performance by using prefetching and sharding -

🕊

-

Revisit "Hello, World!" in a Client-Server Architecture

-Learn how to run a Flow remotely and use a local client to connect to it -

🕊

-

Distribute Your Workflow Remotely

-Learn to run Jina on remote instances and distribute your workflow -

🕊

-

Extend Jina by Implementing Your Own Executor

-Learn how to implement your own ideas into Jina's plugin -

🕊

-

Run Jina Pod via Docker Container

-Learn how Jina solves complex dependencies easily with Docker container -

🕊

-

Google's Big Transfer Model in (Poké-)Production

-Learn to how to use SOTA visual representation for searching Pokémon! -

🚀

- - -## Документация - - - - - -Лучший способ глубокого изучения Jina - это прочитать нашу документацию. Документация строится на каждом нажатии, слиянии и выпуске главного ветви. Вы можете найти более подробную информацию по следующим темам в нашей документации. - -- [Объяснение аргументов интерфейса командной строки Jina](https://docs.jina.ai/chapters/cli/index.html) -- [Jina Python API интерфейс](https://docs.jina.ai/api/jina.html) -- [Синтаксис Jina YAML для Executor, Driver и Flow](https://docs.jina.ai/chapters/yaml/yaml.html) -- [схема Jina Protobuf](https://docs.jina.ai/chapters/proto/index.html) -- [Переменные окружения, используемые в Jina](https://docs.jina.ai/chapters/envs.html) -- ..[и более того](https://docs.jina.ai/index.html) - -Ты "Док"-звезда? Согласен? Присоединяйтесь к нам! Мы приветствуем всевозможные улучшения в документации - -[Документация для старых версий архивируется здесь](https://github.com/jina-ai/docs/releases). - -## Вклад - -Мы приветствуем все виды вклада со стороны сообщества с открытым исходным кодом, отдельных лиц и партнеров. Без вашего активного участия Jina не будет успешной. - -Следующие ресурсы помогут вам сделать хороший первый вклад: - -- [Руководство по оказанию содействия](CONTRIBUTING.md) -- [Циклы выпуска и стадии разработки](RELEASE.md) - -## Сообщество - -- [канал Slack](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - коммуникационная платформа для разработчиков для обсуждения Jina -- [Открытая рассылка](mailto:newsletter+subscribe@jina.ai) - подписаться на последние обновления, релизы и новости о событиях Jina -- [Ссылка на сайт](https://www.linkedin.com/company/jinaai/) - познакомиться с Jina ИИ как с компанией и найти работу -- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - следовать за нами и взаимодействовать с нами с помощью хэштэга`#JinaSearch` -- [Компания](https://jina.ai) - Узнайте больше о нашей компании, мы полностью привержены открытому исходному коду! - -## Дорожная карта" - -[вехи GitHub](https://github.com/jina-ai/jina/milestones) проложить путь к будущим улучшениям. - -Мы ищем партнерства для построения модели открытого управления (например, Технического руководящего комитета) вокруг Jina, которая позволит создать здоровую экосистему с открытым исходным кодом и культуру, дружелюбную к разработчикам. Если вы заинтересованы в участии, не стесняйтесь обращаться к нам по адресу[hello@jina.ai](mailto:hello@jina.ai). - -## Лицензия - -Авторское право (с) 2020 г. "Джина АИ Лимитед". Все права защищены. - -Jina лицензирована по лицензии Apache, версия 2.0[Полный текст лицензии см. в ЛИЦЕНЗИИ.](LICENSE) diff --git a/.github/i18n/README.uk.md b/.github/i18n/README.uk.md deleted file mode 100644 index b6e5acb42f463..0000000000000 --- a/.github/i18n/README.uk.md +++ /dev/null @@ -1,424 +0,0 @@ -

- Банер Jina -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina має ліцензію Apache-2.0")](#license) -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina підтримує Python 3.7 та вище")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)](https://pypi.org/project/jina/) -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina є мультиархітектурною та може працювати на пристроях з різною архітектурою")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://api.jina.ai/) -[![codecov](https://codecov.io/gh/jina-ai/jina/branch/master/graph/badge.svg)](https://codecov.io/gh/jina-ai/jina) - -

- -

- English • - Français • - Deutsch • - 中文 • - 日本語 • - 한국어 • - Português • - Русский язык • - Español -

- - -

- Сайт • - Документація • - Зразки • - Hub (beta) • - Dashboard (beta) • - Jinabox (beta) • - Twitter • - Ми наймаємо - -

- -Jina - це пошукова система на основі ШІ, яка надає розробникам можливість створювати **крос-/мульти-модальні пошукові системи** (напр. текст, зображення, відео, аудіо) у хмарі. Jina має довгострокову підтримку [командою, яка працює full-time та має венчурну підтримку](https://jina.ai). - -⏱️ **Економія часу** - Завантажте систему з ШІ лише за кілька хвилин. - -🧠 **Взірцеві моделі ШІ** - Jina являє собою новий шаблон проєктування для нейронних пошукових систем з блискучою підтримкою [найсучасніших моделей ШІ](https://docs.jina.ai/chapters/all_exec.html). - -🌌 **Універсальний пошук** - Широкомасштабне індексування та запити даних будь-якого типу на багатьох платформах. Відео, зображення, об'ємні/короткі тести, музика, вихідний код, та більше. - -🚀 **Готове до використання** - Cloud-native можливості працюють одразу "з коробки", напр. контейнеризація, мікросервіси, розповсюдження, масштабування, sharding, асинхронні IO, REST, gRPC. - -🧩 **Підключіть та грайте** - Разом з [Jina Hub](https://github.com/jina-ai/jina-hub), з легкістю розширюйте Jina за допомогою Python-скриптів або образів Docker, оптимізованих для ваших сфер пошуку. - -## Зміст - - - - - - -- [Розпочнімо](#%D0%A0%D0%BE%D0%B7%D0%BF%D0%BE%D1%87%D0%BD%D1%96%D0%BC%D0%BE) -- [Jina "Привіт, світе!" 👋🌍](#jina-%D0%9F%D1%80%D0%B8%D0%B2%D1%96%D1%82-%D1%81%D0%B2%D1%96%D1%82%D0%B5-) -- [Туторіали](#%D0%A2%D1%83%D1%82%D0%BE%D1%80%D1%96%D0%B0%D0%BB%D0%B8) -- [Документація](#%D0%94%D0%BE%D0%BA%D1%83%D0%BC%D0%B5%D0%BD%D1%82%D0%B0%D1%86%D1%96%D1%8F) -- [Допомога проєкту](#%D0%94%D0%BE%D0%BF%D0%BE%D0%BC%D0%BE%D0%B3%D0%B0-%D0%BF%D1%80%D0%BE%D1%94%D0%BA%D1%82%D1%83) -- [Спільнота](#%D0%A1%D0%BF%D1%96%D0%BB%D1%8C%D0%BD%D0%BE%D1%82%D0%B0) -- [Відкрите управління](#%D0%92%D1%96%D0%B4%D0%BA%D1%80%D0%B8%D1%82%D0%B5-%D1%83%D0%BF%D1%80%D0%B0%D0%B2%D0%BB%D1%96%D0%BD%D0%BD%D1%8F) -- [Приєднуйтесь](#%D0%9F%D1%80%D0%B8%D1%94%D0%B4%D0%BD%D1%83%D0%B9%D1%82%D0%B5%D1%81%D1%8C) -- [Ліцензія](#%D0%9B%D1%96%D1%86%D0%B5%D0%BD%D0%B7%D1%96%D1%8F) - - - -## Розпочнімо - -### З PyPi - -На пристроях Linux/MacOS з Python >= 3.7: - -```bash -pip install jina -``` - -Для того, щоб встановити разом з Jina додаткові залежності або щоб встановити на Raspberry Pi, [будь-ласка, зверніть увагу на документацію](https://docs.jina.ai). - -### У Docker-контейнері - -Ми пропонуємо універсальний образ Docker, який підтримує різноманітні архітектури (Включаючи x64, x86, arm-64/v7/v6). Просто запустіть: - -```bash -docker run jinaai/jina --help -``` - -## Jina "Привіт, світе!" 👋🌍 - -Як новачок, ви можете спробувати наш "Привіт, світе" - просте демо нейропошуку по зображеннях для [Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/). Жодних додаткових залежностей, просто запустіть: - -```bash -jina hello-world -``` - -...або для користувачів Docker навіть ще простіше, **не потребуючи встановлення**: - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # замініть "open" на "xdg-open" на Linux -``` - -
-Натисніть тут, щоб побачити вивід консолі - -

- привіт світе вивід консолі -

- -
- -Образ Docker завантажує навчально-тестовий набір даних Fashion-MNIST та каже Jina проіндексувати 60,000 зображень із навчального набору. Тоді він випадковим чином обирає зображення з тестового набору як запити та просить Jina отримати відповідні результати. Весь процес займає близько 1 хвилини, і в підсумку відкривається вебсторінка на якій відображаються такі результати: - -

- банер Jina -

- -Реалізація цього є досить простою: - - - - - - - - - - - - -
Python API або використовуючи YAML spec або використовуючи Dashboard
- - -```python -from jina.flow import Flow - -f = (Flow() - .add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) - -with f: - f.index(fashion_mnist, batch_size=1024) -``` - - - -```yaml -!Flow -pods: - encode: - uses: encoder.yml - parallel: 2 - index: - uses: indexer.yml - shards: 2 -``` - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -
-Дослідіть sharding, контейнеризацію, об'єднування вкладень, та більше - -#### Додавання паралелізму та Sharding - -```python -from jina.flow import Flow - -f = (Flow().add(uses='encoder.yml', parallel=2) - .add(uses='indexer.yml', shards=2)) -``` - -#### [Розподіл потоку](https://docs.jina.ai/chapters/remote/index.html) - -```python -from jina.flow import Flow - -f = Flow().add(uses='encoder.yml', host='192.168.0.99') -``` - -#### [Використання Docker-контейнера](https://docs.jina.ai/chapters/hub/index.html) - -```python -from jina.flow import Flow - -f = (Flow().add(uses='jinahub/cnn-encode:0.1') - .add(uses='jinahub/faiss-index:0.2', host='192.168.0.99')) -``` - -#### Об'єднання вкладень - -```python -from jina.flow import Flow - -f = (Flow().add(name='eb1', uses='BiTImageEncoder') - .add(name='eb2', uses='KerasImageEncoder', needs='gateway') - .needs(['eb1', 'eb2'], uses='_concat')) -``` - -#### [Увімкнення мережевих запитів](https://docs.jina.ai/chapters/restapi/index.html) - -```python -from jina.flow import Flow - -f = Flow(port_expose=45678, rest_api=True) - -with f: - f.block() -``` - -Заінтриговані? Зіграйте з різними варіантами: - -```bash -jina hello-world --help -``` -
- -### Створіть свій перший Jina-проєкт - -```bash -pip install jina[devel] -jina hub new --type app -``` - -Ви можете легко створити Jina-проєкт із шаблонів використовуючи лише одну команду терміналу. Це створює точку входу Python, конфігурації YAML та файл Docker. Ви можете почати звідти. - - -## Туторіали - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: Перше, що варто вивчити про Jina

-  English • - 日本語 • - Français • - Português • - Deutsch • - Русский язык • - 中文 • - عربية -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
РівеньТуторіали

🐣

-

Створення семантичної пошуковиої системи з обробки природньої мови

-Пошук сценаріїв "South Park" та тренування з Flows та Pods -

🐣

-

Мій перший Jina-застосунок

-Використання cookiecutter для покрокової збірки jina-застосунку -

🐣

-

Витончений пошук з мовою запитів

-Приправте Hello-World з мовою запитів -

🕊

-

Використання Chunk для пошуку текстів пісень

-Розділіть документи для пошуку на роздрібненому рівні -

🕊

-

Змішування та поєднання зображень та підписів

-Шукайте крос-модально, щоб отримувати зображення із субтитрів та навпаки -

🚀

-

Масштабування семантичного пошуку по відео

-Покращіть продуктивність використовуючи попереднє отримання (prefetching) та sharding -
- -## Документація - - - - - -Найкращий спосіб поглиблено вивчити Jina - прочитати нашу документацію. Вона написана на основі кожного push, merge, та release головної гілки. - -#### Основи - -- [Використання Flow API для компонування пошукових процесів](https://docs.jina.ai/chapters/flow/index.html) -- [Функції введення та виведення у Jina](https://docs.jina.ai/chapters/io/index.html) -- [Використання Dashboard, щоб отримання статистики робочих процесів Jina](https://github.com/jina-ai/dashboard) -- [Віддалений розподіл робочих процесів](https://docs.jina.ai/chapters/remote/index.html) -- [Запуск Jina Pods з допомогою Docker-контейнера](https://docs.jina.ai/chapters/hub/index.html) - -#### Посилання - -- [Аргументи інтерфейсу командного рядка](https://docs.jina.ai/chapters/cli/index.html) -- [Інтерфейс Python API](https://docs.jina.ai/api/jina.html) -- [Синтаксис YAML для Executor, Driver та Flow](https://docs.jina.ai/chapters/yaml/yaml.html) -- [Схеми Protobuf](https://docs.jina.ai/chapters/proto/index.html) -- [Змінні середовища](https://docs.jina.ai/chapters/envs.html) -- ... [та більше](https://docs.jina.ai/index.html) - -Ви "Док"-зірка? Приєднуйтесь! Ми вітаємо будь-які покращення документації. - -[Документації для попередніх версій зберігаються тут](https://github.com/jina-ai/docs/releases). - -## Допомога проєкту - -Ми вітаємо буль-які внески від учасників open-source спільноти, окремих осіб та партнерів. Своїм успіхом ми завдячуємо вашій активній участі. - -- [Правила допомоги](CONTRIBUTING.md) -- [Цикли випуску та стадії розробки](RELEASE.md) - -### Учасники проєкту ✨ - - -[![All Contributors](https://img.shields.io/badge/all_contributors-71-orange.svg?style=flat-square)](#contributors-) - - - - - - - - - - - - - - - - - - - - - -## Спільнота - -- [Slack workspace](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - долучіться до #general на нашому Slack, щоб зустрітися з командою та задати питання -- [YouTube канал](https://youtube.com/c/jina-ai) - підпишіться заради найновіших відео-туторіалів, демо нових випусків, вебінарів та презентацій. -- [LinkedIn](https://www.linkedin.com/company/jinaai/) - познайомтесь з Jina AI як компанією та знайдіть можливості для працевлаштування -- [![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social)](https://twitter.com/JinaAI_) - слідкуйте та взаємодійте з нами використовуючи хештег `#JinaSearch` -- [Компанія](https://jina.ai) - дізнайтесь більше про нашу компанію та як ми повністю віддані open-source. - -## Відкрите управління - -[GitHub milestones](https://github.com/jina-ai/jina/milestones) викладають шлях майбутніх вдосконалень Jina. - -В рамках нашої відкритої моделі управління, ми ведемо Jina [Engineering All Hands]((https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/)) публічно. Ці Zoom-зустрічі відбуваються щомісячно у другий вівторок кожного місяця, о 14:00-15:30 (CET). Кожен може приєднатися через наступне запрошення календаря. - -- [Додати до Google Calendar](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL) -- [Завантажити .ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics) - -Зустріч також буде транслюватися наживо та пізніше буде опублікована на нашому [YouTube каналі](https://youtube.com/c/jina-ai). - -## Приєднуйтесь - -Jina - це проєкт з відкритим вихідним кодом. [Ми наймаємо](https://jobs.jina.ai) full-stack розробників, євангелістів та PM-ів для побудови майбутньої екосистеми з нейропошуку з відкритим вихідним кодом. - - -## Ліцензія - -Copyright (c) 2020 Jina AI Limited. All rights reserved. - -Jina is licensed under the Apache License, Version 2.0. [Повний текст ліцензії розміщено у файлі LICENSE.](LICENSE) diff --git a/.github/i18n/README.zh.md b/.github/i18n/README.zh.md deleted file mode 100644 index b315ebabe3177..0000000000000 --- a/.github/i18n/README.zh.md +++ /dev/null @@ -1,392 +0,0 @@ -

- Jina banner -

- -

- -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-badge.svg?raw=true "We fully commit to open-source")](https://jina.ai) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-hello-world-badge.svg?raw=true "Run Jina 'Hello, World!' without installing anything")](#jina-hello-world-) -[![Jina](https://github.com/jina-ai/jina/blob/master/.github/badges/license-badge.svg?raw=true "Jina is licensed under Apache-2.0")](#license) -[![Jina Docs](https://github.com/jina-ai/jina/blob/master/.github/badges/docs-badge.svg?raw=true "Checkout our docs and learn Jina")](https://docs.jina.ai) -[![We are hiring](https://github.com/jina-ai/jina/blob/master/.github/badges/jina-corp-badge-hiring.svg?raw=true "We are hiring full-time position at Jina")](https://jobs.jina.ai) - - tweet button - -[![Python 3.7 3.8](https://github.com/jina-ai/jina/blob/master/.github/badges/python-badge.svg?raw=true "Jina supports Python 3.7 and above")](https://pypi.org/project/jina/) -[![PyPI](https://img.shields.io/pypi/v/jina?color=%23099cec&label=PyPI%20package&logo=pypi&logoColor=white)]() -[![Docker](https://github.com/jina-ai/jina/blob/master/.github/badges/docker-badge.svg?raw=true "Jina is multi-arch ready, can run on different architectures")](https://hub.docker.com/r/jinaai/jina/tags) -[![Docker Image Version (latest semver)](https://img.shields.io/docker/v/jinaai/jina?color=%23099cec&label=Docker%20Image&logo=docker&logoColor=white&sort=semver)](https://hub.docker.com/r/jinaai/jina/tags) -[![CI](https://github.com/jina-ai/jina/workflows/CI/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3ACI) -[![CD](https://github.com/jina-ai/jina/workflows/CD/badge.svg?branch=master)](https://github.com/jina-ai/jina/actions?query=workflow%3ACD) -[![Release Cycle](https://github.com/jina-ai/jina/workflows/Release%20Cycle/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+Cycle%22) -[![Release CD](https://github.com/jina-ai/jina/workflows/Release%20CD/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22Release+CD%22) -[![API Schema](https://github.com/jina-ai/jina/workflows/API%20Schema/badge.svg)](https://github.com/jina-ai/jina/actions?query=workflow%3A%22API+Schema%22) -

- -

- English • - Français • - Deutsch • - 日本語 • - 한국어 • - Português • - Русский язык • - український • - Español -

- -

- Website • - Docs • - Examples • - Newsletter • - - Dashboard (beta) • - Twitter • - We are Hiring - -

- -想建立一个以深度学习为支撑的搜索系统?你来对地方了! - -Jina是由最先进的AI和深度学习驱动的云端神经搜索框架。它得到了一个全职的、由风险投资支持的团队的长期支持。 - -🌌**通用搜索解决方案** - Jina可以在多个平台和架构上实现任何类型的大规模索引和查询。无论您是搜索图片、视频片段、音频片段、长的法律文档、短的推文,Jina都能处理。 - -🚀**高性能和最先进的设备** - Jina的目标是AI在生产过程中。你可以轻松地扩展出你的VideoBERT、Xception、你的单词令牌器、图像分割器和数据库来处理亿级数据。复制品和碎片等功能都是现成的。 - -🐣**简单的系统工程** - Jina提供了一个一站式的解决方案,让你从手工制作和粘合包、库和数据库中解脱出来。通过最直观的API和[仪表板](https://github.com/jina-ai/dashboard)构建一个云端搜索系统只是分分钟的事情。 - - - -Jina是一个开源项目。[我们正在招聘](https://jobs.jina.ai)AI工程师、全栈开发者、传道者、PM,打造下一个开源的神经搜索生态。 - -## 目录 - - - - - - -- [安装](#%E5%AE%89%E8%A3%85) -- [Jina "你好,世界!"👋🌍](#jina-%E4%BD%A0%E5%A5%BD%E4%B8%96%E7%95%8C) -- [开始](#%E5%BC%80%E5%A7%8B) -- [文档](#%E6%96%87%E6%A1%A3) -- [贡献者](#%E8%B4%A1%E7%8C%AE%E8%80%85) -- [社区](#%E7%A4%BE%E5%8C%BA) -- [路线图](#%E8%B7%AF%E7%BA%BF%E5%9B%BE) -- [许可证](#%E8%AE%B8%E5%8F%AF%E8%AF%81) - - - -## 安装 - -#### 从PyPi安装 - -在安装了 Python >= 3.7 的 Linux/MacOS 上,只需在终端上运行此命令即可。 - -```bash -pip install jina -``` - -要在Raspberry Pi上安装Jina,或者在Raspberry Pi上安装Jina,需要额外的依赖关系。[请参考文件](https://docs.jina.ai) - -#### ...或使用Docker容器运行 - -我们提供了一个通用的Docker图像(只有80MB!),支持多种架构(包括x64、x86、x86、arm-64/v7/v6),只需做到这一点。 - -```bash -docker run jinaai/jina -``` - -## Jina "你好,世界!"👋🌍 - -作为入门者,我们邀请你尝试一下Jina的 "Hello,世界"--简单的图像神经搜索的演示[Fashion-MNIST](https://hanxiao.io/2018/09/28/Fashion-MNIST-Year-In-Review/)。不需要额外的依赖关系,只需做。 - -```bash -jina hello-world -``` - -....甚至对Docker用户来说更容易。**不需要任何安装**。简单地。 - -```bash -docker run -v "$(pwd)/j:/j" jinaai/jina hello-world --workdir /j && open j/hello-world.html # replace "open" with "xdg-open" on Linux -``` - -
-Click here to see the console output - -

- hello world console output -

- -
- -它下载Fashi-MNIST训练和测试数据;告诉Jina从训练集中索引6万张图像。然后,它从测试集中随机抽取图像作为查询,要求Jina检索相关结果。大约1分钟后,它将打开一个网页,并显示出这样的结果。 - -

- Jina banner -

- -那背后的实施呢?就像它应该是很简单的。 - - - - - - - - - - - - -
Python API index.yml Flow in Dashboard
- -```python -from jina.flow import Flow - -f = Flow.load_config('index.yml') - -with f: - f.index(inputs) -``` - - - - -```yaml -!Flow -pods: - chunk_seg: - uses: helloworld.crafter.yml - replicas: $REPLICAS - read_only: true - doc_idx: - uses: helloworld.indexer.doc.yml - encode: - uses: helloworld.encoder.yml - needs: chunk_seg - replicas: $REPLICAS - chunk_idx: - uses: helloworld.indexer.chunk.yml - replicas: $SHARDS - join_all: - uses: _pass - needs: [doc_idx, chunk_idx] - read_only: true -``` - - - - - -![Flow in Dashboard](https://github.com/jina-ai/jina/blob/master/docs/chapters/helloworld/hello-world-flow.png?raw=true) - -
- -所有你能说出名字的大词:计算机视觉、神经IR、微服务、消息队列、弹性、复制&碎片,都是在短短一分钟内发生的! - -有兴趣吗?玩,并尝试不同的选择。 - -```bash -jina hello-world --help -``` - -[请务必继续关注我们的《Jina101指南》。](https://github.com/jina-ai/jina#jina-101-first-thing-to-learn-about-jina) - 在3分钟内理解Jina的所有关键概念! - -## 开始 - -### 从模板开始一个项目 - -```bash -pip install cookiecutter && cookiecutter gh:jina-ai/cookiecutter-jina -``` - -### 教程 - - - - - - - -
- - Jina 101 Concept Illustration Book, Copyright by Jina AI Limited - - -  

Jina 101: First Thing to Learn About Jina

-  English • - 日本語 • - français • - Português • - Русский язык • - 中文 -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
TutorialsLevel
-

Use Flow API to Compose Your Search Workflow

-Learn how to orchestrate Pods to work together: sequentially and in parallel; locally and remotely -

🐣

-

Input and Output Functions in Jina

-Learn how the input and output functions work in Jina -

🐣

-

Use Dashboard to Get Insight of Jina Workflow

-Learn to use dashboard to monitor and get insight of a running workflow -

🐣

-

From BERT-as-Service to X-as-Service

-Learn how to use Jina to extract feature vector using any deep learning representation -

🐣

-

Build a NLP Semantic Search System

-Learn how to build a script search system for South Park and practice your knowledge on Flows and Pods -

🐣

-

Build a Flower Image Search System

-Learn how to build an image search system and define you own executors and run them in docker -

🐣

-

Video Semantic Search in Scale with Prefetching and Sharding

-Learn how to increase the performance by using prefetching and sharding -

🕊

-

Revisit "Hello, World!" in a Client-Server Architecture

-Learn how to run a Flow remotely and use a local client to connect to it -

🕊

-

Distribute Your Workflow Remotely

-Learn to run Jina on remote instances and distribute your workflow -

🕊

-

Extend Jina by Implementing Your Own Executor

-Learn how to implement your own ideas into Jina's plugin -

🕊

-

Run Jina Pod via Docker Container

-Learn how Jina solves complex dependencies easily with Docker container -

🕊

-

Google's Big Transfer Model in (Poké-)Production

-Learn to how to use SOTA visual representation for searching Pokémon! -

🚀

- - -## 文档 - - - - - -要深入学习Jina,最好的方法就是阅读我们的文档。文档建立在主分支的每个推送、合并和发布事件上。你可以在我们的文档中找到关于以下主题的更多细节。 - -- [Jina命令行接口参数解释](https://docs.jina.ai/chapters/cli/index.html) -- [Jina Python API接口](https://docs.jina.ai/api/jina.html) -- [用于执行器、驱动和流程的Jina YAML语法](https://docs.jina.ai/chapters/yaml/yaml.html) -- [Jina Protobuf方案](https://docs.jina.ai/chapters/proto/index.html) -- [Jina中使用的环境变量](https://docs.jina.ai/chapters/envs.html) -- ..[以及更多](https://docs.jina.ai/index.html) - -你是"文档"明星吗?是的?加入我们吧!我们欢迎大家对文档进行各种改进。 - -[旧版本的文档在这里存档。](https://github.com/jina-ai/docs/releases) - -## 贡献者 - -我们欢迎来自开源社区、个人和合作伙伴的各种贡献。没有你的积极参与,Jina就不会成功。 - -下面的资源将帮助你做好第一份贡献。 - -- [缴款准则](CONTRIBUTING.md) -- [释放周期和发展阶段](RELEASE.md) - -## 社区 - -- [Slack频道](https://join.slack.com/t/jina-ai/shared_invite/zt-dkl7x8p0-rVCv~3Fdc3~Dpwx7T7XG8w) - 为开发者提供交流平台,探讨Jina -- [社区通讯](mailto:newsletter+subscribe@jina.ai) - 订阅金娜的最新更新、发布和活动消息,订阅金娜的最新动态、发布和活动消息。 -- [LinkedIn](https://www.linkedin.com/company/jinaai/) - 了解Jina AI公司并寻找工作机会 -- ![Twitter Follow](https://img.shields.io/twitter/follow/JinaAI_?label=Follow%20%40JinaAI_&style=social) - 关注我们,并使用tag标签与我们互动`#JinaSearch` -- [公司](https://jina.ai) - 了解更多关于我们公司的信息,我们完全致力于开源! - -## 路线图 - -[GitHub的里程碑](https://github.com/jina-ai/jina/milestones)为今后的改进工作铺设了道路。 - -我们正在寻找合作伙伴,围绕Jina建立一个开放治理模式(如技术指导委员会),以建立一个健康的开源生态系统和开发者友好的文化。如果您有兴趣参与,请随时联系我们,地址是[hello@jina.ai](mailto:hello@jina.ai)。 - -## 许可证 - -Copyright (c) 2020 Jina AI Limited.保留所有权利。 - -Jina是在Apache License 2.0版本下授权的。[许可证全文见LICENSE。](LICENSE) diff --git a/.github/images/ILLUS12.png b/.github/images/ILLUS12.png deleted file mode 100644 index ccdf0a69f46ab..0000000000000 Binary files a/.github/images/ILLUS12.png and /dev/null differ diff --git a/.github/images/basic-concept.png b/.github/images/basic-concept.png deleted file mode 100644 index 08b6c62bc9d62..0000000000000 Binary files a/.github/images/basic-concept.png and /dev/null differ diff --git a/.github/images/eah-god.png b/.github/images/eah-god.png deleted file mode 100644 index 9c555582202ab..0000000000000 Binary files a/.github/images/eah-god.png and /dev/null differ diff --git a/.github/images/multimodal-search.png b/.github/images/multimodal-search.png deleted file mode 100644 index 3efdacbad1e58..0000000000000 Binary files a/.github/images/multimodal-search.png and /dev/null differ diff --git a/.github/images/pycharm-schema.gif b/.github/images/pycharm-schema.gif deleted file mode 100644 index 5c5fe0d3a3462..0000000000000 Binary files a/.github/images/pycharm-schema.gif and /dev/null differ diff --git a/.github/images/rel_notes/rel_notes_1_0.png b/.github/images/rel_notes/rel_notes_1_0.png deleted file mode 100644 index f51fb835e9c05..0000000000000 Binary files a/.github/images/rel_notes/rel_notes_1_0.png and /dev/null differ diff --git a/.github/images/rel_notes/rel_notes_1_1.png b/.github/images/rel_notes/rel_notes_1_1.png deleted file mode 100644 index 017885f0308fd..0000000000000 Binary files a/.github/images/rel_notes/rel_notes_1_1.png and /dev/null differ diff --git a/.github/images/speedup.png b/.github/images/speedup.png deleted file mode 100644 index 388cd32a72b58..0000000000000 Binary files a/.github/images/speedup.png and /dev/null differ diff --git a/.github/images/vscode-schema.gif b/.github/images/vscode-schema.gif deleted file mode 100644 index 57c33ce9ca34c..0000000000000 Binary files a/.github/images/vscode-schema.gif and /dev/null differ diff --git a/.github/install.png b/.github/install.png deleted file mode 100644 index 7629e28948766..0000000000000 Binary files a/.github/install.png and /dev/null differ diff --git a/.github/jina-docs.png b/.github/jina-docs.png deleted file mode 100644 index 4bfb262b46505..0000000000000 Binary files a/.github/jina-docs.png and /dev/null differ diff --git a/.github/simple-flow0.svg b/.github/simple-flow0.svg deleted file mode 100644 index 5d6186f018737..0000000000000 --- a/.github/simple-flow0.svg +++ /dev/null @@ -1 +0,0 @@ -
PUSH-PULL
PUSH-PULL
gateway
pod0
gateway
\ No newline at end of file diff --git a/.github/simple-flow1.svg b/.github/simple-flow1.svg deleted file mode 100644 index 7f1a24f07897e..0000000000000 --- a/.github/simple-flow1.svg +++ /dev/null @@ -1 +0,0 @@ -
pod1 (2)
pod0 (2)
PUSH-PULL
PUSH-PULL
PUSH-PULL
pod1_0
pod1_1
pod0_0
pod0_1
gateway
gateway
\ No newline at end of file diff --git a/.github/simple-flow2.png b/.github/simple-flow2.png deleted file mode 100644 index 37e1be29a4291..0000000000000 Binary files a/.github/simple-flow2.png and /dev/null differ diff --git a/.github/simple-flow2.svg b/.github/simple-flow2.svg deleted file mode 100644 index 0ed3ca9ffc73f..0000000000000 --- a/.github/simple-flow2.svg +++ /dev/null @@ -1,1496 +0,0 @@ - - - - - - - - - - -
indexer (3) -
-
-
-
-
- - - - - -
audio_embed (3) -
-
-
-
-
- - - - - -
image_embed (3) -
-
-
-
-
- - - - - -
text_embed (3) -
-
-
-
-
-
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
- - - - -
- PUB-SUB - -
-
-
-
- - - - -
- PUB-SUB - -
-
-
-
- - - - -
- PUB-SUB - -
-
-
-
- - - - -
- PUB-SUB - -
-
-
-
- - - - -
- PUB-SUB - -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
- - - - -
- PUB-SUB - -
-
-
-
- - - - -
- PUB-SUB - -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
- - - - -
- - PUSH-PULL - -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
- - - - -
- - PUSH-PULL - -
-
-
-
- - - - -
- PUSH-PULL - -
-
-
-
-
- - - - - - -
indexer_0 -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
indexer_1 -
-
-
-
-
- - - - - -
indexer_2 -
-
-
-
-
- - - - - -
audio_embed_0 -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
audio_embed_1 -
-
-
-
-
- - - - - -
audio_embed_2 -
-
-
-
-
- - - - - -
image_embed_0 -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
image_embed_1 -
-
-
-
-
- - - - - -
image_embed_2 -
-
-
-
-
- - - - - -
text_embed_0 -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
- -
-
-
-
-
- - - - - -
text_embed_1 -
-
-
-
-
- - - - - -
text_embed_2 -
-
-
-
-
- - - - - -
gateway -
-
-
-
-
- - - - - -
preproc -
-
-
-
-
- - - - - -
check_embed -
-
-
-
-
- - - - - -
_aux_check_embed -
-
-
-
-
- - - - - -
multimodal -
-
-
-
-
- - - - - -
check_embed7 -
-
-
-
-
- - - - - -
_aux_check_embed8 -
-
-
-
-
- - - - - -
ranker -
-
-
-
-
- - - - - -
gather_inspect -
-
-
-
-
- - - - - -
gateway -
-
-
-
-
-
-
-
-
\ No newline at end of file diff --git a/.github/simple-plot3.svg b/.github/simple-plot3.svg deleted file mode 100644 index 6feac392d36d1..0000000000000 --- a/.github/simple-plot3.svg +++ /dev/null @@ -1 +0,0 @@ -
PUB-SUB
PUB-SUB
PUB-SUB
PUSH-PULL
PUSH-PULL
PUSH-PULL
PUSH-PULL
gateway
p1
p2
p3
r1
gateway
\ No newline at end of file diff --git a/.github/simple-plot4.svg b/.github/simple-plot4.svg deleted file mode 100644 index fa6a0a6a0a9cb..0000000000000 --- a/.github/simple-plot4.svg +++ /dev/null @@ -1 +0,0 @@ -
p3 (3)
PUB-SUB
PUB-SUB
PUSH-PULL
PUSH-PULL
PUSH-PULL
PUSH-PULL
p3_0
p3_1
p3_2
gateway
p1
p2
r1
gateway
\ No newline at end of file diff --git a/.github/workflows/autocommit.yml b/.github/workflows/autocommit.yml index 92c00315e77e0..a7117b29dc051 100644 --- a/.github/workflows/autocommit.yml +++ b/.github/workflows/autocommit.yml @@ -22,25 +22,6 @@ jobs: - uses: actions/setup-python@v2 with: python-version: 3.7 - - name: Install and run black --check . - run: | - pip install black - black -S jina/drivers/__init__.py - git add -u &> /dev/null || true - export JINA_GIT_COMMIT_MSG_OLD=$(git log --format=%B -n1) - git commit -m "chore(style): reformatted by jina-dev-bot" &> /dev/null || true - git status - - name: Run copyright generator - uses: hanxiao/copyright-action@master - with: - CopyrightString: '__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved."\n__license__ = "Apache-2.0"\n\n' - FileType: '.py' - Path: 'jina/' - IgnorePath: 'jina/drivers/querylang/queryset/' - - run: | - git add -u &> /dev/null || true - git commit -m "chore(copyright): update copyright header" &> /dev/null || true - git status - name: Checkout wall-of-honor uses: actions/checkout@v2 with: @@ -74,6 +55,15 @@ jobs: TOC_TITLE: '' TARGET_PATHS: 'CHANGELOG.md,CONTRIBUTING.md,RELEASE.md' CHECK_ONLY_DEFAULT_BRANCH: true + - name: Generate TOC + uses: technote-space/toc-generator@v3 + with: + MAX_HEADER_LEVEL: 4 + GITHUB_TOKEN: ${{ secrets.JINA_DEV_BOT }} + COMMIT_NAME: Jina Dev Bot + COMMIT_EMAIL: dev-bot@jina.ai + TOC_TITLE: 'Table of Contents' + TARGET_PATHS: '.github/2.0/cookbooks/*.md' - uses: ad-m/github-push-action@v0.6.0 with: github_token: ${{ secrets.JINA_DEV_BOT }} diff --git a/.github/workflows/cd.yml b/.github/workflows/cd.yml index 7371979923a35..57586431c7fcc 100644 --- a/.github/workflows/cd.yml +++ b/.github/workflows/cd.yml @@ -21,7 +21,7 @@ jobs: steps: - uses: actions/checkout@v2 with: - submodules: true +# submodules: true fetch-depth: 100 - run: | truncate --size=24KB README.md > README-trunc.md @@ -80,7 +80,7 @@ jobs: git config --local user.email "dev-bot@jina.ai" git config --local user.name "Jina Dev Bot" git add . && git commit -m "update ${{env.JINA_VERSION}} due to ${{github.event_name}} on ${{github.repository}}" && git push -# - name: Pre-release +# - name: Pre-release (.devN) # run: | # git fetch --depth=1 origin +refs/tags/*:refs/tags/* # pip install twine wheel @@ -89,7 +89,7 @@ jobs: # TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }} # TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }} # JINA_SLACK_WEBHOOK: ${{ secrets.JINA_SLACK_WEBHOOK }} -# # Tell docs repo jina is going to release + # Tell docs repo jina is going to release # - name: send-repository-dispatch-event-to-docs # uses: peter-evans/repository-dispatch@v1 # with: @@ -107,8 +107,8 @@ jobs: tag_pyversion: ["-py37", "-py38", "-py39"] steps: - uses: actions/checkout@v2 - with: - submodules: true # this job build devel image, devel image includes hub submodule +# with: +# submodules: true # this job build devel image, devel image includes hub submodule - name: Set envs and versions run: | echo "VCS_REF=$(git rev-parse --short HEAD)" >> $GITHUB_ENV @@ -159,8 +159,8 @@ jobs: runs-on: ubuntu-20.04 steps: - uses: actions/checkout@v2 - with: - submodules: true +# with: +# submodules: true - run: | echo "PIP_TAG='[devel]'" >> $GITHUB_ENV - name: Upload to Github Docker Registry @@ -191,8 +191,8 @@ jobs: test-path: ${{fromJson(needs.prep-testbed.outputs.matrix)}} steps: - uses: actions/checkout@v2 - with: - submodules: true +# with: +# submodules: true - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: diff --git a/.github/workflows/ci-hub.yml b/.github/workflows/ci-hub.yml index 1dd6f719162d0..371aa8f8aad4a 100644 --- a/.github/workflows/ci-hub.yml +++ b/.github/workflows/ci-hub.yml @@ -8,8 +8,8 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - with: - submodules: true +# with: +# submodules: true - name: Set up Python 3.7 uses: actions/setup-python@v2 with: diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index b2819b35e62e4..dd3a6e55944ae 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -123,7 +123,7 @@ jobs: echo "PIP_TAG='[devel]'" >> $GITHUB_ENV - run: | docker build -f Dockerfiles/pip.Dockerfile -t jinaai/jina:test-pip . - docker run jinaai/jina:test-pip hello-world + docker run jinaai/jina:test-pip hello fashion env: GITHUB_TOKEN: ${{secrets.GITHUB_TOKEN}} @@ -137,8 +137,8 @@ jobs: test-path: ${{fromJson(needs.prep-testbed.outputs.matrix)}} steps: - uses: actions/checkout@v2 - with: - submodules: true +# with: +# submodules: true - name: Set up Python ${{ matrix.python-version }} uses: actions/setup-python@v2 with: @@ -177,7 +177,7 @@ jobs: echo "flag it as jina for codeoverage" echo "::set-output name=codecov_flag::jina" fi - timeout-minutes: 15 + timeout-minutes: 20 env: JINAHUB_USERNAME: ${{ secrets.JINAHUB_USERNAME }} JINAHUB_PASSWORD: ${{ secrets.JINAHUB_PASSWORD }} diff --git a/.github/workflows/force-docker-build.yml b/.github/workflows/force-docker-build.yml index abf0e186776e1..490c193a41d35 100644 --- a/.github/workflows/force-docker-build.yml +++ b/.github/workflows/force-docker-build.yml @@ -31,7 +31,7 @@ jobs: - uses: actions/checkout@v2 with: ref: "v${{ github.event.inputs.git_tag }}" - submodules: true # this job build devel image, devel image includes hub submodule +# submodules: true # this job build devel image, devel image includes hub submodule - name: Set envs and versions run: | JINA_VERSION=${{ github.event.inputs.git_tag }} diff --git a/.github/workflows/force-release.yml b/.github/workflows/force-release.yml index d7145f314e2dd..2888f19958978 100644 --- a/.github/workflows/force-release.yml +++ b/.github/workflows/force-release.yml @@ -27,7 +27,7 @@ jobs: with: token: ${{ secrets.JINA_DEV_BOT }} fetch-depth: 100 # means max contribute history is limited to 100 lines - submodules: true +# submodules: true - uses: actions/setup-python@v2 with: python-version: 3.7 @@ -36,7 +36,7 @@ jobs: git fetch --depth=1 origin +refs/tags/*:refs/tags/* npm install git-release-notes pip install twine wheel - ./scripts/release.sh final "${{ github.event.inputs.release_reason }}" "${{github.actor}}" + ./scripts/release.sh rc "${{ github.event.inputs.release_reason }}" "${{github.actor}}" env: TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }} TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }} diff --git a/.github/workflows/latency-tracking.yml b/.github/workflows/latency-tracking.yml index 4bc6bf84a8648..35f21e7d253a3 100644 --- a/.github/workflows/latency-tracking.yml +++ b/.github/workflows/latency-tracking.yml @@ -8,8 +8,8 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - with: - submodules: true +# with: +# submodules: true - name: Test docker install run: | docker build --build-arg PIP_TAG="[devel]" -f Dockerfiles/pip.Dockerfile -t jinaai/jina:master . @@ -26,7 +26,7 @@ jobs: cd latency docker build --build-arg JINA_VER=master . -t latency-tracking docker run -v $(pwd)/output:/workspace/output -v $(pwd)/original:/workspace/original latency-tracking - bash batch.sh 2 + bash batch.sh 1 pip install prettytable python ppstat.py > comment.txt - id: get-comment-body diff --git a/.github/workflows/release.yml b/.github/workflows/release.yml index 36f10867d7ed1..c5c3db636ab91 100644 --- a/.github/workflows/release.yml +++ b/.github/workflows/release.yml @@ -1,13 +1,13 @@ name: Release Cycle -on: - schedule: - - cron: "00 23 * * 0" - #on: -# push: -# branches-ignore: -# - '**' # temporally disable this action +# schedule: +# - cron: "00 23 * * 0" + +on: + push: + branches-ignore: + - '**' # temporally disable this action jobs: @@ -18,7 +18,7 @@ jobs: with: token: ${{ secrets.JINA_DEV_BOT }} fetch-depth: 100 # means max contribute history is limited to 100 lines - submodules: true +# submodules: true - uses: actions/setup-python@v2 with: python-version: 3.7 @@ -27,7 +27,7 @@ jobs: git fetch --depth=1 origin +refs/tags/*:refs/tags/* npm install git-release-notes pip install twine wheel - ./scripts/release.sh final "Sunday night weekly patch release" "${{github.actor}}" + ./scripts/release.sh rc "Sunday night weekly patch release" "${{github.actor}}" env: TWINE_USERNAME: ${{ secrets.TWINE_USERNAME }} TWINE_PASSWORD: ${{ secrets.TWINE_PASSWORD }} diff --git a/.github/workflows/tag.yml b/.github/workflows/tag.yml index 9459d7447b4ce..b58f49479161f 100644 --- a/.github/workflows/tag.yml +++ b/.github/workflows/tag.yml @@ -5,14 +5,18 @@ on: tags: - "v*" # push to version tags trigger the build +#on: +# push: +# branches-ignore: +# - '**' # temporally disable this action jobs: update-schema: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - with: - submodules: true +# with: +# submodules: true - uses: actions/checkout@v2 with: repository: jina-ai/api @@ -95,8 +99,8 @@ jobs: runs-on: ubuntu-latest steps: - uses: actions/checkout@v2 - with: - submodules: true +# with: +# submodules: true - uses: actions/setup-python@v2 with: python-version: 3.7 @@ -122,8 +126,8 @@ jobs: tag_pyversion: ["-py37", "-py38", "-py39"] steps: - uses: actions/checkout@v2 - with: - submodules: true # this job build devel image, devel image includes hub submodule +# with: +# submodules: true # this job build devel image, devel image includes hub submodule - name: Set envs and versions run: | echo "VCS_REF=$(git rev-parse --short HEAD)" >> $GITHUB_ENV diff --git a/.pre-commit-config.yaml b/.pre-commit-config.yaml index 48b716995e1d9..344da1781976a 100644 --- a/.pre-commit-config.yaml +++ b/.pre-commit-config.yaml @@ -4,6 +4,7 @@ repos: hooks: - id: darglint files: jina/ + exclude: jina/helloworld/ args: - --message-template={path}:{line} {msg_id} {msg} - -s=sphinx @@ -14,6 +15,7 @@ repos: hooks: - id: pydocstyle files: jina/ + exclude: jina/helloworld/ args: - --select=D101,D102,D103 - repo: https://github.com/ambv/black diff --git a/CHANGELOG.md b/CHANGELOG.md index c88eb0dec401d..14cabc02cce0a 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -52,8 +52,6 @@ - - @@ -8610,7 +8608,7 @@ Jina is released on every Friday evening. The PyPi package and Docker Image will - [[```4140961c```](https://github.com/jina-ai/jina/commit/4140961c62359e3acd540a6d88931665c6313824)] __-__ __docs__: update TOC (*Jina Dev Bot*) - [[```20e9cf9a```](https://github.com/jina-ai/jina/commit/20e9cf9a4211bbd1e9f6b540786db747e45ea7f4)] __-__ __version__: the next version will be 1.2.2 (*Jina Dev Bot*) - + ## Release Note (`1.2.3`) > Release time: 2021-05-12 07:55:51 diff --git a/MANIFEST.in b/MANIFEST.in index 79093ba247fec..6da3795e28cc3 100644 --- a/MANIFEST.in +++ b/MANIFEST.in @@ -1,4 +1,5 @@ recursive-include jina/resources * +recursive-include jina/helloworld * include extra-requirements.txt prune tests/ -prune **/tests/ \ No newline at end of file +prune **/tests/ diff --git a/README.md b/README.md index f2c5d611dd404..f87f296bfff03 100644 --- a/README.md +++ b/README.md @@ -9,73 +9,40 @@

Python 3.7 3.8 3.9 -PyPI Docker Image Version (latest semver) codecov

-
-👋 Click here to see quick demo! - - - - - - - - - - - - -
- - - - - - - - - - - -
- Image search - - QA chatbot - - Multi-media search -
- -
- -Jina is geared towards building search systems for any kind of data, including [text](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences), [images](https://github.com/jina-ai/examples/tree/master/pokedex-with-bit), [audio](https://github.com/jina-ai/examples/tree/master/audio-search), [video](https://github.com/jina-ai/examples/tree/master/tumblr-gif-search) and [many more](https://github.com/jina-ai/examples). With the modular design & multi-layer abstraction, you can leverage the efficient patterns to build the system by parts, or chaining them into a [Flow](https://101.jina.ai/#Flow) for an end-to-end experience. - +Jina is geared towards building search systems for any kind of data, including text, image, audio, video, PDF etc. +Powered by deep learning and cloud-native techniques, you can leverage Jina to build a multimedia search system in +minutes. -🌌 **Search anything** - Large-scale indexing and querying of unstructured data: video, image, long/short text, music, source code, etc. +🌌 **Search anything** - Large-scale indexing and querying of unstructured data: video, image, long/short text, music, +source code, etc. ⏱️ **Save time** - *The* design pattern of neural search systems, from zero to a production-ready system in minutes. -🍱 **Own your stack** - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with fragmented, multi-vendor, generic legacy tools. - -🧠 **First-class AI models** - First-class support for [state-of-the-art AI models](https://docs.jina.ai/chapters/all_exec.html), easily usable and extendable with a Pythonic interface. - -🌩️ **Fast & cloud-ready** - Decentralized architecture from day one. Scalable & cloud-native by design: enjoy containerizing, distributing, sharding, async, REST/gRPC/WebSocket. +🍱 **Own your stack** - Keep an end-to-end stack ownership of your solution, avoid the integration pitfalls with +fragmented, multi-vendor, generic legacy tools. +🌩️ **Fast & cloud-ready** - Decentralized architecture from day one. Scalable & cloud-native by design: enjoy +containerizing, distributing, sharding, async, REST/gRPC/WebSocket. ## Installation -```sh -pip install -U jina +```console +$ pip install --pre jina +$ jina -v +2.0.0rcN ``` #### via Docker -```sh -docker run jinaai/jina:latest +```console +$ docker run jinaai/jina:master -v +2.0.0rcN ```
@@ -83,145 +50,113 @@ docker run jinaai/jina:latest |
x86/64,arm/v6,v7,[v8 (Apple M1)](https://github.com/jina-ai/jina/issues/1781) | On Linux/macOS & Python 3.7/3.8/3.9 | Docker Users| | --- | --- | --- | -| Standard | `pip install -U jina` | `docker run jinaai/jina:latest` | -| Daemon | `pip install -U "jina[daemon]"` | `docker run --network=host jinaai/jina:latest-daemon` | -| With Extras | `pip install -U "jina[devel]"` | `docker run jinaai/jina:latest-devel` | +| Standard | `pip install --pre jina` | `docker run jinaai/jina:2.0.0rc` | +| Daemon | `pip install --pre "jina[daemon]"` | `docker run --network=host jinaai/jina:latest-daemon` | +| With Extras | `pip install --pre "jina[devel]"` | `docker run jinaai/jina:latest-devel` | | Dev/Pre-Release | `pip install --pre jina` | `docker run jinaai/jina:master` | -Version identifiers [are explained here](https://github.com/jina-ai/jina/blob/master/RELEASE.md). To install Jina with extra dependencies [please refer to the docs](https://docs.jina.ai/chapters/install/os/via-pip/#cherry-pick-extra-dependencies). Jina can run on [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). We welcome the community to help us with [native Windows support](https://github.com/jina-ai/jina/issues/1252). - -
- -
-💡 YAML Completion in PyCharm & VSCode - -Developing Jina app often means writing YAML configs. We provide a [JSON Schema](https://json-schema.org/) for your IDE to enable code completion, syntax validation, members listing and displaying help text. Here is a [video tutorial](https://youtu.be/qOD-6mihUzQ) to walk you through the setup. - - - - - - - - - - -
- - - -**PyCharm** - -1. Click menu `Preferences` -> `JSON Schema mappings`; -2. Add a new schema, in the `Schema File or URL` write `https://api.jina.ai/schemas/latest.json`; select `JSON Schema Version 7`; -3. Add a file path pattern and link it to `*.jaml` and `*.jina.yml`. - -
- - - -**VSCode** - -1. Install the extension: `YAML Language Support by Red Hat`; -2. In IDE-level `settings.json` add: - -```json -"yaml.schemas": { - "https://api.jina.ai/schemas/latest.json": ["/*.jina.yml", "/*.jaml"], -} -``` +Version identifiers [are explained here](https://github.com/jina-ai/jina/blob/master/RELEASE.md). Jina can run +on [Windows Subsystem for Linux](https://docs.microsoft.com/en-us/windows/wsl/install-win10). We welcome the community +to help us with [native Windows support](https://github.com/jina-ai/jina/issues/1252). -
## Get Started +Document, Executor, Flow are three fundamental concepts in Jina. -### Cookbook +- 📄 **Document** is the basic data type in Jina; +- ⚙️ **Executor** is how Jina processes Documents; +- 🔀 **Flow** is how Jina streamlines and distributes Executors. -[Bits, pieces and examples of Jina code](./.github/pages/snippets.md) - -### Run Quick Demo - -- [👗 Fashion image search](./.github/pages/hello-world.md#-fashion-image-search): `jina hello fashion` -- [🤖 QA chatbot](./.github/pages/hello-world.md#-covid-19-chatbot): `pip install "jina[chatbot]" && jina hello chatbot` -- [📰 Multimedia search](./.github/pages/hello-world.md#-multimodal-document-search): `pip install "jina[multimodal]" && jina hello multimodal` - -### The Basics - -- [What is neural search, and how is it different to symbolic search?](https://jina.ai/2020/07/06/What-is-Neural-Search-and-Why-Should-I-Care.html) -- [Jina 101: Learn Jina's key components](https://docs.jina.ai/chapters/101/) -- [Jina 102: Learn how Jina's components fit together](https://docs.jina.ai/chapters/102/) -- [My First Jina App: Build your first simple app](https://docs.jina.ai/chapters/my_first_jina_app/) +Copy-paste the minimum example below and run it: +```python +from jina import Document, Executor, Flow, requests -### Video Tutorials - - - - - - -
- - - - - - - - - - - -
+class MyExecutor(Executor): + @requests + def foo(self, docs, parameters, **kwargs): + print(f'{parameters["p1"]} - {docs[0]}') -### Examples ([View all](https://github.com/jina-ai/examples)) - -#### [📄 NLP Semantic Wikipedia Search with Transformers and DistilBERT](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences) -    Brand new to neural search? See a simple text-search example to understand how Jina works -#### [📄 Add Incremental Indexing to Wikipedia Search](https://github.com/jina-ai/examples/tree/master/wikipedia-sentences-incremental) -    Index more effectively by adding incremental indexing to your Wikipedia search +f = Flow().add(uses=MyExecutor) -#### [📄 Search Lyrics with Transformers and PyTorch](https://github.com/jina-ai/examples/tree/master/multires-lyrics-search) -    Get a better understanding of chunks by searching a lyrics database. Now with shiny front-end! - -#### [🖼️ Google's Big Transfer Model in (Poké-)Production](https://github.com/jina-ai/examples/tree/master/pokedex-with-bit) -    Use SOTA visual representation for searching Pokémon! - -#### [🎧 Search YouTube audio data with Vggish](https://github.com/jina-ai/examples/tree/master/audio-search) -    A demo of neural search for audio data based Vggish model. - -#### [🎞️ Search Tumblr GIFs with KerasEncoder](https://github.com/jina-ai/examples/tree/master/tumblr-gif-search) -    Use prefetching and sharding to improve the performance of your index and query Flow when searching animated GIFs. +with f: + f.post(on='/bar', inputs=Document(), parameters={'p1': 'hello'}, on_done=print) +``` -Check our [examples repo](https://github.com/jina-ai/examples) for advanced and community-submitted examples. +### Run Quick Demo -## Documentation & Support +- [👗 Fashion image search](./.github/pages/hello-world.md#-fashion-image-search) + ```console + $ jina hello fashion + ``` +- [🤖 QA chatbot](./.github/pages/hello-world.md#-covid-19-chatbot) + ```console + $ pip install "jina[chatbot]" + $ jina hello chatbot + ``` +- [📰 Multimodal search](./.github/pages/hello-world.md#-multimodal-document-search) + ```console + $ pip install "jina[multimodal]" + $ jina hello multimodal + ``` + +#### Fork Demo & Build Your Own + +Copy the source code of a hello world to your own directory and start from there: + +```console +$ jina hello fork fashion ../my-proj/ +``` -- Docs: https://docs.jina.ai -- Join our [Slack community](https://slack.jina.ai) to chat to our engineers about your use cases, questions, and support queries. +### Read Tutorials + +- 📄 `Document` & `DocumentArray`: the basic data type in Jina. + - [Minimum working example](.github/2.0/cookbooks/Document.md#minimum-working-example) + - [`Document` API](.github/2.0/cookbooks/Document.md#document-api) + - [`DocumentArray` API](.github/2.0/cookbooks/Document.md#documentarray-api) +- ⚙️ `Executor`: how Jina processes Documents. + - [Minimum working example](.github/2.0/cookbooks/Executor.md#minimum-working-example) + - [Executor API](.github/2.0/cookbooks/Executor.md#executor-api) + - [Executor Built-in Features](.github/2.0/cookbooks/Executor.md#executor-built-in-features) + - [Migration from 1.x to 2.0 in Practice](.github/2.0/cookbooks/Executor.md#migration-in-practice) +- 🔀 `Flow`: how Jina streamlines and distributes Executors. + - [Minimum working example](.github/2.0/cookbooks/Flow.md#minimum-working-example) + - [Flow API](.github/2.0/cookbooks/Flow.md#flow-api) +- 🧼 [Write clean code in Jina](./.github/2.0/cookbooks/CleanCode.md) + +## Support + +- Join our [Slack community](https://slack.jina.ai) to chat to our engineers about your use cases, questions, and + support queries. - Join our Engineering All Hands meet-up to discuss your use case and learn Jina's new features. - **When?** The second Tuesday of every month - - **Where?** Zoom ([calendar link](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)/[.ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics)) and [live stream on YouTube](https://youtube.com/c/jina-ai) + - **Where?** + Zoom ([calendar link](https://calendar.google.com/event?action=TEMPLATE&tmeid=MHIybG03cjAwaXE3ZzRrYmVpaDJyZ2FpZjlfMjAyMDEwMTNUMTIwMDAwWiBjXzF0NW9nZnAyZDQ1djhmaXQ5ODFqMDhtY200QGc&tmsrc=c_1t5ogfp2d45v8fit981j08mcm4%40group.calendar.google.com&scp=ALL)/[.ics](https://hanxiao.io/2020/08/06/Engineering-All-Hands-in-Public/jina-ai-public.ics)) + and [live stream on YouTube](https://youtube.com/c/jina-ai)) - Subscribe to the latest video tutorials on our [YouTube channel](https://youtube.com/c/jina-ai). +## Join Us + +Jina is backed by [Jina AI](https://jina.ai). [We are actively hiring](https://jobs.jina.ai) full-stack developers, +solution engineers to build the next neural search ecosystem in open source. ## Contributing -We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to your active involvement. +We welcome all kinds of contributions from the open-source community, individuals and partners. We owe our success to +your active involvement. - [Contributing guidelines](CONTRIBUTING.md) -- [Code of conduct](https://github.com/jina-ai/jina/blob/master/.github/CODE_OF_CONDUCT.md) - play nicely with the Jina community +- [Code of conduct](https://github.com/jina-ai/jina/blob/master/.github/CODE_OF_CONDUCT.md) - play nicely with the Jina + community - [Good first issues](https://github.com/jina-ai/jina/issues?q=is%3Aopen+is%3Aissue+label%3A%22good+first+issue%22) - [Release cycles and development stages](RELEASE.md) - [Upcoming features](https://portal.productboard.com/jinaai/) - what's being planned, what we're thinking about. - - [![All Contributors](https://img.shields.io/badge/all_contributors-147-orange.svg?style=flat-square)](#contributors-) @@ -250,9 +185,4 @@ We welcome all kinds of contributions from the open-source community, individual - - - -## Join Us - -Jina is backed by [Jina AI](https://jina.ai). [We are hiring](https://jobs.jina.ai) full-stack developers, evangelists, and PMs to build the next neural search ecosystem in open source. + \ No newline at end of file diff --git a/cli/__init__.py b/cli/__init__.py index f679e114df56c..f8cae4187fe28 100644 --- a/cli/__init__.py +++ b/cli/__init__.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import sys diff --git a/cli/api.py b/cli/api.py index 2dbf5f8fd27c0..95c971a73840e 100644 --- a/cli/api.py +++ b/cli/api.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - if False: from argparse import Namespace @@ -20,6 +17,10 @@ def pod(args: 'Namespace'): pass +# alias +executor = pod + + def pea(args: 'Namespace'): """ Start a Pea @@ -115,17 +116,6 @@ def export_api(args: 'Namespace'): default_logger.info(f'API is exported to {f_name}') -def hello_world(args: 'Namespace'): - """ - Run the fashion hello world example - - :param args: arguments coming from the CLI. - """ - from jina.helloworld.fashion import hello_world - - hello_world(args) - - def hello(args: 'Namespace'): """ Run any of the hello world examples @@ -133,15 +123,23 @@ def hello(args: 'Namespace'): :param args: arguments coming from the CLI. """ if args.hello == 'fashion': - from jina.helloworld.fashion import hello_world + from jina.helloworld.fashion.app import hello_world + + hello_world(args) elif args.hello == 'chatbot': - from jina.helloworld.chatbot import hello_world + from jina.helloworld.chatbot.app import hello_world + + hello_world(args) elif args.hello == 'multimodal': - from jina.helloworld.multimodal import hello_world - else: - raise ValueError(f'must be one of [`fashion`, `chatbot`, `multimodal`]') + from jina.helloworld.multimodal.app import hello_world + + hello_world(args) + elif args.hello == 'fork': + from jina.helloworld.fork import fork_hello - hello_world(args) + fork_hello(args) + else: + raise ValueError(f'must be one of [`fashion`, `chatbot`, `multimodal`, `fork`]') def flow(args: 'Namespace'): diff --git a/cli/autocomplete.py b/cli/autocomplete.py index aa5a7b49b2755..9347a1e97d1e1 100644 --- a/cli/autocomplete.py +++ b/cli/autocomplete.py @@ -1,7 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - - def _update_autocomplete(): from jina.parsers import get_main_parser @@ -14,21 +10,21 @@ def _gaa(key, parser): elif v.choices: _compl.extend(v.choices) for kk, vv in v.choices.items(): - _result.update(_gaa(" ".join([key, kk]).strip(), vv)) + _result.update(_gaa(' '.join([key, kk]).strip(), vv)) # filer out single dash, as they serve as abbrev - _compl = [k for k in _compl if (not k.startswith("-") or k.startswith("--"))] + _compl = [k for k in _compl if (not k.startswith('-') or k.startswith('--'))] _result.update({key: _compl}) return _result - compl = _gaa("", get_main_parser()) - cmd = compl.pop("") - compl = {"commands": cmd, "completions": compl} + compl = _gaa('', get_main_parser()) + cmd = compl.pop('') + compl = {'commands': cmd, 'completions': compl} - with open(__file__, "a") as fp: - fp.write(f"\nac_table = {compl}\n") + with open(__file__, 'a') as fp: + fp.write(f'\nac_table = {compl}\n') -if __name__ == "__main__": +if __name__ == '__main__': _update_autocomplete() ac_table = { @@ -37,33 +33,26 @@ def _gaa(key, parser): '--version', '--version-full', 'hello', + 'executor', 'pod', 'flow', - 'optimizer', - 'gateway', 'ping', - 'check', - 'hub', + 'gateway', 'pea', 'client', 'export-api', - 'hello-world', + 'check', ], 'completions': { 'hello fashion': [ '--help', '--workdir', '--download-proxy', - '--shards', - '--parallel', - '--uses-index', '--index-data-url', '--index-labels-url', - '--index-request-size', - '--uses-query', '--query-data-url', '--query-labels-url', - '--query-request-size', + '--request-size', '--num-query', '--top-k', ], @@ -72,7 +61,6 @@ def _gaa(key, parser): '--workdir', '--download-proxy', '--index-data-url', - '--demo-url', '--port-expose', '--parallel', '--unblock-query-flow', @@ -83,15 +71,15 @@ def _gaa(key, parser): '--download-proxy', '--uses', '--index-data-url', - '--demo-url', '--port-expose', '--unblock-query-flow', ], 'hello': ['--help', 'fashion', 'chatbot', 'multimodal'], - 'pod': [ + 'executor': [ '--help', '--name', '--description', + '--workspace', '--log-config', '--quiet', '--quiet-error', @@ -110,8 +98,6 @@ def _gaa(key, parser): '--host-out', '--socket-in', '--socket-out', - '--load-interval', - '--dump-interval', '--read-only', '--memory-hwm', '--on-error-strategy', @@ -140,37 +126,87 @@ def _gaa(key, parser): '--uses-after', '--parallel', '--shards', + '--replicas', '--polling', '--scheduling', '--pod-role', '--peas-hosts', ], - 'flow': [ + 'pod': [ '--help', '--name', '--description', + '--workspace', '--log-config', '--quiet', '--quiet-error', '--identity', + '--port-ctrl', + '--ctrl-with-ipc', + '--timeout-ctrl', + '--ssh-server', + '--ssh-keyfile', + '--ssh-password', '--uses', - '--inspect', + '--py-modules', + '--port-in', + '--port-out', + '--host-in', + '--host-out', + '--socket-in', + '--socket-out', + '--read-only', + '--memory-hwm', + '--on-error-strategy', + '--num-part', + '--uses-internal', + '--entrypoint', + '--docker-kwargs', + '--pull-latest', + '--volumes', + '--host', + '--port-expose', + '--quiet-remote-logs', + '--upload-files', + '--workspace-id', + '--daemon', + '--runtime-backend', + '--runtime', + '--runtime-cls', + '--timeout-ready', + '--env', + '--expose-public', + '--pea-id', + '--pea-role', + '--noblock-on-start', + '--uses-before', + '--uses-after', + '--parallel', + '--shards', + '--replicas', + '--polling', + '--scheduling', + '--pod-role', + '--peas-hosts', ], - 'optimizer': [ + 'flow': [ '--help', '--name', '--description', + '--workspace', '--log-config', '--quiet', '--quiet-error', '--identity', '--uses', - '--output-dir', + '--inspect', ], + 'ping': ['--help', '--timeout', '--retries', '--print-response'], 'gateway': [ '--help', '--name', '--description', + '--workspace', '--log-config', '--quiet', '--quiet-error', @@ -189,8 +225,6 @@ def _gaa(key, parser): '--host-out', '--socket-in', '--socket-out', - '--load-interval', - '--dump-interval', '--read-only', '--memory-hwm', '--on-error-strategy', @@ -217,70 +251,11 @@ def _gaa(key, parser): '--pea-role', '--noblock-on-start', ], - 'ping': ['--help', '--timeout', '--retries', '--print-response'], - 'check': ['--help', '--summary-exec', '--summary-driver'], - 'hub login': ['--help'], - 'hub new': ['--help', '--output-dir', '--template', '--type', '--overwrite'], - 'hub init': ['--help', '--output-dir', '--template', '--type', '--overwrite'], - 'hub create': ['--help', '--output-dir', '--template', '--type', '--overwrite'], - 'hub build': [ - '--help', - '--username', - '--password', - '--registry', - '--repository', - '--file', - '--pull', - '--push', - '--dry-run', - '--prune-images', - '--raise-error', - '--test-uses', - '--test-level', - '--timeout-ready', - '--host-info', - '--daemon', - '--no-overwrite', - ], - 'hub push': [ - '--help', - '--username', - '--password', - '--registry', - '--repository', - '--no-overwrite', - ], - 'hub pull': [ - '--help', - '--username', - '--password', - '--registry', - '--repository', - '--no-overwrite', - ], - 'hub list': [ - '--help', - '--name', - '--kind', - '--keywords', - '--type', - '--local-only', - ], - 'hub': [ - '--help', - 'login', - 'new', - 'init', - 'create', - 'build', - 'push', - 'pull', - 'list', - ], 'pea': [ '--help', '--name', '--description', + '--workspace', '--log-config', '--quiet', '--quiet-error', @@ -299,8 +274,6 @@ def _gaa(key, parser): '--host-out', '--socket-in', '--socket-out', - '--load-interval', - '--dump-interval', '--read-only', '--memory-hwm', '--on-error-strategy', @@ -329,8 +302,6 @@ def _gaa(key, parser): 'client': [ '--help', '--request-size', - '--mode', - '--top-k', '--mime-type', '--continue-on-error', '--return-results', @@ -347,22 +318,6 @@ def _gaa(key, parser): '--port-expose', ], 'export-api': ['--help', '--yaml-path', '--json-path', '--schema-path'], - 'hello-world': [ - '--help', - '--workdir', - '--download-proxy', - '--shards', - '--parallel', - '--uses-index', - '--index-data-url', - '--index-labels-url', - '--index-request-size', - '--uses-query', - '--query-data-url', - '--query-labels-url', - '--query-request-size', - '--num-query', - '--top-k', - ], + 'check': ['--help', '--summary-exec'], }, } diff --git a/cli/export.py b/cli/export.py index cf4e43c57e48d..9881febd372fe 100644 --- a/cli/export.py +++ b/cli/export.py @@ -1,12 +1,12 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import argparse import os from typing import List def api_to_dict(): + """Convert Jina API to a dict + :return: dict + """ from jina import __version__ from jina.parsers import get_main_parser @@ -41,7 +41,7 @@ def api_to_dict(): def _export_parser_args(parser_fn, type_as_str: bool = False): from jina.enums import BetterEnum - from argparse import _StoreAction, _StoreTrueAction, _HelpAction, _SubParsersAction + from argparse import _StoreAction, _StoreTrueAction from jina.parsers.helper import KVAppendAction port_attr = ('help', 'choices', 'default', 'required', 'option_strings', 'dest') diff --git a/daemon/parser.py b/daemon/parser.py index 53b81532dbc04..787e195c7432e 100644 --- a/daemon/parser.py +++ b/daemon/parser.py @@ -10,6 +10,11 @@ def mixin_daemon_parser(parser): + """ + # noqa: DAR101 + # noqa: DAR102 + # noqa: DAR103 + """ gp = add_arg_group(parser, title='Daemon') gp.add_argument( @@ -19,15 +24,13 @@ def mixin_daemon_parser(parser): help='do not start fluentd, no log streaming', ) - gp.add_argument( - '--workspace', - type=str, - default='/tmp/jinad', - help='the directory for storing all uploaded dependencies', - ) - def get_main_parser(): + """ + Return main parser + :return: main parser + """ + parser = set_base_parser() mixin_remote_parser(parser) @@ -36,6 +39,7 @@ def get_main_parser(): parser.set_defaults( port_expose=8000, + workspace='/tmp/jinad', log_config=os.getenv( 'JINAD_LOG_CONFIG', resource_filename('jina', '/'.join(('resources', 'logging.daemon.yml'))), diff --git a/daemon/stores/flow.py b/daemon/stores/flow.py index 7cb8adaaf2f87..aea5216dc579f 100644 --- a/daemon/stores/flow.py +++ b/daemon/stores/flow.py @@ -1,5 +1,4 @@ import uuid -from fastapi.exceptions import HTTPException from typing import Optional, BinaryIO from jina.flow import Flow @@ -69,4 +68,6 @@ def update( if kind == UpdateOperationEnum.rolling_update: flow_obj.rolling_update(pod_name=pod_name, dump_path=dump_path) elif kind == UpdateOperationEnum.dump: - flow_obj.dump(pod_name=pod_name, dump_path=dump_path, shards=shards) + raise NotImplementedError( + f' sending post request does not work because asyncio loop is occupied' + ) diff --git a/extra-requirements.txt b/extra-requirements.txt index fcc59b018d751..1188c360e854a 100644 --- a/extra-requirements.txt +++ b/extra-requirements.txt @@ -34,7 +34,7 @@ onnx: framework, py37 onnxruntime: framework, py37 Pillow: cv, cicd, multimodal annoy>=1.9.5: index -sklearn: numeric, cicd +sklearn: numeric plyvel: index jieba: nlp lz4<3.1.2: devel, cicd, perf, network diff --git a/jina/__init__.py b/jina/__init__.py index c16eb3e61c57d..8f008caf87e45 100644 --- a/jina/__init__.py +++ b/jina/__init__.py @@ -7,8 +7,6 @@ """ -# DO SOME OS-WISE PATCHES - import datetime as _datetime import os as _os import platform as _platform @@ -16,28 +14,10 @@ import sys as _sys import types as _types -from google.protobuf.internal import api_implementation as _api_implementation - -if _api_implementation._default_implementation_type != 'cpp': - import warnings as _warnings - - _warnings.warn( - ''' - You are using Python protobuf backend, not the C++ version, which is much faster. - - This is often due to C++ implementation failed to compile while installing Protobuf - - You are using in Python 3.9 (https://github.com/jina-ai/jina/issues/1801) - - You are using on architecture other than x86_64/armv6/armv7 - - You installation is broken, try `pip install --force protobuf` - - You have C++ backend but you shut it down, try `export PROTOCOL_BUFFERS_PYTHON_IMPLEMENTATION=cpp` - - ''', - RuntimeWarning, - ) - if _sys.version_info < (3, 7, 0) or _sys.version_info >= (3, 10, 0): raise OSError(f'Jina requires Python 3.7/3.8/3.9, but yours is {_sys.version_info}') +# DO SOME OS-WISE PATCHES if _sys.version_info >= (3, 8, 0) and _platform.system() == 'Darwin': # temporary fix for python 3.8 on macos where the default start is set to "spawn" # https://docs.python.org/3/library/multiprocessing.html#contexts-and-start-methods @@ -48,15 +28,12 @@ # fix fork error on MacOS but seems no effect? must do EXPORT manually before jina start _os.environ['OBJC_DISABLE_INITIALIZE_FORK_SAFETY'] = 'YES' -# Underscore variables shared globally - -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - # do not change this line manually # this is managed by git tag and updated on every release # NOTE: this represents the NEXT release version -__version__ = '1.2.5' + +# TODO: remove 'rcN' on final release +__version__ = '2.0.0rc1' # do not change this line manually # this is managed by proto/build-proto.sh and updated on every execution @@ -70,9 +47,6 @@ # 3. copy all lines EXCEPT the first (which is the grep command in the last line) __jina_env__ = ( 'JINA_ARRAY_QUANT', - 'JINA_BINARY_DELIMITER', - 'JINA_CONTRIB_MODULE', - 'JINA_CONTRIB_MODULE_IS_LOADING', 'JINA_CONTROL_PORT', 'JINA_DEFAULT_HOST', 'JINA_DISABLE_UVLOOP', @@ -84,48 +58,38 @@ 'JINA_LOG_LEVEL', 'JINA_LOG_NO_COLOR', 'JINA_LOG_WORKSPACE', + 'JINA_OPTIMIZER_TRIAL_WORKSPACE', 'JINA_POD_NAME', - 'JINA_RAISE_ERROR_EARLY', 'JINA_RANDOM_PORTS', 'JINA_RANDOM_PORT_MAX', 'JINA_RANDOM_PORT_MIN', 'JINA_SOCKET_HWM', 'JINA_VCS_VERSION', 'JINA_WARN_UNNAMED', - 'JINA_WORKSPACE', ) __default_host__ = _os.environ.get('JINA_DEFAULT_HOST', '0.0.0.0') +__default_executor__ = 'BaseExecutor' +__default_endpoint__ = '/default' __ready_msg__ = 'ready and listening' __stop_msg__ = 'terminated' -__binary_delimiter__ = _os.environ.get( - 'JINA_BINARY_DELIMITER', '460841a0a8a430ae25d9ad7c1f048c57' -).encode() +__num_args_executor_func__ = 5 __root_dir__ = _os.path.dirname(_os.path.abspath(__file__)) _names_with_underscore = [ '__version__', - '__copyright__', - '__license__', '__proto_version__', '__default_host__', '__ready_msg__', '__stop_msg__', - '__binary_delimiter__', '__jina_env__', '__uptime__', '__root_dir__', + '__default_endpoint__', + '__default_executor__', + '__num_args_executor_func__', ] -# Primitive data type, -# note, they must be loaded BEFORE all executors/drivers/... to avoid cyclic imports -from jina.types.ndarray.generic import NdArray -from jina.types.request import Request, Response -from jina.types.message import Message -from jina.types.querylang import QueryLang -from jina.types.document import Document -from jina.types.document.multimodal import MultimodalDocument -from jina.types.arrays import DocumentArray, QueryLangArray # ADD GLOBAL NAMESPACE VARIABLES JINA_GLOBAL = _types.SimpleNamespace() @@ -133,13 +97,10 @@ JINA_GLOBAL.tensorflow_installed = None JINA_GLOBAL.torch_installed = None -import jina.importer as _ji - -# driver first, as executor may contain driver -_ji.import_classes('jina.drivers', show_import_table=False, import_once=True) -_ji.import_classes('jina.executors', show_import_table=False, import_once=True) -_ji.import_classes('jina.hub', show_import_table=False, import_once=True) - +# import jina.importer as _ji +# +# _ji.import_classes('jina.executors', show_import_table=False, import_once=True) +# _signal.signal(_signal.SIGINT, _signal.default_int_handler) @@ -191,24 +152,20 @@ def _set_nofile(nofile_atleast=4096): _set_nofile() -# Flow -from jina.flow import Flow -from jina.flow.asyncio import AsyncFlow +# ONLY FIRST CLASS CITIZENS ARE ALLOWED HERE, namely Document, Executor Flow -# Client -from jina.clients import Client -from jina.clients.asyncio import AsyncClient +# Document +from jina.types.document import Document +from jina.types.arrays.document import DocumentArray # Executor -from jina.executors import GenericExecutor as Executor -from jina.executors.classifiers import BaseClassifier as Classifier -from jina.executors.crafters import BaseCrafter as Crafter -from jina.executors.encoders import BaseEncoder as Encoder -from jina.executors.evaluators import BaseEvaluator as Evaluator -from jina.executors.indexers import BaseIndexer as Indexer -from jina.executors.rankers import BaseRanker as Ranker -from jina.executors.segmenters import BaseSegmenter as Segmenter +from jina.executors import BaseExecutor as Executor from jina.executors.decorators import requests +# Flow +from jina.flow import Flow +from jina.flow.asyncio import AsyncFlow + + __all__ = [_s for _s in dir() if not _s.startswith('_')] -__all__.extend([_s for _s in _names_with_underscore]) +__all__.extend(_names_with_underscore) diff --git a/jina/checker.py b/jina/checker.py index 4c29ce2a741c6..7ee3f7ad33e8b 100644 --- a/jina/checker.py +++ b/jina/checker.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import os from . import __jina_env__ @@ -38,15 +35,6 @@ def __init__(self, args: 'argparse.Namespace'): with open(args.summary_exec, 'w') as fp: _print_dep_tree_rst(fp, _r, 'Executor') - default_logger.info('\navailable drivers\n'.upper()) - _r = import_classes('jina.drivers', show_import_table=True, import_once=False) - - if args.summary_driver: - with open(args.summary_driver, 'w') as fp: - _print_dep_tree_rst(fp, _r, 'Driver') - - # check available driver group - default_logger.info('\nenvironment variables\n'.upper()) default_logger.info( '\n'.join( diff --git a/jina/clients/__init__.py b/jina/clients/__init__.py index 9a3f58c6fbbba..dd9a54948edc5 100644 --- a/jina/clients/__init__.py +++ b/jina/clients/__init__.py @@ -1,246 +1,23 @@ """Module wrapping the Client of Jina.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Union, List - -from . import request from .base import BaseClient, CallbackFnType, InputType, InputDeleteType from .helper import callback_exec +from .mixin import PostMixin from .request import GeneratorSourceType from .websocket import WebSocketClientMixin -from ..enums import RequestType -from ..helper import run_async, deprecated_alias -class Client(BaseClient): +class Client(PostMixin, BaseClient): """A simple Python client for connecting to the gRPC gateway. It manages the asyncio event loop internally, so all interfaces are synchronous from the outside. """ - async def _get_results(self, *args, **kwargs): - result = [] - async for resp in super()._get_results(*args, **kwargs): - if self.args.return_results: - result.append(resp) - - if self.args.return_results: - return result - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def train( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ) -> None: - """Issue 'train' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :return: None - """ - self.mode = RequestType.TRAIN - return run_async( - self._get_results, inputs, on_done, on_error, on_always, **kwargs - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def search( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ) -> None: - """Issue 'search' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :return: None - """ - self.mode = RequestType.SEARCH - self.add_default_kwargs(kwargs) - return run_async( - self._get_results, inputs, on_done, on_error, on_always, **kwargs - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def index( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ) -> None: - """Issue 'index' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :return: None - """ - self.mode = RequestType.INDEX - return run_async( - self._get_results, inputs, on_done, on_error, on_always, **kwargs - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def update( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ) -> None: - """Issue 'update' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :return: None - """ - self.mode = RequestType.UPDATE - return run_async( - self._get_results, inputs, on_done, on_error, on_always, **kwargs - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def delete( - self, - inputs: InputDeleteType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ) -> None: - """Issue 'update' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :return: None - """ - self.mode = RequestType.DELETE - return run_async( - self._get_results, inputs, on_done, on_error, on_always, **kwargs - ) - - def reload( - self, - targets: Union[str, List[str]], - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Send 'reload' request to the Flow. - - :param targets: the regex string or list of regex strings to match the pea/pod names. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :return: None - """ - - if isinstance(targets, str): - targets = [targets] - kwargs['targets'] = targets - - self.mode = RequestType.CONTROL - return run_async( - self._get_results, - [], - on_done, - on_error, - on_always, - command='RELOAD', - **kwargs, - ) - - def dump( - self, - targets: Union[str, List[str]], - dump_path: str, - shards: int, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Send 'reload' request to the Flow. - - :param shards: nr of shards to dump for - :param dump_path: the path to which to dump - :param targets: the regex string or list of regex strings to match the pea/pod names. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :return: None - """ - if isinstance(targets, str): - targets = [targets] - kwargs['targets'] = targets - # required in order for jina.clients.request.helper._add_control_propagate - kwargs['args'] = {} - kwargs['args']['dump_path'] = dump_path - kwargs['args']['shards'] = shards + @property + def client(self) -> 'Client': + """Return the client object itself - self.mode = RequestType.CONTROL - return run_async( - self._get_results, - [], - on_done, - on_error, - on_always, - command='DUMP', - **kwargs, - ) + .. # noqa: DAR201""" + return self class WebSocketClient(Client, WebSocketClientMixin): diff --git a/jina/clients/asyncio.py b/jina/clients/asyncio.py index 79cd114edb952..7c3b5c9b904dd 100644 --- a/jina/clients/asyncio.py +++ b/jina/clients/asyncio.py @@ -1,14 +1,10 @@ """Module wrapping AsyncIO ops for clients.""" -from typing import Union, List, AsyncGenerator -from jina.types.request import Response - -from .base import InputType, InputDeleteType, BaseClient, CallbackFnType +from .base import BaseClient +from .mixin import AsyncPostMixin from .websocket import WebSocketClientMixin -from ..enums import RequestType -from ..helper import deprecated_alias -class AsyncClient(BaseClient): +class AsyncClient(AsyncPostMixin, BaseClient): """ :class:`AsyncClient` is the asynchronous version of the :class:`Client`. @@ -52,178 +48,6 @@ async def concurrent_main(): One can think of :class:`Client` as Jina-managed eventloop, whereas :class:`AsyncClient` is self-managed eventloop. """ - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def train( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs - ) -> AsyncGenerator[Response, None]: - """Issue 'train' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :yield: result - """ - self.mode = RequestType.TRAIN - async for r in self._get_results( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def search( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs - ) -> AsyncGenerator[Response, None]: - """Issue 'search' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :yield: result - """ - self.mode = RequestType.SEARCH - self.add_default_kwargs(kwargs) - async for r in self._get_results( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def index( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs - ) -> AsyncGenerator[Response, None]: - """Issue 'index' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :yield: result - """ - self.mode = RequestType.INDEX - async for r in self._get_results( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def delete( - self, - inputs: InputDeleteType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs - ) -> AsyncGenerator[Response, None]: - """Issue 'delete' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :yield: result - """ - self.mode = RequestType.DELETE - async for r in self._get_results( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def update( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs - ) -> AsyncGenerator[Response, None]: - """Issue 'update' request to the Flow. - - :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :yield: result - """ - self.mode = RequestType.UPDATE - async for r in self._get_results( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - async def reload( - self, - targets: Union[str, List[str]], - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs - ): - """Send 'reload' request to the Flow. - - :param targets: the regex string or list of regex strings to match the pea/pod names. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: additional parameters - :yield: result - """ - - if isinstance(targets, str): - targets = [targets] - kwargs['targets'] = targets - - self.mode = RequestType.CONTROL - async for r in self._get_results([], on_done, on_error, on_always, **kwargs): - yield r - class AsyncWebSocketClient(AsyncClient, WebSocketClientMixin): """ diff --git a/jina/clients/base.py b/jina/clients/base.py index 4b6e921b4a271..b761109369423 100644 --- a/jina/clients/base.py +++ b/jina/clients/base.py @@ -1,17 +1,15 @@ """Module containing the Base Client for Jina.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" import argparse -import os -from typing import Callable, Union, Optional, Iterator, Iterable, Dict, AsyncIterator import asyncio +import inspect +import os +from typing import Callable, Union, Optional, Iterator, Iterable, AsyncIterator import grpc -import inspect + from .helper import callback_exec from .request import GeneratorSourceType -from ..enums import RequestType from ..excepts import BadClient, BadClientInput, ValidationError from ..helper import typename from ..logging import default_logger, JinaLogger @@ -45,31 +43,8 @@ def __init__(self, args: 'argparse.Namespace'): # affect users os-level envs. os.unsetenv('http_proxy') os.unsetenv('https_proxy') - self._mode = args.mode self._inputs = None - @property - def mode(self) -> str: - """ - Get the mode for this client (index, query etc.). - - :return: Mode of the client. - """ - return self._mode - - @mode.setter - def mode(self, value: RequestType) -> None: - """ - Set the mode. - - :param value: Request type. (e.g. INDEX, SEARCH, DELETE, UPDATE, CONTROL, TRAIN) - """ - if isinstance(value, RequestType): - self._mode = value - self.args.mode = value - else: - raise ValueError(f'{value} must be one of {RequestType}') - @staticmethod def check_input(inputs: Optional[InputType] = None, **kwargs) -> None: """Validate the inputs and print the first request if success. @@ -77,11 +52,17 @@ def check_input(inputs: Optional[InputType] = None, **kwargs) -> None: :param inputs: the inputs :param kwargs: keyword arguments """ + + if inputs is None: + # empty inputs is considered as valid + return + if hasattr(inputs, '__call__'): # it is a function inputs = inputs() kwargs['data'] = inputs + kwargs['exec_endpoint'] = '/' if inspect.isasyncgenfunction(inputs) or inspect.isasyncgen(inputs): raise ValidationError( @@ -123,12 +104,6 @@ def _get_requests( return request_generator(**_kwargs) - def _get_task_name(self, kwargs: Dict) -> str: - tname = str(self.mode).lower() - if 'mode' in kwargs: - tname = str(kwargs['mode']).lower() - return tname - @property def inputs(self) -> InputType: """ @@ -138,10 +113,7 @@ def inputs(self) -> InputType: :return: inputs """ - if self._inputs is not None: - return self._inputs - else: - raise BadClient('inputs are not defined') + return self._inputs @inputs.setter def inputs(self, bytes_gen: InputType) -> None: @@ -165,7 +137,6 @@ async def _get_results( ): try: self.inputs = inputs - tname = self._get_task_name(kwargs) req_iter = self._get_requests(**kwargs) async with grpc.aio.insecure_channel( f'{self.args.host}:{self.args.port_expose}', @@ -178,7 +149,7 @@ async def _get_results( self.logger.success( f'connected to the gateway at {self.args.host}:{self.args.port_expose}!' ) - with ProgressBar(task_name=tname) as p_bar, TimeContext(tname): + with ProgressBar() as p_bar, TimeContext(''): async for resp in stub.Call(req_iter): resp.as_typed_request(resp.request_type) resp.as_response() @@ -195,7 +166,7 @@ async def _get_results( except KeyboardInterrupt: self.logger.warning('user cancel the process') except asyncio.CancelledError as ex: - self.logger.warning(f'process error: {ex!r}, terminate signal send?') + self.logger.warning(f'process error: {ex!r}') except grpc.aio._call.AioRpcError as rpc_ex: # Since this object is guaranteed to be a grpc.Call, might as well include that in its name. my_code = rpc_ex.code() @@ -220,48 +191,3 @@ async def _get_results( ) from rpc_ex else: raise BadClient(msg) from rpc_ex - - def index(self): - """Issue 'index' request to the Flow.""" - raise NotImplementedError - - def search(self): - """Issue 'search' request to the Flow.""" - raise NotImplementedError - - def train(self): - """Issue 'train' request to the Flow.""" - raise NotImplementedError - - @staticmethod - def add_default_kwargs(kwargs: Dict): - """ - Add the default kwargs to the instance. - - :param kwargs: the kwargs to add - """ - # TODO: refactor it into load from config file - if ('top_k' in kwargs) and (kwargs['top_k'] is not None): - # associate all VectorSearchDriver and SliceQL driver to use top_k - from jina import QueryLang - - topk_ql = [ - QueryLang( - { - 'name': 'SliceQL', - 'priority': 1, - 'parameters': {'end': kwargs['top_k']}, - } - ), - QueryLang( - { - 'name': 'VectorSearchDriver', - 'priority': 1, - 'parameters': {'top_k': kwargs['top_k']}, - } - ), - ] - if 'queryset' not in kwargs: - kwargs['queryset'] = topk_ql - else: - kwargs['queryset'].extend(topk_ql) diff --git a/jina/clients/helper.py b/jina/clients/helper.py index 60d956da7b310..866e372cbcb97 100644 --- a/jina/clients/helper.py +++ b/jina/clients/helper.py @@ -1,16 +1,14 @@ """Helper functions for clients in Jina.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" from functools import wraps from typing import Callable -from .. import Response from ..excepts import BadClientCallback from ..helper import colored from ..importer import ImportExtensions from ..logging import JinaLogger from ..proto import jina_pb2 +from ..types.request import Response def pprint_routes(resp: 'Response', stack_limit: int = 3): @@ -18,7 +16,6 @@ def pprint_routes(resp: 'Response', stack_limit: int = 3): :param resp: the :class:`Response` object :param stack_limit: traceback limit - :return: """ from textwrap import fill diff --git a/jina/clients/mixin.py b/jina/clients/mixin.py new file mode 100644 index 0000000000000..171f21f13502e --- /dev/null +++ b/jina/clients/mixin.py @@ -0,0 +1,107 @@ +from functools import partialmethod +from typing import Optional, Dict, List, AsyncGenerator + +from .base import CallbackFnType, InputType +from ..helper import run_async +from ..types.request import Response + + +class PostMixin: + """The Post Mixin class for Client and Flow """ + + def post( + self, + on: str, + inputs: Optional[InputType] = None, + on_done: CallbackFnType = None, + on_error: CallbackFnType = None, + on_always: CallbackFnType = None, + parameters: Optional[Dict] = None, + target_peapod: Optional[str] = None, + **kwargs, + ) -> Optional[List[Response]]: + """Post a general data request to the Flow. + + :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id. + :param on: the endpoint is used for identifying the user-defined ``request_type``, labeled by ``@requests(on='/abc')`` + :param on_done: the function to be called when the :class:`Request` object is resolved. + :param on_error: the function to be called when the :class:`Request` object is rejected. + :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. + :param parameters: the kwargs that will be sent to the executor + :param target_peapod: a regex string represent the certain peas/pods request targeted + :param kwargs: additional parameters + :return: None + """ + + async def _get_results(*args, **kwargs): + result = [] + c = self.client + async for resp in c._get_results(*args, **kwargs): + if c.args.return_results: + result.append(resp) + + if c.args.return_results: + return result + + return run_async( + _get_results, + inputs=inputs, + on_done=on_done, + on_error=on_error, + on_always=on_always, + exec_endpoint=on, + target_peapod=target_peapod, + parameters=parameters, + **kwargs, + ) + + # ONLY CRUD, for other request please use `.post` + index = partialmethod(post, '/index') + search = partialmethod(post, '/search') + update = partialmethod(post, '/update') + delete = partialmethod(post, '/delete') + + +class AsyncPostMixin: + """The Async Post Mixin class for AsyncClient and AsyncFlow """ + + async def post( + self, + on: str, + inputs: Optional[InputType] = None, + on_done: CallbackFnType = None, + on_error: CallbackFnType = None, + on_always: CallbackFnType = None, + parameters: Optional[Dict] = None, + target_peapod: Optional[str] = None, + **kwargs, + ) -> AsyncGenerator[None, Response]: + """Post a general data request to the Flow. + + :param inputs: input data which can be an Iterable, a function which returns an Iterable, or a single Document id. + :param on: the endpoint is used for identifying the user-defined ``request_type``, labeled by ``@requests(on='/abc')`` + :param on_done: the function to be called when the :class:`Request` object is resolved. + :param on_error: the function to be called when the :class:`Request` object is rejected. + :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. + :param parameters: the kwargs that will be sent to the executor + :param target_peapod: a regex string represent the certain peas/pods request targeted + :param kwargs: additional parameters + :yield: Response object + """ + async for r in self.client._get_results( + inputs=inputs, + on_done=on_done, + on_error=on_error, + on_always=on_always, + exec_endpoint=on, + target_peapod=target_peapod, + parameters=parameters, + **kwargs, + ): + yield r + + # ONLY CRUD, for other request please use `.post` + index = partialmethod(post, '/index') + search = partialmethod(post, '/search') + update = partialmethod(post, '/update') + delete = partialmethod(post, '/delete') diff --git a/jina/clients/request/__init__.py b/jina/clients/request/__init__.py index 0aaf2fbb00ba2..91c68500e7932 100644 --- a/jina/clients/request/__init__.py +++ b/jina/clients/request/__init__.py @@ -1,16 +1,13 @@ """Module for Jina Requests.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" -from typing import Iterator, Union, Tuple, AsyncIterable, Iterable, Optional +from typing import Iterator, Union, Tuple, AsyncIterable, Iterable, Optional, Dict -from .helper import _new_request_from_batch -from ... import Request -from ...enums import RequestType, DataInputType +from .helper import _new_data_request_from_batch, _new_data_request +from ...enums import DataInputType from ...helper import batch_iterator from ...logging import default_logger from ...types.document import DocumentSourceType, DocumentContentType, Document -from ...types.arrays.querylang import AcceptQueryLangType +from ...types.request import Request SingletonDataType = Union[ DocumentContentType, @@ -26,25 +23,25 @@ def request_generator( + exec_endpoint: str, data: GeneratorSourceType, request_size: int = 0, - mode: RequestType = RequestType.INDEX, mime_type: Optional[str] = None, - queryset: Optional[ - Union[AcceptQueryLangType, Iterator[AcceptQueryLangType]] - ] = None, data_type: DataInputType = DataInputType.AUTO, + target_peapod: Optional[str] = None, + parameters: Optional[Dict] = None, **kwargs, # do not remove this, add on purpose to suppress unknown kwargs ) -> Iterator['Request']: """Generate a request iterator. + :param exec_endpoint: the endpoint string, by convention starts with `/` :param data: the data to use in the request :param request_size: the request size for the client - :param mode: the request mode (index, search etc.) :param mime_type: mime type - :param queryset: querylang set of queries :param data_type: if ``data`` is an iterator over self-contained document, i.e. :class:`DocumentSourceType`; or an iterator over possible Document content (set to text, blob and buffer). + :param parameters: the kwargs that will be sent to the executor + :param target_peapod: a regex string represent the certain peas/pods request targeted :param kwargs: additional arguments :yield: request """ @@ -52,12 +49,23 @@ def request_generator( _kwargs = dict(mime_type=mime_type, weight=1.0, extra_kwargs=kwargs) try: - if not isinstance(data, Iterable): - data = [data] - for batch in batch_iterator(data, request_size): - yield _new_request_from_batch( - _kwargs, batch, data_type, mode, queryset, **kwargs + if data is None: + # this allows empty inputs, i.e. a data request with only parameters + yield _new_data_request( + endpoint=exec_endpoint, target=target_peapod, parameters=parameters ) + else: + if not isinstance(data, Iterable): + data = [data] + for batch in batch_iterator(data, request_size): + yield _new_data_request_from_batch( + _kwargs=kwargs, + batch=batch, + data_type=data_type, + endpoint=exec_endpoint, + target=target_peapod, + parameters=parameters, + ) except Exception as ex: # must be handled here, as grpc channel wont handle Python exception diff --git a/jina/clients/request/asyncio.py b/jina/clients/request/asyncio.py index 50643c3dbd595..5ea9914e80275 100644 --- a/jina/clients/request/asyncio.py +++ b/jina/clients/request/asyncio.py @@ -1,49 +1,60 @@ """Module for async requests generator.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" -from typing import Iterator, Union, AsyncIterator, Optional +from typing import AsyncIterator, Optional, Dict -from .helper import _new_request_from_batch +from .helper import _new_data_request_from_batch, _new_data_request from .. import GeneratorSourceType -from ... import Request -from ...enums import RequestType, DataInputType +from ...enums import DataInputType from ...importer import ImportExtensions from ...logging import default_logger -from ...types.arrays.querylang import AcceptQueryLangType +from ...types.request import Request async def request_generator( + exec_endpoint: str, data: GeneratorSourceType, request_size: int = 0, - mode: RequestType = RequestType.INDEX, mime_type: Optional[str] = None, - queryset: Optional[ - Union[AcceptQueryLangType, Iterator[AcceptQueryLangType]] - ] = None, data_type: DataInputType = DataInputType.AUTO, + target_peapod: Optional[str] = None, + parameters: Optional[Dict] = None, **kwargs, # do not remove this, add on purpose to suppress unknown kwargs ) -> AsyncIterator['Request']: """An async :function:`request_generator`. + :param exec_endpoint: the endpoint string, by convention starts with `/` :param data: the data to use in the request :param request_size: the request size for the client - :param mode: the request mode (index, search etc.) :param mime_type: mime type - :param queryset: querylang set of queries :param data_type: if ``data`` is an iterator over self-contained document, i.e. :class:`DocumentSourceType`; or an iterator over possible Document content (set to text, blob and buffer). - :param kwargs: additional key word arguments + :param parameters: the kwargs that will be sent to the executor + :param target_peapod: a regex string represent the certain peas/pods request targeted + :param kwargs: additional arguments :yield: request """ - _kwargs = dict(mime_type=mime_type, weight=1.0) + + _kwargs = dict(mime_type=mime_type, weight=1.0, extra_kwargs=kwargs) try: - with ImportExtensions(required=True): - import aiostream + if data is None: + # this allows empty inputs, i.e. a data request with only parameters + yield _new_data_request( + endpoint=exec_endpoint, target=target_peapod, parameters=parameters + ) + else: + with ImportExtensions(required=True): + import aiostream - async for batch in aiostream.stream.chunks(data, request_size): - yield _new_request_from_batch(_kwargs, batch, data_type, mode, queryset) + async for batch in aiostream.stream.chunks(data, request_size): + yield _new_data_request_from_batch( + _kwargs=kwargs, + batch=batch, + data_type=data_type, + endpoint=exec_endpoint, + target=target_peapod, + parameters=parameters, + ) except Exception as ex: # must be handled here, as grpc channel wont handle Python exception default_logger.critical(f'inputs is not valid! {ex!r}', exc_info=True) diff --git a/jina/clients/request/helper.py b/jina/clients/request/helper.py index afc30118d876c..149c4a17a66d3 100644 --- a/jina/clients/request/helper.py +++ b/jina/clients/request/helper.py @@ -1,10 +1,41 @@ """Module for helper functions for clients.""" -from typing import Tuple, Sequence +from typing import Tuple -from ... import Document, Request -from ...enums import DataInputType, RequestType +from ... import Document +from ...enums import DataInputType from ...excepts import BadDocType, BadRequestType -from ...excepts import RequestTypeError +from ...types.request import Request + + +def _new_data_request_from_batch( + _kwargs, batch, data_type, endpoint, target, parameters +): + req = _new_data_request(endpoint, target, parameters) + + # add docs, groundtruths fields + try: + _add_docs_groundtruths(req, batch, data_type, _kwargs) + except Exception as ex: + raise BadRequestType( + f'error when building {req.request_type} from {batch}' + ) from ex + + return req + + +def _new_data_request(endpoint, target, parameters): + req = Request() + req.request_type = 'data' + + # set up header + if endpoint: + req.header.exec_endpoint = endpoint + if target: + req.header.target_peapod = target + # add parameters field + if parameters: + req.parameters.update(parameters) + return req def _new_doc_from_data( @@ -32,42 +63,6 @@ def _build_doc_from_content(): return _build_doc_from_content() -def _new_request_from_batch(_kwargs, batch, data_type, mode, queryset, **kwargs): - req = Request() - req.request_type = str(mode) - - try: - # add type-specific fields - if ( - mode == RequestType.INDEX - or mode == RequestType.SEARCH - or mode == RequestType.TRAIN - or mode == RequestType.UPDATE - ): - if 'extra_kwargs' in _kwargs: - _kwargs.pop('extra_kwargs') #: data request do not need extra kwargs - _add_docs_groundtruths(req, batch, data_type, _kwargs) - elif mode == RequestType.DELETE: - _add_ids(req, batch) - elif mode == RequestType.CONTROL: - _add_control_propagate(req, _kwargs) - else: - raise RequestTypeError( - f'generating request from {mode} is not yet supported' - ) - except Exception as ex: - raise BadRequestType( - f'error when building {req.request_type} from {batch}' - ) from ex - - # add common fields - if isinstance(queryset, Sequence): - req.queryset.extend(queryset) - elif queryset is not None: - req.queryset.append(queryset) - return req - - def _add_docs_groundtruths(req, batch, data_type, _kwargs): for content in batch: if isinstance(content, tuple) and len(content) == 2: @@ -83,11 +78,6 @@ def _add_docs_groundtruths(req, batch, data_type, _kwargs): req.docs.append(d) -def _add_ids(req, batch): - string_ids = (str(doc_id) for doc_id in batch) - req.ids.extend(string_ids) - - def _add_control_propagate(req, kwargs): from ...proto import jina_pb2 @@ -113,5 +103,3 @@ def _add_control_propagate(req, kwargs): raise ValueError( f'command "{command}" is not supported, must be one of {_available_commands}' ) - req.targets.extend(extra_kwargs.get('targets', [])) - req.control.propagate = True diff --git a/jina/clients/sugary_io.py b/jina/clients/sugary_io.py deleted file mode 100644 index 2ffbb4b93ba07..0000000000000 --- a/jina/clients/sugary_io.py +++ /dev/null @@ -1,189 +0,0 @@ -"""A module for sugary API wrapper around the clients.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import csv -import glob -import itertools as it -import json -import os -import random -from typing import List, Union, Iterator, Iterable, Dict, Generator, Optional - -import numpy as np - -if False: - from jina import Document - -# https://github.com/ndjson/ndjson.github.io/issues/1#issuecomment-109935996 -_jsonl_ext = {'.jsonlines', '.ndjson', '.jsonl', '.jl', '.ldjson'} -_csv_ext = {'.csv', '.tcsv'} - - -def _sample(iterable, sampling_rate: Optional[float] = None): - for i in iterable: - if sampling_rate is None or random.random() < sampling_rate: - yield i - - -def _subsample( - iterable, size: Optional[int] = None, sampling_rate: Optional[float] = None -): - yield from it.islice(_sample(iterable, sampling_rate), size) - - -def _input_lines( - lines: Optional[Iterable[str]] = None, - filepath: Optional[str] = None, - read_mode: str = 'r', - line_format: str = 'json', - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, -) -> Generator[Union[str, 'Document'], None, None]: - """Generator function for lines, json and sc. Yields documents or strings. - - :param lines: a list of strings, each is considered as a document - :param filepath: a text file that each line contains a document - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in binary - :param line_format: the format of each line ``json`` or ``csv`` - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :yields: documents - - .. note:: - This function should not be directly used, use :meth:`Flow.index_files`, :meth:`Flow.search_files` instead - """ - if filepath: - file_type = os.path.splitext(filepath)[1] - with open(filepath, read_mode) as f: - if file_type in _jsonl_ext: - yield from _input_ndjson(f) - elif file_type in _csv_ext: - yield from _input_csv(f, field_resolver, size, sampling_rate) - else: - yield from _subsample(f, size, sampling_rate) - elif lines: - if line_format == 'json': - yield from _input_ndjson(lines) - elif line_format == 'csv': - yield from _input_csv(lines, field_resolver, size, sampling_rate) - else: - yield from _subsample(lines, size, sampling_rate) - else: - raise ValueError('"filepath" and "lines" can not be both empty') - - -def _input_ndjson( - fp: Iterable[str], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, -): - from jina import Document - - for line in _subsample(fp, size, sampling_rate): - value = json.loads(line) - if 'groundtruth' in value and 'document' in value: - yield Document(value['document'], field_resolver), Document( - value['groundtruth'], field_resolver - ) - else: - yield Document(value, field_resolver) - - -def _input_csv( - fp: Iterable[str], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, -): - from jina import Document - - lines = csv.DictReader(fp) - for value in _subsample(lines, size, sampling_rate): - if 'groundtruth' in value and 'document' in value: - yield Document(value['document'], field_resolver), Document( - value['groundtruth'], field_resolver - ) - else: - yield Document(value, field_resolver) - - -def _input_files( - patterns: Union[str, List[str]], - recursive: bool = True, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: Optional[str] = None, -) -> Iterator[Union[str, bytes]]: - """Creates an iterator over a list of file path or the content of the files. - - :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]' - :param recursive: If recursive is true, the pattern '**' will match any files - and zero or more directories and subdirectories - :param size: the maximum number of the files - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file is opened. - 'r' for reading in text mode, 'rb' for reading in binary mode. - If `read_mode` is None, will iterate over filenames. - :yield: file paths or binary content - - .. note:: - This function should not be directly used, use :meth:`Flow.index_files`, :meth:`Flow.search_files` instead - """ - if read_mode not in {'r', 'rb', None}: - raise RuntimeError(f'read_mode should be "r", "rb" or None, got {read_mode}') - - def _iter_file_exts(ps): - return it.chain.from_iterable(glob.iglob(p, recursive=recursive) for p in ps) - - d = 0 - if isinstance(patterns, str): - patterns = [patterns] - for g in _iter_file_exts(patterns): - if sampling_rate is None or random.random() < sampling_rate: - if read_mode is None: - yield g - elif read_mode in {'r', 'rb'}: - with open(g, read_mode) as fp: - yield fp.read() - d += 1 - if size is not None and d > size: - break - - -def _input_ndarray( - array: 'np.ndarray', - axis: int = 0, - size: Optional[int] = None, - shuffle: bool = False, -) -> Generator['np.ndarray', None, None]: - """Create a generator for a given dimension of a numpy array. - - :param array: the numpy ndarray data source - :param axis: iterate over that axis - :param size: the maximum number of the sub arrays - :param shuffle: shuffle the numpy data source beforehand - :yield: ndarray - - .. note:: - This function should not be directly used, use :meth:`Flow.index_ndarray`, :meth:`Flow.search_ndarray` instead - """ - if shuffle: - # shuffle for random query - array = np.take(array, np.random.permutation(array.shape[0]), axis=axis) - d = 0 - for r in array: - yield r - d += 1 - if size is not None and d >= size: - break - - -# for back-compatibility -_input_numpy = _input_ndarray diff --git a/jina/clients/websocket.py b/jina/clients/websocket.py index 6a66b02829063..835cd227dee00 100644 --- a/jina/clients/websocket.py +++ b/jina/clients/websocket.py @@ -47,7 +47,6 @@ async def _get_results( self.inputs = inputs - tname = self._get_task_name(kwargs) req_iter = self._get_requests(**kwargs) try: client_info = f'{self.args.host}:{self.args.port_expose}' @@ -78,7 +77,7 @@ async def _send_requests(request_iterator): # There is nothing to send, disconnect gracefully await websocket.close(reason='No data to send') - with ProgressBar(task_name=tname) as p_bar, TimeContext(tname): + with ProgressBar() as p_bar, TimeContext(''): # Unlike gRPC, any arbitrary function (generator) cannot be passed via websockets. # Simply iterating through the `req_iter` makes the request-response sequential. # To make client unblocking, :func:`send_requests` and `recv_responses` are separate tasks diff --git a/jina/docker/checker.py b/jina/docker/checker.py index fc446339864eb..28495452c5def 100644 --- a/jina/docker/checker.py +++ b/jina/docker/checker.py @@ -1,6 +1,4 @@ """Module for validation functions.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" import os import re diff --git a/jina/docker/helper.py b/jina/docker/helper.py index bd37641388726..483b9fdca9a78 100644 --- a/jina/docker/helper.py +++ b/jina/docker/helper.py @@ -1,6 +1,4 @@ """Module for helper functions for Docker.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" from pathlib import Path diff --git a/jina/docker/hubio.py b/jina/docker/hubio.py index 735566689c7b6..dccc3c846a647 100644 --- a/jina/docker/hubio.py +++ b/jina/docker/hubio.py @@ -1,6 +1,4 @@ """Module for wrapping Jina Hub API calls.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" import argparse import glob diff --git a/jina/drivers/__init__.py b/jina/drivers/__init__.py deleted file mode 100644 index 391fcfe01dfb6..0000000000000 --- a/jina/drivers/__init__.py +++ /dev/null @@ -1,732 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import inspect -import typing -from functools import wraps -from typing import ( - Any, - Dict, - Callable, - Tuple, - Optional, - Sequence, - Iterable, - List, - Union, -) - -import numpy as np -from google.protobuf.struct_pb2 import Struct - -from ..enums import OnErrorStrategy -from ..excepts import LengthMismatchException -from ..executors.compound import CompoundExecutor -from ..executors.decorators import wrap_func -from ..helper import ( - convert_tuple_to_list, - cached_property, - find_request_binding, - _canonical_request_name, -) -from ..jaml import JAMLCompatible -from ..types.querylang import QueryLang -from ..types.arrays import DocumentArray - -# noinspection PyUnreachableCode -if False: - # fix type-hint complain for sphinx and flake - from ..peapods.runtimes.zmq.zed import ZEDRuntime - from ..executors import AnyExecutor - from ..logging.logger import JinaLogger - from ..types.message import Message - from ..types.request import Request - from ..types.arrays import QueryLangArray - from ..types.document import Document - - -def store_init_kwargs(func: Callable) -> Callable: - """Mark the args and kwargs of :func:`__init__` later to be stored via :func:`save_config` in YAML - - :param func: the Callable to wrap - :return: the wrapped Callable - """ - - @wraps(func) - def _arg_wrapper(self, *args, **kwargs): - if func.__name__ != '__init__': - raise TypeError( - 'this decorator should only be used on __init__ method of a driver' - ) - taboo = {'self', 'args', 'kwargs'} - all_pars = inspect.signature(func).parameters - tmp = {k: v.default for k, v in all_pars.items() if k not in taboo} - tmp_list = [k for k in all_pars.keys() if k not in taboo] - # set args by aligning tmp_list with arg values - for k, v in zip(tmp_list, args): - tmp[k] = v - # set kwargs - for k, v in kwargs.items(): - if k in tmp: - tmp[k] = v - - if self.store_args_kwargs: - if args: - tmp['args'] = args - if kwargs: - tmp['kwargs'] = {k: v for k, v in kwargs.items() if k not in taboo} - - if hasattr(self, '_init_kwargs_dict'): - self._init_kwargs_dict.update(tmp) - else: - self._init_kwargs_dict = tmp - convert_tuple_to_list(self._init_kwargs_dict) - f = func(self, *args, **kwargs) - return f - - return _arg_wrapper - - -class QuerySetReader: - """ - :class:`QuerySetReader` allows a driver to read arguments from the protobuf message. This allows a - driver to override its behavior based on the message it receives. Extremely useful in production, for example, - get ``top_k`` results, doing pagination, filtering. - - To register the field you want to read from the message, simply register them in :meth:`__init__`. - For example, ``__init__(self, arg1, arg2, **kwargs)`` will allow the driver to read field ``arg1`` and ``arg2`` from - the message. When they are not found in the message, the value ``_arg1`` and ``_arg2`` will be used. Note the underscore - prefix. - - .. note:: - - To set default value of ``arg1``, use ``self._arg1 =``, note the underscore in the front. - - To access ``arg1``, simply use ``self.arg1``. It automatically switch between default ``_arg1`` and ``arg1`` from the request. - - For successful value reading, the following condition must be met: - - - the ``name`` in the proto must match with the current class name - - the ``disabled`` field in the proto should not be ``False`` - - the ``priority`` in the proto should be strictly greater than the driver's priority (by default is 0) - - the field name must exist in proto's ``parameters`` - - .. warning:: - For the sake of cooperative multiple inheritance, do NOT implement :meth:`__init__` for this class - """ - - @property - def as_querylang(self): - """Render as QueryLang parameters. - - - .. # noqa: DAR201""" - parameters = { - name: getattr(self, name) for name in self._init_kwargs_dict.keys() - } - return QueryLang( - { - 'name': self.__class__.__name__, - 'priority': self._priority, - 'parameters': parameters, - } - ) - - def _get_parameter(self, key: str, default: Any): - if getattr(self, 'queryset', None): - for q in self.queryset: - if ( - not q.disabled - and self.__class__.__name__ == q.name - and q.priority > self._priority - and key in q.parameters - ): - ret = q.parameters[key] - return dict(ret) if isinstance(ret, Struct) else ret - return getattr(self, f'_{key}', default) - - def __getattr__(self, name: str): - # https://docs.python.org/3/reference/datamodel.html#object.__getattr__ - if name == '_init_kwargs_dict': - # raise attribute error to avoid recursive call - raise AttributeError - if name in self._init_kwargs_dict: - return self._get_parameter(name, default=self._init_kwargs_dict[name]) - raise AttributeError - - -class DriverType(type(JAMLCompatible), type): - """A meta class representing a Driver - - When a new Driver is created, it gets registered - """ - - def __new__(cls, *args, **kwargs): - """Create and register a new class with this meta class. - - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - :return: the newly registered class - """ - _cls = super().__new__(cls, *args, **kwargs) - return cls.register_class(_cls) - - @staticmethod - def register_class(cls): - """Register a class - - :param cls: the class - :return: the class, after being registered - """ - reg_cls_set = getattr(cls, '_registered_class', set()) - if cls.__name__ not in reg_cls_set or getattr(cls, 'force_register', False): - wrap_func(cls, ['__init__'], store_init_kwargs) - # wrap_func(cls, ['__call__'], as_reduce_method) - - reg_cls_set.add(cls.__name__) - setattr(cls, '_registered_class', reg_cls_set) - return cls - - -class BaseDriver(JAMLCompatible, metaclass=DriverType): - """A :class:`BaseDriver` is a logic unit above the :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`. - It reads the protobuf message, extracts/modifies the required information and then return - the message back to :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`. - - A :class:`BaseDriver` needs to be :attr:`attached` to a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` before - using. This is done by :func:`attach`. Note that a deserialized :class:`BaseDriver` from file is always unattached. - - :param priority: the priority of its default arg values (hardcoded in Python). If the - received ``QueryLang`` has a higher priority, it will override the hardcoded value - :param args: not used (kept to maintain interface) - :param kwargs: not used (kept to maintain interface) - """ - - store_args_kwargs = False #: set this to ``True`` to save ``args`` (in a list) and ``kwargs`` (in a map) in YAML config - - def __init__(self, priority: int = 0, *args, **kwargs): - self.attached = False # : represent if this driver is attached to a - # :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` (& :class:`jina.executors.BaseExecutor`) - self.runtime = None # type: Optional['ZEDRuntime'] - self._priority = priority - - def attach(self, runtime: 'ZEDRuntime', *args, **kwargs) -> None: - """Attach this driver to a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` - - :param runtime: the pea to be attached - :param args: not used (kept to maintain interface) - :param kwargs: not used (kept to maintain interface) - """ - self.runtime = runtime - self.attached = True - - @property - def req(self) -> 'Request': - """Get the current (typed) request, shortcut to ``self.runtime.request`` - - - .. # noqa: DAR201 - """ - return self.runtime.request - - @property - def partial_reqs(self) -> Sequence['Request']: - """The collected partial requests under the current ``request_id`` - - - .. # noqa: DAR401 - - - .. # noqa: DAR201 - """ - if self.expect_parts > 1: - return self.runtime.partial_requests - else: - raise ValueError( - f'trying to access all partial requests, ' - f'but {self.runtime} has only one message' - ) - - @property - def expect_parts(self) -> int: - """The expected number of partial messages - - - .. # noqa: DAR201 - """ - return self.runtime.expect_parts - - @property - def docs(self) -> 'DocumentArray': - """The DocumentArray after applying the traversal - - - .. # noqa: DAR201""" - from ..types.arrays import DocumentArray - - if self.expect_parts > 1: - return DocumentArray( - [d for r in reversed(self.partial_reqs) for d in r.docs] - ) - else: - return self.req.docs - - @property - def msg(self) -> 'Message': - """Get the current request, shortcut to ``self.runtime.message`` - - - .. # noqa: DAR201 - """ - return self.runtime.message - - @property - def queryset(self) -> 'QueryLangArray': - """ - - - .. # noqa: DAR101 - - - .. # noqa: DAR102 - - - .. # noqa: DAR201 - """ - if self.msg: - return self.msg.request.queryset - else: - return [] - - @property - def logger(self) -> 'JinaLogger': - """Shortcut to ``self.runtime.logger`` - - - .. # noqa: DAR201 - """ - return self.runtime.logger - - def __call__(self, *args, **kwargs) -> None: - """ - - - .. # noqa: DAR102 - - - .. # noqa: DAR101 - """ - raise NotImplementedError - - def __eq__(self, other): - return self.__class__ == other.__class__ - - def __getstate__(self) -> Dict[str, Any]: - """ - Unlike `Executor`, driver is stateless. - - Therefore, on every save, it creates a new & empty driver object and save it. - :return: the state in dict form - """ - - d = dict(self.__class__(**self._init_kwargs_dict).__dict__) - return d - - -class ContextAwareRecursiveMixin: - """ - The full data structure version of :class:`FlatRecursiveMixin`, to be mixed in with :class:`BaseRecursiveDriver`. - It uses :meth:`traverse` in :class:`DocumentArray` and allows direct manipulation of Chunk-/Match-/DocumentArrays. - - .. seealso:: - https://github.com/jina-ai/jina/issues/1932 - - """ - - def __call__(self, *args, **kwargs): - """Traverse with _apply_all - - :param args: args forwarded to ``_apply_all`` - :param kwargs: kwargs forwarded to ``_apply_all`` - """ - document_sets = self.docs.traverse(self._traversal_paths) - self._apply_all(document_sets, *args, **kwargs) - - def _apply_all( - self, - doc_sequences: Iterable['DocumentArray'], - *args, - **kwargs, - ) -> None: - """Apply function works on an Iterable of DocumentArray, modify the docs in-place. - - Each DocumentArray refers to a leaf (e.g. roots, matches or chunks wrapped - in a :class:`jina.DocumentArray`) in the traversal_paths. Modifications on the - DocumentArrays (e.g. adding or deleting Documents) are directly applied on the underlying objects. - Adding a chunk to a ChunkArray results in adding a chunk to the parent Document. - - :param doc_sequences: the Documents that should be handled - :param args: driver specific arguments, which might be forwarded to the Executor - :param kwargs: driver specific arguments, which might be forwarded to the Executor - """ - - -class FlatRecursiveMixin: - """ - The batch optimized version of :class:`ContextAwareRecursiveMixin`, to be mixed in with :class:`BaseRecursiveDriver`. - It uses :meth:`traverse_flattened_per_path` in :class:`DocumentArray` and yield much better performance - when no context is needed and batching is possible. - - .. seealso:: - https://github.com/jina-ai/jina/issues/1932 - - """ - - def __call__(self, *args, **kwargs): - """Traverse with _apply_all - - :param args: args forwarded to ``_apply_all`` - :param kwargs: kwargs forwarded to ``_apply_all`` - """ - path_documents = self.docs.traverse_flattened_per_path(self._traversal_paths) - for documents in path_documents: - if documents: - self._apply_all(documents, *args, **kwargs) - - def _apply_all( - self, - docs: 'DocumentArray', - *args, - **kwargs, - ) -> None: - """Apply function works on a list of docs, modify the docs in-place. - - The list refers to all reachable leaves of a single ``traversal_path``. - - :param docs: the Documents that should be handled - :param args: driver specific arguments, which might be forwarded to the Executor - :param kwargs: driver specific arguments, which might be forwarded to the Executor - - """ - - -class DocsExtractUpdateMixin: - """ - A Driver pattern for extracting attributes from Documents, feeding to an executor and updating the Documents with - the results. - - Drivers equipped with this mixin will have :method:`_apply_all` inherited. - - The :method:`_apply_all` implements the following logics: - - From ``docs``, it extracts the attributes defined :method:`exec_fn`'s arguments. - - It feeds the attributes to the bind executor's :method:`exec_fn`. - - It updates ``docs`` with results returned from :method:`exec_fn` - - The following shortcut logics are implemented: - - while extracting: attributes defined :method:`exec_fn`'s arguments are extracted from ``docs``; - - while extracting: attributes annotated with ``ndarray`` are stacked into Numpy NdArray objects; - - while updating: if ``exec_fn`` returns a List of Dict, then ``doc.set_attrs(**exec_result)`` is called; - - while updating: if ``exec_fn`` returns a Document, then ``doc.update(exec_result)` is called. - - while updating: if none of above applies, then calling :meth:`update_single_doc` - - To override the update behavior, you can choose to override: - - :meth:`update_docs` if you want to modify the behavior of updating docs in bulk - - :meth:`update_single_doc` if you want to modify the behavior of updating a single doc - """ - - @property - def _stack_document_content(self): - return self._exec_fn_required_keys_is_ndarray - - def _apply_all(self, docs: 'DocumentArray') -> None: - """Apply function works on a list of docs, modify the docs in-place. - - The list refers to all reachable leaves of a single ``traversal_path``. - - :param docs: the Documents that should be handled - """ - - contents, docs_pts = docs.extract_docs( - *self._exec_fn_required_keys, - stack_contents=self._stack_document_content, - ) - - if docs_pts: - if len(self._exec_fn_required_keys) > 1: - exec_results = self.exec_fn(*contents) - else: - exec_results = self.exec_fn(contents) - - if exec_results is not None: - # if exec_fn returns None then exec_fn is assumed to be immutable wrt. doc, hence skipped - - try: - len_results = len(exec_results) - except: - try: - len_results = exec_results.shape[0] - except: - len_results = None - - if len(docs_pts) != len_results: - msg = ( - f'mismatched {len(docs_pts)} docs from level {docs_pts[0].granularity} ' - f'and length of returned: {len_results}, their length must be the same' - ) - raise LengthMismatchException(msg) - - self.update_docs(docs_pts, exec_results) - - def update_docs( - self, - docs_pts: 'DocumentArray', - exec_results: Union[List[Dict], List['Document'], Any], - ) -> None: - """ - Update Documents with the Executor returned results. - - :param: docs_pts: the set of document to be updated - :param: exec_results: the results from :meth:`exec_fn` - """ - from ..types.document import Document - - if self._exec_fn_return_is_ndarray and not isinstance(exec_results, np.ndarray): - r_type = type(exec_results).__name__ - if r_type in {'EagerTensor', 'Tensor', 'list'}: - exec_results = np.array(exec_results, dtype=np.float32) - else: - raise TypeError(f'unrecognized type {exec_results!r}') - - for doc, exec_result in zip(docs_pts, exec_results): - if isinstance(exec_result, dict): - doc.set_attrs(**exec_result) - elif isinstance(exec_result, Document): - # doc id should not be override with this method - doc.update(exec_result, exclude_fields=('id',)) - else: - self.update_single_doc(doc, exec_result) - - def update_single_doc(self, doc: 'Document', exec_result: Any) -> None: - """Update a single Document with the Executor returned result. - - :param doc: the Document object - :param exec_result: the single result from :meth:`exec_fn` - """ - raise NotImplementedError - - @cached_property - def _exec_fn_required_keys(self) -> List[str]: - """Get the arguments of :attr:`exec_fn`. - - If ``strict_method_args`` set, then all arguments of :attr:`exec_fn` must be valid :class:`Document` attribute. - - :return: a list of supported arguments - """ - - if not self.exec_fn: - raise ValueError( - f'`exec_fn` is None, maybe {self} is not attached? call `self.attach`.' - ) - - required_keys = [ - k - for k in inspect.getfullargspec(inspect.unwrap(self.exec_fn)).args - if k != 'self' - ] - if not required_keys: - raise AttributeError(f'{self.exec_fn} takes no argument.') - - if not self._strict_method_args: - return required_keys - - from .. import Document - - support_keys = Document.get_all_attributes() - unrecognized_keys = set(required_keys).difference(support_keys) - - if not unrecognized_keys: - return required_keys - - from ..proto import jina_pb2 - - camel_keys = set(jina_pb2.DocumentProto().DESCRIPTOR.fields_by_camelcase_name) - legacy_keys = {'data'} - unrecognized_camel_keys = unrecognized_keys.intersection(camel_keys) - if unrecognized_camel_keys: - raise AttributeError( - f'{unrecognized_camel_keys} are supported but you give them in CamelCase, ' - f'please rewrite them in canonical form.' - ) - elif unrecognized_keys.intersection(legacy_keys): - raise AttributeError( - f'{unrecognized_keys.intersection(legacy_keys)} is now deprecated and not a valid argument of ' - 'the executor function, ' - 'please change `data` to `content: \'np.ndarray\'` in your executor function. ' - 'details: https://github.com/jina-ai/jina/pull/2313/' - ) - else: - raise AttributeError( - f'{unrecognized_keys} are invalid Document attributes, must come from {support_keys}' - ) - - return required_keys - - @cached_property - def _exec_fn_required_keys_is_ndarray(self) -> List[bool]: - """Return a list of boolean indicators for showing if a key is annotated as ndarray - - :return: a list of boolean idicator, True if the corresponding key is annotated as ndarray - """ - - try: - anno = typing.get_type_hints((inspect.unwrap(self.exec_fn))) - return [ - anno.get(k, None) == np.ndarray for k in self._exec_fn_required_keys - ] - except NameError: - return [False] * len(self._exec_fn_required_keys) - - @cached_property - def _exec_fn_return_is_ndarray(self) -> bool: - """Return a boolean value for showing if the return of :meth:`exec_fn` is annotated as `ndarray` - - :return: a bool indicator - """ - try: - return ( - typing.get_type_hints((inspect.unwrap(self.exec_fn))).get( - 'return', None - ) - == np.ndarray - ) - except NameError: - return False - - -class BaseRecursiveDriver(BaseDriver): - """A :class:`BaseRecursiveDriver` is an abstract Driver class containing information about the `traversal_paths` - that a `Driver` must apply its logic. - It is intended to be mixed in with either :class:`FlatRecursiveMixin` or :class:`ContextAwareRecursiveMixin` - """ - - def __init__(self, traversal_paths: Tuple[str] = ('c', 'r'), *args, **kwargs): - """Initialize a :class:`BaseRecursiveDriver` - - :param traversal_paths: Describes the leaves of the document tree on which _apply_all are called - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - super().__init__(*args, **kwargs) - self._traversal_paths = [path.lower() for path in traversal_paths] - - -class BaseExecutableDriver(BaseRecursiveDriver): - """A :class:`BaseExecutableDriver` is an intermediate logic unit between the :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` and :class:`jina.executors.BaseExecutor` - It reads the protobuf message, extracts/modifies the required information and then sends to the :class:`jina.executors.BaseExecutor`, - finally it returns the message back to :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`. - - A :class:`BaseExecutableDriver` needs to be :attr:`attached` to a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime` and :class:`jina.executors.BaseExecutor` before using. - This is done by :func:`attach`. Note that a deserialized :class:`BaseDriver` from file is always unattached. - """ - - def __init__( - self, - executor: Optional[str] = None, - method: Optional[str] = None, - strict_method_args: bool = True, - *args, - **kwargs, - ): - """Initialize a :class:`BaseExecutableDriver` - - :param executor: the name of the sub-executor, only necessary when :class:`jina.executors.compound.CompoundExecutor` is used - :param method: the function name of the executor that the driver feeds to - :param strict_method_args: if set, then the input args of ``executor.method`` must be valid :class:`Document` attributes - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - super().__init__(*args, **kwargs) - self._executor_name = executor - self._method_name = method - self._strict_method_args = strict_method_args - self._exec = None - self._exec_fn = None - - @property - def exec(self) -> 'AnyExecutor': - """the executor that to which the instance is attached - - - .. # noqa: DAR201 - """ - return self._exec - - @property - def exec_fn(self) -> Callable: - """the function of :func:`jina.executors.BaseExecutor` to call - - :return: the Callable to execute in the driver - """ - if not self.runtime: - return self._exec_fn - elif ( - not self.msg.is_error - or self.runtime.args.on_error_strategy < OnErrorStrategy.SKIP_EXECUTOR - ): - return self._exec_fn - else: - return lambda *args, **kwargs: None - - def attach( - self, executor: 'AnyExecutor', req_type: Optional[str] = None, *args, **kwargs - ) -> None: - """Attach the driver to a :class:`jina.executors.BaseExecutor` - - :param executor: the executor to which we attach - :param req_type: the request type to attach to - :param args: additional positional arguments for the call of super().attach() - :param kwargs: additional key value arguments for the call of super().attach() - """ - super().attach(*args, **kwargs) - if self._executor_name and isinstance(executor, CompoundExecutor): - if self._executor_name in executor: - self._exec = executor[self._executor_name] - else: - for c in executor.components: - if any( - t.__name__ == self._executor_name for t in type.mro(c.__class__) - ): - self._exec = c - break - if self._exec is None: - self.logger.critical( - f'fail to attach the driver to {executor}, ' - f'no executor is named or typed as {self._executor_name}' - ) - else: - self._exec = executor - - if not self._method_name: - decor_bindings = find_request_binding(self.exec.__class__) - if req_type: - canonic_name = _canonical_request_name(req_type) - if canonic_name in decor_bindings: - self._method_name = decor_bindings[canonic_name] - elif 'default' in decor_bindings: - self._method_name = decor_bindings['default'] - elif 'default' in decor_bindings: - self._method_name = decor_bindings['default'] - - if self._method_name: - self._exec_fn = getattr(self.exec, self._method_name) - - def __getstate__(self) -> Dict[str, Any]: - """Do not save the executor and executor function, as it would be cross-referencing and unserializable. - In other words, a deserialized :class:`BaseExecutableDriver` from file is always unattached. - - :return: dictionary of state - """ - d = super().__getstate__() - if '_exec' in d: - del d['_exec'] - if '_exec_fn' in d: - del d['_exec_fn'] - return d diff --git a/jina/drivers/cache.py b/jina/drivers/cache.py deleted file mode 100644 index c257dada7396a..0000000000000 --- a/jina/drivers/cache.py +++ /dev/null @@ -1,94 +0,0 @@ -"""Module for the Drivers for the Cache.""" -import hashlib -from typing import Any, Dict, List - -from .index import BaseIndexDriver - -# noinspection PyUnreachableCode -if False: - from .. import Document - from ..types.arrays import DocumentArray - - -class BaseCacheDriver(BaseIndexDriver): - """A driver related to :class:`BaseCache`. - - :param with_serialization: feed serialized Document to the CacheIndexer - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, with_serialization: bool = False, *args, **kwargs): - self.with_serialization = with_serialization - super().__init__(*args, **kwargs) - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - if self._method_name == 'update': - values = [BaseCacheDriver.hash_doc(d, self.exec.fields) for d in docs] - self.exec_fn([d.id for d in docs], values) - else: - for d in docs: - value = BaseCacheDriver.hash_doc(d, self.exec.fields) - result = self.exec[value] - if result: - self.on_hit(d, result) - else: - self.on_miss(d, value) - - def on_miss(self, req_doc: 'Document', value: bytes) -> None: - """Call when document is missing. - - The default behavior is to add to cache when miss. - - :param req_doc: the document in the request but missed in the cache - :param value: the data besides the `req_doc.id` to be passed through to the executors - """ - if self.with_serialization: - self.exec_fn([req_doc.id], req_doc.SerializeToString(), [value]) - else: - self.exec_fn([req_doc.id], [value]) - - def on_hit(self, req_doc: 'Document', hit_result: Any) -> None: - """Call when cache is hit for a document. - - :param req_doc: the document in the request and hit in the cache - :param hit_result: the hit result returned by the cache - """ - pass - - @staticmethod - def hash_doc(doc: 'Document', fields: List[str]) -> bytes: - """Calculate hash by which we cache. - - :param doc: the Document - :param fields: the list of fields - :return: the hash value of the fields - """ - values = doc.get_attrs(*fields).values() - data = '' - for field, value in zip(fields, values): - data += f'{field}:{value};' - digest = hashlib.sha256(bytes(data.encode('utf8'))).digest() - return digest - - -class TaggingCacheDriver(BaseCacheDriver): - """A driver for labelling the hit-cache docs with certain tags.""" - - def __init__(self, tags: Dict, *args, **kwargs): - """Create a new TaggingCacheDriver. - - :param tags: the tags to be updated on hit docs - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - super().__init__(*args, **kwargs) - self._tags = tags - - def on_hit(self, req_doc: 'Document', hit_result: Any) -> None: - """Call when cache is hit for a document. - - :param req_doc: the document requested - :param hit_result: the result of the hit - """ - req_doc.tags.update(self._tags) diff --git a/jina/drivers/control.py b/jina/drivers/control.py deleted file mode 100644 index 93e552581f231..0000000000000 --- a/jina/drivers/control.py +++ /dev/null @@ -1,238 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import re -import time - -from google.protobuf.json_format import MessageToJson - -from . import BaseDriver -from ..excepts import UnknownControlCommand, RuntimeTerminated -from ..proto import jina_pb2 -from ..types.querylang.queryset.dunderkey import dunder_get - - -class BaseControlDriver(BaseDriver): - """Control driver does not have access to the executor and it - often works directly with protobuf layer instead Jina primitive types""" - - @property - def envelope(self) -> 'jina_pb2.EnvelopeProto': - """Get the current request, shortcut to ``self.runtime.message`` - - - .. # noqa: DAR201 - """ - return self.msg.envelope - - -class LogInfoDriver(BaseControlDriver): - """ - Log output the request info - - :param key: (str) that represents a first level or nested key in the dict - :param json: (bool) indicating if the log output should be formatted as json - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, key: str = 'request', json: bool = True, *args, **kwargs): - super().__init__(*args, **kwargs) - self.key = key - self.json = json - - def __call__(self, *args, **kwargs): - """Log the information. - - :param args: unused - :param kwargs: unused - """ - data = dunder_get(self.msg.proto, self.key) - if self.json: - self.logger.info(MessageToJson(data)) - else: - self.logger.info(data) - - -class WaitDriver(BaseControlDriver): - """Wait for some seconds, mainly for demo purpose""" - - def __call__(self, *args, **kwargs): - """Wait for some seconds, mainly for demo purpose - - - .. # noqa: DAR101 - """ - time.sleep(5) - - -class ControlReqDriver(BaseControlDriver): - """Handling the control request, by default it is installed for all :class:`jina.peapods.peas.BasePea`""" - - def __call__(self, *args, **kwargs): - """Handle the request controlling. - - :param args: unused - :param kwargs: unused - """ - if self.req.command == 'TERMINATE': - self.envelope.status.code = jina_pb2.StatusProto.SUCCESS - raise RuntimeTerminated - elif self.req.command == 'STATUS': - self.envelope.status.code = jina_pb2.StatusProto.READY - self.req.args = vars(self.runtime.args) - elif self.req.command == 'IDLE': - pass - elif self.req.command == 'CANCEL': - pass - elif self.req.command == 'DUMP': - self._dump() - elif self.req.command == 'RELOAD': - self._reload() - elif self.req.command == 'ACTIVATE': - # TODO (Joan): This is a hack, but I checked in devel-2.0 branch, this _handle_control_req will be moved into the `ZedRuntime` so this code - # aligns very well with that view - self.runtime._zmqlet._send_idle_to_router() - elif self.req.command == 'DEACTIVATE': - # TODO (Joan): This is a hack, but I checked in devel-2.0 branch, this _handle_control_req will be moved into the `ZedRuntime` so this code - # aligns very well with that view - self.runtime._zmqlet._send_cancel_to_router() - else: - raise UnknownControlCommand(f'don\'t know how to handle {self.req.command}') - - def _reload(self): - # TODO should this be removed, since we now have proper rolling update? - if self.req.targets and self.runtime.__class__.__name__ == 'ZEDRuntime': - patterns = self.req.targets - if isinstance(patterns, str): - patterns = [patterns] - for p in patterns: - if re.match(p, self.runtime.name): - self.logger.info( - f'reloading the Executor `{self.runtime._executor.name}` in `{self.runtime.name}`' - ) - self.runtime._load_executor() - break - - def _dump(self): - # TODO(Cristian): this is a smell, since we are accessing the private _executor - # to be reconsidered after the Executor API refactoring - if self.req.targets and self.runtime.__class__.__name__ == 'ZEDRuntime': - patterns = self.req.targets - if isinstance(patterns, str): - patterns = [patterns] - for p in patterns: - if re.match(p, self.runtime.name): - self.logger.info( - f'Dumping from Executor `{self.runtime._executor.name}` in `{self.runtime.name}`' - ) - req_dict = dict(self.req.args) - self.runtime._executor.dump( - req_dict.get('dump_path'), int(req_dict.get('shards')) - ) - break - - -class RouteDriver(ControlReqDriver): - """Ensures that data requests are forwarded to the downstream `:class:`BasePea` ensuring - that the load is balanced between parallel `:class:`BasePea` if the scheduling `:class:`SchedulerType` is LOAD_BALANCE. - - .. note:: - - The dealer never receives a control request from the router, - every time it finishes a job and sends via out_sock, it returns the envelope with control - request idle back to the router. The dealer also sends control request idle to the router - when it first starts. - - - The router receives requests from both dealer and upstream pusher. - if it is an upstream request, use LB to schedule the receiver, - mark it in the envelope if it is a control request in - - :param raise_no_dealer: raise a RuntimeError when no available dealer - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, raise_no_dealer: bool = False, *args, **kwargs): - super().__init__(*args, **kwargs) - self.idle_dealer_ids = set() - self.is_polling_paused = False - self.raise_no_dealer = raise_no_dealer - - def __call__(self, *args, **kwargs): - """Perform the routing. - - :param args: additional positional arguments which are just used for calling the parent - :param kwargs: additional key value arguments which are just used for calling the parent - - - .. # noqa: DAR401 - """ - if self.msg.is_data_request: - self.logger.debug(self.idle_dealer_ids) - if self.idle_dealer_ids: - dealer_id = self.idle_dealer_ids.pop() - self.envelope.receiver_id = dealer_id - if not self.idle_dealer_ids: - self.runtime._zmqlet.pause_pollin() - self.is_polling_paused = True - elif self.raise_no_dealer: - raise RuntimeError( - 'if this router connects more than one dealer, ' - 'then this error should never be raised. often when it ' - 'is raised, some Pods must fail to start, so please go ' - 'up and check the first error message in the log' - ) - # else: - # this FALLBACK to trivial message pass - # - # Explanation on the logic here: - # there are two cases that when `idle_dealer_ids` is empty - # (1) this driver is used in a PUSH-PULL fan-out setting, - # where no dealer is registered in the first place, so `idle_dealer_ids` is empty - # all the time - # (2) this driver is used in a ROUTER-DEALER fan-out setting, - # where some dealer is broken/fails to start, so `idle_dealer_ids` is empty - # IDLE requests add the dealer id to the router. Therefore, it knows which dealer would be available for - # new data requests. - # CANCEL requests remove the dealer id from the router. Therefore, it can not send any more data requests - # to the dealer. - elif self.req.command == 'IDLE': - self.idle_dealer_ids.add(self.envelope.receiver_id) - self.logger.debug( - f'{self.envelope.receiver_id} is idle, now I know these idle peas {self.idle_dealer_ids}' - ) - if self.is_polling_paused: - self.runtime._zmqlet.resume_pollin() - self.is_polling_paused = False - elif self.req.command == 'CANCEL': - if self.envelope.receiver_id in self.idle_dealer_ids: - self.idle_dealer_ids.remove(self.envelope.receiver_id) - self.logger.debug( - f'{self.envelope.receiver_id} is cancelled, now I know these idle peas {self.idle_dealer_ids}' - ) - else: - super().__call__(*args, **kwargs) - - -class ForwardDriver(RouteDriver): - """Alias to :class:`RouteDriver`""" - - -class WhooshDriver(BaseControlDriver): - """Play a whoosh! sound""" - - def __call__(self, *args, **kwargs): - """Play a whoosh sound, used in 2021 April fools day - - .. # noqa: DAR101 - """ - import subprocess - from pkg_resources import resource_filename - - whoosh_mp3 = resource_filename( - 'jina', '/'.join(('resources', 'soundfx', 'whoosh.mp3')) - ) - - subprocess.Popen( - f'ffplay -nodisp -autoexit {whoosh_mp3} >/dev/null 2>&1', shell=True - ) diff --git a/jina/drivers/convert.py b/jina/drivers/convert.py deleted file mode 100644 index 07d84f07d2a88..0000000000000 --- a/jina/drivers/convert.py +++ /dev/null @@ -1,96 +0,0 @@ -from ..drivers import FlatRecursiveMixin, BaseRecursiveDriver - -if False: - from ..types.arrays import DocumentArray - - -class ConvertDriver(FlatRecursiveMixin, BaseRecursiveDriver): - """Drivers that make sure that specific conversions are applied to the documents. - - .. note:: - The list of functions that can be applied can be found in `:class:`Document` - """ - - def __init__(self, convert_fn: str, *args, **kwargs): - """ - :param convert_fn: the method name from `:class:`Document` to be applied - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: the set of named arguments to be passed to `convert_fn` - """ - super().__init__(*args, **kwargs) - self._convert_fn = convert_fn - self._convert_fn_kwargs = kwargs - - def _apply_all( - self, - docs: 'DocumentArray', - *args, - **kwargs, - ) -> None: - for d in docs: - getattr(d, self._convert_fn)(**self._convert_fn_kwargs) - - -class URI2Buffer(ConvertDriver): - """Driver to convert URI to buffer""" - - def __init__(self, convert_fn: str = 'convert_uri_to_buffer', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) - - -class URI2DataURI(ConvertDriver): - """Driver to convert URI to data URI""" - - def __init__(self, convert_fn: str = 'convert_uri_to_data_uri', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) - - -class Buffer2URI(ConvertDriver): - """Driver to convert buffer to URI""" - - def __init__(self, convert_fn: str = 'convert_buffer_to_uri', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) - - -class BufferImage2Blob(ConvertDriver): - """Driver to convert image buffer to blob""" - - def __init__( - self, convert_fn: str = 'convert_buffer_image_to_blob', *args, **kwargs - ): - super().__init__(convert_fn, *args, **kwargs) - - -class URI2Blob(ConvertDriver): - """Driver to convert URI to blob""" - - def __init__(self, convert_fn: str = 'convert_uri_to_blob', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) - - -class DataURI2Blob(ConvertDriver): - """Driver to convert Data URI to image blob""" - - def __init__(self, convert_fn: str = 'convert_data_uri_to_blob', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) - - -class Text2URI(ConvertDriver): - """Driver to convert text to URI""" - - def __init__(self, convert_fn: str = 'convert_text_to_uri', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) - - -class URI2Text(ConvertDriver): - """Driver to convert URI to text""" - - def __init__(self, convert_fn: str = 'convert_uri_to_text', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) - - -class Blob2PngURI(ConvertDriver): - """Driver to convert blob to URI""" - - def __init__(self, convert_fn: str = 'convert_blob_to_uri', *args, **kwargs): - super().__init__(convert_fn, *args, **kwargs) diff --git a/jina/drivers/craft.py b/jina/drivers/craft.py deleted file mode 100644 index 1892cc18e4d65..0000000000000 --- a/jina/drivers/craft.py +++ /dev/null @@ -1,19 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Optional - -from . import FlatRecursiveMixin, BaseExecutableDriver, DocsExtractUpdateMixin - - -class CraftDriver(DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver): - """Drivers inherited from this Driver will bind :meth:`craft` by default """ - - def __init__( - self, executor: Optional[str] = None, method: str = 'craft', *args, **kwargs - ): - super().__init__(executor, method, *args, **kwargs) - - @property - def _stack_document_content(self): - return False diff --git a/jina/drivers/debug.py b/jina/drivers/debug.py deleted file mode 100644 index 48595acbef209..0000000000000 --- a/jina/drivers/debug.py +++ /dev/null @@ -1,64 +0,0 @@ -import os - -import numpy as np - -from jina.drivers import FlatRecursiveMixin, BaseRecursiveDriver -from jina.importer import ImportExtensions - -if False: - # noinspection PyUnreachableCode - from jina import DocumentArray - - -class PngToDiskDriver(FlatRecursiveMixin, BaseRecursiveDriver): - """A driver that can store an intermediate representation of a png in the workspace, under a given folder. - - Useful for debugging Crafters in the Flow - - :param workspace: the folder where we store the pngs - :param prefix: the subfolder to add to workspace - :param top: limit the pngs to first N - """ - - def __init__(self, workspace, prefix='', top=10, *args, **kwargs): - self.prefix = prefix - self.top = top - self.done = 0 - self.workspace = workspace - self.folder = os.path.join(self.workspace, self.prefix) - if not os.path.exists(self.folder): - os.makedirs(self.folder) - super().__init__(*args, **kwargs) - - def _apply_all( - self, - docs: 'DocumentArray', - *args, - **kwargs, - ) -> None: - def _move_channel_axis( - img: 'np.ndarray', channel_axis_to_move: int, target_channel_axis: int = -1 - ) -> 'np.ndarray': - if channel_axis_to_move == target_channel_axis: - return img - return np.moveaxis(img, channel_axis_to_move, target_channel_axis) - - def _load_image(blob: 'np.ndarray', channel_axis: int): - with ImportExtensions( - required=True, - pkg_name='Pillow', - verbose=True, - logger=self.logger, - help_text='PIL is missing. Install it with `pip install Pillow`', - ): - from PIL import Image - - img = _move_channel_axis(blob, channel_axis) - return Image.fromarray(img.astype('uint8')) - - for d in docs: - if self.done < self.top: - img = _load_image(d.blob, -1) - path = os.path.join(self.folder, f'{self.done}.png') - img.save(path) - self.done += 1 diff --git a/jina/drivers/delete.py b/jina/drivers/delete.py deleted file mode 100644 index 8ca350da7c545..0000000000000 --- a/jina/drivers/delete.py +++ /dev/null @@ -1,24 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Optional - -from . import BaseExecutableDriver - - -class DeleteDriver(BaseExecutableDriver): - """Drivers inherited from this Driver will bind :meth:`delete` by default """ - - def __init__( - self, executor: Optional[str] = None, method: str = 'delete', *args, **kwargs - ): - super().__init__(executor, method, *args, **kwargs) - - def __call__(self, *args, **kwargs): - """ - Call base executable driver on document ids for deletion. - - :param args: unused - :param kwargs: unused - """ - self.exec_fn(self.req.ids) diff --git a/jina/drivers/dump.py b/jina/drivers/dump.py deleted file mode 100644 index b56707d33f32f..0000000000000 --- a/jina/drivers/dump.py +++ /dev/null @@ -1,28 +0,0 @@ -from typing import Optional - -from jina.drivers import BaseExecutableDriver - - -class DumpDriver(BaseExecutableDriver): - """A Driver that calls the dump method of the Executor - - :param executor: the executor to which we attach the driver - :param args: passed to super().__init__ - :param kwargs: passed to super().__init__ - """ - - def __init__( - self, - executor: Optional[str] = None, - *args, - **kwargs, - ): - super().__init__(executor, 'dump', *args, **kwargs) - - def __call__(self, *args, **kwargs): - """Call the Dump method of the Indexer to which the Driver is attached - - :param args: passed to the exec_fn - :param kwargs: passed to the exec_fn - """ - self.exec_fn(self.req.path, self.req.shards, *args, **kwargs) diff --git a/jina/drivers/encode.py b/jina/drivers/encode.py deleted file mode 100644 index 25e259fe2bba5..0000000000000 --- a/jina/drivers/encode.py +++ /dev/null @@ -1,52 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Optional, Any, Union - -from . import BaseExecutableDriver, FlatRecursiveMixin, DocsExtractUpdateMixin - -# noinspection PyUnreachableCode -if False: - from .. import Document, DocumentArray, NdArray - import numpy as np - from ..proto import jina_pb2 - - -class BaseEncodeDriver(BaseExecutableDriver): - """Drivers inherited from this Driver will bind :meth:`encode` by default """ - - def __init__( - self, executor: Optional[str] = None, method: str = 'encode', *args, **kwargs - ): - super().__init__(executor, method, *args, **kwargs) - - -class EncodeDriver(DocsExtractUpdateMixin, FlatRecursiveMixin, BaseEncodeDriver): - """Extract the content from documents and call executor and do encoding""" - - def update_single_doc( - self, - doc: 'Document', - exec_result: Union['np.ndarray', 'jina_pb2.NdArrayProto', 'NdArray'], - ) -> None: - """Update the document embedding with returned ndarray result - - :param doc: the Document object - :param exec_result: the single result from :meth:`exec_fn` - """ - doc.embedding = exec_result - - -class ScipySparseEncodeDriver( - DocsExtractUpdateMixin, FlatRecursiveMixin, BaseEncodeDriver -): - """Extract the content from documents and call executor and do encoding""" - - def update_docs(self, docs_pts: 'DocumentArray', exec_results: Any) -> None: - """Update the document embedding with returned sparse matrix - - :param: docs_pts: the set of document to be updated - :param: exec_results: the results from :meth:`exec_fn` - """ - for idx, doc in enumerate(docs_pts): - doc.embedding = exec_results.getrow(idx) diff --git a/jina/drivers/evaluate.py b/jina/drivers/evaluate.py deleted file mode 100644 index c3ec7a32527c9..0000000000000 --- a/jina/drivers/evaluate.py +++ /dev/null @@ -1,229 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Any, Iterator, Optional, Tuple, Union - -import numpy as np - -from . import BaseExecutableDriver -from ..types.querylang.queryset.dunderkey import dunder_get -from .search import KVSearchDriver -from ..types.document import Document -from ..types.document.helper import DocGroundtruthPair -from ..helper import deprecated_alias -from ..types.arrays.doc_groundtruth import DocumentGroundtruthSequence - - -class BaseEvaluateDriver(BaseExecutableDriver): - """The Base Driver for evaluation operations. - - .. warning:: - - When ``running_avg=True``, then the running mean is returned. So far at Jina 0.8.10, - there is no way to reset the running statistics. If you have a query Flow running multiple queries, - you may want to make sure the running statistics is meaningful across multiple runs. - - :param executor: the name of the sub-executor, only necessary when :class:`jina.executors.compound.CompoundExecutor` is used - :param method: the function name of the executor that the driver feeds to - :param running_avg: always return running average instead of value of the current run - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__( - self, - executor: Optional[str] = None, - method: str = 'evaluate', - running_avg: bool = False, - *args, - **kwargs, - ): - super().__init__(executor, method, *args, **kwargs) - self._running_avg = running_avg - - def __call__(self, *args, **kwargs): - """Load the ground truth pairs. - - :param args: args for _traverse_apply - :param kwargs: kwargs for _traverse_apply - """ - docs_groundtruths = DocumentGroundtruthSequence( - [ - DocGroundtruthPair(doc, groundtruth) - for doc, groundtruth in zip(self.req.docs, self.req.groundtruths) - ] - ) - traversal_result = docs_groundtruths.traverse_flatten(self._traversal_paths) - self._apply_all(traversal_result, *args, **kwargs) - - def _apply_all(self, docs: Iterator['DocGroundtruthPair'], *args, **kwargs) -> None: - for doc_groundtruth in docs: - doc = doc_groundtruth.doc - groundtruth = doc_groundtruth.groundtruth - evaluation = doc.evaluations.add() - evaluation.value = self.exec_fn( - self.extract(doc), self.extract(groundtruth) - ) - if self._running_avg: - evaluation.value = self.exec.mean - - if getattr(self.exec, 'eval_at', None): - evaluation.op_name = ( - f'{self.exec.__class__.__name__}@{self.exec.eval_at}' - ) - else: - evaluation.op_name = self.exec.__class__.__name__ - evaluation.ref_id = groundtruth.id - - def extract(self, doc: 'Document') -> Any: - """Extract the to-be-evaluated field from the document. - - Drivers inherit from :class:`BaseEvaluateDriver` must implement this method. - This function will be invoked two times in :meth:`_apply_all`: - once with actual doc, once with groundtruth doc. - - .. # noqa: DAR401 - :param doc: the Document - """ - raise NotImplementedError - - -class FieldEvaluateDriver(BaseEvaluateDriver): - """ - Evaluate on the values from certain field, the extraction is implemented with :meth:`dunder_get`. - - :param field: the field name to be extracted from the Protobuf. - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, field: str, *args, **kwargs): - super().__init__(*args, **kwargs) - self.field = field - - def extract(self, doc: 'Document') -> Any: - """Extract the field from the Document. - - :param doc: the Document - :return: the data in the field - """ - return dunder_get(doc, self.field) - - -class RankEvaluateDriver(BaseEvaluateDriver): - """Drivers used to pass `matches` from documents and groundtruths to an executor and add the evaluation value. - - - Example fields: - ['tags__id', 'score__value] - - :param fields: the fields names to be extracted from the Protobuf. - The differences with `:class:FieldEvaluateDriver` are: - - More than one field is allowed. For instance, for NDCGComputation you may need to have both `ID` and `Relevance` information. - - The fields are extracted from the `matches` of the `Documents` and the `Groundtruth` so it returns a sequence of values. - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - @deprecated_alias(field=('fields', 0)) - def __init__( - self, - fields: Union[str, Tuple[str]] = ( - 'id', - ), # str maintained for backwards compatibility - *args, - **kwargs, - ): - super().__init__(*args, **kwargs) - self.fields = fields - - @property - def single_field(self): - """ - Get single field. - - Property to guarantee compatibility when only one field is provided either as a string or as a unit length tuple. - - :return: a list of fields - """ - if isinstance(self.fields, str): - return self.fields - elif len(self.fields) == 1: - return self.fields[0] - - def extract(self, doc: 'Document'): - """ - Extract values of the matches from documents with fields as keys. - - :param doc: Documents to be extracted. - :return: a list of tuples consisting of the values from the fields. - """ - single_field = self.single_field - if single_field: - r = [dunder_get(x, single_field) for x in doc.matches] - # TODO: Clean this, optimization for `hello-world` because it passes a list of 6k elements in a single - # match. See `pseudo_match` in helloworld/helper.py _get_groundtruths - ret = list(np.array(r).flat) - else: - ret = [ - tuple(dunder_get(x, field) for field in self.fields) - for x in doc.matches - ] - - return ret - - -class NDArrayEvaluateDriver(FieldEvaluateDriver): - """Drivers used to pass `embedding` from documents and groundtruths to an executor and add the evaluation value. - - .. note:: - - Valid fields: - ['blob', 'embedding'] - - """ - - def __init__(self, field: str = 'embedding', *args, **kwargs): - super().__init__(field, *args, **kwargs) - - -class TextEvaluateDriver(FieldEvaluateDriver): - """Drivers used to pass a content field from documents and groundtruths to an executor and add the evaluation value. - - .. note:: - - Valid fields: - ['id', 'level_name', 'parent_id', 'text', 'mime_type', 'uri', 'modality'] - """ - - def __init__(self, field: str = 'text', *args, **kwargs): - super().__init__(field, *args, **kwargs) - - -class LoadGroundTruthDriver(KVSearchDriver): - """Driver used to search for the `document key` in a KVIndex to find the corresponding groundtruth. - (This driver does not use the `recursive structure` of jina Documents, and will not consider the `traversal_path` argument. - It only retrieves `groundtruth` taking documents at root as key) - This driver's job is to fill the `request` groundtruth with the corresponding groundtruth for each document if found in the corresponding KVIndexer. - - .. warning:: - The documents that are not found to have an indexed groundtruth are removed from the `request` so that the `Evaluator` only - works with documents which have groundtruth. - """ - - def __call__(self, *args, **kwargs): - """Load the ground truth. - - :param args: unused - :param kwargs: unused - """ - miss_idx = ( - [] - ) #: missed hit results, some documents may not have groundtruth and thus will be removed - serialized_groundtruths = self.exec_fn([d.id for d in self.docs]) - for idx, serialized_groundtruth in enumerate(serialized_groundtruths): - if serialized_groundtruth: - self.req.groundtruths.append(Document(serialized_groundtruth)) - else: - miss_idx.append(idx) - - # delete non-existed matches in reverse - for j in reversed(miss_idx): - del self.docs[j] diff --git a/jina/drivers/generic.py b/jina/drivers/generic.py deleted file mode 100644 index 56113d5b31805..0000000000000 --- a/jina/drivers/generic.py +++ /dev/null @@ -1,15 +0,0 @@ -from typing import Optional - -from . import DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver - - -class GenericExecutorDriver( - DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver -): - """Generic driver that uses extract-apply-update pattern. It automatically binds to the method - decorated with `@request`.""" - - def __init__( - self, executor: Optional[str] = None, method: str = '', *args, **kwargs - ): - super().__init__(executor, method, *args, **kwargs) diff --git a/jina/drivers/index.py b/jina/drivers/index.py deleted file mode 100644 index 115d09eb7610c..0000000000000 --- a/jina/drivers/index.py +++ /dev/null @@ -1,99 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Iterable, Optional - -from . import BaseExecutableDriver, FlatRecursiveMixin -from .. import Document -from ..enums import EmbeddingClsType - -if False: - from ..types.arrays import DocumentArray - - -class BaseIndexDriver(FlatRecursiveMixin, BaseExecutableDriver): - """Drivers inherited from this Driver will bind :meth:`add` by default """ - - def __init__( - self, executor: Optional[str] = None, method: str = 'add', *args, **kwargs - ): - super().__init__(executor, method, *args, **kwargs) - - def check_key_length(self, val: Iterable[str]): - """ - Check if the max length of val(e.g. doc id) is larger than key_length. - - :param val: The values to be checked - """ - m_val = max(len(v) for v in val) - if m_val > self.exec.key_length: - raise ValueError( - f'{self.exec} allows only keys of length {self.exec.key_length}, ' - f'but yours is {m_val}.' - ) - - -class VectorIndexDriver(BaseIndexDriver): - """Extracts embeddings and ids from the documents and forwards them to the executor. - In case `method` is 'delete', the embeddings are ignored. - If `method` is not 'delete', documents without content are filtered out. - """ - - @property - def exec_embedding_cls_type(self) -> EmbeddingClsType: - """Get the sparse class type of the attached executor. - - :return: Embedding class type of the attached executor, default value is `dense` - """ - return EmbeddingClsType.from_string(self.exec.embedding_cls_type) - - def _get_documents_embeddings(self, docs: 'DocumentArray'): - embedding_cls_type = self.exec_embedding_cls_type - if embedding_cls_type.is_dense: - return docs.all_embeddings - else: - return docs.get_all_sparse_embeddings(embedding_cls_type=embedding_cls_type) - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - embed_vecs, docs_pts = self._get_documents_embeddings(docs) - if docs_pts: - keys = [doc.id for doc in docs_pts] - self.check_key_length(keys) - self.exec_fn(keys, embed_vecs) - - -class KVIndexDriver(BaseIndexDriver): - """Forwards pairs of serialized documents and ids to the executor.""" - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - info = [(doc.id, doc.SerializeToString()) for doc in docs] - if info: - keys, values = zip(*info) - self.check_key_length(keys) - self.exec_fn(keys, values) - - -class DBMSIndexDriver(BaseIndexDriver): - """Forwards ids, vectors, serialized Document to a BaseDBMSIndexer""" - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - info = [ - ( - doc.id, - doc.embedding, - DBMSIndexDriver._doc_without_embedding(doc).SerializeToString(), - ) - for doc in docs - ] - if info: - ids, vecs, metas = zip(*info) - self.check_key_length(ids) - self.exec_fn(ids, vecs, metas) - - @staticmethod - def _doc_without_embedding(d): - from .. import Document - - new_doc = Document(d, copy=True) - new_doc.ClearField('embedding') - return new_doc diff --git a/jina/drivers/multimodal.py b/jina/drivers/multimodal.py deleted file mode 100644 index 6733f0da02594..0000000000000 --- a/jina/drivers/multimodal.py +++ /dev/null @@ -1,106 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from collections import defaultdict -from typing import Tuple, Dict, List - -import numpy as np - -from . import FlatRecursiveMixin -from .encode import BaseEncodeDriver -from ..types.document.multimodal import MultimodalDocument - -if False: - from ..types.arrays import DocumentArray - - -class MultiModalDriver(FlatRecursiveMixin, BaseEncodeDriver): - """Extract multimodal embeddings from different modalities. - - Input-Output :: - - Input: - document: - |- chunk: {modality: mode1} - | - |- chunk: {modality: mode2} - Output: - document: (embedding: multimodal encoding) - |- chunk: {modality: mode1} - | - |- chunk: {modality: mode2} - - .. note:: - - - It traverses on the ``documents`` for which we want to apply the ``multimodal`` embedding. This way - - we can use the `batching` capabilities for the `executor`. - - .. warning:: - - It assumes that every ``chunk`` of a ``document`` belongs to a different modality. - """ - - def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - - @property - def positional_modality(self) -> List[str]: - """Get position per modality. - :return: the list of strings representing the name and order of the modality. - """ - if not self._exec.positional_modality: - raise RuntimeError( - 'Could not know which position of the ndarray to load to each modality' - ) - return self._exec.positional_modality - - def _get_executor_input_arguments( - self, content_by_modality: Dict[str, 'np.ndarray'] - ) -> List['np.ndarray']: - """From a dictionary ``content_by_modality`` it returns the arguments in the proper order so that they can be - passed to the executor. - - :param content_by_modality: a dictionary of `Document content` by modality name - :return: list of input arguments as np arrays - """ - return [content_by_modality[modality] for modality in self.positional_modality] - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - """Apply the driver to each of the Documents in docs. - - :param docs: the docs for which a ``multimodal embedding`` will be computed, whose chunks are of different - :param args: unused - :param kwargs: unused - """ - content_by_modality = defaultdict( - list - ) # array of num_rows equal to num_docs and num_columns equal to - - valid_docs = [] - for doc in docs: - # convert to MultimodalDocument - doc = MultimodalDocument(doc) - if doc.modality_content_map: - valid_docs.append(doc) - for modality in self.positional_modality: - content_by_modality[modality].append(doc[modality]) - else: - self.logger.warning( - f'Invalid doc {doc.id}. Only one chunk per modality is accepted' - ) - - if len(valid_docs) > 0: - # Pass a variable length argument (one argument per array) - for modality in self.positional_modality: - content_by_modality[modality] = np.stack(content_by_modality[modality]) - - # Guarantee that the arguments are provided to the executor in its desired order - input_args = self._get_executor_input_arguments(content_by_modality) - embeds = self.exec_fn(*input_args) - if len(valid_docs) != embeds.shape[0]: - self.logger.error( - f'mismatched {len(valid_docs)} docs from level {valid_docs[0].granularity} ' - f'and a {embeds.shape} shape embedding, the first dimension must be the same' - ) - for doc, embedding in zip(valid_docs, embeds): - doc.embedding = embedding diff --git a/jina/drivers/predict.py b/jina/drivers/predict.py deleted file mode 100644 index 6e46caffdd735..0000000000000 --- a/jina/drivers/predict.py +++ /dev/null @@ -1,186 +0,0 @@ -from typing import List, Any, Union, Optional - -import numpy as np - -from . import BaseExecutableDriver, FlatRecursiveMixin, DocsExtractUpdateMixin -from ..helper import typename - -if False: - from .. import DocumentArray, Document, NdArray - from ..proto import jina_pb2 - - -class BasePredictDriver( - DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver -): - """Drivers inherited from :class:`BasePredictDriver` will bind :meth:`predict` by default - - :param fields: name of fields to be used to predict tags, default "embeddings" - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__( - self, - executor: Optional[str] = None, - method: str = 'predict', - *args, - **kwargs, - ): - super().__init__(executor, method, *args, **kwargs) - - -class BaseLabelPredictDriver(BasePredictDriver): - """Base class of a Driver for label prediction. - - :param output_tag: output label will be written to ``doc.tags`` - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, output_tag: str = 'prediction', *args, **kwargs): - super().__init__(*args, **kwargs) - self.output_tag = output_tag - - def update_docs(self, docs_pts: 'DocumentArray', exec_results: Any): - """Update doc tags attribute with executor's return - - :param: docs_pts: the set of document to be updated - :param: exec_results: the results from :meth:`exec_fn` - """ - labels = self.prediction2label( - exec_results - ) # type: List[Union[str, List[str]]] - for doc, label in zip(docs_pts, labels): - doc.tags[self.output_tag] = label - - def prediction2label(self, prediction: 'np.ndarray') -> List[Any]: - """Converting ndarray prediction into list of readable labels - - .. note:: - ``len(output)`` should be the same as ``prediction.shape[0]`` - - :param prediction: the float/int numpy ndarray given by :class:`BaseClassifier` - :return: the readable label to be stored. - - - - .. # noqa: DAR401 - - - .. # noqa: DAR202 - """ - raise NotImplementedError - - -class BinaryPredictDriver(BaseLabelPredictDriver): - """Converts binary prediction into string label. This is often used with binary classifier. - - :param one_label: label when prediction is one - :param zero_label: label when prediction is zero - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, one_label: str = 'yes', zero_label: str = 'no', *args, **kwargs): - super().__init__(*args, **kwargs) - self.one_label = one_label - self.zero_label = zero_label - - def prediction2label(self, prediction: 'np.ndarray') -> List[str]: - """ - - :param prediction: a (B,) or (B, 1) zero one array - :return: the labels as either ``self.one_label`` or ``self.zero_label`` - - - .. # noqa: DAR401 - """ - p = np.squeeze(prediction) - if p.ndim > 1: - raise ValueError( - f'{typename(self)} expects prediction has ndim=1, but receiving ndim={p.ndim}' - ) - - return [self.one_label if v else self.zero_label for v in p.astype(bool)] - - -class OneHotPredictDriver(BaseLabelPredictDriver): - """Mapping prediction to one of the given labels - - Expect prediction to be 2dim array, zero-one valued. Each row corresponds to - a sample, each column corresponds to a label. Each row can have only one 1. - - This is often used with multi-class classifier. - """ - - def __init__(self, labels: List[str], *args, **kwargs): - super().__init__(*args, **kwargs) - self.labels = labels - - def validate_labels(self, prediction: 'np.ndarray'): - """Validate the labels. - - :param prediction: the predictions - - - .. # noqa: DAR401 - """ - if prediction.ndim != 2: - raise ValueError( - f'{typename(self)} expects prediction to have ndim=2, but received {prediction.ndim}' - ) - if prediction.shape[1] != len(self.labels): - raise ValueError( - f'{typename(self)} expects prediction.shape[1]==len(self.labels), but received {prediction.shape}' - ) - - def prediction2label(self, prediction: 'np.ndarray') -> List[str]: - """ - - :param prediction: a (B, C) array where C is the number of classes, only one element can be one - :return: the list of labels - """ - self.validate_labels(prediction) - p = np.argmax(prediction, axis=1) - return [self.labels[v] for v in p] - - -class MultiLabelPredictDriver(OneHotPredictDriver): - """Mapping prediction to a list of labels - - Expect prediction to be 2dim array, zero-one valued. Each row corresponds to - a sample, each column corresponds to a label. Each row can have only multiple 1s. - - This is often used with multi-label classifier, where each instance can have multiple labels - """ - - def prediction2label(self, prediction: 'np.ndarray') -> List[List[str]]: - """Transform the prediction into labels. - - :param prediction: the array of predictions - :return: nested list of labels - """ - self.validate_labels(prediction) - return [[self.labels[int(pp)] for pp in p.nonzero()[0]] for p in prediction] - - -class Prediction2DocBlobDriver(BasePredictDriver): - """Write the prediction result directly into ``document.blob``. - - .. warning:: - - This will erase the content in ``document.text`` and ``document.buffer``. - """ - - def update_single_doc( - self, - doc: 'Document', - exec_result: Union['np.ndarray', 'jina_pb2.NdArrayProto', 'NdArray'], - ) -> None: - """Update doc blob with executor's return. - - :param doc: the Document object - :param exec_result: the single result from :meth:`exec_fn` - """ - doc.blob = exec_result diff --git a/jina/drivers/querylang/filter.py b/jina/drivers/querylang/filter.py deleted file mode 100644 index acdb47515ff9e..0000000000000 --- a/jina/drivers/querylang/filter.py +++ /dev/null @@ -1,53 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Dict, Any, Iterable - -from ...types.querylang.queryset.lookup import Q -from .. import QuerySetReader, BaseRecursiveDriver, ContextAwareRecursiveMixin - -if False: - from ...types.arrays import DocumentArray - - -class FilterQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver): - """Filters incoming `docs` by evaluating a series of `lookup rules`. - - This is often useful when the proceeding Pods require only a signal, not the full message. - - Example :: - - !FilterQL - with: - lookups: {modality: mode2} - - !EncodeDriver - with: - method: encode - - ensures that the EncodeDriver will only get documents which modality field value is `mode2` by filtering - those documents at the specific levels that do not comply with this condition - - :param lookups: (dict) a dictionary where keys are interpreted by ``:class:`LookupLeaf`` to form a - an evaluation function. For instance, a dictionary ``{ modality__in: [mode1, mode2] }``, would create - an evaluation function that will check if the field `modality` is found in `[mode1, mode2]` - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, lookups: Dict[str, Any], *args, **kwargs): - super().__init__(*args, **kwargs) - self._lookups = lookups - - def _apply_all( - self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs - ) -> None: - for docs in doc_sequences: - if self.lookups: - _lookups = Q(**self.lookups) - miss_idx = [] - for idx, doc in enumerate(docs): - if not _lookups.evaluate(doc): - miss_idx.append(idx) - - # delete non-exit matches in reverse - for j in reversed(miss_idx): - del docs[j] diff --git a/jina/drivers/querylang/reverse.py b/jina/drivers/querylang/reverse.py deleted file mode 100644 index 671b4f409cd05..0000000000000 --- a/jina/drivers/querylang/reverse.py +++ /dev/null @@ -1,31 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Iterable, Tuple - -from .. import QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver - -if False: - from ...types.arrays import DocumentArray - - -class ReverseQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver): - """Reverses the order of the provided ``docs``. - - This is often useful when the proceeding Pods require only a signal, not the full message. - - Example :: - - !Chunk2DocRankerDriver {} - - !ReverseQL {} - - will reverse the order of the documents returned by the `Chunk2DocRankerDriver` before sending them to the next `Pod` - """ - - def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - - def _apply_all( - self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs - ) -> None: - for docs in doc_sequences: - docs.reverse() diff --git a/jina/drivers/querylang/select.py b/jina/drivers/querylang/select.py deleted file mode 100644 index 9e0f5446e6e18..0000000000000 --- a/jina/drivers/querylang/select.py +++ /dev/null @@ -1,107 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Union, Tuple - -from .. import QuerySetReader, FlatRecursiveMixin, BaseRecursiveDriver - -# noinspection PyUnreachableCode -if False: - from ...types.arrays import DocumentArray - - -class ExcludeQL(QuerySetReader, FlatRecursiveMixin, BaseRecursiveDriver): - """Clean some fields from the document-level protobuf to reduce the total size of the request - Example:: - - !ExcludeQL - with: - fields: - - chunks - - buffer - - ExcludeQL will avoid `buffer` and `chunks` fields to be sent to the next `Pod` - - :param fields: the pruned field names in tuple - :param traversal_paths: the traversal paths - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__( - self, - fields: Union[Tuple, str], - traversal_paths: Tuple[str] = ('r',), - *args, - **kwargs, - ): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - if isinstance(fields, str): - self._fields = [fields] - else: - self._fields = [field for field in fields] - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs): - for doc in docs: - for k in self.fields: - doc.ClearField(k) - - -class SelectQL(ExcludeQL): - """Selects some fields from the chunk-level protobuf to reduce the total size of the request, it works with the opposite - logic as `:class:`ExcludeQL` - - Example:: - - !SelectQL - with: - fields: - - matches - - SelectQL will ensure that the `outgoing` documents only contain the field `matches` - """ - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs): - for doc in docs: - for k in doc.DESCRIPTOR.fields_by_name.keys(): - if k not in self.fields: - doc.ClearField(k) - - -class ExcludeReqQL(ExcludeQL): - """Clean up request from the request-level protobuf message to reduce the total size of the message - - This is often useful when the proceeding Pods require only a signal, not the full message. - """ - - def __call__(self, *args, **kwargs): - """ - - - .. # noqa: DAR102 - - - .. # noqa: DAR101 - """ - for k in self.fields: - self.req.ClearField(k) - - -class SelectReqQL(ExcludeReqQL): - """Clean up request from the request-level protobuf message to reduce the total size of the message, it works with the opposite - logic as `:class:`ExcludeReqQL` - - - .. # noqa: DAR101 - """ - - def __call__(self, *args, **kwargs): - """ - - - .. # noqa: DAR102 - - - .. # noqa: DAR101 - """ - for k in self.req.DESCRIPTOR.fields_by_name.keys(): - if k not in self.fields: - self.req.ClearField(k) diff --git a/jina/drivers/querylang/slice.py b/jina/drivers/querylang/slice.py deleted file mode 100644 index 180df1a23643e..0000000000000 --- a/jina/drivers/querylang/slice.py +++ /dev/null @@ -1,58 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import sys - -from typing import Iterable, Optional - -from .. import QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver - -if False: - from ...types.arrays.document import DocumentArray - - -class SliceQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver): - """Restrict the size of the ``docs`` to ``k`` (given by the request) - - Example:: - - !ReduceAllDriver - with: - traversal_paths: ['m'] - - !SortQL - with: - reverse: true - field: 'score__value' - traversal_paths: ['m'] - - !SliceQL - with: - start: 0 - end: 50 - traversal_paths: ['m'] - - `SliceQL` will ensure that only the first 50 documents are returned from this `Pod` - - :param start: Zero-based index at which to start extraction. - :param end: Zero-based index before which to end extraction. - slice extracts up to but not including end. For example, take(1,4) extracts - the second element through the fourth element (elements indexed 1, 2, and 3). - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, start: int, end: Optional[int] = None, *args, **kwargs): - super().__init__(*args, **kwargs) - self._start = int(start) - if end is None: - self._end = sys.maxsize - else: - self._end = int(end) - - def _apply_all( - self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs - ) -> None: - for docs in doc_sequences: - if self.start <= 0 and (self.end is None or self.end >= len(docs)): - pass - else: - del docs[int(self.end) :] - del docs[: int(self.start)] diff --git a/jina/drivers/querylang/sort.py b/jina/drivers/querylang/sort.py deleted file mode 100644 index 17a30294feca3..0000000000000 --- a/jina/drivers/querylang/sort.py +++ /dev/null @@ -1,56 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Iterable, Tuple - -from ...types.querylang.queryset.dunderkey import dunder_get -from .. import QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver - -if False: - from ...types.arrays import DocumentArray - - -class SortQL(QuerySetReader, ContextAwareRecursiveMixin, BaseRecursiveDriver): - """Sorts the incoming of the documents by the value of a given field. - It can also work in reverse mode - - Example:: - - !ReduceAllDriver - with: - traversal_paths: ['m'] - - !SortQL - with: - reverse: true - field: 'score__value' - traversal_paths: ['m'] - - !SliceQL - with: - start: 0 - end: 50 - traversal_paths: ['m'] - - `SortQL` will ensure that only the documents are sorted by the score value before slicing the first top 50 documents - :param field: the value of the field drives the sort of the iterable docs - :param reverse: sort the value from big to small - :param traversal_paths: the traversal paths - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__( - self, - field: str, - reverse: bool = False, - traversal_paths: Tuple[str] = ('r',), - *args, - **kwargs, - ): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - self._reverse = reverse - self._field = field - - def _apply_all( - self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs - ) -> None: - for docs in doc_sequences: - docs.sort(key=lambda x: dunder_get(x, self.field), reverse=self.reverse) diff --git a/jina/drivers/rank/__init__.py b/jina/drivers/rank/__init__.py deleted file mode 100644 index 27b52f95a06da..0000000000000 --- a/jina/drivers/rank/__init__.py +++ /dev/null @@ -1,121 +0,0 @@ -from typing import Tuple, Optional, Iterable - -from .. import BaseExecutableDriver, FlatRecursiveMixin -from ...types.arrays import MatchArray -from ...types.score import NamedScore - -if False: - from ...types.arrays import DocumentArray - - -class BaseRankDriver(FlatRecursiveMixin, BaseExecutableDriver): - """Drivers inherited from this Driver will bind :meth:`rank` by default """ - - def __init__( - self, executor: Optional[str] = None, method: str = 'score', *args, **kwargs - ): - super().__init__(executor, method, *args, **kwargs) - - @property - def _exec_match_keys(self): - """Property to provide backward compatibility to executors relying in `required_keys` - :return: keys for attribute lookup in matches - """ - return getattr( - self.exec, 'match_required_keys', getattr(self.exec, 'required_keys', None) - ) - - @property - def _exec_query_keys(self): - """Property to provide backward compatibility to executors relying in `required_keys` - - :return: keys for attribute lookup in matches - """ - return getattr( - self.exec, 'query_required_keys', getattr(self.exec, 'required_keys', None) - ) - - -class Matches2DocRankDriver(BaseRankDriver): - """This driver is intended to only resort the given matches on the 0 level granularity for a document. - It gets the scores from a Ranking Executor, which does only change the scores of matches. - Afterwards, the Matches2DocRankDriver resorts all matches for a document. - Input-Output :: - Input: - document: {granularity: 0, adjacency: k} - |- matches: {granularity: 0, adjacency: k+1} - Output: - document: {granularity: 0, adjacency: k} - |- matches: {granularity: 0, adjacency: k+1} (Sorted according to scores from Ranker Executor) - """ - - def __init__( - self, - reverse: bool = True, - traversal_paths: Tuple[str] = ('r',), - *args, - **kwargs, - ): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - self.reverse = reverse - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - """ - - :param docs: the matches of the ``context_doc``, they are at granularity ``k`` - :param args: not used (kept to maintain interface) - :param kwargs: not used (kept to maintain interface) - - .. note:: - - This driver will change in place the ordering of ``matches`` of the ``context_doc`. - - Set the ``traversal_paths`` of this driver such that it traverses along the ``matches`` of the ``chunks`` at the level desired. - """ - old_scores = [] - queries_metas = [] - matches_metas = [] - for doc in docs: - query_meta = ( - doc.get_attrs(*self._exec_query_keys) if self._exec_query_keys else None - ) - - matches = doc.matches - old_match_scores = [] - needs_match_meta = self._exec_match_keys is not None - match_meta = [] if needs_match_meta else None - for match in matches: - old_match_scores.append(match.score.value) - if needs_match_meta: - match_meta.append(match.get_attrs(*self._exec_match_keys)) - - # if there are no matches, no need to sort them - old_scores.append(old_match_scores) - queries_metas.append(query_meta) - matches_metas.append(match_meta) - - new_scores = self.exec_fn(old_scores, queries_metas, matches_metas) - if len(new_scores) != len(docs): - msg = f'The number of scores {len(new_scores)} does not match the number of queries {len(docs)}' - self.logger.error(msg) - raise ValueError(msg) - - for doc, scores in zip(docs, new_scores): - matches = doc.matches - if len(doc.matches) != len(scores): - msg = ( - f'The number of matches to be scored {len(doc.matches)} do not match the number of scores returned ' - f'by the ranker {self.exec.__name__} for doc: {doc.id} ' - ) - self.logger.error(msg) - raise ValueError(msg) - self._sort_matches_in_place(matches, scores) - - def _sort_matches_in_place( - self, matches: 'MatchArray', match_scores: Iterable[float] - ) -> None: - op_name = self.exec.__class__.__name__ - ref_doc_id = matches._ref_doc.id - - for match, score in zip(matches, match_scores): - match.score = NamedScore(value=score, op_name=op_name, ref_id=ref_doc_id) - - matches.sort(key=lambda x: x.score.value, reverse=self.reverse) diff --git a/jina/drivers/rank/aggregate/__init__.py b/jina/drivers/rank/aggregate/__init__.py deleted file mode 100644 index b4e406fa6d958..0000000000000 --- a/jina/drivers/rank/aggregate/__init__.py +++ /dev/null @@ -1,307 +0,0 @@ -from typing import Dict, List, Tuple -from collections import defaultdict, namedtuple - -import numpy as np - -from ....executors.rankers import Chunk2DocRanker -from ....types.document import Document -from ....types.score import NamedScore - -from .. import BaseRankDriver - -if False: - from ....types.arrays import DocumentArray - -COL_STR_TYPE = 'U64' #: the ID column data type for score matrix - - -class BaseAggregateMatchesRankerDriver(BaseRankDriver): - """Drivers inherited from this Driver focus on aggregating scores from `chunks` to its `parents`. - - :param keep_source_matches_as_chunks: A flag to indicate if the driver must return the old matches of the query or its chunks - (at a greater granularity level (k + 1)) as the chunks of the new computed `matches` (at granularity level k) - Set it to `True` when keeping track of the chunks that lead to a retrieved result. - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - - .. note:: - When set `keep_source_matches_as_chunks=True`, the chunks of the match contains **ONLY** the chunks leading - to the match rather than **ALL** the chunks of the match.""" - - def __init__(self, keep_source_matches_as_chunks: bool = False, *args, **kwargs): - super().__init__(*args, **kwargs) - self.keep_source_matches_as_chunks = keep_source_matches_as_chunks - - QueryMatchInfo = namedtuple( - 'QueryMatchInfo', 'match_parent_id match_id query_id score' - ) - - def _extract_query_match_info(self, match: Document, query: Document): - return self.QueryMatchInfo( - match_parent_id=match.parent_id, - match_id=match.id, - query_id=query.id, - score=match.score.value, - ) - - def _insert_query_matches( - self, - query: Document, - parent_id_chunk_id_map: dict, - chunk_matches_by_id: dict, - docs_scores: 'np.ndarray', - ): - """ - :param query: the query Document where the resulting matches will be inserted - :param parent_id_chunk_id_map: a map with parent_id as key and list of previous matches ids as values - :param chunk_matches_by_id: the previous matches of the query (at a higher granularity) grouped by the new map (by its parent) - :param docs_scores: An `np.ndarray` resulting from the ranker executor with the `scores` of the new matches - """ - - op_name = self.exec.__class__.__name__ - for doc_id, score in docs_scores: - m = Document(id=doc_id) - m.score = NamedScore(op_name=op_name, value=score) - if self.keep_source_matches_as_chunks: - for match_chunk_id in parent_id_chunk_id_map[doc_id]: - m.chunks.append(chunk_matches_by_id[match_chunk_id]) - query.matches.append(m) - - @staticmethod - def _group_by(match_idx, col_name): - """ - Create an list of numpy arrays with the same ``col_name`` in each position of the list - - :param match_idx: Numpy array of Tuples with document id and score - :param col_name: Column name in the structured numpy array of Tuples - - :return: List of numpy arrays with the same ``doc_id`` in each position of the list - :rtype: np.ndarray. - """ - _sorted_m = np.sort(match_idx, order=col_name) - list_numpy_arrays = [] - prev_val = _sorted_m[col_name][0] - prev_index = 0 - for i, current_val in enumerate(_sorted_m[col_name]): - if current_val != prev_val: - list_numpy_arrays.append(_sorted_m[prev_index:i]) - prev_index = i - prev_val = current_val - list_numpy_arrays.append(_sorted_m[prev_index:]) - return list_numpy_arrays - - @staticmethod - def _sort_doc_by_score(r): - """ - Sort a numpy array of dtype (``doc_id``, ``score``) by the ``score``. - - :param r: Numpy array of Tuples with document id and score - :type r: np.ndarray[Tuple[np.str_, np.float64]] - """ - r[::-1].sort(order=Chunk2DocRanker.COL_SCORE) - - def _score( - self, match_idx: 'np.ndarray', query_chunk_meta: Dict, match_chunk_meta: Dict - ) -> 'np.ndarray': - """ - Translate the chunk-level top-k results into doc-level top-k results. Some score functions may leverage the - meta information of the query, hence the meta info of the query chunks and matched chunks are given - as arguments. - - :param match_idx: A [N x 4] numpy ``ndarray``, column-wise: - - ``match_idx[:, 0]``: ``doc_id`` of the matched chunks, integer - - ``match_idx[:, 1]``: ``chunk_id`` of the matched chunks, integer - - ``match_idx[:, 2]``: ``chunk_id`` of the query chunks, integer - - ``match_idx[:, 3]``: distance/metric/score between the query and matched chunks, float - :type match_idx: np.ndarray. - :param query_chunk_meta: The meta information of the query chunks, where the key is query chunks' ``chunk_id``, - the value is extracted by the ``query_required_keys``. - :param match_chunk_meta: The meta information of the matched chunks, where the key is matched chunks' - ``chunk_id``, the value is extracted by the ``match_required_keys``. - :return: A [N x 2] numpy ``ndarray``, where the first column is the matched documents' ``doc_id`` (integer) - the second column is the score/distance/metric between the matched doc and the query doc (float). - :rtype: np.ndarray. - """ - _groups = self._group_by(match_idx, Chunk2DocRanker.COL_PARENT_ID) - n_groups = len(_groups) - res = np.empty( - (n_groups,), - dtype=[ - (Chunk2DocRanker.COL_PARENT_ID, COL_STR_TYPE), - (Chunk2DocRanker.COL_SCORE, np.float64), - ], - ) - - for i, _g in enumerate(_groups): - res[i] = ( - _g[Chunk2DocRanker.COL_PARENT_ID][0], - self.exec_fn(_g, query_chunk_meta, match_chunk_meta), - ) - - self._sort_doc_by_score(res) - return res - - -class Chunk2DocRankDriver(BaseAggregateMatchesRankerDriver): - """Extract matches score from chunks and use the executor to compute the rank and assign the resulting matches to the - level above. - - Input-Output :: - Input: - document: {granularity: k-1} - |- chunks: {granularity: k} - | |- matches: {granularity: k} - | - |- chunks: {granularity: k} - |- matches: {granularity: k} - Output: - document: {granularity: k-1} - |- chunks: {granularity: k} - | |- matches: {granularity: k} - | - |- chunks: {granularity: k} - | |- matches: {granularity: k} - | - |-matches: {granularity: k-1} (Ranked according to Ranker Executor) - """ - - def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - """ - :param docs: the doc which gets bubbled up matches - :param args: not used (kept to maintain interface) - :param kwargs: not used (kept to maintain interface) - """ - for doc in docs: - chunks = doc.chunks - match_idx = [] # type: List[Tuple[str, str, str, float]] - query_meta = {} # type: Dict[str, Dict] - match_meta = {} # type: Dict[str, Dict] - parent_id_chunk_id_map = defaultdict(list) - matches_by_id = defaultdict(Document) - for chunk in chunks: - query_meta[chunk.id] = ( - chunk.get_attrs(*self._exec_query_keys) - if self._exec_query_keys - else None - ) - for match in chunk.matches: - match_info = self._extract_query_match_info( - match=match, query=chunk - ) - match_idx.append(match_info) - match_meta[match.id] = ( - match.get_attrs(*self._exec_match_keys) - if self._exec_match_keys - else None - ) - parent_id_chunk_id_map[match.parent_id].append(match.id) - matches_by_id[match.id] = match - - if match_idx: - match_idx = np.array( - match_idx, - dtype=[ - (Chunk2DocRanker.COL_PARENT_ID, COL_STR_TYPE), - (Chunk2DocRanker.COL_DOC_CHUNK_ID, COL_STR_TYPE), - (Chunk2DocRanker.COL_QUERY_CHUNK_ID, COL_STR_TYPE), - (Chunk2DocRanker.COL_SCORE, np.float64), - ], - ) - - docs_scores = self._score(match_idx, query_meta, match_meta) - - self._insert_query_matches( - query=doc, - parent_id_chunk_id_map=parent_id_chunk_id_map, - chunk_matches_by_id=matches_by_id, - docs_scores=docs_scores, - ) - - -class AggregateMatches2DocRankDriver(BaseAggregateMatchesRankerDriver): - """This Driver is intended to take a `document` with matches at a `given granularity > 0`, clear those matches and substitute - these matches by the documents at a lower granularity level. - Input-Output :: - Input: - document: {granularity: k} - |- matches: {granularity: k} - - Output: - document: {granularity: k} - |- matches: {granularity: k-1} (Sorted according to Ranker Executor) - - Imagine a case where we are querying a system with text documents chunked by sentences. When we query the system, - we use sentences (chunks) to query it. So at some point we will have: - `query sentence (documents of granularity 1): - matches: indexed sentences (documents of level depth 1)` - ` - But in the output we want to have the full document that better matches the `sentence`. - `query sentence (documents of granularity 1): - matches: indexed full documents (documents of granularity 0). - ` - Using this Driver before querying a Binary Index with full binary document data can be very useful to implement a search system. - """ - - def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - """ - - :param docs: the document at granularity ``k`` - :param args: not used (kept to maintain interface) - :param kwargs: not used (kept to maintain interface) - - .. note:: - - This driver will substitute the ``matches`` of `docs` to the corresponding ``parent documents`` of its current ``matches`` according - to the executor. - - Set the ``traversal_paths`` of this driver to identify the documents, which needs to get bubbled up matches. - """ - - for doc in docs: - matches = doc.matches - - match_idx = [] - query_meta = {} - match_meta = {} - parent_id_chunk_id_map = defaultdict(list) - matches_by_id = defaultdict(Document) - - query_meta[doc.id] = ( - doc.get_attrs(*self._exec_query_keys) if self._exec_query_keys else None - ) - - for match in matches: - match_info = self._extract_query_match_info(match=match, query=doc) - match_idx.append(match_info) - match_meta[match.id] = ( - match.get_attrs(*self._exec_match_keys) - if self._exec_match_keys - else None - ) - parent_id_chunk_id_map[match.parent_id].append(match.id) - matches_by_id[match.id] = match - - if match_idx: - match_idx = np.array( - match_idx, - dtype=[ - (Chunk2DocRanker.COL_PARENT_ID, COL_STR_TYPE), - (Chunk2DocRanker.COL_DOC_CHUNK_ID, COL_STR_TYPE), - (Chunk2DocRanker.COL_QUERY_CHUNK_ID, COL_STR_TYPE), - (Chunk2DocRanker.COL_SCORE, np.float64), - ], - ) - - docs_scores = self._score(match_idx, query_meta, match_meta) - # This ranker will change the current matches - doc.ClearField('matches') - self._insert_query_matches( - query=doc, - parent_id_chunk_id_map=parent_id_chunk_id_map, - chunk_matches_by_id=matches_by_id, - docs_scores=docs_scores, - ) diff --git a/jina/drivers/reduce.py b/jina/drivers/reduce.py deleted file mode 100644 index 47a5315edd401..0000000000000 --- a/jina/drivers/reduce.py +++ /dev/null @@ -1,89 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Tuple, Iterable - -from collections import defaultdict - -import numpy as np - -from . import ContextAwareRecursiveMixin, BaseRecursiveDriver, FlatRecursiveMixin -from ..types.arrays import ChunkArray, MatchArray, DocumentArray - - -class ReduceAllDriver(ContextAwareRecursiveMixin, BaseRecursiveDriver): - """:class:`ReduceAllDriver` merges chunks/matches from all requests, recursively. - - .. note:: - - It uses the last request as a reference. - """ - - def __init__(self, traversal_paths: Tuple[str] = ('c',), *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - - def _apply_root(self, docs): - request = self.msg.request - request.body.ClearField('docs') - request.docs.extend(docs) - - def _apply_all( - self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs - ) -> None: - doc_pointers = {} - for docs in doc_sequences: - if isinstance(docs, (ChunkArray, MatchArray)): - context_id = docs.reference_doc.id - if context_id not in doc_pointers: - doc_pointers[context_id] = docs.reference_doc - else: - if isinstance(docs, ChunkArray): - doc_pointers[context_id].chunks.extend(docs) - else: - doc_pointers[context_id].matches.extend(docs) - else: - self._apply_root(docs) - - -class CollectEvaluationDriver(FlatRecursiveMixin, BaseRecursiveDriver): - """Merge all evaluations into one, grouped by ``doc.id`` """ - - def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - doc_pointers = {} - for doc in docs: - if doc.id not in doc_pointers: - doc_pointers[doc.id] = doc.evaluations - else: - doc_pointers[doc.id].extend(doc.evaluations) - - -class ConcatEmbedDriver(BaseRecursiveDriver): - """Concat all embeddings into one, grouped by ``doc.id`` """ - - def __init__(self, traversal_paths: Tuple[str] = ('r',), *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - - def __call__(self, *args, **kwargs): - """Performs the concatenation of all embeddings in `self.docs`. - - :param args: args not used. Only for complying with parent class interface. - :param kwargs: kwargs not used. Only for complying with parent class interface. - """ - all_documents = self.docs.traverse_flatten(self._traversal_paths) - doc_pointers = self._collect_embeddings(all_documents) - - last_request_documents = self.req.docs.traverse_flatten(self._traversal_paths) - self._concat_apply(last_request_documents, doc_pointers) - - def _collect_embeddings(self, docs: 'DocumentArray'): - doc_pointers = defaultdict(list) - for doc in docs: - doc_pointers[doc.id].append(doc.embedding) - return doc_pointers - - def _concat_apply(self, docs, doc_pointers): - for doc in docs: - doc.embedding = np.concatenate(doc_pointers[doc.id], axis=0) diff --git a/jina/drivers/search.py b/jina/drivers/search.py deleted file mode 100644 index ac10a39245837..0000000000000 --- a/jina/drivers/search.py +++ /dev/null @@ -1,185 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Iterable, Tuple, Optional - -from . import ( - BaseExecutableDriver, - QuerySetReader, - FlatRecursiveMixin, - ContextAwareRecursiveMixin, -) -from ..enums import EmbeddingClsType -from ..types.document import Document -from ..types.score import NamedScore - -if False: - from ..types.arrays import DocumentArray - - -class BaseSearchDriver(BaseExecutableDriver): - """Drivers inherited from this Driver will bind :meth:`query` by default """ - - def __init__( - self, - executor: Optional[str] = None, - method: str = 'query', - traversal_paths: Tuple[str] = ('r', 'c'), - *args, - **kwargs, - ): - super().__init__( - executor, method, traversal_paths=traversal_paths, *args, **kwargs - ) - - -class KVSearchDriver(ContextAwareRecursiveMixin, BaseSearchDriver): - """Fill in the results using the :class:`jina.executors.indexers.meta.BinaryPbIndexer` - - .. warning:: - This driver runs a query for each document. - This may not be very efficient, as the total number of queries grows cubic with the number of documents, chunks - per document and top-k. - - - traversal_paths = ['m'] => D x K - - traversal_paths = ['r'] => D - - traversal_paths = ['cm'] => D x C x K - - traversal_paths = ['m', 'cm'] => D x K + D x C x K - - where: - - D is the number of queries - - C is the number of chunks per document - - K is the top-k - - :param is_update: when set to true the retrieved docs are merged into current message; - otherwise, the retrieved Document overrides the existing Document - :param traversal_paths: traversal paths for the driver - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__( - self, - is_update: bool = True, - traversal_paths: Tuple[str] = ('m',), - *args, - **kwargs, - ): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - self._is_update = is_update - - def _apply_all( - self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs - ) -> None: - - for docs in doc_sequences: - miss_idx = ( - [] - ) #: missed hit results, some search may not end with results. especially in shards - serialized_docs = self.exec_fn([d.id for d in docs]) - - for idx, (retrieved_doc, serialized_doc) in enumerate( - zip(docs, serialized_docs) - ): - if serialized_doc: - r = Document(serialized_doc) - if self._is_update: - retrieved_doc.update(r) - else: - retrieved_doc.CopyFrom(r) - else: - miss_idx.append(idx) - - # delete non-existed matches in reverse - for j in reversed(miss_idx): - del docs[j] - - -class VectorFillDriver(FlatRecursiveMixin, QuerySetReader, BaseSearchDriver): - """Fill in the embedding by their document id.""" - - def __init__( - self, - executor: Optional[str] = None, - method: str = 'query_by_key', - *args, - **kwargs, - ): - super().__init__(executor, method, *args, **kwargs) - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - embeds = self.exec_fn([d.id for d in docs]) - for doc, embedding in zip(docs, embeds): - doc.embedding = embedding - - -class VectorSearchDriver(FlatRecursiveMixin, QuerySetReader, BaseSearchDriver): - """Extract dense embeddings from the request for the executor to query. - - :param top_k: top-k document ids to retrieve - :param fill_embedding: fill in the embedding of the corresponding doc, - this requires the executor to implement :meth:`query_by_key` - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization""" - - def __init__(self, top_k: int = 50, fill_embedding: bool = False, *args, **kwargs): - super().__init__(*args, **kwargs) - self._top_k = top_k - self._fill_embedding = fill_embedding - - @property - def exec_embedding_cls_type(self) -> EmbeddingClsType: - """Get the sparse class type of the attached executor. - - :return: Embedding class type of the attached executor, default value is `dense` - """ - return EmbeddingClsType.from_string(self.exec.embedding_cls_type) - - def _get_documents_embeddings(self, docs: 'DocumentArray'): - embedding_cls_type = self.exec_embedding_cls_type - if embedding_cls_type.is_dense: - return docs.all_embeddings - else: - return docs.get_all_sparse_embeddings(embedding_cls_type=embedding_cls_type) - - def _fill_matches(self, doc, op_name, topks, scores, topk_embed): - embedding_cls_type = self.exec_embedding_cls_type - if embedding_cls_type.is_dense: - for numpy_match_id, score, vector in zip(topks, scores, topk_embed): - m = Document(id=numpy_match_id) - m.score = NamedScore(op_name=op_name, value=score) - r = doc.matches.append(m) - if vector is not None: - r.embedding = vector - else: - for idx, (numpy_match_id, score) in enumerate(zip(topks, scores)): - vector = None - if topk_embed[idx] is not None: - vector = topk_embed.getrow(idx) - m = Document(id=numpy_match_id) - m.score = NamedScore(op_name=op_name, value=score) - match = doc.matches.append(m) - if vector is not None: - match.embedding = vector - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs) -> None: - embed_vecs, doc_pts = self._get_documents_embeddings(docs) - - if not doc_pts: - return - - fill_fn = getattr(self.exec, 'query_by_key', None) - if self._fill_embedding and not fill_fn: - self.logger.warning( - f'"fill_embedding=True" but {self.exec} does not have "query_by_key" method' - ) - - idx, dist = self.exec_fn(embed_vecs, top_k=int(self.top_k)) - op_name = self.exec.__class__.__name__ - for doc, topks, scores in zip(doc_pts, idx, dist): - topk_embed = ( - fill_fn(topks) - if (self._fill_embedding and fill_fn) - else [None] * len(topks) - ) - self._fill_matches(doc, op_name, topks, scores, topk_embed) diff --git a/jina/drivers/segment.py b/jina/drivers/segment.py deleted file mode 100644 index 48fb5a0d9158b..0000000000000 --- a/jina/drivers/segment.py +++ /dev/null @@ -1,41 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Optional, Tuple, Dict, List - -from . import BaseExecutableDriver, FlatRecursiveMixin, DocsExtractUpdateMixin -from ..types.document import Document - - -class SegmentDriver(DocsExtractUpdateMixin, FlatRecursiveMixin, BaseExecutableDriver): - """Drivers inherited from this Driver will bind :meth:`segment` by default """ - - def __init__( - self, - executor: Optional[str] = None, - method: str = 'segment', - traversal_paths: Tuple[str] = ('r',), - *args, - **kwargs, - ): - super().__init__( - executor, method, traversal_paths=traversal_paths, *args, **kwargs - ) - - @property - def _stack_document_content(self): - return False - - def update_single_doc(self, doc: 'Document', exec_result: List[Dict]) -> None: - """Update the document's chunks field with executor's returns. - - :param doc: the Document object - :param exec_result: the single result from :meth:`exec_fn` - """ - new_chunks = [] - for chunk in exec_result: - with Document(**chunk) as c: - if not c.mime_type: - c.mime_type = doc.mime_type - new_chunks.append(c) - doc.chunks.extend(new_chunks) diff --git a/jina/drivers/train/rank/__init__.py b/jina/drivers/train/rank/__init__.py deleted file mode 100644 index 337f3e7cb20b7..0000000000000 --- a/jina/drivers/train/rank/__init__.py +++ /dev/null @@ -1,40 +0,0 @@ -from ...rank import Matches2DocRankDriver -from ....types.sets import DocumentSet - - -class RankerTrainerDriver(Matches2DocRankDriver): - """Ranker trainer driver.""" - - def __init__(self, method: str = 'train', *args, **kwargs): - super().__init__(method=method, *args, **kwargs) - - def _apply_all(self, docs: 'DocumentSet', *args, **kwargs) -> None: - """ - - :param docs: the matches of the ``context_doc``, they are at granularity ``k`` - :param args: not used (kept to maintain interface) - :param kwargs: not used (kept to maintain interface) - - .. note:: - - This driver will change in place the ordering of ``matches`` of the ``context_doc`. - - Set the ``traversal_paths`` of this driver such that it traverses along the ``matches`` of the ``chunks`` at the level desired. - """ - queries_metas = [] - matches_metas = [] - for doc in docs: - query_meta = ( - doc.get_attrs(*self._exec_query_keys) if self._exec_query_keys else None - ) - - matches = doc.matches - needs_match_meta = self._exec_match_keys is not None - match_meta = [] if needs_match_meta else None - for match in matches: - if needs_match_meta: - match_meta.append(match.get_attrs(*self._exec_match_keys)) - - # if there are no matches, no need to sort them - queries_metas.append(query_meta) - matches_metas.append(match_meta) - - self.exec_fn(queries_metas, matches_metas) diff --git a/jina/enums.py b/jina/enums.py index b449af810d95f..23029a6bbbeec 100644 --- a/jina/enums.py +++ b/jina/enums.py @@ -16,9 +16,6 @@ parallel_type: any """ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - from enum import IntEnum, EnumMeta @@ -211,15 +208,6 @@ def paired(self) -> 'SocketType': }[self] -class FlowOutputType(BetterEnum): - """The enum for representing flow output config.""" - - SHELL_PROC = 0 #: a shell-script, run each microservice as a process - SHELL_DOCKER = 1 #: a shell-script, run each microservice as a container - DOCKER_SWARM = 2 #: a docker-swarm YAML config - K8S = 3 #: a Kubernetes YAML config - - class FlowBuildLevel(BetterEnum): """ The enum for representing a flow's build level. @@ -263,12 +251,8 @@ def is_inspect(self) -> bool: class RequestType(BetterEnum): """The enum of Client mode.""" - INDEX = 0 - SEARCH = 1 - DELETE = 2 - UPDATE = 3 - CONTROL = 4 - TRAIN = 5 + DATA = 0 + CONTROL = 1 class CompressAlgo(BetterEnum): @@ -305,9 +289,8 @@ class OnErrorStrategy(BetterEnum): IGNORE = ( 0 #: Ignore it, keep running all Drivers & Executors logics in the sequel flow ) - SKIP_EXECUTOR = 1 #: Skip all Executors in the sequel, but drivers are still called - SKIP_HANDLE = 2 #: Skip all Drivers & Executors in the sequel, only `pre_hook` and `post_hook` are called - THROW_EARLY = 3 #: Immediately throw the exception, the sequel flow will not be running at all + SKIP_HANDLE = 1 #: Skip all Executors in the sequel, only `pre_hook` and `post_hook` are called + THROW_EARLY = 2 #: Immediately throw the exception, the sequel flow will not be running at all class FlowInspectType(BetterEnum): diff --git a/jina/excepts.py b/jina/excepts.py index b600e66de7b1a..cff8dfd541fa6 100644 --- a/jina/excepts.py +++ b/jina/excepts.py @@ -1,8 +1,5 @@ """This modules defines all kinds of exceptions raised in Jina.""" -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - class NoExplicitMessage(Exception): """Waiting until all partial messages are received.""" @@ -32,10 +29,6 @@ class NoAvailablePortError(Exception): """When no available random port could be found""" -class DriverError(Exception): - """Driver related exceptions.""" - - class RuntimeTerminated(KeyboardInterrupt): """The event loop of BasePea ends.""" @@ -44,18 +37,6 @@ class PodRunTimeError(Exception): """The error propagated by Pods when Executor throws an exception.""" -class DriverNotInstalled(DriverError): - """Driver is not installed in the BasePea.""" - - -class NoDriverForRequest(DriverError): - """No matched driver for this request.""" - - -class UnattachedDriver(DriverError): - """Driver is not attached to any BasePea or executor.""" - - class UnknownControlCommand(RuntimeError): """The control command received can not be recognized.""" @@ -163,10 +144,6 @@ class BadDocType(TypeError): """Exception when can not construct a document from the given data.""" -class BadQueryLangType(TypeError): - """Exception when can not construct a query language from the given data.""" - - class BadRequestType(TypeError): """Exception when can not construct a request object from given data.""" diff --git a/jina/executors/__init__.py b/jina/executors/__init__.py index 311d5bea3f632..1d254aed1b527 100644 --- a/jina/executors/__init__.py +++ b/jina/executors/__init__.py @@ -1,87 +1,31 @@ -__copyright__ = 'Copyright (c) 2020 Jina AI Limited. All rights reserved.' -__license__ = 'Apache-2.0' - import os -import pickle -import tempfile -from datetime import datetime -from pathlib import Path from types import SimpleNamespace -from typing import Dict, TypeVar, Type, List, Optional - -from .decorators import ( - as_update_method, - store_init_kwargs, - as_aggregate_method, - wrap_func, -) -from .metas import get_default_metas, fill_metas_with_defaults -from ..excepts import BadPersistantFile, NoDriverForRequest, UnattachedDriver -from ..helper import typename, random_identity -from ..jaml import JAMLCompatible, JAML, subvar_regex, internal_var_regex -from ..logging import JinaLogger +from typing import Dict, TypeVar, Optional, Callable -# noinspection PyUnreachableCode -if False: - from ..peapods.runtimes.zmq.zed import ZEDRuntime - from ..drivers import BaseDriver +from .decorators import store_init_kwargs, wrap_func +from .metas import get_default_metas +from .. import __default_endpoint__ +from ..helper import typename +from ..jaml import JAMLCompatible, JAML, subvar_regex, internal_var_regex -__all__ = ['BaseExecutor', 'AnyExecutor', 'ExecutorType', 'GenericExecutor'] +__all__ = ['BaseExecutor', 'AnyExecutor', 'ExecutorType'] AnyExecutor = TypeVar('AnyExecutor', bound='BaseExecutor') -# some variables may be self-referred and they must be resolved at here -_ref_desolve_map = SimpleNamespace() -_ref_desolve_map.__dict__['metas'] = SimpleNamespace() -_ref_desolve_map.__dict__['metas'].__dict__['pea_id'] = 0 -_ref_desolve_map.__dict__['metas'].__dict__['replica_id'] = -1 - class ExecutorType(type(JAMLCompatible), type): """The class of Executor type, which is the metaclass of :class:`BaseExecutor`.""" def __new__(cls, *args, **kwargs): """ - - - # noqa: DAR201 - - # noqa: DAR101 - - # noqa: DAR102 + + :return: Executor class """ _cls = super().__new__(cls, *args, **kwargs) return cls.register_class(_cls) - def __call__(cls, *args, **kwargs): - """ - - - # noqa: DAR201 - - - # noqa: DAR101 - - - # noqa: DAR102 - """ - # do _preload_package - getattr(cls, 'pre_init', lambda *x: None)() - - m = kwargs.pop('metas') if 'metas' in kwargs else {} - r = kwargs.pop('requests') if 'requests' in kwargs else {} - - obj = type.__call__(cls, *args, **kwargs) - - # set attribute with priority - # metas in YAML > class attribute > default_jina_config - # jina_config = expand_dict(jina_config) - - getattr(obj, '_post_init_wrapper', lambda *x: None)(m, r) - return obj - @staticmethod def register_class(cls): """ @@ -90,16 +34,12 @@ def register_class(cls): :param cls: The class. :return: The class, after being registered. """ - update_funcs = ['add', 'delete', 'update'] - aggregate_funcs = ['evaluate'] reg_cls_set = getattr(cls, '_registered_class', set()) cls_id = f'{cls.__module__}.{cls.__name__}' if cls_id not in reg_cls_set or getattr(cls, 'force_register', False): wrap_func(cls, ['__init__'], store_init_kwargs) - wrap_func(cls, update_funcs, as_update_method) - wrap_func(cls, aggregate_funcs, as_aggregate_method) reg_cls_set.add(cls_id) setattr(cls, '_registered_class', reg_cls_set) @@ -126,473 +66,153 @@ def __init__(awesomeness = 5): .. highlight:: yaml .. code-block:: yaml - !MyAwesomeExecutor + jtype: MyAwesomeExecutor with: awesomeness: 5 - To use an executor in a :class:`jina.peapods.runtimes.zmq.zed.ZEDRuntime`, - a proper :class:`jina.drivers.Driver` is required. This is because the - executor is *NOT* protobuf-aware and has no access to the key-values in the protobuf message. - - Different executor may require different :class:`Driver` with - proper :mod:`jina.drivers.handlers`, :mod:`jina.drivers.hooks` installed. - - .. seealso:: - Methods of the :class:`BaseExecutor` can be decorated via :mod:`jina.executors.decorators`. - - .. seealso:: - Meta fields :mod:`jina.executors.metas.defaults`. - """ - store_args_kwargs = False #: set this to ``True`` to save ``args`` (in a list) and ``kwargs`` (in a map) in YAML config - - def __init__(self, *args, **kwargs): - if isinstance(args, tuple) and len(args) > 0: - self.args = args[0] - else: - self.args = args - self.logger = JinaLogger(self.__class__.__name__) - self._snapshot_files = [] - self._post_init_vars = set() - self._last_snapshot_ts = datetime.now() - - def _post_init_wrapper( + def __init__( self, - _metas: Optional[Dict] = None, - _requests: Optional[Dict] = None, - fill_in_metas: bool = True, - ) -> None: - if fill_in_metas: - if not _metas: - _metas = get_default_metas() - - self._fill_metas(_metas) - self.fill_in_drivers(_requests) - - _before = set(list(vars(self).keys())) - self.post_init() - self._post_init_vars = {k for k in vars(self) if k not in _before} - - def fill_in_drivers(self, _requests: Optional[Dict]): - """ - Fill in drivers in a BaseExecutor. - - :param _requests: Dict containing driver information. - """ - from ..executors.requests import get_default_reqs - - default_requests = get_default_reqs(type.mro(self.__class__)) + metas: Optional[Dict] = None, + requests: Optional[Dict] = None, + runtime_args: Optional[Dict] = None, + ): + """`metas` and `requests` are always auto-filled with values from YAML config. + + :param metas: a dict of metas fields + :param requests: a dict of endpoint-function mapping + :param runtime_args: a dict of arguments injected from :class:`Runtime` during runtime + """ + self._add_metas(metas) + self._add_requests(requests) + self._add_runtime_args(runtime_args) + + def _add_runtime_args(self, _runtime_args: Optional[Dict]): + if _runtime_args: + self.runtime_args = SimpleNamespace(**_runtime_args) + else: + self.runtime_args = SimpleNamespace() + + def _add_requests(self, _requests: Optional[Dict]): + request_mapping = {} # type: Dict[str, Callable] + + if _requests: + for endpoint, func in _requests.items(): + # the following line must be `getattr(self.__class__, func)` NOT `getattr(self, func)` + # this to ensure we always have `_func` as unbound method + _func = getattr(self.__class__, func) + if callable(_func): + # the target function is not decorated with `@requests` yet + request_mapping[endpoint] = _func + elif typename(_func) == 'jina.executors.decorators.FunctionMapper': + # the target function is already decorated with `@requests`, need unwrap with `.fn` + request_mapping[endpoint] = _func.fn + else: + raise TypeError( + f'expect {typename(self)}.{func} to be a function, but receiving {typename(_func)}' + ) - if not _requests: - self._drivers = self._get_drivers_from_requests(default_requests) + if hasattr(self, 'requests'): + self.requests.update(request_mapping) else: - parsed_drivers = self._get_drivers_from_requests(_requests) + self.requests = request_mapping - if _requests.get('use_default', False): - default_drivers = self._get_drivers_from_requests(default_requests) + def _add_metas(self, _metas: Optional[Dict]): - for k, v in default_drivers.items(): - if k not in parsed_drivers: - parsed_drivers[k] = v + tmp = get_default_metas() - self._drivers = parsed_drivers + if _metas: + tmp.update(_metas) - @staticmethod - def _get_drivers_from_requests(_requests): - _drivers = {} # type: Dict[str, List['BaseDriver']] - - if _requests and 'on' in _requests and isinstance(_requests['on'], dict): - # if control request is forget in YAML, then fill it - if 'ControlRequest' not in _requests['on']: - from ..drivers.control import ControlReqDriver - - _requests['on']['ControlRequest'] = [ControlReqDriver()] - - for req_type, drivers_spec in _requests['on'].items(): - if isinstance(req_type, str): - req_type = [req_type] - if isinstance(drivers_spec, list): - # old syntax - drivers = drivers_spec - common_kwargs = {} - elif isinstance(drivers_spec, dict): - drivers = drivers_spec.get('drivers', []) - common_kwargs = drivers_spec.get('with', {}) - else: - raise TypeError(f'unsupported type of driver spec: {drivers_spec}') - - for r in req_type: - if r not in _drivers: - _drivers[r] = list() - if _drivers[r] != drivers: - _drivers[r].extend(drivers) - - # inject common kwargs to drivers - if common_kwargs: - new_drivers = [] - for d in _drivers[r]: - new_init_kwargs_dict = { - k: v for k, v in d._init_kwargs_dict.items() - } - new_init_kwargs_dict.update(common_kwargs) - new_drivers.append(d.__class__(**new_init_kwargs_dict)) - _drivers[r].clear() - _drivers[r] = new_drivers - - if not _drivers[r]: - _drivers.pop(r) - return _drivers - - def _fill_metas(self, _metas): unresolved_attr = False + target = SimpleNamespace() # set self values filtered by those non-exist, and non-expandable - for k, v in _metas.items(): - if not hasattr(self, k): + for k, v in tmp.items(): + if not hasattr(target, k): if isinstance(v, str): if not subvar_regex.findall(v): - setattr(self, k, v) + setattr(target, k, v) else: unresolved_attr = True else: - setattr(self, k, v) - elif type(getattr(self, k)) == type(v): - setattr(self, k, v) - if not getattr(self, 'name', None): - _id = random_identity().split('-')[0] - _name = f'{typename(self)}-{_id}' - if getattr(self, 'warn_unnamed', False): - self.logger.warning( - f'this executor is not named, i will call it "{_name}". ' - 'naming is important as it provides an unique identifier when ' - 'persisting this executor on disk.' - ) - setattr(self, 'name', _name) + setattr(target, k, v) + elif type(getattr(target, k)) == type(v): + setattr(target, k, v) + if unresolved_attr: _tmp = vars(self) - _tmp['metas'] = _metas - new_metas = JAML.expand_dict(_tmp, context=_ref_desolve_map)['metas'] + _tmp['metas'] = tmp + new_metas = JAML.expand_dict(_tmp)['metas'] - # set self values filtered by those non-exist, and non-expandable for k, v in new_metas.items(): - if not hasattr(self, k): + if not hasattr(target, k): if isinstance(v, str): if not ( subvar_regex.findall(v) or internal_var_regex.findall(v) ): - setattr(self, k, v) + setattr(target, k, v) else: raise ValueError( f'{k}={v} is not substitutable or badly referred' ) else: - setattr(self, k, v) - - def post_init(self): - """ - Initialize class attributes/members that can/should not be (de)serialized in standard way. - - Examples: + setattr(target, k, v) + # `name` is important as it serves as an identifier of the executor + # if not given, then set a name by the rule + if not getattr(target, 'name', None): + setattr(target, 'name', typename(self)) - - deep learning models - - index files - - numpy arrays + self.metas = target - .. warning:: - All class members created here will NOT be serialized when calling :func:`save`. Therefore if you - want to store them, please override the :func:`__getstate__`. + def close(self) -> None: """ - pass + Always invoked as executor is destroyed. - @classmethod - def pre_init(cls): - """This function is called before the object initiating (i.e. :func:`__call__`) - - Packages and environment variables can be set and load here. + You can write destructor & saving logic here. """ pass - @property - def save_abspath(self) -> str: - """Get the file path of the binary serialized object - - The file name ends with `.bin`. - - :return: the name of the file with `.bin` + def __call__(self, req_endpoint: str, **kwargs): """ - return self.get_file_from_workspace(f'{self.name}.bin') - - @property - def config_abspath(self) -> str: - """Get the file path of the YAML config - - :return: The file name ends with `.yml`. + # noqa: DAR101 + # noqa: DAR102 + # noqa: DAR201 """ - return self.get_file_from_workspace(f'{self.name}.yml') + if req_endpoint in self.requests: + return self.requests[req_endpoint]( + self, **kwargs + ) # unbound method, self is required + elif __default_endpoint__ in self.requests: + return self.requests[__default_endpoint__]( + self, **kwargs + ) # unbound method, self is required - @staticmethod - def get_shard_workspace( - workspace_folder: str, - workspace_name: str, - pea_id: int, - replica_id: int = -1, - ) -> str: + @property + def workspace(self) -> str: """ Get the path of the current shard. - :param workspace_folder: folder of the workspace. - :param workspace_name: name of the workspace. - :param pea_id: id of the pea - :param replica_id: id of the replica - :return: returns the workspace of the shard of this Executor. """ - if replica_id == -1: - return os.path.join(workspace_folder, f'{workspace_name}-{pea_id}') - else: - return os.path.join( - workspace_folder, f'{workspace_name}-{replica_id}-{pea_id}' - ) - - @property - def workspace_name(self): - """Get the name of the workspace. - - :return: returns the name of the executor - """ - return self.name - - @property - def _workspace(self): - """Property to access `workspace` if existing or default to `./`. Useful to provide good interface when - using executors directly in python. - - .. highlight:: python - .. code-block:: python - - with NumpyIndexer() as indexer: - indexer.touch() - - :return: returns the workspace property of the executor or default to './' - """ - return self.workspace or './' - - @property - def shard_workspace(self) -> str: - """Get the path of the current shard. - - :return: returns the workspace of the shard of this Executor - """ - return BaseExecutor.get_shard_workspace( - self._workspace, self.workspace_name, self.pea_id, self.replica_id - ) - - def get_file_from_workspace(self, name: str) -> str: - """Get a usable file path under the current workspace - - :param name: the name of the file - - :return: file path - """ - Path(self.shard_workspace).mkdir(parents=True, exist_ok=True) - return os.path.join(self.shard_workspace, name) - - @property - def physical_size(self) -> int: - """Return the size of the current workspace in bytes - - :return: byte size of the current workspace - """ - root_directory = Path(self.shard_workspace) - return sum(f.stat().st_size for f in root_directory.glob('**/*') if f.is_file()) - - def __getstate__(self): - d = dict(self.__dict__) - del d['logger'] - for k in self._post_init_vars: - del d[k] - cached = [k for k in d.keys() if k.startswith('CACHED_')] - for k in cached: - del d[k] - - d.pop('_drivers', None) - return d - - def __setstate__(self, d): - self.__dict__.update(d) - self.logger = JinaLogger(self.__class__.__name__) - try: - self._post_init_wrapper(fill_in_metas=False) - except ModuleNotFoundError as ex: - self.logger.warning( - f'{typename(ex)} is often caused by a missing component, ' - f'which often can be solved by "pip install" relevant package: {ex!r}', - exc_info=True, - ) - - def touch(self) -> None: - """Touch the executor and change ``is_updated`` to ``True`` so that one can call :func:`save`. """ - self.is_updated = True - - def save(self, filename: str = None): - """ - Persist data of this executor to the :attr:`shard_workspace`. The data could be - a file or collection of files produced/used during an executor run. - - These are some of the common data that you might want to persist: - - - binary dump/pickle of the executor - - the indexed files - - (pre)trained models - - .. warning:: - - Class members created in `post_init` will NOT be serialized when calling :func:`save`. Therefore if you - want to store them, please override the :func:`__getstate__`. - - It uses ``pickle`` for dumping. For members/attributes that are invalid or inefficient for ``pickle``, you - need to implement their own persistence strategy in the :func:`__getstate__`. - - :param filename: file path of the serialized file, if not given then :attr:`save_abspath` is used - """ - if not self.read_only and self.is_updated: - f = filename or self.save_abspath - if not f: - f = tempfile.NamedTemporaryFile( - 'w', delete=False, dir=os.environ.get('JINA_EXECUTOR_WORKDIR', None) - ).name - - if self.max_snapshot > 0 and os.path.exists(f): - bak_f = ( - f - + f'.snapshot-{self._last_snapshot_ts.strftime("%Y%m%d%H%M%S") or "NA"}' - ) - os.rename(f, bak_f) - self._snapshot_files.append(bak_f) - if len(self._snapshot_files) > self.max_snapshot: - d_f = self._snapshot_files.pop(0) - if os.path.exists(d_f): - os.remove(d_f) - with open(f, 'wb') as fp: - pickle.dump(self, fp) - self._last_snapshot_ts = datetime.now() - self.is_updated = False - self.logger.success( - f'artifacts of this executor ({self.name}) is persisted to {f}' + if getattr(self.runtime_args, 'workspace', None): + complete_workspace = os.path.join( + self.runtime_args.workspace, self.metas.name ) + replica_id = getattr(self.runtime_args, 'replica_id', None) + pea_id = getattr(self.runtime_args, 'pea_id', None) + if replica_id is not None and replica_id != -1: + complete_workspace = os.path.join(complete_workspace, str(replica_id)) + if pea_id is not None and pea_id != -1: + complete_workspace = os.path.join(complete_workspace, str(pea_id)) + return os.path.abspath(complete_workspace) + elif self.metas.workspace is not None: + return os.path.abspath(self.metas.workspace) else: - if not self.is_updated: - self.logger.info( - f'no update since {self._last_snapshot_ts:%Y-%m-%d %H:%M:%S%z}, will not save. ' - 'If you really want to save it, call "touch()" before "save()" to force saving' - ) - - @classmethod - def inject_config( - cls: Type[AnyExecutor], - raw_config: Dict, - pea_id: int = 0, - replica_id: int = -1, - read_only: bool = False, - *args, - **kwargs, - ) -> Dict: - """Inject config into the raw_config before loading into an object. - - :param raw_config: raw config to work on - :param pea_id: the id of the storage of this parallel pea - :param replica_id: the id of the replica the pea is contained in - :param read_only: if the executor should be readonly - :param args: Additional arguments. - :param kwargs: Additional key word arguments. - - :return: an executor object - """ - if 'metas' not in raw_config: - raw_config['metas'] = {} - tmp = fill_metas_with_defaults(raw_config) - tmp['metas']['pea_id'] = pea_id - tmp['metas']['replica_id'] = replica_id - tmp['metas']['read_only'] = read_only - if kwargs.get('metas'): - tmp['metas'].update(kwargs['metas']) - del kwargs['metas'] - tmp.update(kwargs) - return tmp - - @staticmethod - def load(filename: str = None) -> AnyExecutor: - """Build an executor from a binary file - - :param filename: the file path of the binary serialized file - :return: an executor object - - It uses ``pickle`` for loading. - """ - if not filename: - raise FileNotFoundError - try: - with open(filename, 'rb') as fp: - return pickle.load(fp) - except EOFError: - raise BadPersistantFile(f'broken file {filename} can not be loaded') - - def close(self) -> None: - """ - Release the resources as executor is destroyed, need to be overridden - """ - self.save() - self.logger.close() + raise Exception('can not find metas.workspace or runtime_args.workspace') def __enter__(self): return self def __exit__(self, exc_type, exc_val, exc_tb): self.close() - - def attach(self, runtime: 'ZEDRuntime', *args, **kwargs): - """Attach this executor to a Basepea - - This is called inside the initializing of a :class:`jina.peapods.runtime.BasePea`. - - :param runtime: Runtime procedure leveraging ZMQ. - :param args: Additional arguments. - :param kwargs: Additional key word arguments. - """ - for req_type, drivers in self._drivers.items(): - for driver in drivers: - driver.attach( - executor=self, runtime=runtime, req_type=req_type, *args, **kwargs - ) - - # replacing the logger to runtime's logger - if runtime and isinstance(getattr(runtime, 'logger', None), JinaLogger): - self.logger = runtime.logger - - def __call__(self, req_type, *args, **kwargs): - """ - - - # noqa: DAR201 - - - # noqa: DAR101 - - - # noqa: DAR102 - """ - if req_type in self._drivers: - for d in self._drivers[req_type]: - if d.attached: - d() - else: - raise UnattachedDriver(d) - else: - raise NoDriverForRequest(f'{req_type} for {self}') - - def __str__(self): - return self.__class__.__name__ - - -class GenericExecutor(BaseExecutor): - """Alias to BaseExecutor, but bind with GenericDriver by default. """ diff --git a/jina/executors/classifiers/__init__.py b/jina/executors/classifiers/__init__.py deleted file mode 100644 index d263819fac568..0000000000000 --- a/jina/executors/classifiers/__init__.py +++ /dev/null @@ -1,34 +0,0 @@ -from .. import BaseExecutor - -if False: - import numpy as np - - -class BaseClassifier(BaseExecutor): - """ - The base class of Classifier Executor. Classifier Executor allows one to - perform classification and regression on given input and output the predicted - hard/soft label. - - This class should not be used directly. Subclasses should be used. - """ - - def predict(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - """ - Perform hard/soft classification on ``data``, the predicted value for each sample in X is returned. - - The output value can be zero/one, for one-hot label; or float for soft-label or regression label. - Use the corresponding driver to interpret these labels - - The size and type of output can be one of the follows, ``B`` is ``data.shape[0]``: - - (B,) or (B, 1); zero/one or float - - (B, L): zero/one one-hot or soft label for L-class multi-class classification - - :param content: the input data to be classified, can be a ndim array. - where axis=0 represents the batch size, i.e. data[0] is the first sample, data[1] is the second sample, data[n] is the n sample - :type content: np.ndarray - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - :rtype: np.ndarray - """ - raise NotImplementedError diff --git a/jina/executors/compound.py b/jina/executors/compound.py deleted file mode 100644 index 336d6fc9509eb..0000000000000 --- a/jina/executors/compound.py +++ /dev/null @@ -1,378 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from collections import defaultdict -from typing import Dict, List, Callable, Union, Optional - -from . import BaseExecutor, AnyExecutor - - -class CompoundExecutor(BaseExecutor): - """A :class:`CompoundExecutor` is a set of multiple executors. - The most common usage is chaining a pipeline of executors, where the - input of the current is the output of the former. - - A common use case of :class:`CompoundExecutor` is to glue multiple :class:`BaseExecutor` together, instead of breaking them into different Pods. - - :param routes: a map of function routes. The key is the function name, the value is a tuple of two pieces, - where the first element is the name of the referred component (``metas.name``) and the second element - is the name of the referred function. - - .. seealso:: - - :func:`add_route` - :param resolve_all: universally add ``*_all()`` to all functions that have the identical name - - **Example 1: a compound Chunk Indexer that does vector indexing and key-value index** - - .. highlight:: yaml - .. code-block:: yaml - - !CompoundExecutor - components: - - !NumpyIndexer - with: - index_filename: vec.gz - metas: - name: vecidx_exec # a customized name - workspace: ${{TEST_WORKDIR}} - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx_exec - workspace: ${{TEST_WORKDIR}} - metas: - name: chunk_compound_indexer - workspace: ${{TEST_WORKDIR}} - requests: - on: - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx_exec - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx_exec - ControlRequest: - - !ControlReqDriver {} - - **Example 2: a compound crafter that first craft the doc and then segment ** - - .. highlight:: yaml - .. code-block:: yaml - - !CompoundExecutor - components: - - !GifNameRawSplit - metas: - name: name_split # a customized name - workspace: ${{TEST_WORKDIR}} - - !GifPreprocessor - with: - every_k_frame: 2 - from_buffer: true - metas: - name: gif2chunk_preprocessor # a customized name - metas: - name: compound_crafter - workspace: ${{TEST_WORKDIR}} - py_modules: gif2chunk.py - requests: - on: - IndexRequest: - - !DocCraftDriver - with: - executor: name_split - - !SegmentDriver - with: - executor: gif2chunk_preprocessor - ControlRequest: - - !ControlReqDriver {} - - Create a new :class:`CompoundExecutor` object - - - **Example 3: ** - - We have two dummy executors as follows: - - .. highlight:: python - .. code-block:: python - - class dummyA(BaseExecutor): - def say(self): - return 'a' - - def sayA(self): - print('A: im A') - - - class dummyB(BaseExecutor): - def say(self): - return 'b' - - def sayB(self): - print('B: im B') - - and we create a :class:`CompoundExecutor` consisting of these two via - - .. highlight:: python - .. code-block:: python - - da, db = dummyA(), dummyB() - ce = CompoundExecutor() - ce.components = lambda: [da, db] - - Now the new executor ``ce`` have two new methods, i.e :func:`ce.sayA` and :func:`ce.sayB`. They point to the original - :func:`dummyA.sayA` and :func:`dummyB.sayB` respectively. One can say ``ce`` has inherited these two methods. - - The interesting part is :func:`say`, as this function name is shared between :class:`dummyA` and :class:`dummyB`. - It requires some resolution. When `resolve_all=True`, then a new function :func:`say_all` is add to ``ce``. - ``ce.say_all`` works as if you call :func:`dummyA.sayA` and :func:`dummyB.sayB` in a row. This - makes sense in some cases such as training, saving. In other cases, it may require a more sophisticated resolution, - where one can use :func:`add_route` to achieve that. For example, - - .. highlight:: python - .. code-block:: python - - ce.add_route('say', db.name, 'say') - assert b.say() == 'b' - - Such resolution is what we call **routes** here, and it can be specified in advance with the - arguments ``routes`` in :func:`__init__`, or using YAML. - - .. highlight:: yaml - .. code-block:: yaml - - !CompoundExecutor - components: ... - with: - resolve_all: true - routes: - say: - - dummyB-e3acc910 - - say - - .. warning:: - - When setting inner `executors` in `components` the `workspace` configuration will not be used and will be overridden - by a workspace extracted considering the name of the `CompoundExecutor`, the name of each internal `Component` and the `pea_id` - - - One can access the component of a :class:`CompoundExecutor` via index, e.g. - - .. highlight:: python - .. code-block:: python - - c = BaseExecutor.load_config('compound-example.yaml') - assertTrue(c[0] == c['dummyA-1ef90ea8']) - c[0].add(obj) - - .. note:: - Component ``workspace`` and ``pea_id`` are overridden by their :class:`CompoundExecutor` counterparts. - - .. warning:: - - When sub-component is external, ``py_modules`` must be given at root level ``metas`` not at the sub-level. - - """ - - class _FnWrapper: - def __init__(self, fns): - self.fns = fns - - def __call__(self, *args, **kwargs): - r = [] - for f in self.fns: - r.append(f()) - return r - - class _FnAllWrapper(_FnWrapper): - def __call__(self, *args, **kwargs): - return all(super().__call__(*args, **kwargs)) - - class _FnOrWrapper(_FnWrapper): - def __call__(self, *args, **kwargs): - return any(super().__call__(*args, **kwargs)) - - def __init__( - self, routes: Dict[str, Dict] = None, resolve_all: bool = True, *args, **kwargs - ): - super().__init__(*args, **kwargs) - self._components = None # type: Optional[List[AnyExecutor]] - self._routes = routes - self._is_updated = False #: the internal update state of this compound executor - self.resolve_all = resolve_all - - @property - def is_updated(self) -> bool: - """ - Return ``True`` if any components is updated. - - :return: only true if all components are updated or if the compound is updated - """ - return ( - self.components and any(c.is_updated for c in self.components) - ) or self._is_updated - - @is_updated.setter - def is_updated(self, val: bool) -> None: - """ - Set :attr:`is_updated` for this :class:`CompoundExecutor`. Note, not to all its components - - :param val: new value of :attr:`is_updated` - """ - self._is_updated = val - - def save(self, filename: Optional[str] = None): - """ - Serialize this compound executor along with all components in it to binary files. - It uses ``pickle`` for dumping. - - :param filename: file path of the serialized file, if not given then :attr:`save_abspath` is used - """ - for c in self.components: - c.save() - super().save( - filename=filename - ) # do i really need to save the compound executor itself - - @property - def components(self) -> List[AnyExecutor]: - """ - Return all component executors as a list. The list follows the order as defined in the YAML config or the - pre-given order when calling the setter. - - :return: components - """ - return self._components - - @components.setter - def components(self, comps: Callable[[], List]) -> None: - """Set the components of this executors - - :param comps: a function returns a list of executors - """ - if not callable(comps): - raise TypeError( - 'components must be a callable function that returns ' - 'a List[BaseExecutor]' - ) - - # Important to handle when loading a CompoundExecutor when `inner` executors have not been loaded from yaml - if not getattr(self, '_init_from_yaml', False): - self._components = comps() - if not isinstance(self._components, list): - raise TypeError( - f'components expect a list of executors, receiving {type(self._components)!r}' - ) - self._set_comp_workspace() - self._resolve_routes() - self._post_components() - else: - self.logger.debug( - 'components is omitted from construction, as it is initialized from yaml config' - ) - - @staticmethod - def get_component_workspace_from_compound_workspace( - compound_workspace: str, compound_name: str, pea_id: int - ) -> str: - """ - Get the name of workspace. - - :param compound_workspace: Workspace of the compound executor. - :param compound_name: Name of the compound executor. - :param pea_id: Id of the pea. - :return: The name of workspace. - """ - import os - - return ( - BaseExecutor.get_shard_workspace(compound_workspace, compound_name, pea_id) - if (isinstance(pea_id, int) and pea_id > 0) - else os.path.join(compound_workspace, compound_name) - ) - - def _set_comp_workspace(self) -> None: - # overrides the workspace setting for all components - for c in self.components: - if not c.workspace and self.workspace: - c_workspace = ( - CompoundExecutor.get_component_workspace_from_compound_workspace( - self.workspace, self.name, self.pea_id - ) - ) - self.logger.warning(f'Setting workspace of {c.name} to {c_workspace}') - c.workspace = c_workspace - - def _resolve_routes(self) -> None: - if self._routes: - for f, v in self._routes.items(): - for kk, vv in v.items(): - self.add_route(f, kk, vv) - - def add_route( - self, fn_name: str, comp_name: str, comp_fn_name: str, is_stored: bool = False - ) -> None: - """Create a new function for this executor which refers to the component's function - - This will create a new function :func:`fn_name` which actually refers to ``components[comp_name].comp_fn_name``. - It is useful when two components have a function with duplicated name and one wants to resolve this duplication. - - :param fn_name: the name of the new function - :param comp_name: the name of the referred component, defined in ``metas.name`` - :param comp_fn_name: the name of the referred function of ``comp_name`` - :param is_stored: if ``True`` then this change will be stored in the config and affects future :func:`save` and - :func:`save_config` - - """ - for c in self.components: - if ( - c.name == comp_name - and hasattr(c, comp_fn_name) - and callable(getattr(c, comp_fn_name)) - ): - setattr(self, fn_name, getattr(c, comp_fn_name)) - if is_stored: - if not self._routes: - self._routes = {} - self._routes[fn_name] = {comp_name: comp_fn_name} - self.is_updated = True - return - else: - raise AttributeError(f'bad names: {comp_name} and {comp_fn_name}') - - def close(self) -> None: - """Close all components and release the resources""" - if self.components: - for c in self.components: - c.close() - super().close() - - def __contains__(self, item: str): - if isinstance(item, str): - for c in self.components: - if c.name == item: - return True - return False - else: - raise TypeError('CompoundExecutor only support string type "in"') - - def __getitem__(self, item: Union[int, str]): - if isinstance(item, int): - return self.components[item] - elif isinstance(item, str): - for c in self.components: - if c.name == item: - return c - else: - raise TypeError('CompoundExecutor only supports int or string index') - - def __iter__(self): - return self.components.__iter__() - - def _post_components(self): - pass diff --git a/jina/executors/crafters/__init__.py b/jina/executors/crafters/__init__.py deleted file mode 100644 index a43a36fcc147b..0000000000000 --- a/jina/executors/crafters/__init__.py +++ /dev/null @@ -1,31 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Dict, Union, List - -from .. import BaseExecutor - - -class BaseCrafter(BaseExecutor): - """ - A :class:`BaseCrafter` transforms the content of `Document`. - It can be used for preprocessing, segmenting etc. - It is an interface for Crafters which is a family of executors intended to apply - transformations to single documents. - The apply function is :func:`craft`, where the name of the arguments will be used as keys of the content. - - :param args: Additional positional arguments which are just used for the parent initialization - :param kwargs: Additional keyword arguments which are just used for the parent initialization - """ - - def craft(self, *args, **kwargs) -> Union[List[Dict], Dict]: - """ - Apply function of this executor. - The name of the arguments are used as keys, which are then used to tell :class:`Driver` what information to extract - from the protobuf request accordingly. - The name of the arguments should be always valid keys defined in the protobuf. - - :param args: Extra variable length arguments - :param kwargs: Extra variable keyword arguments - """ - raise NotImplementedError diff --git a/jina/executors/decorators.py b/jina/executors/decorators.py index 890caa36c10bb..27b5f8210fb7f 100644 --- a/jina/executors/decorators.py +++ b/jina/executors/decorators.py @@ -1,52 +1,20 @@ """Decorators and wrappers designed for wrapping :class:`BaseExecutor` functions. """ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import copy +import functools import inspect from functools import wraps -from itertools import islice, chain -from typing import Callable, Any, Union, Iterator, List, Optional, Dict, Iterable - -import numpy as np +from typing import ( + Callable, + Union, + List, + Optional, + Dict, + Sequence, +) from .metas import get_default_metas -from ..helper import batch_iterator, convert_tuple_to_list -from ..logging import default_logger - - -def as_aggregate_method(func: Callable) -> Callable: - """Mark a function so that it keeps track of the number of documents evaluated and a running sum - to have always access to average value - :param func: the function to decorate - :return: the wrapped function - """ - - @wraps(func) - def arg_wrapper(self, *args, **kwargs): - f = func(self, *args, **kwargs) - self._running_stats += f - return f - - return arg_wrapper - - -def as_update_method(func: Callable) -> Callable: - """Mark the function as the updating function of this executor, - calling this function will change the executor so later you can save the change via :func:`save` - Will set the is_updated property after function is called. - :param func: the function to decorate - :return: the wrapped function - """ - - @wraps(func) - def arg_wrapper(self, *args, **kwargs): - f = func(self, *args, **kwargs) - self.is_updated = True - return f - - return arg_wrapper +from .. import DocumentArray +from ..helper import convert_tuple_to_list def wrap_func(cls, func_lst, wrapper): @@ -63,27 +31,6 @@ def wrap_func(cls, func_lst, wrapper): setattr(cls, f_name, wrapper(getattr(cls, f_name))) -def as_ndarray(func: Callable, dtype=np.float32) -> Callable: - """Convert an :class:`BaseExecutor` function returns to a ``numpy.ndarray``, - the following type are supported: `EagerTensor`, `Tensor`, `list` - - :param func: the function to decorate - :param dtype: the converted dtype of the ``numpy.ndarray`` - :return: the wrapped function - """ - - @wraps(func) - def arg_wrapper(self, *args, **kwargs): - r = func(self, *args, **kwargs) - r_type = type(r).__name__ - if r_type in {'ndarray', 'EagerTensor', 'Tensor', 'list'}: - return np.array(r, dtype) - else: - raise TypeError(f'unrecognized type {r_type}: {type(r)}') - - return arg_wrapper - - def store_init_kwargs(func: Callable) -> Callable: """Mark the args and kwargs of :func:`__init__` later to be stored via :func:`save_config` in YAML :param func: the function to decorate @@ -96,7 +43,7 @@ def arg_wrapper(self, *args, **kwargs): raise TypeError( 'this decorator should only be used on __init__ method of an executor' ) - taboo = {'self', 'args', 'kwargs'} + taboo = {'self', 'args', 'kwargs', 'metas', 'requests', 'runtime_args'} _defaults = get_default_metas() taboo.update(_defaults.keys()) all_pars = inspect.signature(func).parameters @@ -127,302 +74,56 @@ def arg_wrapper(self, *args, **kwargs): return arg_wrapper -def _get_slice( - data: Union[Iterator[Any], List[Any], np.ndarray], total_size: int -) -> Union[Iterator[Any], List[Any], np.ndarray]: - if isinstance(data, Dict): - data = islice(data.items(), total_size) - else: - data = data[:total_size] - return data - - -def _get_size(data: Union[Iterator[Any], List[Any], np.ndarray], axis: int = 0) -> int: - if isinstance(data, np.ndarray): - total_size = data.shape[axis] - elif hasattr(data, '__len__'): - total_size = len(data) - else: - total_size = None - return total_size - - -def _get_total_size(full_data_size, batch_size, num_batch): - batched_data_size = batch_size * num_batch if num_batch else None - - if full_data_size is not None and batched_data_size is not None: - total_size = min(full_data_size, batched_data_size) - else: - total_size = full_data_size or batched_data_size - return total_size - - -def _merge_results_after_batching( - final_result, merge_over_axis: int = 0, flatten: bool = True +def requests( + func: Callable[ + [DocumentArray, DocumentArray, Dict, List[DocumentArray], List[DocumentArray]], + Optional[DocumentArray], + ] = None, + *, + on: Optional[Union[str, Sequence[str]]] = None, ): - if not final_result: - return - - if isinstance(final_result[0], np.ndarray): - if len(final_result[0].shape) > 1: - final_result = np.concatenate(final_result, merge_over_axis) - elif isinstance(final_result[0], list) and flatten: - final_result = list(chain.from_iterable(final_result)) - - return final_result - - -def batching( - func: Optional[Callable[[Any], np.ndarray]] = None, - batch_size: Optional[Union[int, Callable]] = None, - num_batch: Optional[int] = None, - split_over_axis: int = 0, - merge_over_axis: int = 0, - slice_on: int = 1, - slice_nargs: int = 1, - label_on: Optional[int] = None, - ordinal_idx_arg: Optional[int] = None, - flatten_output: bool = True, -) -> Any: - """Split the input of a function into small batches and call :func:`func` on each batch - , collect the merged result and return. This is useful when the input is too big to fit into memory - - :param func: function to decorate - :param batch_size: size of each batch - :param num_batch: number of batches to take, the rest will be ignored - :param split_over_axis: split over which axis into batches - :param merge_over_axis: merge over which axis into a single result - :param slice_on: the location of the data. When using inside a class, - ``slice_on`` should take ``self`` into consideration. - :param slice_nargs: the number of arguments - :param label_on: the location of the labels. Useful for data with any kind of accompanying labels - :param ordinal_idx_arg: the location of the ordinal indexes argument. Needed for classes - where function decorated needs to know the ordinal indexes of the data in the batch - (Not used when label_on is used) - :param flatten_output: If this is set to True, the results from different batches will be chained and the returning value is a list of the results. Otherwise, the returning value is a list of lists, in which each element is a list containing the result from one single batch. Note if there is only one batch returned, the returned result is always flatten. - :return: the merged result as if run :func:`func` once on the input. - - Example: - .. highlight:: python - .. code-block:: python - - class MemoryHungryExecutor: - - @batching - def train(self, batch: 'numpy.ndarray', *args, **kwargs): - gpu_train(batch) #: this will respect the ``batch_size`` defined as object attribute - - @batching(batch_size = 64) - def train(self, batch: 'numpy.ndarray', *args, **kwargs): - gpu_train(batch) """ + `@requests` defines when a function will be invoked. It has a keyword `on=` to define the endpoint. - def _batching(func): - @wraps(func) - def arg_wrapper(*args, **kwargs): - # priority: decorator > class_attribute - # by default data is in args[1] (self needs to be taken into account) - data = args[slice_on : slice_on + slice_nargs] - b_size = ( - batch_size(data) if callable(batch_size) else batch_size - ) or getattr(args[0], 'batch_size', None) - - # no batching if b_size is None - if b_size is None or data is None: - return func(*args, **kwargs) - - default_logger.debug( - f'batching enabled for {func.__qualname__} batch_size={b_size} ' - f'num_batch={num_batch} axis={split_over_axis}' - ) - - results = [] - data = (data, args[label_on]) if label_on else data - - yield_slice = [ - isinstance(args[slice_on + i], np.memmap) for i in range(0, slice_nargs) - ] - - slice_idx = None - - # split the data into batches - data_iterators = [ - batch_iterator( - data[i], - b_size, - split_over_axis, - yield_slice=yield_slice[i], - ) - for i in range(0, slice_nargs) - ] - - batch_args = list(copy.copy(args)) - - # load the batches of data and feed into the function - for _data_args in zip(*data_iterators): - _data_args = list(_data_args) - for i, (_yield_slice, _arg) in enumerate(zip(yield_slice, _data_args)): - if _yield_slice: - original_arg = args[slice_on + i] - _memmap = np.memmap( - original_arg.filename, - dtype=original_arg.dtype, - mode='r', - shape=original_arg.shape, - ) - _data_args[i] = _memmap[_arg] - slice_idx = _arg[split_over_axis] - if slice_idx.start is None or slice_idx.stop is None: - slice_idx = None - del _memmap - - # TODO: figure out what is ordinal_idx_arg - if not isinstance(_data_args[i], tuple): - if ordinal_idx_arg and slice_idx is not None: - batch_args[ordinal_idx_arg] = slice_idx - - batch_args[slice_on : slice_on + slice_nargs] = _data_args - - r = func(*batch_args, **kwargs) - - if r is not None: - results.append(r) - - return _merge_results_after_batching( - results, merge_over_axis, flatten_output - ) - - return arg_wrapper - - if func: - return _batching(func) - else: - return _batching - - -def single( - func: Optional[Callable[[Any], np.ndarray]] = None, - merge_over_axis: int = 0, - slice_on: int = 1, - slice_nargs: int = 1, - flatten_output: bool = False, -) -> Any: - """Guarantee that the inputs of a function with more than one argument is provided as single instances and not in batches - - :param func: function to decorate - :param merge_over_axis: merge over which axis into a single result - :param slice_on: the location of the data. When using inside a class, - ``slice_on`` should take ``self`` into consideration. - :param slice_nargs: the number of positional arguments considered as data - :param flatten_output: If this is set to True, the results from different batches will be chained and the returning value is a list of the results. Otherwise, the returning value is a list of lists, in which each element is a list containing the result from one single batch. Note if there is only one batch returned, the returned result is always flatten. - :return: the merged result as if run :func:`func` once on the input. - - ..warning: - data arguments will be taken starting from ``slice_on` to ``slice_on + num_data`` - - Example: - .. highlight:: python - .. code-block:: python - - class OneByOneCrafter: - - @single - def craft(self, text: str, id: str) -> Dict: - ... - - .. note: - Single multi input decorator will let the user interact with the executor in 3 different ways: - - Providing batches: (This decorator will make sure that the actual method receives just a single instance) - - Providing a single instance - - Providing a single instance through kwargs. - - .. highlight:: python - .. code-block:: python - - class OneByOneCrafter: - @single - def craft(self, text: str, id: str) -> Dict: - return {'text': f'{text}-crafted', 'id': f'{id}-crafted'} - - crafter = OneByOneCrafter() - - results = crafted.craft(['text1', 'text2'], ['id1', 'id2']) - assert len(results) == 2 - assert results[0] == {'text': 'text1-crafted', 'id': 'id1-crafted'} - assert results[1] == {'text': 'text2-crafted', 'id': 'id2-crafted'} + A class method decorated with plan `@requests` (without `on=`) is the default handler for all endpoints. + That means, it is the fallback handler for endpoints that are not found. - result = crafter.craft('text', 'id') - assert result['text'] == 'text-crafted' - assert result['id'] == 'id-crafted' - - results = crafted.craft(text='text', id='id') - assert result['text'] == 'text-crafted' - assert result['id'] == 'id-crafted' + :param func: the method to decorate + :param on: the endpoint string, by convention starts with `/` + :return: decorated function """ + from .. import __default_endpoint__, __num_args_executor_func__ - def _single_multi_input(func): - @wraps(func) - def arg_wrapper(*args, **kwargs): - # by default data is in args[1:] (self needs to be taken into account) - args = list(args) - default_logger.debug(f'batching disabled for {func.__qualname__}') - - data_iterators = args[slice_on : slice_on + slice_nargs] + class FunctionMapper: + def __init__(self, fn): - if len(args) <= slice_on: - # like this one can use the function with single kwargs - return func(*args, **kwargs) - elif len(args) < slice_on + slice_nargs: - raise IndexError( - f'can not select positional args at {slice_on}: {slice_nargs}, ' - f'your `args` has {len(args)} arguments.' + arg_spec = inspect.getfullargspec(fn) + if not arg_spec.varkw and len(arg_spec.args) < __num_args_executor_func__: + raise TypeError( + f'{fn} accepts only {arg_spec.args} which is fewer than expected, ' + f'please add `**kwargs` to the function signature.' ) - elif ( - len(args) <= slice_on - or isinstance(data_iterators[0], str) - or isinstance(data_iterators[0], bytes) - or not isinstance(data_iterators[0], Iterable) - ): - # like this one can use the function with single kwargs - return func(*args, **kwargs) - - final_result = [] - for new_args in zip(*data_iterators): - args[slice_on : slice_on + slice_nargs] = new_args - r = func(*args, **kwargs) - - if r is not None: - final_result.append(r) - - return _merge_results_after_batching( - final_result, merge_over_axis, flatten=flatten_output - ) - return arg_wrapper - - if func: - return _single_multi_input(func) - else: - return _single_multi_input + @functools.wraps(fn) + def arg_wrapper(*args, **kwargs): + return fn(*args, **kwargs) + self.fn = arg_wrapper -def requests(func: Callable = None, on: str = 'default') -> Callable: - """Decorator for binding an Executor function to requests - - :param func: the Executor function to decorate - :param on: the request type to bind, e.g. IndexRequest, SearchRequest, UpdateRequest, DeleteRequest, etc. - you may also use `index`, `search`, `update`, `delete` as shortcut. - :return: the wrapped function - """ + def __set_name__(self, owner, name): + self.fn.class_name = owner.__name__ + if not hasattr(owner, 'requests'): + owner.requests = {} - def _requests(func): - @wraps(func) - def arg_wrapper(*args, **kwargs): - return func(*args, **kwargs) + if isinstance(on, (list, tuple)): + for o in on: + owner.requests[o] = self.fn + else: + owner.requests[on or __default_endpoint__] = self.fn - return arg_wrapper + setattr(owner, name, self.fn) if func: - return _requests(func) + return FunctionMapper(func) else: - return _requests + return FunctionMapper diff --git a/jina/executors/devices.py b/jina/executors/devices.py deleted file mode 100644 index 78b5eccf68fc6..0000000000000 --- a/jina/executors/devices.py +++ /dev/null @@ -1,270 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from abc import abstractmethod - -from ..helper import cached_property - - -class BaseDevice: - """:class:`BaseFrameworkExecutor` is the base class for the executors using other frameworks internally, including `tensorflow`, `pytorch`, `onnx`, `faiss` and `paddlepaddle`.""" - - @cached_property - @abstractmethod - def device(self): - """ - Set the device on which the executor will be running. - - ..notes: - In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use, - please use the environment variable `CUDA_VISIBLE_DEVICES`. - """ - - @abstractmethod - def to_device(self, *args, **kwargs): - """Move the computation from GPU to CPU or vice versa.""" - - -class TorchDevice(BaseDevice): - """ - :class:`BaseTorchDeviceHandler` implements the base class for the executors using :mod:`torch` library. The common setups go into this class. - - To implement your own executor with the :mod:`torch` library, - - .. highlight:: python - .. code-block:: python - - class MyAwesomeTorchEncoder(BaseEncoder, BaseTorchDeviceHandler): - def post_init(self): - # load your awesome model - import torchvision.models as models - self.model = models.mobilenet_v2().features.eval() - self.to_device(self.model) - - def encode(self, data, *args, **kwargs): - # use your awesome model to encode/craft/score - import torch - torch.set_grad_enabled(False) - - _input = torch.as_tensor(data, device=self.device) - _output = self.model(_input).cpu() - - return _output.numpy() - - """ - - @cached_property - def device(self): - """ - Set the device on which the executors using :mod:`torch` library will be running. - - ..notes: - In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use, - please use the environment variable `CUDA_VISIBLE_DEVICES`. - """ - import torch - - return torch.device('cuda:0') if self.on_gpu else torch.device('cpu') - - def to_device(self, model, *args, **kwargs): - """Load the model to device.""" - model.to(self.device) - - -class PaddleDevice(BaseDevice): - """ - :class:`BasePaddleExecutor` implements the base class for the executors using :mod:`paddlepaddle` library. The common setups go into this class. - - To implement your own executor with the :mod:`paddlepaddle` library, - - .. highlight:: python - .. code-block:: python - - class MyAwesomePaddleEncoder(BasePaddleExecutor): - def post_init(self): - # load your awesome model - import paddlehub as hub - module = hub.Module(name='mobilenet_v2_imagenet') - inputs, outputs, self.model = module.context(trainable=False) - self.inputs_name = input_dict['image'].name - self.outputs_name = output_dict['feature_map'].name - self.exe = self.to_device() - - def encode(self, data, *args, **kwargs): - # use your awesome model to encode/craft/score - _output, *_ = self.exe.run( - program=self.model, - fetch_list=[self.outputs_name], - feed={self.inputs_name: data}, - return_numpy=True - ) - return feature_map - """ - - @cached_property - def device(self): - """ - Set the device on which the executors using :mod:`paddlepaddle` library will be running. - - ..notes: - In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use, - please use the environment variable `CUDA_VISIBLE_DEVICES`. - """ - import paddle.fluid as fluid - - return fluid.CUDAPlace(0) if self.on_gpu else fluid.CPUPlace() - - def to_device(self): - """Load the model to device.""" - import paddle.fluid as fluid - - return fluid.Executor(self.device) - - -class TFDevice(BaseDevice): - """ - :class:`BaseTFDeviceHandler` implements the base class for the executors using :mod:`tensorflow` library. The common setups go into this class. - - To implement your own executor with the :mod:`tensorflow` library, - - .. highlight:: python - .. code-block:: python - - class MyAwesomeTFEncoder(BaseTFDeviceHandler): - def post_init(self): - # load your awesome model - self.to_device() - import tensorflow as tf - model = tf.keras.applications.MobileNetV2( - input_shape=(self.img_shape, self.img_shape, 3), - include_top=False, - pooling=self.pool_strategy, - weights='imagenet') - model.trainable = False - self.model = model - - def encode(self, data, *args, **kwargs): - # use your awesome model to encode/craft/score - return self.model(data) - """ - - @cached_property - def device(self): - """ - Set the device on which the executors using :mod:`tensorflow` library will be running. - - ..notes: - In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use, - please use the environment variable `CUDA_VISIBLE_DEVICES`. - """ - import tensorflow as tf - - cpus = tf.config.experimental.list_physical_devices(device_type='CPU') - gpus = tf.config.experimental.list_physical_devices(device_type='GPU') - if self.on_gpu and len(gpus) > 0: - cpus.append(gpus[0]) - return cpus - - def to_device(self): - """Load the model to device.""" - import tensorflow as tf - - tf.config.experimental.set_visible_devices(devices=self.device) - - -class OnnxDevice(BaseDevice): - """ - :class:`OnnxDevice` implements the base class for the executors using :mod:`onnxruntime` library. The common setups go into this class. - - To implement your own executor with the :mod:`onnxruntime` library, - - .. highlight:: python - .. code-block:: python - - class MyAwesomeOnnxEncoder(BaseOnnxDeviceHandler): - def __init__(self, output_feature, model_path, *args, **kwargs): - super().__init__(*args, **kwargs) - self.outputs_name = output_feature - self.model_path = model_path - - def post_init(self): - import onnxruntime - self.model = onnxruntime.InferenceSession(self.model_path, None) - self.inputs_name = self.model.get_inputs()[0].name - self.to_device(self.model) - - def encode(self, data, *args, **kwargs): - # use your awesome model to encode/craft/score - results = [] - for idx in data: - data_encoded, *_ = self.model.run( - [self.outputs_name, ], {self.inputs_name: data}) - results.append(data_encoded) - return np.concatenate(results, axis=0) - - """ - - @cached_property - def device(self): - """ - Set the device on which the executors using :mod:`onnxruntime` library will be running. - - ..notes: - In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use, - please use the environment variable `CUDA_VISIBLE_DEVICES`. - """ - return ['CUDAExecutionProvider'] if self.on_gpu else ['CPUExecutionProvider'] - - def to_device(self, model, *args, **kwargs): - """Load the model to device.""" - model.set_providers(self.device) - - -class FaissDevice(BaseDevice): - """:class:`FaissDevice` implements the base class for the executors using :mod:`faiss` library. The common setups go into this class.""" - - @cached_property - def device(self): - """ - Set the device on which the executors using :mod:`faiss` library will be running. - - ..notes: - In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use, - please use the environment variable `CUDA_VISIBLE_DEVICES`. - """ - import faiss - - # For now, consider only one GPU, do not distribute the index - return faiss.StandardGpuResources() if self.on_gpu else None - - def to_device(self, index, *args, **kwargs): - """Load the model to device.""" - import faiss - - device = self.device - return ( - faiss.index_cpu_to_gpu(device, 0, index, None) - if device is not None - else index - ) - - -class MindsporeDevice(BaseDevice): - """:class:`MindsporeDevice` implements the base classes for the executors using :mod:`mindspore` library. The common setups go into this class.""" - - @cached_property - def device(self): - """ - Set the device on which the executors using :mod:`mindspore` library will be running. - - ..notes: - In the case of using GPUs, we only use the first gpu from the visible gpus. To specify which gpu to use, - please use the environment variable `CUDA_VISIBLE_DEVICES`. - """ - return 'GPU' if self.on_gpu else 'CPU' - - def to_device(self): - """Load the model to device.""" - import mindspore.context as context - - context.set_context(mode=context.GRAPH_MODE, device_target=self.device) diff --git a/jina/executors/encoders/__init__.py b/jina/executors/encoders/__init__.py deleted file mode 100644 index bbc6a07074bbd..0000000000000 --- a/jina/executors/encoders/__init__.py +++ /dev/null @@ -1,88 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - - -from .. import BaseExecutor - -if False: - # fix type-hint complain for sphinx and flake - from typing import TypeVar - import numpy as np - import scipy - import tensorflow as tf - import torch - - EncodingType = TypeVar( - 'EncodingType', - np.ndarray, - scipy.sparse.csr_matrix, - scipy.sparse.coo_matrix, - scipy.sparse.bsr_matrix, - scipy.sparse.csc_matrix, - torch.sparse_coo_tensor, - tf.SparseTensor, - ) - - -class BaseEncoder(BaseExecutor): - """``BaseEncoder`` encodes chunk into vector representation. - - The key function is :func:`encode`. - - .. seealso:: - :mod:`jina.drivers.encode` - """ - - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'EncodingType': - """Encode the data, needs to be implemented in subclass. - :param content: the data to be encoded - :param args: additional positional arguments - :param kwargs: additional key-value arguments - """ - - raise NotImplementedError - - -class BaseNumericEncoder(BaseEncoder): - """BaseNumericEncoder encodes data from a ndarray, potentially B x ([T] x D) into a ndarray of B x D""" - - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'EncodingType': - """ - :param content: a `B x ([T] x D)` numpy ``ndarray``, `B` is the size of the batch - :param args: additional positional arguments - :param kwargs: additional key-value arguments - """ - raise NotImplementedError - - -class BaseImageEncoder(BaseNumericEncoder): - """BaseImageEncoder encodes data from a ndarray, potentially B x (Height x Width) into a ndarray of B x D""" - - pass - - -class BaseVideoEncoder(BaseNumericEncoder): - """BaseVideoEncoder encodes data from a ndarray, potentially B x (Time x Height x Width) into a ndarray of B x D""" - - pass - - -class BaseAudioEncoder(BaseNumericEncoder): - """BaseAudioEncoder encodes data from a ndarray, potentially B x (Time x D) into a ndarray of B x D""" - - pass - - -class BaseTextEncoder(BaseEncoder): - """ - BaseTextEncoder encodes data from an array of string type (data.dtype.kind == 'U') of size B into a ndarray of B x D. - """ - - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'EncodingType': - """ - - :param content: an 1d array of string type (data.dtype.kind == 'U') in size B - :param args: additional positional arguments - :param kwargs: additional key-value arguments - """ - raise NotImplementedError diff --git a/jina/executors/encoders/frameworks.py b/jina/executors/encoders/frameworks.py deleted file mode 100644 index d8e82042bd510..0000000000000 --- a/jina/executors/encoders/frameworks.py +++ /dev/null @@ -1,164 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import os -from typing import Optional - -from . import BaseEncoder -from ..devices import OnnxDevice, PaddleDevice, TorchDevice, TFDevice, MindsporeDevice -from ...excepts import ModelCheckpointNotExist -from ...helper import is_url, cached_property - - -# mixin classes go first, base classes are read from right to left. -class BaseOnnxEncoder(OnnxDevice, BaseEncoder): - """ - :class:`BasePaddleEncoder` is the base class for implementing Encoders with models from :mod:`onnxruntime` library. - - :param output_feature: the name of the layer for feature extraction. - :param model_path: the path of the model in the format of `.onnx`. Check a list of available pretrained - models at https://github.com/onnx/models#image_classification and download the git LFS to your local path. - The ``model_path`` is the local path of the ``.onnx`` file, e.g. ``/tmp/onnx/mobilenetv2-1.0.onnx``. - """ - - def __init__( - self, - output_feature: Optional[str] = None, - model_path: Optional[str] = None, - *args, - **kwargs, - ): - super().__init__(*args, **kwargs) - self.outputs_name = output_feature - self.raw_model_path = model_path - - def post_init(self): - """ - Load the model from the `.onnx` file and add outputs for the selected layer, i.e. ``outputs_name``. The modified - models is saved at `tmp_model_path`. - """ - super().post_init() - model_name = self.raw_model_path.split('/')[-1] if self.raw_model_path else None - tmp_model_path = ( - self.get_file_from_workspace(f'{model_name}.tmp') if model_name else None - ) - raw_model_path = self.raw_model_path - if self.raw_model_path and is_url(self.raw_model_path): - import urllib.request - - download_path, *_ = urllib.request.urlretrieve(self.raw_model_path) - raw_model_path = download_path - self.logger.info(f'download the model at {self.raw_model_path}') - if tmp_model_path and not os.path.exists(tmp_model_path) and self.outputs_name: - self._append_outputs(raw_model_path, self.outputs_name, tmp_model_path) - self.logger.info( - f'save the model with outputs [{self.outputs_name}] at {tmp_model_path}' - ) - - if tmp_model_path and os.path.exists(tmp_model_path): - import onnxruntime - - self.model = onnxruntime.InferenceSession(tmp_model_path, None) - self.inputs_name = self.model.get_inputs()[0].name - self._device = None - self.to_device(self.model) - else: - raise ModelCheckpointNotExist(f'model at {tmp_model_path} does not exist') - - @staticmethod - def _append_outputs(inputs, outputs_name_to_append, output_fn): - import onnx - - model = onnx.load(inputs) - feature_map = onnx.helper.ValueInfoProto() - feature_map.name = outputs_name_to_append - model.graph.output.append(feature_map) - onnx.save(model, output_fn) - - -class BaseTFEncoder(TFDevice, BaseEncoder): - """:class:`BasePaddleEncoder` is the base class for implementing Encoders with models from :mod:`tensorflow` library.""" - - pass - - -class BaseTorchEncoder(TorchDevice, BaseEncoder): - """Base encoder class for :mod:`pytorch` library.""" - - pass - - -class BasePaddleEncoder(PaddleDevice, BaseEncoder): - """:class:`BasePaddleEncoder` is the base class for implementing Encoders with models from :mod:`paddlepaddle` library.""" - - pass - - -class BaseMindsporeEncoder(MindsporeDevice, BaseEncoder): - """ - :class:`BaseMindsporeEncoder` is the base class for implementing Encoders with models from `mindspore`. - - To implement your own executor with the :mod:`mindspore` lilbrary, - - .. highlight:: python - .. code-block:: python - import mindspore.nn as nn - - class YourAwesomeModel(nn.Cell): - def __init__(self): - ... - - def construct(self, x): - ... - - class YourAwesomeEncoder(BaseMindsporeEncoder): - def encode(self, data, *args, **kwargs): - from mindspore import Tensor - return self.model(Tensor(data)).asnumpy() - - def get_cell(self): - return YourAwesomeModel() - - :param model_path: the path of the model's checkpoint. - :param args: additional arguments - :param kwargs: additional key value arguments - """ - - def __init__(self, model_path: Optional[str] = None, *args, **kwargs): - super().__init__(*args, **kwargs) - self.model_path = model_path - - def post_init(self): - """ - Load the model from the `.ckpt` checkpoint. - """ - super().post_init() - if self.model_path and os.path.exists(self.model_path): - self.to_device() - from mindspore.train.serialization import ( - load_checkpoint, - load_param_into_net, - ) - - _param_dict = load_checkpoint(ckpt_file_name=self.model_path) - load_param_into_net(self.model, _param_dict) - else: - raise ModelCheckpointNotExist(f'model {self.model_path} does not exist') - - @cached_property - def model(self): - """ - Get the Mindspore Neural Networks Cells. - :return: model property - """ - return self.get_cell() - - def get_cell(self): - """ - Return Mindspore Neural Networks Cells. - - Pre-defined building blocks or computing units to construct Neural Networks. - A ``Cell`` could be a single neural network cell, such as conv2d, relu, batch_norm, etc. - or a composition of cells to constructing a network. - """ - raise NotImplementedError diff --git a/jina/executors/encoders/multimodal/__init__.py b/jina/executors/encoders/multimodal/__init__.py deleted file mode 100644 index 914bb9c1b0c56..0000000000000 --- a/jina/executors/encoders/multimodal/__init__.py +++ /dev/null @@ -1,31 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Sequence - -import numpy as np - -from ... import BaseExecutor - - -class BaseMultiModalEncoder(BaseExecutor): - """ - :class:`BaseMultiModalEncoder` encodes data from multiple inputs (``text``, ``buffer``, ``blob`` or other ``embeddings``) - into a single ``embedding`` - """ - - def __init__(self, positional_modality: Sequence[str], *args, **kwargs): - """ - :param positional_modality: the list of arguments indicating in which order the modalities they need to come - for the encoding method - :return: - """ - super().__init__(*args, **kwargs) - self.positional_modality = positional_modality - - def encode(self, *data: 'np.ndarray', **kwargs) -> 'np.ndarray': - """ - :param: data: M arguments of shape `B x (D)` numpy ``ndarray``, `B` is the size of the batch, `M` is the number of modalities - :return: a `B x D` numpy ``ndarray`` - """ - raise NotImplementedError diff --git a/jina/executors/encoders/numeric/__init__.py b/jina/executors/encoders/numeric/__init__.py deleted file mode 100644 index c953b94590610..0000000000000 --- a/jina/executors/encoders/numeric/__init__.py +++ /dev/null @@ -1,42 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Optional - -import numpy as np - -from .. import BaseNumericEncoder -from ...decorators import batching - - -class TransformEncoder(BaseNumericEncoder): - """ - :class:`TransformEncoder` encodes data from an ndarray in size `B x T` into an ndarray in size `B x D` - - :param model_path: path from where to pickle the sklearn model. - :param args: Extra positional arguments to be set - :param kwargs: Extra keyword arguments to be set - """ - - def __init__(self, model_path: Optional[str] = None, *args, **kwargs): - super().__init__(*args, **kwargs) - self.model_path = model_path - - def post_init(self) -> None: - """Load the model from path if :param:`model_path` is set.""" - import pickle - - self.model = None - if self.model_path: - with open(self.model_path, 'rb') as model_file: - self.model = pickle.load(model_file) - - @batching - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - """ - :param content: a `B x T` numpy ``ndarray``, `B` is the size of the batch - :return: a `B x D` numpy ``ndarray`` - :param args: Extra positional arguments to be set - :param kwargs: Extra keyword arguments to be set - """ - return self.model.transform(content) diff --git a/jina/executors/evaluators/__init__.py b/jina/executors/evaluators/__init__.py deleted file mode 100644 index 93ba4ac9db4d7..0000000000000 --- a/jina/executors/evaluators/__init__.py +++ /dev/null @@ -1,85 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Any - -from .running_stats import RunningStats -from .. import BaseExecutor -from ..compound import CompoundExecutor - - -class BaseEvaluator(BaseExecutor): - """A :class:`BaseEvaluator` is used to evaluate different messages coming from any kind of executor""" - - metric = '' #: Get the name of the evaluation metric - - def post_init(self): - """Initialize running stats.""" - super().post_init() - self._running_stats = RunningStats() - - def evaluate(self, actual: Any, desired: Any, *args, **kwargs) -> float: - """Evaluates difference between param:`actual` and `param:desired`, needs to be implemented in subclass.""" - raise NotImplementedError - - @property - def mean(self) -> float: - """Get the running mean.""" - return self._running_stats.mean - - @property - def std(self) -> float: - """Get the running standard variance.""" - return self._running_stats.std - - @property - def variance(self) -> float: - """Get the running variance.""" - return self._running_stats.variance - - -class FileBasedEvaluator(CompoundExecutor): - - """A Frequently used pattern for combining A :class:`BinaryPbIndexer` and :class:`BaseEvaluator`. - It will be equipped with predefined ``requests.on`` behaviors: - - - At evaluation time(query or index) - - 1. Checks for the incoming document, gets its value from the `BinaryPbIndexer` and fills the `groundtruth of the request - - 2. Filter the documents that do not have a corresponding groundtruth - - 3. The BaseEvaluator works as if the `groundtruth` had been provided by the client as it comes in the request. - - .. warning:: - The documents that are not found to have an indexed groundtruth are removed from the `request` so that the `Evaluator` only - works with documents which have groundtruth. - - One can use the :class:`FileBasedEvaluator` via - - .. highlight:: yaml - .. code-block:: yaml - - !FileBasedEvaluator - components: - - !BinaryPbIndexer - with: - index_filename: ground_truth.gz - metas: - name: groundtruth_index # a customized name - workspace: ${{TEST_WORKDIR}} - - !BaseEvaluator - - Without defining any ``requests.on`` logic. When load from this YAML, it will be auto equipped with - - .. highlight:: yaml - .. code-block:: yaml - - on: - [SearchRequest, IndexRequest]: - - !LoadGroundTruthDriver - with: - executor: BaseKVIndexer - - !BaseEvaluateDriver - with: - executor: BaseEvaluator - ControlRequest: - - !ControlReqDriver {} - """ diff --git a/jina/executors/evaluators/embedding/__init__.py b/jina/executors/evaluators/embedding/__init__.py deleted file mode 100644 index 97bdd8176da0b..0000000000000 --- a/jina/executors/evaluators/embedding/__init__.py +++ /dev/null @@ -1,34 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from .. import BaseEvaluator - - -class BaseEmbeddingEvaluator(BaseEvaluator): - """A :class:`BaseEmbeddingEvaluator` evaluates the difference between actual and desired embeddings""" - - def evaluate( - self, actual: 'np.array', desired: 'np.array', *args, **kwargs - ) -> float: - """ " - :param actual: the embedding of the document (resulting from an Encoder) - :param desired: the expected embedding of the document - :return the evaluation metric value for the request document - """ - raise NotImplementedError - - -def expand_vector(vec): - """ - Expand 1d vector with one dimension axis == 0. - - :param vec: Vector to be expanded. - :return: Expanded vector, - """ - if not isinstance(vec, np.ndarray): - vec = np.array(vec) - if len(vec.shape) == 1: - vec = np.expand_dims(vec, 0) - return vec diff --git a/jina/executors/evaluators/embedding/cosine.py b/jina/executors/evaluators/embedding/cosine.py deleted file mode 100644 index 4e6436954b5db..0000000000000 --- a/jina/executors/evaluators/embedding/cosine.py +++ /dev/null @@ -1,63 +0,0 @@ -import numpy as np - -from ..embedding import BaseEmbeddingEvaluator, expand_vector - - -class CosineEvaluator(BaseEmbeddingEvaluator): - """A :class:`CosineEvaluator` evaluates the distance between actual and desired embeddings computing - the cosine distance between them. (The smaller value the closest distance, it is not cosine similarity measure) - - .. math:: - - 1 - \\frac{u \\cdot v} - {||u||_2 ||v||_2}. - """ - - metric = 'CosineDistance' - - def evaluate( - self, actual: 'np.array', desired: 'np.array', *args, **kwargs - ) -> float: - """ " - :param actual: the embedding of the document (resulting from an Encoder) - :param desired: the expected embedding of the document - :return the evaluation metric value for the request document - """ - actual = expand_vector(actual) - desired = expand_vector(desired) - return _cosine(_ext_A(_norm(actual)), _ext_B(_norm(desired))) - - -# duplicate on purpose, to be migrated to the Hub -def _get_ones(x, y): - return np.ones((x, y)) - - -def _ext_A(A): - nA, dim = A.shape - A_ext = _get_ones(nA, dim * 3) - A_ext[:, dim : 2 * dim] = A - A_ext[:, 2 * dim :] = A ** 2 - return A_ext - - -def _ext_B(B): - nB, dim = B.shape - B_ext = _get_ones(dim * 3, nB) - B_ext[:dim] = (B ** 2).T - B_ext[dim : 2 * dim] = -2.0 * B.T - del B - return B_ext - - -def _euclidean(A_ext, B_ext): - sqdist = A_ext.dot(B_ext).clip(min=0) - return np.sqrt(sqdist) - - -def _norm(A): - return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True) - - -def _cosine(A_norm_ext, B_norm_ext): - return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2 diff --git a/jina/executors/evaluators/embedding/euclidean.py b/jina/executors/evaluators/embedding/euclidean.py deleted file mode 100644 index 44cae756342cb..0000000000000 --- a/jina/executors/evaluators/embedding/euclidean.py +++ /dev/null @@ -1,58 +0,0 @@ -import numpy as np - -from ..embedding import BaseEmbeddingEvaluator, expand_vector - - -class EuclideanEvaluator(BaseEmbeddingEvaluator): - """A :class:`EuclideanEvaluator` evaluates the distance between actual and desired embeddings computing - the euclidean distance between them - """ - - metric = 'EuclideanDistance' - - def evaluate( - self, actual: 'np.array', desired: 'np.array', *args, **kwargs - ) -> float: - """ " - :param actual: the embedding of the document (resulting from an Encoder) - :param desired: the expected embedding of the document - :return the evaluation metric value for the request document - """ - actual = expand_vector(actual) - desired = expand_vector(desired) - return _euclidean(_ext_A(actual), _ext_B(desired)) - - -# duplicate on purpose, to be migrated to the Hub -def _get_ones(x, y): - return np.ones((x, y)) - - -def _ext_A(A): - nA, dim = A.shape - A_ext = _get_ones(nA, dim * 3) - A_ext[:, dim : 2 * dim] = A - A_ext[:, 2 * dim :] = A ** 2 - return A_ext - - -def _ext_B(B): - nB, dim = B.shape - B_ext = _get_ones(dim * 3, nB) - B_ext[:dim] = (B ** 2).T - B_ext[dim : 2 * dim] = -2.0 * B.T - del B - return B_ext - - -def _euclidean(A_ext, B_ext): - sqdist = A_ext.dot(B_ext).clip(min=0) - return np.sqrt(sqdist) - - -def _norm(A): - return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True) - - -def _cosine(A_norm_ext, B_norm_ext): - return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2 diff --git a/jina/executors/evaluators/rank/__init__.py b/jina/executors/evaluators/rank/__init__.py deleted file mode 100644 index afe858bbebf01..0000000000000 --- a/jina/executors/evaluators/rank/__init__.py +++ /dev/null @@ -1,22 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Sequence, Any - -from .. import BaseEvaluator - - -class BaseRankingEvaluator(BaseEvaluator): - """A :class:`BaseRankingEvaluator` evaluates the content of matches against the expected GroundTruth. - It is used to evaluate messages coming out from Indexers and Rankers and compares matches with groundtruths - """ - - def evaluate( - self, actual: Sequence[Any], desired: Sequence[Any], *args, **kwargs - ) -> float: - """ " - :param actual: the matched document identifiers from the request as matched by jina indexers and rankers - :param desired: the expected documents matches ids sorted as they are expected - :return the evaluation metric value for the request document - """ - raise NotImplementedError diff --git a/jina/executors/evaluators/rank/precision.py b/jina/executors/evaluators/rank/precision.py deleted file mode 100644 index 85b7f8b7b2b8c..0000000000000 --- a/jina/executors/evaluators/rank/precision.py +++ /dev/null @@ -1,31 +0,0 @@ -from typing import Sequence, Any, Optional - -from ..rank import BaseRankingEvaluator - - -class PrecisionEvaluator(BaseRankingEvaluator): - """A :class:`PrecisionEvaluator` evaluates the Precision of the search. - It computes how many of the first given `eval_at` matches are found in the groundtruth - """ - - def __init__(self, eval_at: Optional[int] = None, *args, **kwargs): - """ " - :param eval_at: the point at which evaluation is computed, if None give, will consider all the input to evaluate - """ - super().__init__(*args, **kwargs) - self.eval_at = eval_at - - def evaluate( - self, actual: Sequence[Any], desired: Sequence[Any], *args, **kwargs - ) -> float: - """ " - :param actual: the matched document identifiers from the request as matched by jina indexers and rankers - :param desired: the expected documents matches ids sorted as they are expected - :return the evaluation metric value for the request document - """ - if self.eval_at == 0: - return 0.0 - actual_at_k = actual[: self.eval_at] if self.eval_at else actual - ret = len(set(actual_at_k).intersection(set(desired))) - sub = len(actual_at_k) - return ret / sub if sub != 0 else 0.0 diff --git a/jina/executors/evaluators/rank/recall.py b/jina/executors/evaluators/rank/recall.py deleted file mode 100644 index a363d54506677..0000000000000 --- a/jina/executors/evaluators/rank/recall.py +++ /dev/null @@ -1,30 +0,0 @@ -from typing import Sequence, Any, Optional - -from . import BaseRankingEvaluator - - -class RecallEvaluator(BaseRankingEvaluator): - """A :class:`RecallEvaluator` evaluates the Precision of the search. - It computes how many of the first given `eval_at` groundtruth are found in the matches - """ - - def __init__(self, eval_at: Optional[int] = None, *args, **kwargs): - """ " - :param eval_at: the point at which evaluation is computed, if None give, will consider all the input to evaluate - """ - super().__init__(*args, **kwargs) - self.eval_at = eval_at - - def evaluate( - self, actual: Sequence[Any], desired: Sequence[Any], *args, **kwargs - ) -> float: - """ " - :param actual: the matched document identifiers from the request as matched by jina indexers and rankers - :param desired: the expected documents matches ids sorted as they are expected - :return the evaluation metric value for the request document - """ - if self.eval_at == 0: - return 0.0 - actual_at_k = actual[: self.eval_at] if self.eval_at else actual - ret = len(set(actual_at_k).intersection(set(desired))) - return ret / len(desired) diff --git a/jina/executors/evaluators/running_stats.py b/jina/executors/evaluators/running_stats.py deleted file mode 100644 index 72979d8a0213e..0000000000000 --- a/jina/executors/evaluators/running_stats.py +++ /dev/null @@ -1,46 +0,0 @@ -"""Decorators and wrappers designed for wrapping :class:`BaseExecutor` functions. """ - -from math import sqrt - - -class RunningStats: - """Computes running mean and standard deviation""" - - def __init__(self): - """Constructor.""" - self._n = 0 - self._m = None - self._s = None - - def clear(self): - """Reset the stats.""" - self._n = 0.0 - - @property - def mean(self): - """Get the running mean.""" - return self._m if self._n else 0.0 - - @property - def variance(self): - """Get the running variance.""" - return self._s / self._n if self._n else 0.0 - - @property - def std(self): - """Get the standard variance.""" - return sqrt(self.variance) - - def __add__(self, x: float): - self._n += 1 - if self._n == 1: - self._m = x - self._s = 0.0 - else: - prev_m = self._m - self._m += (x - self._m) / self._n - self._s += (x - prev_m) * (x - self._m) - return self - - def __str__(self): - return f'mean={self.mean:2.4f}, std={self.std:2.4f}' diff --git a/jina/executors/evaluators/text/__init__.py b/jina/executors/evaluators/text/__init__.py deleted file mode 100644 index 9d17747869ffd..0000000000000 --- a/jina/executors/evaluators/text/__init__.py +++ /dev/null @@ -1,18 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Any - -from .. import BaseEvaluator - - -class BaseTextEvaluator(BaseEvaluator): - """A :class:`BaseTextEvaluator` evaluates the difference between actual and desired text""" - - def evaluate(self, actual: Any, desired: Any, *args, **kwargs) -> float: - """ " - :param actual: the content of the document - :param desired: the expected content of the document - :return the evaluation metric value for the request document - """ - raise NotImplementedError diff --git a/jina/executors/evaluators/text/length.py b/jina/executors/evaluators/text/length.py deleted file mode 100644 index 1207f7c2b0097..0000000000000 --- a/jina/executors/evaluators/text/length.py +++ /dev/null @@ -1,18 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from . import BaseTextEvaluator - - -class TextLengthEvaluator(BaseTextEvaluator): - """A :class:`TextLengthEvaluator` evaluates the different lengths between actual and desired text""" - - metric = 'LengthDiff' - - def evaluate(self, actual: str, desired: str, *args, **kwargs) -> float: - """ " - :param actual: the text of the document - :param desired: the expected text of the document - :return the evaluation metric value for the request document - """ - return abs(len(actual) - len(desired)) diff --git a/jina/executors/indexers/__init__.py b/jina/executors/indexers/__init__.py deleted file mode 100644 index 99246d7c281f7..0000000000000 --- a/jina/executors/indexers/__init__.py +++ /dev/null @@ -1,456 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import os -from typing import Tuple, Optional, Any, Iterable - -import numpy as np - -from .. import BaseExecutor -from ..compound import CompoundExecutor -from ...helper import call_obj_fn, cached_property, get_readable_size - -if False: - from typing import TypeVar - import scipy - import tensorflow as tf - import torch - - EncodingType = TypeVar( - 'EncodingType', - np.ndarray, - scipy.sparse.csr_matrix, - scipy.sparse.coo_matrix, - scipy.sparse.bsr_matrix, - scipy.sparse.csc_matrix, - torch.sparse_coo_tensor, - tf.SparseTensor, - ) - - -class BaseIndexer(BaseExecutor): - """Base class for storing and searching any kind of data structure. - - The key functions here are :func:`add` and :func:`query`. - One can decorate them with :func:`jina.helper.batching` and :func:`jina.logging.profile.profiling`. - - One should always inherit from either :class:`BaseVectorIndexer` or :class:`BaseKVIndexer`. - - .. seealso:: - :mod:`jina.drivers.handlers.index` - - .. note:: - Calling :func:`save` to save a :class:`BaseIndexer` will create - more than one files. One is the serialized version of the :class:`BaseIndexer` object, often ends with ``.bin`` - - .. warning:: - When using :class:`BaseIndexer` out of the Pod, use it with context manager - - .. highlight:: python - .. code-block:: python - - with BaseIndexer() as b: - b.add() - - So that it can safely save the data. Or you have to manually call `b.close()` to close the indexer safely. - - :param index_filename: the name of the file for storing the index, when not given metas.name is used. - :param args: Additional positional arguments which are just used for the parent initialization - :param kwargs: Additional keyword arguments which are just used for the parent initialization - """ - - def __init__( - self, - index_filename: Optional[str] = None, - key_length: int = 36, - *args, - **kwargs, - ): - super().__init__(*args, **kwargs) - self.index_filename = ( - index_filename #: the file name of the stored index, no path is required - ) - self.key_length = key_length #: the default minimum length of the key, will be expanded one time on the first batch - self._size = 0 - - def add(self, *args, **kwargs): - """ - Add documents to the index. - - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def update(self, *args, **kwargs): - """ - Update documents on the index. - - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def delete(self, *args, **kwargs): - """ - Delete documents from the index. - - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def post_init(self): - """query handler and write handler can not be serialized, thus they must be put into :func:`post_init`. """ - self.index_filename = self.index_filename or self.name - self.handler_mutex = True #: only one handler at a time by default - self.is_handler_loaded = False - - def query(self, *args, **kwargs): - """ - Query documents from the index. - - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - @property - def index_abspath(self) -> str: - """ - Get the file path of the index storage - - :return: absolute path - """ - return self.get_file_from_workspace(self.index_filename) - - @cached_property - def query_handler(self): - """A readable and indexable object, could be dict, map, list, numpy array etc. - - :return: read handler - - .. note:: - :attr:`query_handler` and :attr:`write_handler` are by default mutex - """ - r = None - if not self.handler_mutex or not self.is_handler_loaded: - r = self.get_query_handler() - if r is None: - self.logger.warning( - f'you can not query from {self} as its "query_handler" is not set. ' - 'If you are indexing data from scratch then it is fine. ' - 'If you are querying data then the index file must be empty or broken.' - ) - else: - self.logger.info(f'indexer size: {self.size}') - self.is_handler_loaded = True - if r is None: - r = self.null_query_handler - return r - - @cached_property - def null_query_handler(self) -> Optional[Any]: - """The empty query handler when :meth:`get_query_handler` fails - - :return: nothing - """ - return - - @property - def is_exist(self) -> bool: - """ - Check if the database is exist or not - - :return: true if the absolute index path exists, else false - """ - return os.path.exists(self.index_abspath) - - @cached_property - def write_handler(self): - """A writable and indexable object, could be dict, map, list, numpy array etc. - - :return: write handler - - .. note:: - :attr:`query_handler` and :attr:`write_handler` are by default mutex - """ - - # ! a || ( a && b ) - # = - # ! a || b - if not self.handler_mutex or not self.is_handler_loaded: - r = self.get_add_handler() if self.is_exist else self.get_create_handler() - - if r is None: - self.logger.warning( - '"write_handler" is None, you may not add data to this index, ' - 'unless "write_handler" is later assigned with a meaningful value' - ) - else: - self.is_handler_loaded = True - return r - - def get_query_handler(self): - """Get a *readable* index handler when the ``index_abspath`` already exist, need to be overridden""" - raise NotImplementedError - - def get_add_handler(self): - """Get a *writable* index handler when the ``index_abspath`` already exist, need to be overridden""" - raise NotImplementedError - - def get_create_handler(self): - """Get a *writable* index handler when the ``index_abspath`` does not exist, need to be overridden""" - raise NotImplementedError - - @property - def size(self) -> int: - """ - The number of vectors or documents indexed. - - :return: size - """ - return self._size - - def __getstate__(self): - d = super().__getstate__() - self.flush() - return d - - def close(self): - """Close all file-handlers and release all resources. """ - self.logger.info( - f'indexer size: {self.size} physical size: {get_readable_size(self.physical_size)}' - ) - self.flush() - call_obj_fn(self.write_handler, 'close') - call_obj_fn(self.query_handler, 'close') - super().close() - - def flush(self): - """Flush all buffered data to ``index_abspath`` """ - try: - # It may have already been closed by the Pea using context manager - call_obj_fn(self.write_handler, 'flush') - except: - pass - - def _filter_nonexistent_keys_values( - self, keys: Iterable, values: Iterable, existent_keys: Iterable - ) -> Tuple[Iterable, Iterable]: - f = [(key, value) for key, value in zip(keys, values) if key in existent_keys] - if f: - return zip(*f) - else: - return None, None - - def _filter_nonexistent_keys( - self, keys: Iterable, existent_keys: Iterable - ) -> Iterable: - return [key for key in keys if key in set(existent_keys)] - - def sample(self): - """Return a sample from this indexer, useful in sanity check """ - raise NotImplementedError - - def __iter__(self): - """Iterate over all entries in this indexer. """ - raise NotImplementedError - - -class BaseVectorIndexer(BaseIndexer): - """An abstract class for vector indexer. It is equipped with drivers in ``requests.on`` - - All vector indexers should inherit from it. - - It can be used to tell whether an indexer is vector indexer, via ``isinstance(a, BaseVectorIndexer)`` - """ - - embedding_cls_type = 'dense' - - def query_by_key(self, keys: Iterable[str], *args, **kwargs) -> 'np.ndarray': - """Get the vectors by id, return a subset of indexed vectors - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def add( - self, keys: Iterable[str], vectors: 'EncodingType', *args, **kwargs - ) -> None: - """Add new chunks and their vector representations - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param vectors: vector representations in B x D - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def query( - self, vectors: 'EncodingType', top_k: int, *args, **kwargs - ) -> Tuple['np.ndarray', 'np.ndarray']: - """Find k-NN using query vectors, return chunk ids and chunk scores - - :param vectors: query vectors in ndarray, shape B x D - :param top_k: int, the number of nearest neighbour to return - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def update( - self, keys: Iterable[str], vectors: 'EncodingType', *args, **kwargs - ) -> None: - """Update vectors on the index. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param vectors: vector representations in B x D - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - """Delete vectors from the index. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - -class BaseKVIndexer(BaseIndexer): - """An abstract class for key-value indexer. - - All key-value indexers should inherit from it. - - It can be used to tell whether an indexer is key-value indexer, via ``isinstance(a, BaseKVIndexer)`` - """ - - def add( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Add the serialized documents to the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param values: serialized documents - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def query(self, key: str, *args, **kwargs) -> Optional[bytes]: - """Find the serialized document to the index via document id. - - :param key: document id - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def update( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Update the serialized documents on the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param values: serialized documents - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - """Delete the serialized documents from the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - raise NotImplementedError - - def __getitem__(self, key: Any) -> Optional[bytes]: - return self.query(key) - - -class UniqueVectorIndexer(CompoundExecutor): - """A frequently used pattern for combining a :class:`BaseVectorIndexer` and a :class:`DocCache` """ - - -class CompoundIndexer(CompoundExecutor): - """A Frequently used pattern for combining A :class:`BaseVectorIndexer` and :class:`BaseKVIndexer`. - It will be equipped with predefined ``requests.on`` behaviors: - - - In the index time - - 1. stores the vector via :class:`BaseVectorIndexer` - - 2. remove all vector information (embedding, buffer, blob, text) - - 3. store the remained meta information via :class:`BaseKVIndexer` - - In the query time - - 1. Find the knn using the vector via :class:`BaseVectorIndexer` - - 2. remove all vector information (embedding, buffer, blob, text) - - 3. Fill in the meta information of the document via :class:`BaseKVIndexer` - - One can use the :class:`ChunkIndexer` via - - .. highlight:: yaml - .. code-block:: yaml - - !ChunkIndexer - components: - - !NumpyIndexer - with: - index_filename: vec.gz - metas: - name: vecidx # a customized name - workspace: ${{TEST_WORKDIR}} - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx # a customized name - workspace: ${{TEST_WORKDIR}} - metas: - name: chunk_compound_indexer - workspace: ${{TEST_WORKDIR}} - - Without defining any ``requests.on`` logic. When load from this YAML, it will be auto equipped with - - .. highlight:: yaml - .. code-block:: yaml - - on: - SearchRequest: - - !VectorSearchDriver - with: - executor: BaseVectorIndexer - - !PruneDriver - with: - pruned: - - embedding - - buffer - - blob - - text - - !KVSearchDriver - with: - executor: BaseKVIndexer - IndexRequest: - - !VectorIndexDriver - with: - executor: BaseVectorIndexer - - !PruneDriver - with: - pruned: - - embedding - - buffer - - blob - - text - - !KVIndexDriver - with: - executor: BaseKVIndexer - ControlRequest: - - !ControlReqDriver {} - """ diff --git a/jina/executors/indexers/cache.py b/jina/executors/indexers/cache.py deleted file mode 100644 index b6d28c78f3593..0000000000000 --- a/jina/executors/indexers/cache.py +++ /dev/null @@ -1,175 +0,0 @@ -"""Indexer for caching.""" - -import pickle -import tempfile -from typing import Optional, Iterable, List, Tuple, Union - -from jina.executors.indexers import BaseKVIndexer -from jina.helper import deprecated_alias - -DATA_FIELD = 'data' -ID_KEY = 'id' -CONTENT_HASH_KEY = 'content_hash' - - -class BaseCache(BaseKVIndexer): - """Base class of the cache inherited :class:`BaseKVIndexer`. - - The difference between a cache and a :class:`BaseKVIndexer` is the ``handler_mutex`` is released in cache, - this allows one to query-while-indexing. - - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def post_init(self): - """For Cache we need to release the handler mutex to allow RW at the same time.""" - self.handler_mutex = False - - -class DocCache(BaseCache): - """A key-value indexer that specializes in caching. - - Serializes the cache to two files, one for ids, one for the actually cached field. - If fields=["id"], then the second file is redundant. The class optimizes the process - so that there are no duplicates. - - Order of fields does NOT affect the caching. - - :param index_filename: file name for storing the cache data - :param fields: fields to cache on (of Document) - :param args: additional positional arguments which are just used for the parent initialization - :param kwargs: additional key value arguments which are just used for the parent initialization - """ - - class CacheHandler: - """A handler for loading and serializing the in-memory cache of the DocCache. - - :param path: Path to the file from which to build the actual paths. - :param logger: Instance of logger. - """ - - def __init__(self, path, logger): - self.path = path - try: - self.id_to_cache_val = pickle.load(open(path + '.ids', 'rb')) - self.cache_val_to_id = pickle.load(open(path + '.cache', 'rb')) - except FileNotFoundError as e: - logger.warning( - f'File path did not exist : {path}.ids or {path}.cache: {e!r}. Creating new CacheHandler...' - ) - self.id_to_cache_val = dict() - self.cache_val_to_id = dict() - - def close(self): - """Flushes the in-memory cache to pickle files.""" - pickle.dump(self.id_to_cache_val, open(self.path + '.ids', 'wb')) - pickle.dump(self.cache_val_to_id, open(self.path + '.cache', 'wb')) - - default_fields = (ID_KEY,) - - @deprecated_alias(field=('fields', 0)) - def __init__( - self, - index_filename: Optional[str] = None, - fields: Optional[ - Union[str, Tuple[str]] - ] = None, # str for backwards compatibility - *args, - **kwargs, - ): - if not index_filename: - # create a new temp file if not exist - index_filename = tempfile.NamedTemporaryFile(delete=False).name - super().__init__(index_filename, *args, **kwargs) - if isinstance(fields, str): - fields = (fields,) - # order shouldn't matter - self.fields = sorted(fields or self.default_fields) - - def add( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Add a document to the cache depending. - - :param keys: document ids to be added - :param values: document cache values to be added - :param args: not used - :param kwargs: not used - """ - for key, value in zip(keys, values): - self.query_handler.id_to_cache_val[key] = value - self.query_handler.cache_val_to_id[value] = key - self._size += 1 - - def query(self, key: str, *args, **kwargs) -> bool: - """Check whether the data exists in the cache. - - :param key: the value that we cached by (combination of the Document fields) - :param args: not used - :param kwargs: not used - :return: status - """ - return key in self.query_handler.cache_val_to_id - - def update( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Update cached documents. - - :param keys: list of Document.id - :param values: list of values (combination of the Document fields) - :param args: not used - :param kwargs: not used - """ - - if len(self.fields) == 1 and self.fields[0] == ID_KEY: - # if we don't cache anything else, no need - return - - for key, value in zip(keys, values): - if key not in self.query_handler.id_to_cache_val: - continue - old_value = self.query_handler.id_to_cache_val[key] - self.query_handler.id_to_cache_val[key] = value - del self.query_handler.cache_val_to_id[old_value] - self.query_handler.cache_val_to_id[value] = key - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - """Delete documents from the cache. - - :param keys: list of Document.id - :param args: not used - :param kwargs: not used - """ - for key in keys: - if key not in self.query_handler.id_to_cache_val: - continue - value = self.query_handler.id_to_cache_val[key] - del self.query_handler.id_to_cache_val[key] - del self.query_handler.cache_val_to_id[value] - self._size -= 1 - - def get_add_handler(self): - """Get the CacheHandler. - - - .. # noqa: DAR201""" - return self.get_query_handler() - - def get_query_handler(self) -> CacheHandler: - """Get the CacheHandler. - - - .. # noqa: DAR201""" - return self.CacheHandler(self.save_abspath, self.logger) - - def get_create_handler(self): - """Get the CacheHandler. - - - .. # noqa: DAR201""" - return self.get_query_handler() diff --git a/jina/executors/indexers/dbms/__init__.py b/jina/executors/indexers/dbms/__init__.py deleted file mode 100644 index f5599ad452837..0000000000000 --- a/jina/executors/indexers/dbms/__init__.py +++ /dev/null @@ -1,60 +0,0 @@ -from typing import Optional, List - -import numpy as np -from jina.executors.indexers import BaseIndexer - - -class BaseDBMSIndexer(BaseIndexer): - """A class only meant for storing (indexing, update, delete) of data""" - - def add( - self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs - ): - """Add to the DBMS Indexer, both vectors and metadata - - :param ids: the ids of the documents - :param vecs: the vectors - :param metas: the metadata, in binary format - :param args: not used - :param kwargs: not used - """ - raise NotImplementedError - - def update( - self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs - ): - """Update the DBMS Indexer, both vectors and metadata - - :param ids: the ids of the documents - :param vecs: the vectors - :param metas: the metadata, in binary format - :param args: not used - :param kwargs: not used - """ - raise NotImplementedError - - def delete(self, ids: List[str], *args, **kwargs): - """Delete from the indexer by ids - - :param ids: the ids of the Documents to delete - :param args: not used - :param kwargs: not used - """ - raise NotImplementedError - - def dump(self, path: str, shards: int): - """Dump the index - - :param path: the path to which to dump - :param shards: the nr of shards to which to dump - """ - raise NotImplementedError - - def query(self, key: str, *args, **kwargs) -> Optional[bytes]: - """DBMSIndexers do NOT support querying - - :param key: the key by which to query - :param args: not used - :param kwargs: not used - """ - raise NotImplementedError('DBMSIndexers do not support querying') diff --git a/jina/executors/indexers/dbms/keyvalue.py b/jina/executors/indexers/dbms/keyvalue.py deleted file mode 100644 index bf0df53d68a79..0000000000000 --- a/jina/executors/indexers/dbms/keyvalue.py +++ /dev/null @@ -1,94 +0,0 @@ -import pickle -from typing import List, Tuple, Generator -import numpy as np - -from jina import Document -from jina.executors.indexers.dump import export_dump_streaming -from jina.executors.indexers.dbms import BaseDBMSIndexer -from jina.executors.indexers.keyvalue import BinaryPbWriterMixin - - -class BinaryPbDBMSIndexer(BinaryPbWriterMixin, BaseDBMSIndexer): - """A DBMS Indexer (no query method)""" - - def _get_generator( - self, ids: List[str] - ) -> Generator[Tuple[str, np.array, bytes], None, None]: - for id_ in ids: - vecs_metas_list_bytes = super()._query([id_]) - vec, meta = pickle.loads(vecs_metas_list_bytes[0]) - yield id_, vec, meta - - def dump(self, path: str, shards: int) -> None: - """Dump the index - - :param path: the path to which to dump - :param shards: the nr of shards to which to dump - """ - self.write_handler.close() - # noinspection PyPropertyAccess - del self.write_handler - self.handler_mutex = False - ids = self.query_handler.header.keys() - export_dump_streaming( - path, - shards=shards, - size=len(ids), - data=self._get_generator(ids), - ) - self.query_handler.close() - self.handler_mutex = False - # noinspection PyPropertyAccess - del self.query_handler - - def add( - self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs - ): - """Add to the DBMS Indexer, both vectors and metadata - - :param ids: the ids of the documents - :param vecs: the vectors - :param metas: the metadata, in binary format - :param args: not used - :param kwargs: not used - """ - if not any(ids): - return - - vecs_metas = [pickle.dumps([vec, meta]) for vec, meta in zip(vecs, metas)] - with self.write_handler as write_handler: - self._add(ids, vecs_metas, write_handler) - - def update( - self, ids: List[str], vecs: List[np.array], metas: List[bytes], *args, **kwargs - ): - """Update the DBMS Indexer, both vectors and metadata - - :param ids: the ids of the documents - :param vecs: the vectors - :param metas: the metadata, in binary format - :param args: not used - :param kwargs: not used - """ - vecs_metas = [pickle.dumps((vec, meta)) for vec, meta in zip(vecs, metas)] - keys, vecs_metas = self._filter_nonexistent_keys_values( - ids, vecs_metas, self.query_handler.header.keys() - ) - del self.query_handler - self.handler_mutex = False - if keys: - self._delete(keys) - with self.write_handler as write_handler: - self._add(keys, vecs_metas, write_handler) - - def delete(self, ids: List[str], *args, **kwargs): - """Delete the serialized documents from the index via document ids. - - :param ids: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: not used - :param kwargs: not used""" - super(BinaryPbDBMSIndexer, self).delete(ids) - - -class KeyValueDBMSIndexer(BinaryPbDBMSIndexer): - """An alias""" diff --git a/jina/executors/indexers/dump.py b/jina/executors/indexers/dump.py deleted file mode 100644 index a7f20f600ab92..0000000000000 --- a/jina/executors/indexers/dump.py +++ /dev/null @@ -1,156 +0,0 @@ -import os -import sys -from typing import Tuple, Generator, BinaryIO, TextIO - -import numpy as np - -from jina.logging import JinaLogger - -BYTE_PADDING = 4 -DUMP_DTYPE = np.float64 - -logger = JinaLogger(__name__) - - -def export_dump_streaming( - path: str, - shards: int, - size: int, - data: Generator[Tuple[str, np.array, bytes], None, None], -): - """Export the data to a path, based on sharding, - - :param path: path to dump - :param shards: the nr of shards this pea is part of - :param size: total amount of entries - :param data: the generator of the data (ids, vectors, metadata) - """ - logger.info(f'Dumping {size} docs to {path} for {shards} shards') - _handle_dump(data, path, shards, size) - - -def _handle_dump( - data: Generator[Tuple[str, np.array, bytes], None, None], - path: str, - shards: int, - size: int, -): - if not os.path.exists(path): - os.makedirs(path) - - # directory must be empty to be safe - if not os.listdir(path): - size_per_shard = size // shards - extra = size % shards - shard_range = list(range(shards)) - for shard_id in shard_range: - if shard_id == shard_range[-1]: - size_this_shard = size_per_shard + extra - else: - size_this_shard = size_per_shard - _write_shard_data(data, path, shard_id, size_this_shard) - else: - raise Exception( - f'path for dump {path} contains data. Please empty. Not dumping...' - ) - - -def _write_shard_data( - data: Generator[Tuple[str, np.array, bytes], None, None], - path: str, - shard_id: int, - size_this_shard: int, -): - shard_path = os.path.join(path, str(shard_id)) - shard_docs_written = 0 - os.makedirs(shard_path) - vectors_fp, metas_fp, ids_fp = _get_file_paths(shard_path) - with open(vectors_fp, 'wb') as vectors_fh, open(metas_fp, 'wb') as metas_fh, open( - ids_fp, 'w' - ) as ids_fh: - while shard_docs_written < size_this_shard: - _write_shard_files(data, ids_fh, metas_fh, vectors_fh) - shard_docs_written += 1 - - -def _write_shard_files( - data: Generator[Tuple[str, np.array, bytes], None, None], - ids_fh: TextIO, - metas_fh: BinaryIO, - vectors_fh: BinaryIO, -): - id_, vec, meta = next(data) - # need to ensure compatibility to read time - vec = vec.astype(DUMP_DTYPE) - vec_bytes = vec.tobytes() - vectors_fh.write(len(vec_bytes).to_bytes(BYTE_PADDING, sys.byteorder) + vec_bytes) - metas_fh.write(len(meta).to_bytes(BYTE_PADDING, sys.byteorder) + meta) - ids_fh.write(id_ + '\n') - - -def import_vectors(path: str, pea_id: str): - """Import id and vectors - - :param path: the path to the dump - :param pea_id: the id of the pea (as part of the shards) - :return: the generators for the ids and for the vectors - """ - logger.info(f'Importing ids and vectors from {path} for pea_id {pea_id}') - path = os.path.join(path, pea_id) - ids_gen = _ids_gen(path) - vecs_gen = _vecs_gen(path) - return ids_gen, vecs_gen - - -def import_metas(path: str, pea_id: str): - """Import id and metadata - - :param path: the path of the dump - :param pea_id: the id of the pea (as part of the shards) - :return: the generators for the ids and for the metadata - """ - logger.info(f'Importing ids and metadata from {path} for pea_id {pea_id}') - path = os.path.join(path, pea_id) - ids_gen = _ids_gen(path) - metas_gen = _metas_gen(path) - return ids_gen, metas_gen - - -def _ids_gen(path: str): - with open(os.path.join(path, 'ids'), 'r') as ids_fh: - for l in ids_fh: - yield l.strip() - - -def _vecs_gen(path: str): - with open(os.path.join(path, 'vectors'), 'rb') as vectors_fh: - while True: - next_size = vectors_fh.read(BYTE_PADDING) - next_size = int.from_bytes(next_size, byteorder=sys.byteorder) - if next_size: - vec = np.frombuffer( - vectors_fh.read(next_size), - dtype=DUMP_DTYPE, - ) - yield vec - else: - break - - -def _metas_gen(path: str): - with open(os.path.join(path, 'metas'), 'rb') as metas_fh: - while True: - next_size = metas_fh.read(BYTE_PADDING) - next_size = int.from_bytes(next_size, byteorder=sys.byteorder) - if next_size: - meta = metas_fh.read(next_size) - yield meta - else: - break - - -def _get_file_paths(shard_path: str): - vectors_fp = os.path.join(shard_path, 'vectors') - metas_fp = os.path.join(shard_path, 'metas') - ids_fp = os.path.join(shard_path, 'ids') - return vectors_fp, metas_fp, ids_fp diff --git a/jina/executors/indexers/keyvalue.py b/jina/executors/indexers/keyvalue.py deleted file mode 100644 index 1941a7fe2a686..0000000000000 --- a/jina/executors/indexers/keyvalue.py +++ /dev/null @@ -1,352 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import mmap -import os -import random -from typing import Iterable, Optional, Union, List - -import numpy as np - -from . import BaseKVIndexer -from ..compound import CompoundExecutor - -HEADER_NONE_ENTRY = (-1, -1, -1) - - -class _WriteHandler: - """ - Write file handler. - - :param path: Path of the file. - :param mode: Writing mode. (e.g. 'ab', 'wb') - """ - - def __init__(self, path, mode): - self.path = path - self.mode = mode - self.body = open(self.path, self.mode) - self.header = open(self.path + '.head', self.mode) - - def __enter__(self): - if self.body.closed: - self.body = open(self.path, self.mode) - if self.header.closed: - self.header = open(self.path + '.head', self.mode) - return self - - def __exit__(self, exc_type, exc_val, exc_tb): - self.flush() - - def close(self): - """Close the file.""" - if not self.body.closed: - self.body.close() - if not self.header.closed: - self.header.close() - - def flush(self): - """Clear the body and header.""" - if not self.body.closed: - self.body.flush() - if not self.header.closed: - self.header.flush() - - -class _ReadHandler: - """ - Read file handler. - - :param path: Path of the file. - :param key_length: Length of key. - """ - - def __init__(self, path, key_length): - self.path = path - self.header = {} - if os.path.exists(self.path + '.head'): - with open(self.path + '.head', 'rb') as fp: - tmp = np.frombuffer( - fp.read(), - dtype=[ - ('', (np.str_, key_length)), - ('', np.int64), - ('', np.int64), - ('', np.int64), - ], - ) - self.header = { - r[0]: None - if np.array_equal((r[1], r[2], r[3]), HEADER_NONE_ENTRY) - else (r[1], r[2], r[3]) - for r in tmp - } - if os.path.exists(self.path): - self._body = open(self.path, 'r+b') - self.body = self._body.fileno() - else: - raise FileNotFoundError( - f'Path not found {self.path}. Querying will not work' - ) - else: - raise FileNotFoundError( - f'Path not found {self.path + ".head"}. Querying will not work' - ) - - def close(self): - """Close the file.""" - if hasattr(self, '_body'): - if not self._body.closed: - self._body.close() - - -class _CloseHandler: - def __init__(self, handler: Union['_WriteHandler', '_ReadHandler']): - self.handler = handler - - def __enter__(self): - return self - - def __exit__(self, exc_type, exc_val, exc_tb): - if self.handler is not None: - self.handler.close() - - -class BinaryPbWriterMixin: - """Mixing for providing the binarypb writing and reading methods""" - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self._start = 0 - self._page_size = mmap.ALLOCATIONGRANULARITY - - def get_add_handler(self) -> '_WriteHandler': - """ - Get write file handler. - - :return: write handler - """ - # keep _start position as in pickle serialization - return _WriteHandler(self.index_abspath, 'ab') - - def get_create_handler(self) -> '_WriteHandler': - """ - Get write file handler. - - :return: write handler. - """ - - self._start = 0 # override _start position - return _WriteHandler(self.index_abspath, 'wb') - - def get_query_handler(self) -> '_ReadHandler': - """ - Get read file handler. - - :return: read handler. - """ - return _ReadHandler(self.index_abspath, self.key_length) - - def _add( - self, keys: Iterable[str], values: Iterable[bytes], write_handler: _WriteHandler - ): - for key, value in zip(keys, values): - l = len(value) #: the length - p = ( - int(self._start / self._page_size) * self._page_size - ) #: offset of the page - r = ( - self._start % self._page_size - ) #: the remainder, i.e. the start position given the offset - # noinspection PyTypeChecker - write_handler.header.write( - np.array( - (key, p, r, r + l), - dtype=[ - ('', (np.str_, self.key_length)), - ('', np.int64), - ('', np.int64), - ('', np.int64), - ], - ).tobytes() - ) - self._start += l - write_handler.body.write(value) - self._size += 1 - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - """Delete the serialized documents from the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: not used - :param kwargs: not used - """ - keys = self._filter_nonexistent_keys(keys, self.query_handler.header.keys()) - del self.query_handler - self.handler_mutex = False - if keys: - self._delete(keys) - - def _delete(self, keys: Iterable[str]) -> None: - with self.write_handler as write_handler: - for key in keys: - write_handler.header.write( - np.array( - tuple(np.concatenate([[key], HEADER_NONE_ENTRY])), - dtype=[ - ('', (np.str_, self.key_length)), - ('', np.int64), - ('', np.int64), - ('', np.int64), - ], - ).tobytes() - ) - self._size -= 1 - - def _query(self, keys: Iterable[str]) -> List[bytes]: - query_results = [] - for key in keys: - pos_info = self.query_handler.header.get(key, None) - if pos_info is not None: - p, r, l = pos_info - with mmap.mmap(self.query_handler.body, offset=p, length=l) as m: - query_results.append(m[r:]) - else: - query_results.append(None) - - return query_results - - -class BinaryPbIndexer(BinaryPbWriterMixin, BaseKVIndexer): - """Simple Key-value indexer.""" - - def __init__(self, delete_on_dump: bool = False, *args, **kwargs): - super().__init__(*args, **kwargs) - self.delete_on_dump = delete_on_dump - - def __getstate__(self): - # called on pickle save - if self.delete_on_dump: - self._delete_invalid_indices() - d = super().__getstate__() - return d - - def _delete_invalid_indices(self): - # make sure the file is closed before querying. - with _CloseHandler(handler=self.write_handler): - pass - - keys = [] - vals = [] - # we read the valid values and write them to the intermediary file - with _CloseHandler( - handler=_ReadHandler(self.index_abspath, self.key_length) - ) as close_handler: - for key in close_handler.handler.header.keys(): - pos_info = close_handler.handler.header.get(key, None) - if pos_info: - p, r, l = pos_info - with mmap.mmap(close_handler.handler.body, offset=p, length=l) as m: - keys.append(key) - vals.append(m[r:]) - if len(keys) == 0: - return - - # intermediary file - tmp_file = self.index_abspath + '-tmp' - self._start = 0 - with _CloseHandler(handler=_WriteHandler(tmp_file, 'ab')) as close_handler: - # reset size - self._size = 0 - self._add(keys, vals, write_handler=close_handler.handler) - - # replace orig. file - # and .head file - head_path = self.index_abspath + '.head' - os.remove(self.index_abspath) - os.remove(head_path) - os.rename(tmp_file, self.index_abspath) - os.rename(tmp_file + '.head', head_path) - - def add( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Add the serialized documents to the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param values: serialized documents - :param args: extra arguments - :param kwargs: keyword arguments - """ - if not any(keys): - return - - need_to_remove_handler = not self.is_exist - with self.write_handler as writer_handler: - self._add(keys, values, write_handler=writer_handler) - if need_to_remove_handler: - # very hacky way to ensure write_handler will use add_handler at next computation, this must be solved - # by touching file at __init__ time - del self.write_handler - self.is_handler_loaded = False - - def sample(self) -> Optional[bytes]: - """Return a random entry from the indexer for sanity check. - - :return: A random entry from the indexer. - """ - k = random.sample(self.query_handler.header.keys(), k=1)[0] - return self.query([k])[0] - - def __iter__(self): - for k in self.query_handler.header.keys(): - yield self[k] - - def query(self, keys: Iterable[str], *args, **kwargs) -> Iterable[Optional[bytes]]: - """Find the serialized document to the index via document id. - - :param keys: list of document ids - :param args: extra arguments - :param kwargs: keyword arguments - :return: serialized documents - """ - return self._query(keys) - - def update( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Update the serialized documents on the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param values: serialized documents - :param args: extra arguments - :param kwargs: keyword arguments - """ - keys, values = self._filter_nonexistent_keys_values( - keys, values, self.query_handler.header.keys() - ) - del self.query_handler - self.handler_mutex = False - if keys: - self._delete(keys) - self.add(keys, values) - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - """Delete the serialized documents from the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: not used - :param kwargs: not used""" - super(BinaryPbIndexer, self).delete(keys) - - -class KeyValueIndexer(BinaryPbIndexer): - """Alias for :class:`BinaryPbIndexer` """ - - -class DataURIPbIndexer(BinaryPbIndexer): - """Alias for BinaryPbIndexer""" - - -class UniquePbIndexer(CompoundExecutor): - """A frequently used pattern for combining a :class:`BaseKVIndexer` and a :class:`DocCache` """ diff --git a/jina/executors/indexers/query/__init__.py b/jina/executors/indexers/query/__init__.py deleted file mode 100644 index d29aecbe91eac..0000000000000 --- a/jina/executors/indexers/query/__init__.py +++ /dev/null @@ -1,59 +0,0 @@ -from typing import Iterable, Optional, Dict - -from jina.executors.indexers import BaseIndexer - - -class BaseQueryIndexer(BaseIndexer): - """An indexer only for querying. It only reads once (at creation time, from a dump)""" - - def _post_init_wrapper( - self, - _metas: Optional[Dict] = None, - _requests: Optional[Dict] = None, - fill_in_metas: bool = True, - ) -> None: - super()._post_init_wrapper(_metas, _requests, fill_in_metas) - self.dump_path = _metas.get('dump_path') - # TODO this shouldn't be required - # we don't do this for Compounds, as the _components - # are not yet set at this stage. - # for Compound we use a `_post_components` - if self.dump_path and not hasattr(self, 'components'): - self._load_dump(self.dump_path) - - def _load_dump(self, dump_path): - """Load the dump at the dump_path - - :param dump_path: the path of the dump""" - raise NotImplementedError - - def _log_warn(self): - self.logger.error(f'Index {self.__class__} is write-once') - - def add( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Disabled. QueryIndexers are write-once (at instantiation time) - - - .. # noqa: DAR101 - """ - self._log_warn() - - def update( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - """Disabled. QueryIndexers are write-once (at instantiation time) - - - .. # noqa: DAR101 - """ - self._log_warn() - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - """Disabled. QueryIndexers are write-once (at instantiation time) - - - .. # noqa: DAR101 - """ - self._log_warn() diff --git a/jina/executors/indexers/query/compound.py b/jina/executors/indexers/query/compound.py deleted file mode 100644 index 81963ca1e2624..0000000000000 --- a/jina/executors/indexers/query/compound.py +++ /dev/null @@ -1,52 +0,0 @@ -from jina.executors.compound import CompoundExecutor -from jina.executors.indexers.query import BaseQueryIndexer - - -class CompoundQueryExecutor(CompoundExecutor, BaseQueryIndexer): - """A Compound Executor that wraps several QueryIndexers - - :param dump_path: the path to initialize from - """ - - # TODO this shouldn't be required - # we don't do this for Compounds, as the _components - # are not yet set at this stage. - # for Compound we use a `_post_components` - def _post_components(self): - if self.dump_path: - self._load_dump(self.dump_path) - - def _load_dump(self, dump_path, *args, **kwargs): - """Loads the data in the indexer - - :param dump_path: the path to the dump - :param args: passed to the inner Indexer's load_dump - :param kwargs: passed to the inner Indexer's load_dump - """ - for c in self.components: - c._load_dump(dump_path) - - def get_add_handler(self): - """required to silence NotImplementedErrors - - - .. #noqa: DAR201""" - return None - - def get_create_handler(self): - """required to silence NotImplementedErrors - - - .. #noqa: DAR201""" - return None - - def get_query_handler(self): - """required to silence NotImplementedErrors - - - .. #noqa: DAR201""" - return None - - -class CompoundQueryIndexer(CompoundQueryExecutor): - """Alias""" diff --git a/jina/executors/indexers/query/keyvalue.py b/jina/executors/indexers/query/keyvalue.py deleted file mode 100644 index c1be358fb267a..0000000000000 --- a/jina/executors/indexers/query/keyvalue.py +++ /dev/null @@ -1,35 +0,0 @@ -from typing import Optional, List - -from jina import Document -from jina.executors.indexers.dump import import_metas -from jina.executors.indexers.keyvalue import BinaryPbWriterMixin -from jina.executors.indexers.query import BaseQueryIndexer - - -class BinaryPbQueryIndexer(BinaryPbWriterMixin, BaseQueryIndexer): - """A write-once Key-value indexer.""" - - def _load_dump(self, dump_path): - """Load the dump at the path - - :param dump_path: the path of the dump""" - ids, metas = import_metas(dump_path, str(self.pea_id)) - with self.get_create_handler() as write_handler: - self._add(list(ids), list(metas), write_handler) - # warming up - self.query(['someid']) - - def query(self, keys: List[str], *args, **kwargs) -> List[Optional[bytes]]: - """Get a document by its id - - :param keys: the ids - :param args: not used - :param kwargs: not used - :return: List of the bytes of the Documents (or None, if not found) - """ - res = self._query(keys) - return res - - -class KeyValueQueryIndexer(BinaryPbQueryIndexer): - """An alias""" diff --git a/jina/executors/indexers/query/vector.py b/jina/executors/indexers/query/vector.py deleted file mode 100644 index 686d751b53a4c..0000000000000 --- a/jina/executors/indexers/query/vector.py +++ /dev/null @@ -1,69 +0,0 @@ -from typing import Generator - -import numpy as np - -from jina.executors.indexers.dump import import_vectors -from jina.executors.indexers.query import BaseQueryIndexer -from jina.executors.indexers.vector import NumpyIndexer - - -class NumpyQueryIndexer(NumpyIndexer, BaseQueryIndexer): - """An exhaustive vector indexers implemented with numpy and scipy. - - .. note:: - Metrics other than `cosine` and `euclidean` requires ``scipy`` installed. - - :param metric: The distance metric to use. `braycurtis`, `canberra`, `chebyshev`, `cityblock`, `correlation`, - `cosine`, `dice`, `euclidean`, `hamming`, `jaccard`, `jensenshannon`, `kulsinski`, - `mahalanobis`, - `matching`, `minkowski`, `rogerstanimoto`, `russellrao`, `seuclidean`, `sokalmichener`, - `sokalsneath`, `sqeuclidean`, `wminkowski`, `yule`. - :param backend: `numpy` or `scipy`, `numpy` only supports `euclidean` and `cosine` distance - :param compress_level: compression level to use - """ - - def _load_dump(self, dump_path): - """Load the dump at the path - - :param dump_path: the path of the dump""" - ids, vecs = import_vectors(dump_path, str(self.pea_id)) - self._add(ids, vecs) - self.write_handler.flush() - self.write_handler.close() - self.handler_mutex = False - self.is_handler_loaded = False - test_vecs = np.array([np.random.random(self.num_dim)], dtype=self.dtype) - assert self.query(test_vecs, 1) is not None - - def _add(self, keys: Generator, vectors: Generator, *args, **kwargs) -> None: - """Add the embeddings and document ids to the index. - - NOTE:: - - This replaces the parent class' `_add` since we - need to adapt to use Generators from the dump loading - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param vectors: embeddings - :param args: not used - :param kwargs: not used - """ - keys = np.array(list(keys), (np.str_, self.key_length)) - vectors_nr = 0 - for vector in vectors: - if not getattr(self, 'num_dim', None): - self.num_dim = vector.shape[0] - self.dtype = vector.dtype.name - self.write_handler.write(vector.tobytes()) - vectors_nr += 1 - - if vectors_nr != keys.shape[0]: - raise ValueError( - f'Different number of vectors and keys. {vectors_nr} vectors and {len(keys)} keys. Validate your dump' - ) - - self.valid_indices = np.concatenate( - (self.valid_indices, np.full(len(keys), True)) - ) - self.key_bytes += keys.tobytes() - self._size += keys.shape[0] diff --git a/jina/executors/indexers/vector.py b/jina/executors/indexers/vector.py deleted file mode 100644 index 67bc3c5580fa7..0000000000000 --- a/jina/executors/indexers/vector.py +++ /dev/null @@ -1,542 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import gzip -import io -import os -import random -from functools import lru_cache -from os import path -from typing import Optional, Iterable, Tuple, Dict, Union - -import numpy as np - -from . import BaseVectorIndexer -from ..decorators import batching -from ...helper import cached_property -from ...importer import ImportExtensions - - -class BaseNumpyIndexer(BaseVectorIndexer): - """ - :class:`BaseNumpyIndexer` stores and loads vector in a compresses binary file - - .. note:: - :attr:`compress_level` balances between time and space. By default, :classL`NumpyIndexer` has - :attr:`compress_level` = 0. - - Setting :attr:`compress_level`>0 gives a smaller file size on the disk in the index time. However, in the query - time it loads all data into memory at once. Not ideal for large scale application. - - Setting :attr:`compress_level`=0 enables :func:`np.memmap`, which loads data in an on-demand way and - gives smaller memory footprint in the query time. However, it often gives larger file size on the disk. - - :param compress_level: The compresslevel argument is an integer from 0 to 9 controlling the - level of compression; 1 is fastest and produces the least compression, - and 9 is slowest and produces the most compression. 0 is no compression - at all. The default is 9. - :param ref_indexer: Bootstrap the current indexer from a ``ref_indexer``. This enables user to switch - the query algorithm at the query time. - :param delete_on_dump: whether to delete the rows marked as delete (see ``valid_indices``) - """ - - def __init__( - self, - compress_level: int = 1, - ref_indexer: Optional['BaseNumpyIndexer'] = None, - delete_on_dump: bool = False, - *args, - **kwargs, - ): - super().__init__(*args, **kwargs) - self.num_dim = None - self.dtype = None - self.delete_on_dump = delete_on_dump - self.compress_level = compress_level - self.key_bytes = b'' - self.valid_indices = np.array([], dtype=bool) - self.ref_indexer_workspace_name = None - - if ref_indexer: - # copy the header info of the binary file - self.num_dim = ref_indexer.num_dim - self.dtype = ref_indexer.dtype - self.compress_level = ref_indexer.compress_level - self.key_bytes = ref_indexer.key_bytes - self.key_length = ref_indexer.key_length - self._size = ref_indexer._size - # point to the ref_indexer.index_filename - # so that later in `post_init()` it will load from the referred index_filename - self.valid_indices = ref_indexer.valid_indices - self.index_filename = ref_indexer.index_filename - self.logger.warning( - f'\n' - f'num_dim extracted from `ref_indexer` to {ref_indexer.num_dim} \n' - f'_size extracted from `ref_indexer` to {ref_indexer._size} \n' - f'dtype extracted from `ref_indexer` to {ref_indexer.dtype} \n' - f'compress_level overridden from `ref_indexer` to {ref_indexer.compress_level} \n' - f'index_filename overridden from `ref_indexer` to {ref_indexer.index_filename}' - ) - self.ref_indexer_workspace_name = ref_indexer.workspace_name - self.delete_on_dump = getattr(ref_indexer, 'delete_on_dump', delete_on_dump) - - def _delete_invalid_indices(self): - valid = self.valid_indices[self.valid_indices == True] # noqa - if len(valid) != len(self.valid_indices): - self._clean_memmap() - self._post_clean_memmap(valid) - - def _post_clean_memmap(self, valid): - # here we need to make sure the fields - # that depend on the valid_indices are cleaned up too - valid_key_bytes = np.frombuffer( - self.key_bytes, dtype=(np.str_, self.key_length) - )[self.valid_indices].tobytes() - self.key_bytes = valid_key_bytes - self._size = len(valid) - self.valid_indices = valid - del self._int2ext_id - del self._ext2int_id - - def _clean_memmap(self): - # clean up the underlying matrix of entries marked for deletion - # first we need to make sure we flush the writing handler - if self.write_handler and not self.write_handler.closed: - with self.write_handler as f: - f.flush() - self.handler_mutex = False - # force the raw_ndarray (underlying matrix) to re-read from disk - # (needed when there were writing ops to be flushed) - del self._raw_ndarray - filtered = self._raw_ndarray[self.valid_indices] - # we need an intermediary file - tmp_path = self.index_abspath + 'tmp' - - # write the bytes in the respective files - if self.compress_level > 0: - with gzip.open( - tmp_path, 'wb', compresslevel=self.compress_level - ) as new_gzip_fh: - new_gzip_fh.write(filtered.tobytes()) - else: - with open(tmp_path, 'wb') as filtered_data_fh: - filtered_data_fh.write(filtered.tobytes()) - - os.remove(self.index_abspath) - os.rename(tmp_path, self.index_abspath) - # force it to re-read again from the new file - del self._raw_ndarray - - def __getstate__(self): - # called on pickle save - if self.delete_on_dump: - self._delete_invalid_indices() - d = super().__getstate__() - return d - - @property - def workspace_name(self): - """Get the workspace name. - - - .. # noqa: DAR201 - """ - return ( - self.name - if self.ref_indexer_workspace_name is None - else self.ref_indexer_workspace_name - ) - - @property - def index_abspath(self) -> str: - """Get the file path of the index storage - - Use index_abspath - - - .. # noqa: DAR201 - """ - return self.get_file_from_workspace(self.index_filename) - - def get_add_handler(self) -> 'io.BufferedWriter': - """Open a binary gzip file for appending new vectors - - :return: a gzip file stream - """ - if self.compress_level > 0: - return gzip.open( - self.index_abspath, 'ab', compresslevel=self.compress_level - ) - else: - return open(self.index_abspath, 'ab') - - def get_create_handler(self) -> 'io.BufferedWriter': - """Create a new gzip file for adding new vectors. The old vectors are replaced. - - :return: a gzip file stream - """ - if self.compress_level > 0: - return gzip.open( - self.index_abspath, 'wb', compresslevel=self.compress_level - ) - else: - return open(self.index_abspath, 'wb') - - def _validate_key_vector_shapes(self, keys, vectors): - if len(vectors.shape) != 2: - raise ValueError( - f'vectors shape {vectors.shape} is not valid, expecting "vectors" to have rank of 2' - ) - - if not getattr(self, 'num_dim', None): - self.num_dim = vectors.shape[1] - self.dtype = vectors.dtype.name - elif self.num_dim != vectors.shape[1]: - raise ValueError( - f'vectors shape {vectors.shape} does not match with indexers\'s dim: {self.num_dim}' - ) - elif self.dtype != vectors.dtype.name: - raise TypeError( - f'vectors\' dtype {vectors.dtype.name} does not match with indexers\'s dtype: {self.dtype}' - ) - - if keys.shape[0] != vectors.shape[0]: - raise ValueError( - f'number of key {keys.shape[0]} not equal to number of vectors {vectors.shape[0]}' - ) - - def add(self, keys: Iterable[str], vectors: 'np.ndarray', *args, **kwargs) -> None: - """Add the embeddings and document ids to the index. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param vectors: embeddings - :param args: not used - :param kwargs: not used - """ - np_keys = np.array(keys, (np.str_, self.key_length)) - self._add(np_keys, vectors) - - def _add(self, keys: 'np.ndarray', vectors: 'np.ndarray'): - if keys.size and vectors.size: - self._validate_key_vector_shapes(keys, vectors) - self.write_handler.write(vectors.tobytes()) - self.valid_indices = np.concatenate( - (self.valid_indices, np.full(len(keys), True)) - ) - self.key_bytes += keys.tobytes() - self._size += keys.shape[0] - - def update( - self, keys: Iterable[str], vectors: 'np.ndarray', *args, **kwargs - ) -> None: - """Update the embeddings on the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param vectors: embeddings - :param args: not used - :param kwargs: not used - """ - # noinspection PyTypeChecker - if self.size: - keys, values = self._filter_nonexistent_keys_values( - keys, vectors, self._ext2int_id.keys() - ) - if keys: - np_keys = np.array(keys, (np.str_, self.key_length)) - self._delete(np_keys) - self._add(np_keys, np.array(values)) - else: - self.logger.error(f'{self!r} is empty, update is aborted') - - def _delete(self, keys): - if keys.size: - for key in keys: - # mark as `False` in mask - self.valid_indices[self._ext2int_id[key]] = False - self._size -= 1 - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - """Delete the embeddings from the index via document ids. - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: not used - :param kwargs: not used - """ - if self.size: - keys = self._filter_nonexistent_keys(keys, self._ext2int_id.keys()) - if keys: - np_keys = np.array(keys, (np.str_, self.key_length)) - self._delete(np_keys) - else: - self.logger.error(f'{self!r} is empty, deletion is aborted') - - def get_query_handler(self) -> Optional['np.ndarray']: - """Open a gzip file and load it as a numpy ndarray - - :return: a numpy ndarray of vectors - """ - if np.all(self.valid_indices): - vecs = self._raw_ndarray - else: - vecs = self._raw_ndarray[self.valid_indices] - - if vecs is not None: - return self.build_advanced_index(vecs) - - def build_advanced_index(self, vecs: 'np.ndarray'): - """Not implemented here. - - - .. # noqa: DAR201 - - - .. # noqa: DAR101 - """ - raise NotImplementedError - - def _load_gzip(self, abspath: str, mode='rb') -> Optional['np.ndarray']: - try: - self.logger.info(f'loading index from {abspath}...') - with gzip.open(abspath, mode) as fp: - return np.frombuffer(fp.read(), dtype=self.dtype).reshape( - [-1, self.num_dim] - ) - except EOFError: - self.logger.error( - f'{abspath} is broken/incomplete, perhaps forgot to ".close()" in the last usage?' - ) - - @cached_property - def _raw_ndarray(self) -> Union['np.ndarray', 'np.memmap', None]: - if not (path.exists(self.index_abspath) or self.num_dim or self.dtype): - return - - if self.compress_level > 0: - return self._load_gzip(self.index_abspath) - elif self.size is not None and os.stat(self.index_abspath).st_size: - self.logger.success(f'memmap is enabled for {self.index_abspath}') - # `==` is required. `is False` does not work in np - deleted_keys = len(self.valid_indices[self.valid_indices == False]) # noqa - return np.memmap( - self.index_abspath, - dtype=self.dtype, - mode='r', - shape=(self.size + deleted_keys, self.num_dim), - ) - - def sample(self) -> Optional[bytes]: - """Return a random entry from the indexer for sanity check. - - :return: A random entry from the indexer. - """ - k = random.sample(list(self._ext2int_id.values()), k=1)[0] - return self._raw_ndarray[k] - - def __iter__(self): - return self._raw_ndarray.__iter__() - - def query_by_key( - self, keys: Iterable[str], *args, **kwargs - ) -> Optional['np.ndarray']: - """ - Search the index by the external key (passed during `.add(`). - - :param keys: a list of ``id``, i.e. ``doc.id`` in protobuf - :param args: not used - :param kwargs: not used - :return: ndarray of vectors - """ - keys = self._filter_nonexistent_keys(keys, self._ext2int_id.keys()) - if keys: - indices = [self._ext2int_id[key] for key in keys] - return self._raw_ndarray[indices] - else: - return None - - @cached_property - def _int2ext_id(self) -> Optional['np.ndarray']: - """Convert internal ids (0,1,2,3,4,...) to external ids (random strings) - - - .. # noqa: DAR201 - """ - if self.key_bytes: - r = np.frombuffer(self.key_bytes, dtype=(np.str_, self.key_length)) - # `==` is required. `is False` does not work in np - deleted_keys = len(self.valid_indices[self.valid_indices == False]) # noqa - if r.shape[0] == (self.size + deleted_keys) == self._raw_ndarray.shape[0]: - return r - else: - print( - f'the size of the keys and vectors are inconsistent ' - f'({r.shape[0]}, {self._size}, {self._raw_ndarray.shape[0]}), ' - f'did you write to this index twice? or did you forget to save indexer?' - ) - self.logger.error( - f'the size of the keys and vectors are inconsistent ' - f'({r.shape[0]}, {self._size}, {self._raw_ndarray.shape[0]}), ' - f'did you write to this index twice? or did you forget to save indexer?' - ) - - @cached_property - def _ext2int_id(self) -> Optional[Dict]: - """Convert external ids (random strings) to internal ids (0,1,2,3,4,...) - - - .. # noqa: DAR201 - """ - if self._int2ext_id is not None: - return {k: idx for idx, k in enumerate(self._int2ext_id)} - - -@lru_cache(maxsize=3) -def _get_ones(x, y): - return np.ones((x, y)) - - -def _ext_A(A): - nA, dim = A.shape - A_ext = _get_ones(nA, dim * 3) - A_ext[:, dim : 2 * dim] = A - A_ext[:, 2 * dim :] = A ** 2 - return A_ext - - -def _ext_B(B): - nB, dim = B.shape - B_ext = _get_ones(dim * 3, nB) - B_ext[:dim] = (B ** 2).T - B_ext[dim : 2 * dim] = -2.0 * B.T - del B - return B_ext - - -def _euclidean(A_ext, B_ext): - sqdist = A_ext.dot(B_ext).clip(min=0) - return np.sqrt(sqdist) - - -def _norm(A): - return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True) - - -def _cosine(A_norm_ext, B_norm_ext): - return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2 - - -class NumpyIndexer(BaseNumpyIndexer): - """An exhaustive vector indexers implemented with numpy and scipy. - - .. note:: - Metrics other than `cosine` and `euclidean` requires ``scipy`` installed. - - :param metric: The distance metric to use. `braycurtis`, `canberra`, `chebyshev`, `cityblock`, `correlation`, - `cosine`, `dice`, `euclidean`, `hamming`, `jaccard`, `jensenshannon`, `kulsinski`, - `mahalanobis`, - `matching`, `minkowski`, `rogerstanimoto`, `russellrao`, `seuclidean`, `sokalmichener`, - `sokalsneath`, `sqeuclidean`, `wminkowski`, `yule`. - :param backend: `numpy` or `scipy`, `numpy` only supports `euclidean` and `cosine` distance - :param compress_level: compression level to use - """ - - batch_size = 512 - - def __init__( - self, - metric: str = 'cosine', - backend: str = 'numpy', - compress_level: int = 0, - *args, - **kwargs, - ): - super().__init__(*args, compress_level=compress_level, **kwargs) - self.metric = metric - self.backend = backend - - @staticmethod - def _get_sorted_top_k( - dist: 'np.array', top_k: int - ) -> Tuple['np.ndarray', 'np.ndarray']: - """Find top-k smallest distances in ascending order. - - Idea is to use partial sort to retrieve top-k smallest distances unsorted and then sort these - in ascending order. Equivalent to full sort but faster for n >> k. If k >= n revert to full sort. - - :param dist: the distances - :param top_k: nr to limit - :return: tuple of indices, computed distances - """ - if top_k >= dist.shape[1]: - idx = dist.argsort(axis=1)[:, :top_k] - dist = np.take_along_axis(dist, idx, axis=1) - else: - idx_ps = dist.argpartition(kth=top_k, axis=1)[:, :top_k] - dist = np.take_along_axis(dist, idx_ps, axis=1) - idx_fs = dist.argsort(axis=1) - idx = np.take_along_axis(idx_ps, idx_fs, axis=1) - dist = np.take_along_axis(dist, idx_fs, axis=1) - - return idx, dist - - def query( - self, vectors: 'np.ndarray', top_k: int, *args, **kwargs - ) -> Tuple['np.ndarray', 'np.ndarray']: - """Find the top-k vectors with smallest ``metric`` and return their ids in ascending order. - - :return: a tuple of two ndarray. - The first is ids in shape B x K (`dtype=int`), the second is metric in shape B x K (`dtype=float`) - - .. warning:: - This operation is memory-consuming. - - Distance (the smaller the better) is returned, not the score. - - :param vectors: the vectors with which to search - :param args: not used - :param kwargs: not used - :param top_k: nr of results to return - :return: tuple of indices within matrix and distances - """ - if self.size == 0: - return np.array([]), np.array([]) - if self.metric not in {'cosine', 'euclidean'} or self.backend == 'scipy': - dist = self._cdist(vectors, self.query_handler) - elif self.metric == 'euclidean': - _query_vectors = _ext_A(vectors) - dist = self._euclidean(_query_vectors, self.query_handler) - elif self.metric == 'cosine': - _query_vectors = _ext_A(_norm(vectors)) - dist = self._cosine(_query_vectors, self.query_handler) - - idx, dist = self._get_sorted_top_k(dist, top_k) - indices = self._int2ext_id[self.valid_indices][idx] - return indices, dist - - def build_advanced_index(self, vecs: 'np.ndarray') -> 'np.ndarray': - """ - Build advanced index structure based on in-memory numpy ndarray, e.g. graph, tree, etc. - - :param vecs: The raw numpy ndarray. - :return: Advanced index. - """ - return vecs - - @batching(merge_over_axis=1, slice_on=2) - def _euclidean(self, cached_A, raw_B): - data = _ext_B(raw_B) - return _euclidean(cached_A, data) - - @batching(merge_over_axis=1, slice_on=2) - def _cosine(self, cached_A, raw_B): - data = _ext_B(_norm(raw_B)) - return _cosine(cached_A, data) - - @batching(merge_over_axis=1, slice_on=2) - def _cdist(self, *args, **kwargs): - with ImportExtensions(required=True): - from scipy.spatial.distance import cdist - return cdist(*args, **kwargs, metric=self.metric) - - -class VectorIndexer(NumpyIndexer): - """Alias to :class:`NumpyIndexer` """ diff --git a/jina/executors/metas.py b/jina/executors/metas.py index dbf2d43917421..adc65d4e8b43d 100644 --- a/jina/executors/metas.py +++ b/jina/executors/metas.py @@ -1,226 +1,19 @@ -"""The default meta config that all executors follow, they can be overridden by the YAML config - -.. warning:: - - When you define your own Executor class, make sure your attributes/methods name do not - conflict with the name listed below. - - -.. note:: - Essentially, the meta config can be set in two places: as part of the YAML file, or as the class attribute - via :func:`__init__` or in class definition. When multiple meta specification exists, the overwrite priority is: - - metas defined in YAML > metas defined as class attribute > metas default values listed below - - -Any executor inherited from :class:`BaseExecutor` always has the following **meta** fields: - - .. confval:: is_updated - - indicates if the executor is updated or changed since last save, if not then :func:`save` will do nothing. - A forced save is possible to use :func:`touch` before :func:`save` - - :type: bool - :default: ``False`` - - .. confval:: batch_size - - the size of each batch, methods decorated by :func:`@batching` will respect this. useful when incoming data is - too large to fit into (GPU) memory. - - :type: int - :default: ``None`` - - .. confval:: workspace - - the working directory, for persisting the artifacts of the executor. An artifact is a file or collection of files - used during a workflow run. - - By default it is not set, if you expect your executor to be persisted or to persist any data, remember to set it - to the desired value. - - When a `BaseExecutor` is a component of a `CompoundExecutor`, its `workspace` value will be overridden by the `workspace` - coming from the `CompoundExecutor` unless a particular `workspace` value is set for the component `BaseExecutor`. - - :type: str - :default: None - - .. confval:: name - - the name of the executor. - - :type: str - :default: class name plus a random string - - .. confval:: on_gpu - - if the executor is set to run on GPU. - - :type: bool - :default: ``False`` - - - .. confval:: py_modules - - the external python module paths. it is useful when you want to load external python modules - using :func:`BaseExecutor.load_config` from a YAML file. If a relative path is given then the root path is set to - the path of the current YAML file. - - Example of ``py_module`` usage: - - 1. This is a valid structure and it is RECOMMENDED: - - "my_cust_module" is a python module - - all core logic of your customized executor goes to ``__init__.py`` - - to import ``foo.py``, you can use relative import, e.g. ``from .foo import bar`` - - ``helper.py`` needs to be put BEFORE `__init__.py` in YAML ``py_modules`` - - This is also the structure given by ``jina hub new`` CLI. - - .. highlight:: text - .. code-block:: text - - my_cust_module - |- __init__.py - |- helper.py - |- config.yml - |- py_modules - |- helper.py - |- __init__.py - - 2. This is a valid structure but not recommended: - - "my_cust_module" is not a python module (lack of __init__.py under the root) - - to import ``foo.py``, you must to use ``from jinahub.foo import bar`` - - ``jinahub`` is a common namespace for all plugin-modules, not changeable. - - ``helper.py`` needs to be put BEFORE `my_cust.py` in YAML ``py_modules`` - - .. highlight:: text - .. code-block:: text - - my_cust_module - |- my_cust.py - |- helper.py - |- config.yml - |- py_modules - |- helper.py - |- my_cust.py - - :type: str/List[str] - :default: ``None`` - - .. confval:: pea_id - - the integer index used for distinguish each parallel pea of this executor, required in :attr:`shard_workspace` - - :type: int - :default: ``'${{root.metas.pea_id}}'`` - - .. confval:: root_workspace - - the workspace of the root executor. It will be the same as `executor` except in the case when an `Executor` inside a `CompoundExecutor` is used, - or when a `BaseNumpyIndexer` is used with a `ref_indexer`. - - By default, jina will try to find if a `dump` of the executor can be found in `workspace`, otherwise it will try to find it under `root_workspace` - assuming it may be part of a `CompoundExecutor`. - - :type: str - :default: ``'${{root.metas.workspace}}'`` - - .. confval:: root_name - - the name of the root executor. It will be the same as `executor` except in the case when an `Executor` inside a `CompoundExecutor` is used, - or when a `BaseNumpyIndexer` is used with a `ref_indexer` - - :type: str - :default: ``'${{root.metas.name}}'`` - - .. confval:: read_only - - do not allow the pod to modify the model, save calls will be ignored. If set to true no serialization of the executor - - :type: bool - :default: ``False`` - - .. warning:: - ``name`` and ``workspace`` must be set if you want to serialize/deserialize this executor. - - .. note:: - - ``pea_id`` is set in a way that when the executor ``A`` is used as - a component of a :class:`jina.executors.compound.CompoundExecutor` ``B``, then ``A``'s setting will be overridden by B's counterpart. - - These **meta** fields can be accessed via `self.name` or loaded from a YAML config via :func:`load_config`: - - .. highlight:: yaml - .. code-block:: yaml - - !MyAwesomeExecutor - with: - ... - metas: - name: my_transformer # a customized name - workspace: ./ # path for serialize/deserialize - - - -""" - -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - - -from typing import Dict, Union, List - -_defaults = None +from typing import Dict def get_default_metas() -> Dict: """ Get a copy of default meta variables. - :return: default metas - """ - import copy - - global _defaults - - if _defaults is None: - from ..jaml import JAML - from pkg_resources import resource_stream - - with resource_stream( - 'jina', '/'.join(('resources', 'executors.metas.default.yml')) - ) as fp: - _defaults = JAML.load( - fp - ) # do not expand variables at here, i.e. DO NOT USE expand_dict(yaml.load(fp)) + NOTE: DO NOT ADD MORE ENTRIES HERE! - return copy.deepcopy(_defaults) - - -def fill_metas_with_defaults(d: Dict) -> Dict: - """Fill the incomplete ``metas`` field with complete default values - - :param d: the loaded YAML map - :return: dictionary with injected metas + :return: a deep copy of the default metas in a new dict """ - def _scan(sub_d: Union[Dict, List]): - if isinstance(sub_d, Dict): - for k, v in sub_d.items(): - if k == 'metas': - _tmp = get_default_metas() - _tmp.update(v) - sub_d[k] = _tmp - elif isinstance(v, dict): - _scan(v) - elif isinstance(v, list): - _scan(v) - elif isinstance(sub_d, List): - for idx, v in enumerate(sub_d): - if isinstance(v, dict): - _scan(v) - elif isinstance(v, list): - _scan(v) - - _scan(d) - return d + # NOTE: DO NOT ADD MORE ENTRIES HERE! + return { + 'name': '', #: a string, the name of the executor + 'description': '', #: a string, the description of this executor. It will be used in automatics docs UI + 'workspace': '', #: a string, the workspace of the executor + 'py_modules': '', #: a list of strings, the python dependencies of the executor + } diff --git a/jina/executors/rankers/__init__.py b/jina/executors/rankers/__init__.py deleted file mode 100644 index ae6d41ab4b522..0000000000000 --- a/jina/executors/rankers/__init__.py +++ /dev/null @@ -1,155 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Dict, Optional, List - -import numpy as np - -from .. import BaseExecutor - - -class BaseRanker(BaseExecutor): - """ - The base class for a `Ranker` - :param query_required_keys: Set of keys or features to be extracted from query `Document` by the `Driver` so that - they are passed as query features or metainfo. - :param match_required_keys: Set of keys or features to be extracted from match `Document` by the `Driver` so that - they are passed as match features or metainfo. - :param args: Extra positional arguments - :param kwargs: Extra keyword arguments - - .. note:: - See how the attributes are accessed in :class:`Document` in :meth:`get_attrs`. - - .. highlight:: python - .. code-block:: python - - query = Document({'tags': {'color': 'blue'}) - match = Document({'tags': {'color': 'blue', 'price': 1000}}) - - ranker = BaseRanker(query_required_keys=('tags__color'), match_required_keys=('tags__color, 'tags__price') - """ - - def __init__( - self, - query_required_keys: Optional[List[str]] = None, - match_required_keys: Optional[List[str]] = None, - *args, - **kwargs - ): - """ - - :param query_required_keys: Set of keys or features to be extracted from query `Document` by the `Driver` so that - they are passed as query features or metainfo. - :param match_required_keys: Set of keys or features to be extracted from match `Document` by the `Driver` so that - they are passed as match features or metainfo. - :param args: Extra positional arguments - :param kwargs: Extra keyword arguments - - .. note:: - See how the attributes are accessed in :class:`Document` in :meth:`get_attrs`. - - .. highlight:: python - .. code-block:: python - - query = Document({'tags': {'color': 'blue'}) - match = Document({'tags': {'color': 'blue', 'price': 1000}}) - - ranker = BaseRanker(query_required_keys=('tags__color'), match_required_keys=('tags__color, 'tags__price') - """ - super().__init__(*args, **kwargs) - self.query_required_keys = query_required_keys - self.match_required_keys = match_required_keys - - def score(self, *args, **kwargs): - """Calculate the score. Base class method needs to be implemented in subclass. - :param args: Extra positional arguments - :param kwargs: Extra keyword arguments - """ - raise NotImplementedError - - -class Chunk2DocRanker(BaseRanker): - """A :class:`Chunk2DocRanker` translates the chunk-wise score (distance) to the doc-wise score. - - In the query-time, :class:`Chunk2DocRanker` is an almost-always required component. - Because in the end we want to retrieve top-k documents of given query-document not top-k chunks of - given query-chunks. The purpose of :class:`Chunk2DocRanker` is to aggregate the already existed top-k chunks - into documents. - - The key function here is :func:`score`. - - .. seealso:: - :mod:`jina.drivers.handlers.score` - - """ - - COL_PARENT_ID = 'match_parent_id' - COL_DOC_CHUNK_ID = 'match_doc_chunk_id' - COL_QUERY_CHUNK_ID = 'match_query_chunk_id' - COL_SCORE = 'score' - - def score( - self, match_idx: 'np.ndarray', query_chunk_meta: Dict, match_chunk_meta: Dict - ) -> float: - """ - Given a set of queries (that may correspond to the chunks of a root level query) and a set of matches - corresponding to the same parent id, compute the matching score of the common parent of the set of matches. - Returns a score corresponding to the score of the parent document of the matches in `match_idx` - - :param match_idx: A [N x 4] numpy ``ndarray``, column-wise: - - ``match_idx[:, 0]``: ``parent_id`` of the matched docs, integer - - ``match_idx[:, 1]``: ``id`` of the matched chunks, integer - - ``match_idx[:, 2]``: ``id`` of the query chunks, integer - - ``match_idx[:, 3]``: distance/metric/score between the query and matched chunks, float. - All the matches belong to the same `parent` - :param query_chunk_meta: The meta information of the query chunks, where the key is query chunks' ``chunk_id``, - the value is extracted by the ``query_required_keys``. - :param match_chunk_meta: The meta information of the matched chunks, where the key is matched chunks' - ``chunk_id``, the value is extracted by the ``match_required_keys``. - - - TODO: - - ``match_idx[:, 0]`` is redundant because all the matches have the same ``parent_id``. - - """ - raise NotImplementedError - - -class Match2DocRanker(BaseRanker): - """ - Re-scores the matches for a document. This Ranker is only responsible for - calculating new scores and not for the actual sorting. The sorting is handled - in the respective ``Matches2DocRankDriver``. - - Possible implementations: - - ReverseRanker (reverse scores of all matches) - - BucketShuffleRanker (first buckets matches and then sort each bucket). - """ - - COL_MATCH_ID = 'match_doc_chunk_id' - COL_SCORE = 'score' - - def score( - self, - old_matches_scores: List[List[float]], - queries_metas: List[Dict], - matches_metas: List[List[Dict]], - ) -> List[List[float]]: - """ - Calculates the new scores for matches and returns them. Returns an iterable of the scores to be assigned to the matches. - The returned scores need to be returned in the same order as the input `:param old_match_scores`. - - .. note:: - The length of `old_match_scores`, `queries_metas` and `matches_metas` correspond to the amount of queries in the batch for which - one wants to score its matches. - - Every Sequence in match metas correspond to the amount of retrieved matches per query. - - The resulting list of scores will provide a list of score for every query. And every list will be ordered in the same way as the `matches_metas` lists - - :param old_matches_scores: Contains old scores in a list for every query - :param queries_metas: List of dictionaries containing all the query meta information requested by the `query_required_keys` class_variable for each query in a batch. - :param matches_metas: List of lists containing all the matches meta information requested by the `match_required_keys` class_variable for every query. Sorted in the same way as `old_match_scores` - """ - raise NotImplementedError diff --git a/jina/executors/rankers/trainer.py b/jina/executors/rankers/trainer.py deleted file mode 100644 index 57048a095e44b..0000000000000 --- a/jina/executors/rankers/trainer.py +++ /dev/null @@ -1,26 +0,0 @@ -__copyright__ = "Copyright (c) 2021 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from .. import BaseExecutor - - -class RankerTrainer(BaseExecutor): - """Class :class:`RankerTrainer` is used to train a ranker for ranker fine-tunning purpose. - such as offline-learning and online-learning. - """ - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def train(self, *args, **kwargs): - """Train ranker based on user feedback, updating ranker weights based on - the `loss` function. - - :param args: Additional arguments. - :param kwargs: Additional key value arguments. - """ - raise NotImplementedError - - def save(self): - """Save the of the ranker model.""" - raise NotImplementedError diff --git a/jina/executors/requests.py b/jina/executors/requests.py deleted file mode 100644 index 74ac0d5abcba0..0000000000000 --- a/jina/executors/requests.py +++ /dev/null @@ -1,46 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Dict, List - -from ..jaml import JAML - -_defaults = {} - - -def get_default_reqs(cls_mro: List[type]) -> Dict: - """Get a copy of default meta variables - - :param cls_mro: the MRO inherited order followed. - """ - import copy - - global _defaults - - for cls in cls_mro: - try: - if cls.__name__ not in _defaults: - from pkg_resources import resource_stream - - with resource_stream( - 'jina', - '/'.join(('resources', f'executors.requests.{cls.__name__}.yml')), - ) as fp: - _defaults[cls.__name__] = JAML.load( - fp - ) # do not expand variables at here, i.e. DO NOT USE expand_dict(yaml.load(fp)) - - if cls.__name__ != cls_mro[0].__name__: - from ..logging import default_logger - - default_logger.debug( - f'"requests.on" setting of {cls_mro[0]} fallback to general {cls} setting, ' - f'because you did not specify {cls_mro[0]}' - ) - return copy.deepcopy(_defaults[cls.__name__]) - except FileNotFoundError: - pass - - raise ValueError( - f'not able to find any default settings along this chain {cls_mro!r}' - ) diff --git a/jina/executors/segmenters/__init__.py b/jina/executors/segmenters/__init__.py deleted file mode 100644 index e031ce694f79b..0000000000000 --- a/jina/executors/segmenters/__init__.py +++ /dev/null @@ -1,21 +0,0 @@ -__copyright__ = "Copyright (c) 2021 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Dict, List, Union - -from .. import BaseExecutor - - -class BaseSegmenter(BaseExecutor): - """:class:`BaseSegmenter` works on doc-level, - it chunks Documents into set of Chunks - :param args: Variable length arguments - :param kwargs: Variable length keyword arguments - """ - - def segment(self, *args, **kwargs) -> Union[List[List[Dict]], List[Dict]]: - """ - :param args: Variable length arguments - :param kwargs: Variable length keyword arguments - """ - raise NotImplementedError diff --git a/jina/flow/__init__.py b/jina/flow/__init__.py index 12b7997b9c868..bc133c05ed8cd 100644 --- a/jina/flow/__init__.py +++ b/jina/flow/__init__.py @@ -1,12 +1,9 @@ from .base import BaseFlow -from .mixin.control import ControlFlowMixin -from .mixin.crud import CRUDFlowMixin +from ..clients.mixin import PostMixin -class Flow(CRUDFlowMixin, ControlFlowMixin, BaseFlow): +class Flow(PostMixin, BaseFlow): """The synchronous version of :class:`AsyncFlow`. For proper usage see `this guide` """ - - pass diff --git a/jina/flow/asyncio.py b/jina/flow/asyncio.py index 95c718902dbb2..067dc76440e2a 100644 --- a/jina/flow/asyncio.py +++ b/jina/flow/asyncio.py @@ -1,10 +1,9 @@ from .base import BaseFlow -from .mixin.async_crud import AsyncCRUDFlowMixin -from .mixin.async_control import AsyncControlFlowMixin from ..clients.asyncio import AsyncClient, AsyncWebSocketClient +from ..clients.mixin import AsyncPostMixin -class AsyncFlow(AsyncCRUDFlowMixin, AsyncControlFlowMixin, BaseFlow): +class AsyncFlow(AsyncPostMixin, BaseFlow): """ :class:`AsyncFlow` is the asynchronous version of the :class:`Flow`. They share the same interface, except in :class:`AsyncFlow` :meth:`train`, :meth:`index`, :meth:`search` methods are coroutines @@ -38,28 +37,7 @@ class AsyncFlow(AsyncCRUDFlowMixin, AsyncControlFlowMixin, BaseFlow): https://ipython.readthedocs.io/en/stable/interactive/autoawait.html Another example is when using Jina as an integration. Say you have another IO-bounded job ``heavylifting()``, you - can use this feature to schedule Jina ``index()`` and ``heavylifting()`` concurrently. For example, - - .. highlight:: python - .. code-block:: python - - async def run_async_flow_5s(): - # WaitDriver pause 5s makes total roundtrip ~5s - with AsyncFlow().add(uses='- !WaitDriver {}') as f: - await f.index_ndarray(np.random.random([5, 4]), on_done=validate) - - - async def heavylifting(): - # total roundtrip takes ~5s - print('heavylifting other io-bound jobs, e.g. download, upload, file io') - await asyncio.sleep(5) - print('heavylifting done after 5s') - - - async def concurrent_main(): - # about 5s; but some dispatch cost, can't be just 5s, usually at <7s - await asyncio.gather(run_async_flow_5s(), heavylifting()) - + can use this feature to schedule Jina ``index()`` and ``heavylifting()`` concurrently. One can think of :class:`Flow` as Jina-managed eventloop, whereas :class:`AsyncFlow` is self-managed eventloop. """ diff --git a/jina/flow/base.py b/jina/flow/base.py index 956d4a635dded..d8ce5539862a1 100644 --- a/jina/flow/base.py +++ b/jina/flow/base.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import argparse import base64 import copy @@ -8,6 +5,7 @@ import re import threading import uuid +import warnings from collections import OrderedDict, defaultdict from contextlib import ExitStack from typing import Optional, Union, Tuple, List, Set, Dict, TextIO @@ -32,7 +30,7 @@ __all__ = ['BaseFlow'] from ..peapods import Pod -from ..peapods.pods.compoundpod import CompoundPod +from ..peapods.pods.compound import CompoundPod from ..peapods.pods.factory import PodFactory @@ -106,19 +104,6 @@ def _update_args(self, args, **kwargs): args, _flow_parser ) #: for yaml dump - @property - def yaml_spec(self): - """ - get the YAML representation of the instance - - - .. # noqa: DAR401 - - - .. # noqa: DAR201 - """ - return JAML.dump(self) - @staticmethod def _parse_endpoints(op_flow, pod_name, endpoint, connect_to_last_pod=False) -> Set: # parsing needs @@ -307,6 +292,10 @@ def add( parser = set_gateway_parser() args = ArgNamespace.kwargs2namespace(kwargs, parser) + + # pod workspace if not set then derive from flow workspace + args.workspace = os.path.abspath(args.workspace or self.workspace) + op_flow._pod_nodes[pod_name] = PodFactory.build_pod(args, needs) op_flow.last_pod = pod_name @@ -364,7 +353,6 @@ def inspect(self, name: str = 'inspect', *args, **kwargs) -> 'BaseFlow': def gather_inspect( self, name: str = 'gather_inspect', - uses='_merge_eval', include_last_pod: bool = True, *args, **kwargs, @@ -378,7 +366,6 @@ def gather_inspect( in general you don't need to manually call :meth:`gather_inspect`. :param name: the name of the gather Pod - :param uses: the config of the executor, by default is ``_pass`` :param include_last_pod: if to include the last modified Pod in the Flow :param args: args for .add() :param kwargs: kwargs for .add() @@ -396,7 +383,6 @@ def gather_inspect( needs.append(self.last_pod) return self.add( name=name, - uses=uses, needs=needs, pod_role=PodRoleType.JOIN_INSPECT, *args, @@ -609,8 +595,13 @@ def __eq__(self, other: 'BaseFlow') -> bool: return a._pod_nodes == b._pod_nodes + @property @build_required(FlowBuildLevel.GRAPH) - def _get_client(self, **kwargs) -> 'Client': + def client(self) -> 'Client': + """Return a :class:`Client` object attach to this Flow. + + .. # noqa: DAR201""" + kwargs = {} kwargs.update(self._common_kwargs) if 'port_expose' not in kwargs: kwargs['port_expose'] = self.port_expose @@ -792,27 +783,6 @@ def _mermaid_to_url(self, mermaid_str: str, img_type: str) -> str: return f'https://mermaid.ink/{img_type}/{encoded_str}' - @build_required(FlowBuildLevel.GRAPH) - def to_swarm_yaml(self, path: TextIO): - """ - Generate the docker swarm YAML compose file - - :param path: the output yaml path - """ - swarm_yml = {'version': '3.4', 'services': {}} - - for k, v in self._pod_nodes.items(): - if v.role == PodRoleType.GATEWAY: - cmd = 'jina gateway' - else: - cmd = 'jina pod' - swarm_yml['services'][k] = { - 'command': f'{cmd} {" ".join(ArgNamespace.kwargs2list(vars(v.args)))}', - 'deploy': {'parallel': 1}, - } - - JAML.dump(swarm_yml, path) - @property @build_required(FlowBuildLevel.GRAPH) def port_expose(self) -> int: @@ -919,6 +889,13 @@ def _update_client(self): if self._pod_nodes['gateway'].args.restful: self._cls_client = WebSocketClient + @property + def workspace(self) -> str: + """Return the workspace path of the flow. + + .. # noqa: DAR201""" + return os.path.abspath(self.args.workspace or './') + @property def workspace_id(self) -> Dict[str, str]: """Get all Pods' ``workspace_id`` values in a dict @@ -939,9 +916,7 @@ def workspace_id(self, value: str): for k, p in self: if hasattr(p.args, 'workspace_id'): p.args.workspace_id = value - args = getattr(p, 'peas_args', None) - if args is None: - args = getattr(p, 'replicas_args', None) + args = getattr(p, 'peas_args', getattr(p, 'replicas_args', None)) if args is None: raise ValueError( f'could not find "peas_args" or "replicas_args" on {p}' @@ -990,6 +965,14 @@ def rolling_update(self, pod_name: str, dump_path: Optional[str] = None): :param dump_path: the path from which to read the dump data :param pod_name: pod to update """ + # TODO: By design after the Flow object started, Flow shouldn't have memory access to its sub-objects anymore. + # All controlling should be issued via Network Request, not via memory access. + # In the current master, we have Flow.rolling_update() & Flow.dump() method avoid the above design. + # Avoiding this design make the whole system NOT cloud-native. + warnings.warn( + 'This function is experimental and facing potential refactoring', + FutureWarning, + ) compound_pod = self._pod_nodes[pod_name] if isinstance(compound_pod, CompoundPod): @@ -998,13 +981,3 @@ def rolling_update(self, pod_name: str, dump_path: Optional[str] = None): raise ValueError( f'The BasePod {pod_name} is not a CompoundPod and does not support updating' ) - - def dump(self, pod_name: str, dump_path: str, shards: int, timeout=-1): - """Emit a Dump request to a specific Pod - :param shards: the nr of shards in the dump - :param dump_path: the path to which to dump - :param pod_name: the name of the pod - :param timeout: time to wait (seconds) - """ - pod: BasePod = self._pod_nodes[pod_name] - pod.dump(pod_name, dump_path, shards, timeout) diff --git a/jina/flow/builder.py b/jina/flow/builder.py index c38c8cacfe98f..15afb9af34eca 100644 --- a/jina/flow/builder.py +++ b/jina/flow/builder.py @@ -5,12 +5,11 @@ from .. import __default_host__ from ..enums import SocketType, FlowBuildLevel, PodRoleType from ..excepts import FlowBuildLevelError, SocketTypeError -from ..peapods.pods import BasePod +from ..peapods import BasePod # noinspection PyUnreachableCode if False: from . import Flow - from ..peapods import BasePod def build_required(required_level: 'FlowBuildLevel'): diff --git a/jina/flow/mixin/async_control.py b/jina/flow/mixin/async_control.py deleted file mode 100644 index 2e8cdf4cad01b..0000000000000 --- a/jina/flow/mixin/async_control.py +++ /dev/null @@ -1,30 +0,0 @@ -from typing import Union, Sequence - -from ...clients.base import CallbackFnType - - -class AsyncControlFlowMixin: - """The asynchronous version of the Mixin for controlling, scaling the Flow""" - - async def reload( - self, - targets: Union[str, Sequence[str]], - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Reload the executor of certain peas/pods in the Flow - It will start a :py:class:`CLIClient` and call :py:func:`reload`. - - :param targets: the regex string or list of regex strings to match the pea/pod names. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yield: result - """ - async for r in self._get_client(**kwargs).reload( - targets, on_done, on_error, on_always, **kwargs - ): - yield r diff --git a/jina/flow/mixin/async_crud.py b/jina/flow/mixin/async_crud.py deleted file mode 100644 index f72ad5b39fcae..0000000000000 --- a/jina/flow/mixin/async_crud.py +++ /dev/null @@ -1,611 +0,0 @@ -import warnings -from typing import Union, Iterable, TextIO, Dict, Optional - -import numpy as np - -from ...clients.base import InputType, CallbackFnType -from ...enums import DataInputType -from ...helper import deprecated_alias - - -class AsyncCRUDFlowMixin: - """The asynchronous version of the Mixin for CRUD in Flow""" - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def train( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do training on the current Flow - - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - warnings.warn(f'{self.train} is under heavy refactoring', FutureWarning) - async for r in self._get_client(**kwargs).train( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def index_ndarray( - self, - array: 'np.ndarray', - axis: int = 0, - size: Optional[int] = None, - shuffle: bool = False, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Using numpy ndarray as the index source for the current Flow - - :param array: the numpy ndarray data source - :param axis: iterate over that axis - :param size: the maximum number of the sub arrays - :param shuffle: shuffle the the numpy data source beforehand - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_ndarray - - async for r in self._get_client(**kwargs).index( - _input_ndarray(array, axis, size, shuffle), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def search_ndarray( - self, - array: 'np.ndarray', - axis: int = 0, - size: Optional[int] = None, - shuffle: bool = False, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a numpy ndarray as the query source for searching on the current Flow - - :param array: the numpy ndarray data source - :param axis: iterate over that axis - :param size: the maximum number of the sub arrays - :param shuffle: shuffle the the numpy data source beforehand - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_ndarray - - async for r in self._get_client(**kwargs).search( - _input_ndarray(array, axis, size, shuffle), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def index_lines( - self, - lines: Optional[Union[Iterable[str], TextIO]] = None, - filepath: Optional[str] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: str = 'r', - line_format: str = 'json', - field_resolver: Optional[Dict[str, str]] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - - :param lines: a list of strings, each is considered as d document - :param filepath: a text file that each line contains a document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in binary - :param line_format: the format of each line: ``json`` or ``csv`` - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_lines - - async for r in self._get_client(**kwargs).index( - _input_lines( - lines, - filepath, - size=size, - sampling_rate=sampling_rate, - read_mode=read_mode, - line_format=line_format, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ): - yield r - - async def index_csv( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Dict[str, str] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_csv - - async for r in self._get_client(**kwargs).index( - _input_csv( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ): - yield r - - async def index_ndjson( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_ndjson - - async for r in self._get_client(**kwargs).index( - _input_ndjson( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def index_files( - self, - patterns: Union[str, Iterable[str]], - recursive: bool = True, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: Optional[str] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a set of files as the index source for indexing on the current Flow - - :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]' - :param recursive: If recursive is true, the pattern '**' will match any files and - zero or more directories and subdirectories. - :param size: the maximum number of the files - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in binary mode - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_files - - async for r in self._get_client(**kwargs).index( - _input_files(patterns, recursive, size, sampling_rate, read_mode), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def search_files( - self, - patterns: Union[str, Iterable[str]], - recursive: bool = True, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: Optional[str] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a set of files as the query source for searching on the current Flow - - :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]' - :param recursive: If recursive is true, the pattern '**' will match any files and - zero or more directories and subdirectories. - :param size: the maximum number of the files - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_files - - async for r in self._get_client(**kwargs).search( - _input_files(patterns, recursive, size, sampling_rate, read_mode), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ): - yield r - - async def search_ndjson( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of files as the query source for searching on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_ndjson - - async for r in self._get_client(**kwargs).search( - _input_ndjson( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ): - yield r - - async def search_csv( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_csv - - async for r in self._get_client(**kwargs).search( - _input_csv( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def search_lines( - self, - lines: Optional[Union[Iterable[str], TextIO]] = None, - filepath: Optional[str] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: str = 'r', - line_format: str = 'json', - field_resolver: Optional[Dict[str, str]] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of files as the query source for searching on the current Flow - - :param filepath: a text file that each line contains a document - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in binary - :param line_format: the format of each line: ``json`` or ``csv`` - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - from ...clients.sugary_io import _input_lines - - async for r in self._get_client(**kwargs).search( - _input_lines( - lines, - filepath, - size=size, - sampling_rate=sampling_rate, - read_mode=read_mode, - line_format=line_format, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def index( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do indexing on the current Flow - - It will start a :py:class:`CLIClient` and call :py:func:`index`. - - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - async for r in self._get_client(**kwargs).index( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def update( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do updates on the current Flow - - It will start a :py:class:`CLIClient` and call :py:func:`index`. - - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - async for r in self._get_client(**kwargs).update( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def delete( - self, - ids: Iterable[str], - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do deletion on the current Flow - - :param ids: An iterable of ids - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - async for r in self._get_client(**kwargs).delete( - ids, on_done, on_error, on_always, **kwargs - ): - yield r - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - async def search( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do searching on the current Flow - - It will start a :py:class:`CLIClient` and call :py:func:`search`. - - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :yields: results - """ - async for r in self._get_client(**kwargs).search( - inputs, on_done, on_error, on_always, **kwargs - ): - yield r diff --git a/jina/flow/mixin/control.py b/jina/flow/mixin/control.py deleted file mode 100644 index 90b3306bc8d77..0000000000000 --- a/jina/flow/mixin/control.py +++ /dev/null @@ -1,29 +0,0 @@ -from typing import Union, Sequence - -from ...clients.base import CallbackFnType - - -class ControlFlowMixin: - """The synchronous version of the Mixin for controlling, scaling the Flow""" - - def reload( - self, - targets: Union[str, Sequence[str]], - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Reload the executor of certain peas/pods in the Flow - It will start a :py:class:`CLIClient` and call :py:func:`reload`. - - :param targets: the regex string or list of regex strings to match the pea/pod names. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - return self._get_client(**kwargs).reload( - targets, on_done, on_error, on_always, **kwargs - ) diff --git a/jina/flow/mixin/crud.py b/jina/flow/mixin/crud.py deleted file mode 100644 index cb54e4346e974..0000000000000 --- a/jina/flow/mixin/crud.py +++ /dev/null @@ -1,565 +0,0 @@ -import warnings -from typing import Union, Iterable, TextIO, Dict, Optional - -import numpy as np - -from ...clients.base import InputType, InputDeleteType, CallbackFnType -from ...enums import DataInputType -from ...helper import deprecated_alias - - -class CRUDFlowMixin: - """The synchronous version of the Mixin for CRUD in Flow""" - - @deprecated_alias(input_fn=('inputs', 0)) - def train( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do training on the current Flow - - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - warnings.warn(f'{self.train} is under heavy refactoring', FutureWarning) - return self._get_client(**kwargs).train( - inputs, on_done, on_error, on_always, **kwargs - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def index_ndarray( - self, - array: 'np.ndarray', - axis: int = 0, - size: Optional[int] = None, - shuffle: bool = False, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Using numpy ndarray as the index source for the current Flow - - :param array: the numpy ndarray data source - :param axis: iterate over that axis - :param size: the maximum number of the sub arrays - :param shuffle: shuffle the the numpy data source beforehand - :param on_done: the callback function to invoke after indexing - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_ndarray - - return self._get_client(**kwargs).index( - _input_ndarray(array, axis, size, shuffle), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def search_ndarray( - self, - array: 'np.ndarray', - axis: int = 0, - size: Optional[int] = None, - shuffle: bool = False, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a numpy ndarray as the query source for searching on the current Flow - - :param array: the numpy ndarray data source - :param axis: iterate over that axis - :param size: the maximum number of the sub arrays - :param shuffle: shuffle the the numpy data source beforehand - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - """ - from ...clients.sugary_io import _input_ndarray - - self._get_client(**kwargs).search( - _input_ndarray(array, axis, size, shuffle), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def index_lines( - self, - lines: Optional[Union[Iterable[str], TextIO]] = None, - filepath: Optional[str] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: str = 'r', - line_format: str = 'json', - field_resolver: Optional[Dict[str, str]] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - :param lines: a list of strings, each is considered as d document - :param filepath: a text file that each line contains a document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in binary - :param line_format: the format of each line: ``json`` or ``csv`` - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_lines - - return self._get_client(**kwargs).index( - _input_lines( - lines, - filepath, - size=size, - sampling_rate=sampling_rate, - read_mode=read_mode, - line_format=line_format, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ) - - def index_ndjson( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_ndjson - - return self._get_client(**kwargs).index( - _input_ndjson( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ) - - def index_csv( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_csv - - return self._get_client(**kwargs).index( - _input_csv( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ) - - def search_csv( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of lines as the index source for indexing on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_csv - - return self._get_client(**kwargs).search( - _input_csv( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def index_files( - self, - patterns: Union[str, Iterable[str]], - recursive: bool = True, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: Optional[str] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a set of files as the index source for indexing on the current Flow - :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]' - :param recursive: If recursive is true, the pattern '**' will match any files and - zero or more directories and subdirectories. - :param size: the maximum number of the files - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in binary mode - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_files - - return self._get_client(**kwargs).index( - _input_files(patterns, recursive, size, sampling_rate, read_mode), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def search_files( - self, - patterns: Union[str, Iterable[str]], - recursive: bool = True, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: Optional[str] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a set of files as the query source for searching on the current Flow - :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]' - :param recursive: If recursive is true, the pattern '**' will match any files and - zero or more directories and subdirectories. - :param size: the maximum number of the files - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_files - - return self._get_client(**kwargs).search( - _input_files(patterns, recursive, size, sampling_rate, read_mode), - on_done, - on_error, - on_always, - data_type=DataInputType.CONTENT, - **kwargs, - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def search_lines( - self, - lines: Optional[Union[Iterable[str], TextIO]] = None, - filepath: Optional[str] = None, - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - read_mode: str = 'r', - line_format: str = 'json', - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of files as the query source for searching on the current Flow - :param filepath: a text file that each line contains a document - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param read_mode: specifies the mode in which the file - is opened. 'r' for reading in text mode, 'rb' for reading in binary - :param line_format: the format of each line ``json`` or ``csv`` - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_lines - - return self._get_client(**kwargs).search( - _input_lines( - lines, - filepath, - size=size, - sampling_rate=sampling_rate, - read_mode=read_mode, - line_format=line_format, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ) - - def search_ndjson( - self, - lines: Union[Iterable[str], TextIO], - field_resolver: Optional[Dict[str, str]] = None, - size: Optional[int] = None, - sampling_rate: Optional[float] = None, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Use a list of files as the query source for searching on the current Flow - :param lines: a list of strings, each is considered as d document - :param size: the maximum number of the documents - :param sampling_rate: the sampling rate between [0, 1] - :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field - names defined in Protobuf. This is only used when the given ``document`` is - a JSON string or a Python dict. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - from ...clients.sugary_io import _input_ndjson - - return self._get_client(**kwargs).search( - _input_ndjson( - lines, - size=size, - sampling_rate=sampling_rate, - field_resolver=field_resolver, - ), - on_done, - on_error, - on_always, - data_type=DataInputType.AUTO, - **kwargs, - ) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def index( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do indexing on the current Flow - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - return self._get_client(**kwargs).index( - inputs, on_done, on_error, on_always, **kwargs - ) - - @deprecated_alias(input_fn=('inputs', 0)) - def update( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Updates Documents on the current Flow - - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - """ - self._get_client(**kwargs).update( - inputs, on_done, on_error, on_always, **kwargs - ) - - def delete( - self, - ids: InputDeleteType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do deletion on the current Flow - - :param ids: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - """ - self._get_client(**kwargs).delete(ids, on_done, on_error, on_always, **kwargs) - - @deprecated_alias( - input_fn=('inputs', 0), - buffer=('inputs', 1), - callback=('on_done', 1), - output_fn=('on_done', 1), - ) - def search( - self, - inputs: InputType, - on_done: CallbackFnType = None, - on_error: CallbackFnType = None, - on_always: CallbackFnType = None, - **kwargs, - ): - """Do searching on the current Flow - It will start a :py:class:`CLIClient` and call :py:func:`search`. - - :param inputs: An iterator of bytes. If not given, then you have to specify it in **kwargs**. - :param on_done: the function to be called when the :class:`Request` object is resolved. - :param on_error: the function to be called when the :class:`Request` object is rejected. - :param on_always: the function to be called when the :class:`Request` object is is either resolved or rejected. - :param kwargs: accepts all keyword arguments of `jina client` CLI - :return: results - """ - return self._get_client(**kwargs).search( - inputs, on_done, on_error, on_always, **kwargs - ) diff --git a/jina/helloworld/chatbot/__init__.py b/jina/helloworld/chatbot/__init__.py index 9cbfa9558d014..e69de29bb2d1d 100644 --- a/jina/helloworld/chatbot/__init__.py +++ b/jina/helloworld/chatbot/__init__.py @@ -1,71 +0,0 @@ -import os -import webbrowser -from pathlib import Path - -from pkg_resources import resource_filename - -from ..helper import download_data -from ... import Flow -from ...importer import ImportExtensions -from ...logging import default_logger - - -def hello_world(args): - """ - Execute the chatbot example. - - :param args: arguments passed from CLI - """ - Path(args.workdir).mkdir(parents=True, exist_ok=True) - - with ImportExtensions( - required=True, - help_text='this demo requires Pytorch and Transformers to be installed, ' - 'if you haven\'t, please do `pip install jina[torch,transformers]`', - ): - import transformers, torch - - assert [torch, transformers] #: prevent pycharm auto remove the above line - - targets = { - 'covid-csv': { - 'url': args.index_data_url, - 'filename': os.path.join(args.workdir, 'dataset.csv'), - } - } - - # download the data - download_data(targets, args.download_proxy, task_name='download csv data') - - # this envs are referred in index and query flow YAMLs - os.environ['HW_WORKDIR'] = args.workdir - - # now comes the real work - # load index flow from a YAML file - - f = ( - Flow() - .add(uses='TransformerTorchEncoder', parallel=args.parallel) - .add( - uses=f'{resource_filename("jina", "resources")}/chatbot/helloworld.indexer.yml' - ) - ) - - # index it! - with f, open(targets['covid-csv']['filename']) as fp: - f.index_csv(fp, field_resolver={'question': 'text', 'url': 'uri'}) - - # switch to REST gateway - f.use_rest_gateway(args.port_expose) - with f: - try: - webbrowser.open(args.demo_url, new=2) - except: - pass # intentional pass, browser support isn't cross-platform - finally: - default_logger.success( - f'You should see a demo page opened in your browser, ' - f'if not, you may open {args.demo_url} manually' - ) - if not args.unblock_query_flow: - f.block() diff --git a/jina/helloworld/chatbot/app.py b/jina/helloworld/chatbot/app.py new file mode 100644 index 0000000000000..e9aacd86de802 --- /dev/null +++ b/jina/helloworld/chatbot/app.py @@ -0,0 +1,103 @@ +import os +import urllib.request +import webbrowser +from pathlib import Path + +from jina import Flow, Document +from jina.importer import ImportExtensions +from jina.logging import default_logger +from jina.logging.profile import ProgressBar +from jina.parsers.helloworld import set_hw_chatbot_parser + +if __name__ == '__main__': + from executors import MyTransformer, MyIndexer +else: + from .executors import MyTransformer, MyIndexer + + +def hello_world(args): + """ + Execute the chatbot example. + + :param args: arguments passed from CLI + """ + Path(args.workdir).mkdir(parents=True, exist_ok=True) + + with ImportExtensions( + required=True, + help_text='this demo requires Pytorch and Transformers to be installed, ' + 'if you haven\'t, please do `pip install jina[torch,transformers]`', + ): + import transformers, torch + + assert [torch, transformers] #: prevent pycharm auto remove the above line + + targets = { + 'covid-csv': { + 'url': args.index_data_url, + 'filename': os.path.join(args.workdir, 'dataset.csv'), + } + } + + # download the data + download_data(targets, args.download_proxy, task_name='download csv data') + + # now comes the real work + # load index flow from a YAML file + + f = ( + Flow() + .add(uses=MyTransformer, parallel=args.parallel) + .add(uses=MyIndexer, workspace=args.workdir) + ) + + # index it! + with f, open(targets['covid-csv']['filename']) as fp: + f.index(Document.from_csv(fp, field_resolver={'question': 'text'})) + + # switch to REST gateway + url_html_path = 'file://' + os.path.abspath( + os.path.join(os.path.dirname(os.path.realpath(__file__)), 'static/index.html') + ) + f.use_rest_gateway(args.port_expose) + with f: + try: + webbrowser.open(url_html_path, new=2) + except: + pass # intentional pass, browser support isn't cross-platform + finally: + default_logger.success( + f'You should see a demo page opened in your browser, ' + f'if not, you may open {url_html_path} manually' + ) + if not args.unblock_query_flow: + f.block() + + +def download_data(targets, download_proxy=None, task_name='download fashion-mnist'): + """ + Download data. + + :param targets: target path for data. + :param download_proxy: download proxy (e.g. 'http', 'https') + :param task_name: name of the task + """ + opener = urllib.request.build_opener() + opener.addheaders = [('User-agent', 'Mozilla/5.0')] + if download_proxy: + proxy = urllib.request.ProxyHandler( + {'http': download_proxy, 'https': download_proxy} + ) + opener.add_handler(proxy) + urllib.request.install_opener(opener) + with ProgressBar(task_name=task_name, batch_unit='') as t: + for k, v in targets.items(): + if not os.path.exists(v['filename']): + urllib.request.urlretrieve( + v['url'], v['filename'], reporthook=lambda *x: t.update_tick(0.01) + ) + + +if __name__ == '__main__': + args = set_hw_chatbot_parser().parse_args() + hello_world(args) diff --git a/jina/helloworld/chatbot/executors.py b/jina/helloworld/chatbot/executors.py new file mode 100644 index 0000000000000..746f42441f558 --- /dev/null +++ b/jina/helloworld/chatbot/executors.py @@ -0,0 +1,169 @@ +import os +from pathlib import Path +from typing import Optional, Dict, Tuple + +import numpy as np +import torch +from transformers import AutoModel, AutoTokenizer + +from jina import Executor, DocumentArray, requests, Document + + +class MyTransformer(Executor): + """Transformer executor class """ + + def __init__( + self, + pretrained_model_name_or_path: str = 'sentence-transformers/distilbert-base-nli-stsb-mean-tokens', + base_tokenizer_model: Optional[str] = None, + pooling_strategy: str = 'mean', + layer_index: int = -1, + max_length: Optional[int] = None, + acceleration: Optional[str] = None, + embedding_fn_name: str = '__call__', + *args, + **kwargs, + ): + super().__init__(*args, **kwargs) + self.pretrained_model_name_or_path = pretrained_model_name_or_path + self.base_tokenizer_model = ( + base_tokenizer_model or pretrained_model_name_or_path + ) + self.pooling_strategy = pooling_strategy + self.layer_index = layer_index + self.max_length = max_length + self.acceleration = acceleration + self.embedding_fn_name = embedding_fn_name + self.tokenizer = AutoTokenizer.from_pretrained(self.base_tokenizer_model) + self.model = AutoModel.from_pretrained( + self.pretrained_model_name_or_path, output_hidden_states=True + ) + self.model.to(torch.device('cpu')) + + def _compute_embedding(self, hidden_states: 'torch.Tensor', input_tokens: Dict): + import torch + + fill_vals = {'cls': 0.0, 'mean': 0.0, 'max': -np.inf, 'min': np.inf} + fill_val = torch.tensor( + fill_vals[self.pooling_strategy], device=torch.device('cpu') + ) + + layer = hidden_states[self.layer_index] + attn_mask = input_tokens['attention_mask'].unsqueeze(-1).expand_as(layer) + layer = torch.where(attn_mask.bool(), layer, fill_val) + + embeddings = layer.sum(dim=1) / attn_mask.sum(dim=1) + return embeddings.cpu().numpy() + + @requests + def encode(self, docs: 'DocumentArray', *args, **kwargs): + import torch + + with torch.no_grad(): + + if not self.tokenizer.pad_token: + self.tokenizer.add_special_tokens({'pad_token': '[PAD]'}) + self.model.resize_token_embeddings(len(self.tokenizer.vocab)) + + input_tokens = self.tokenizer( + docs.get_attributes('content'), + max_length=self.max_length, + padding='longest', + truncation=True, + return_tensors='pt', + ) + input_tokens = { + k: v.to(torch.device('cpu')) for k, v in input_tokens.items() + } + + outputs = getattr(self.model, self.embedding_fn_name)(**input_tokens) + if isinstance(outputs, torch.Tensor): + return outputs.cpu().numpy() + hidden_states = outputs.hidden_states + + embeds = self._compute_embedding(hidden_states, input_tokens) + for doc, embed in zip(docs, embeds): + doc.embedding = embed + + +class MyIndexer(Executor): + """Simple indexer class """ + + def __init__(self, **kwargs): + super().__init__(**kwargs) + self._docs = DocumentArray() + Path(self.workspace).mkdir(parents=True, exist_ok=True) + self.filename = os.path.join(self.workspace, 'chatbot.ndjson') + if os.path.exists(self.filename): + self._docs = DocumentArray.load(self.filename) + + def close(self) -> None: + self._docs.save(self.filename) + + @requests(on='/index') + def index(self, docs: 'DocumentArray', **kwargs): + self._docs.extend(docs) + + @requests(on='/search') + def search(self, docs: 'DocumentArray', **kwargs): + a = np.stack(docs.get_attributes('embedding')) + b = np.stack(self._docs.get_attributes('embedding')) + q_emb = _ext_A(_norm(a)) + d_emb = _ext_B(_norm(b)) + dists = _cosine(q_emb, d_emb) + idx, dist = self._get_sorted_top_k(dists, 1) + for _q, _ids, _dists in zip(docs, idx, dist): + for _id, _dist in zip(_ids, _dists): + d = Document(self._docs[int(_id)], copy=True) + d.score.value = 1 - _dist + _q.matches.append(d) + + @staticmethod + def _get_sorted_top_k( + dist: 'np.array', top_k: int + ) -> Tuple['np.ndarray', 'np.ndarray']: + if top_k >= dist.shape[1]: + idx = dist.argsort(axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx, axis=1) + else: + idx_ps = dist.argpartition(kth=top_k, axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx_ps, axis=1) + idx_fs = dist.argsort(axis=1) + idx = np.take_along_axis(idx_ps, idx_fs, axis=1) + dist = np.take_along_axis(dist, idx_fs, axis=1) + + return idx, dist + + +def _get_ones(x, y): + return np.ones((x, y)) + + +def _ext_A(A): + nA, dim = A.shape + A_ext = _get_ones(nA, dim * 3) + A_ext[:, dim : 2 * dim] = A + A_ext[:, 2 * dim :] = A ** 2 + return A_ext + + +def _ext_B(B): + nB, dim = B.shape + B_ext = _get_ones(dim * 3, nB) + B_ext[:dim] = (B ** 2).T + B_ext[dim : 2 * dim] = -2.0 * B.T + del B + return B_ext + + +def _euclidean(A_ext, B_ext): + sqdist = A_ext.dot(B_ext).clip(min=0) + return np.sqrt(sqdist) + + +def _norm(A): + return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True) + + +def _cosine(A_norm_ext, B_norm_ext): + return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2 diff --git a/jina/helloworld/chatbot/static/index.html b/jina/helloworld/chatbot/static/index.html new file mode 100644 index 0000000000000..56f0fb801ecd7 --- /dev/null +++ b/jina/helloworld/chatbot/static/index.html @@ -0,0 +1,59 @@ + + + + + COVID-19 Simple QA Demo + + + + + + + + + + +
+
+

Covid-19 Simple QA

+

Powered by Jina

+
+ +
+
+
+
+
+
+ + +
+
+ +
+ Chatbox UI credited to https://codepen.io/supah/ Copyright reserved Fabio Ottaviani
+ Covid19 dataset from https://www.kaggle.com/xhlulu/covidqa
+
+ +
+
+ + + + + + + + diff --git a/jina/helloworld/chatbot/static/license.txt b/jina/helloworld/chatbot/static/license.txt new file mode 100644 index 0000000000000..c9870ab94316f --- /dev/null +++ b/jina/helloworld/chatbot/static/license.txt @@ -0,0 +1,8 @@ +Copyright (c) 2021 by Fabio Ottaviani (https://codepen.io/supah/pen/jqOBqp) + +Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. + diff --git a/jina/helloworld/chatbot/static/script.js b/jina/helloworld/chatbot/static/script.js new file mode 100644 index 0000000000000..990a1a91658ea --- /dev/null +++ b/jina/helloworld/chatbot/static/script.js @@ -0,0 +1,100 @@ +var $messages = $('.messages-content'), + d, h, m, + i = 0; + +$(window).load(function () { + $messages.mCustomScrollbar(); + setTimeout(function () { + fakeMessage("Hi there, please ask me COVID-19 related questions. For example,
What is Covid-19?
"); + }, 100); +}); + +function updateScrollbar() { + $messages.mCustomScrollbar("update").mCustomScrollbar('scrollTo', 'bottom', { + scrollInertia: 10, + timeout: 0 + }); +} + +function setDate() { + d = new Date() + if (m != d.getMinutes()) { + m = d.getMinutes(); + $('
' + d.getHours() + ':' + m + '
').appendTo($('.message:last')); + } +} + +function insertMessage() { + msg = $('.message-input').val(); + if ($.trim(msg) == '') { + return false; + } + $('
' + msg + '
').appendTo($('.mCSB_container')).addClass('new'); + setDate(); + $('.message-input').val(null); + updateScrollbar(); + setTimeout(function () { + jinaMessage(msg); + }, 1000 + (Math.random() * 20) * 100); +} + +$('.message-submit').click(function () { + insertMessage(); +}); + +$(window).on('keydown', function (e) { + if (e.which == 13) { + insertMessage(); + return false; + } +}) + + +function fakeMessage(msg) { + if ($('.message-input').val() != '') { + return false; + } + $('
').appendTo($('.mCSB_container')); + updateScrollbar(); + + setTimeout(function () { + $('.message.loading').remove(); + $('
' + msg + '
').appendTo($('.mCSB_container')).addClass('new'); + setDate(); + updateScrollbar(); + }, 200); +} + + +function jinaMessage(question) { + if ($('.message-input').val() != '') { + return false; + } + + $('
').appendTo($('.mCSB_container')); + updateScrollbar(); + + $.ajax({ + type: "POST", + url: $('#jina-server-addr').val() + "/search", + data: JSON.stringify({"data": [question], "top_k": 3}), + contentType: "application/json; charset=utf-8", + dataType: "json", + }).success(function (data, textStatus, jqXHR) { + console.info(data) + var top_answer = data['data']['docs'][0]['matches'][0] + $('.message.loading').remove(); + $('
' + + '
' + + '
' + + '
' + top_answer["text"] + '
' + + top_answer["tags"]["answer"] + + '
').appendTo($('.mCSB_container')).addClass('new'); + setDate(); + updateScrollbar(); + }).fail(function () { + setTimeout(function () { + fakeMessage("Connection failed, did you run
jina hello chatbot
on local? Is your address
" + $('#jina-server-addr').val() + "
?"); + }, 100); + }); +} diff --git a/jina/helloworld/chatbot/static/style.css b/jina/helloworld/chatbot/static/style.css new file mode 100644 index 0000000000000..8f3b3d4cadf61 --- /dev/null +++ b/jina/helloworld/chatbot/static/style.css @@ -0,0 +1,458 @@ +/*-------------------- +Mixins +--------------------*/ +/*-------------------- +Body +--------------------*/ +*, +*::before, +*::after { + box-sizing: border-box; +} + +html, +body { + height: 100%; +} + +body { + /*background: linear-gradient(135deg, #044f48, #2a7561);*/ + background-size: cover; + font-family: "Open Sans", sans-serif; + font-size: 12px; + line-height: 1.3; + overflow: hidden; +} + +.bg { + width: 100%; + height: 100%; + top: 0; + left: 0; + z-index: 1; + /*background: url("https://images.unsplash.com/photo-1451186859696-371d9477be93?crop=entropy&fit=crop&fm=jpg&h=975&ixjsv=2.1.0&ixlib=rb-0.3.5&q=80&w=1925") no-repeat 0 0;*/ + filter: blur(80px); + transform: scale(1.2); +} + +/*-------------------- +Chat +--------------------*/ +.chat { + position: absolute; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + width: 300px; + height: 50vh; + max-height: 500px; + z-index: 2; + overflow: hidden; + box-shadow: 0 5px 30px rgba(0, 0, 0, 0.2); + background: rgba(0, 0, 0, 0.5); + border-radius: 20px; + display: flex; + justify-content: space-between; + flex-direction: column; +} + +/*-------------------- +Chat Title +--------------------*/ +.chat-title { + flex: 0 1 45px; + position: relative; + z-index: 2; + background: rgba(0, 0, 0, 0.2); + color: #fff; + text-transform: uppercase; + text-align: left; + padding: 10px 10px 10px 50px; +} + +.chat-title h1, .chat-title h2 { + font-weight: normal; + font-size: 10px; + margin: 0; + padding: 0; +} + +.chat-title h2 { + color: rgba(255, 255, 255, 0.5); + font-size: 8px; + letter-spacing: 1px; +} + +.chat-title .avatar { + position: absolute; + z-index: 1; + top: 8px; + left: 9px; + border-radius: 30px; + width: 30px; + height: 30px; + overflow: hidden; + margin: 0; + padding: 0; + border: 2px solid rgba(255, 255, 255, 0.24); +} + +.chat-title .avatar img { + width: 100%; + height: auto; +} + +/*-------------------- +Messages +--------------------*/ +.messages { + flex: 1 1 auto; + color: rgba(255, 255, 255, 0.5); + overflow: hidden; + position: relative; + width: 100%; +} + +.messages .messages-content { + position: absolute; + top: 0; + left: 0; + height: 101%; + width: 100%; +} + +.messages .message { + max-width: 80%; + clear: both; + float: left; + padding: 6px 10px 7px; + border-radius: 10px 10px 10px 0; + background: rgba(0, 0, 0, 0.3); + margin: 8px 0; + font-size: 11px; + line-height: 1.4; + margin-left: 35px; + position: relative; + text-shadow: 0 1px 1px rgba(0, 0, 0, 0.2); +} + +.messages .message .timestamp { + position: absolute; + bottom: -15px; + font-size: 9px; + color: rgba(255, 255, 255, 0.3); +} + +.messages .message::before { + content: ""; + position: absolute; + bottom: -6px; + border-top: 6px solid rgba(0, 0, 0, 0.3); + left: 0; + border-right: 7px solid transparent; +} + +.messages .message .avatar { + position: absolute; + z-index: 1; + bottom: -15px; + left: -35px; + border-radius: 30px; + width: 30px; + height: 30px; + overflow: hidden; + margin: 0; + padding: 0; + border: 2px solid rgba(255, 255, 255, 0.24); +} + +.messages .message .avatar img { + width: 100%; + height: auto; +} + +.messages .message.message-personal { + float: right; + color: #fff; + text-align: right; + background: linear-gradient(120deg, #248A52, #257287); + border-radius: 10px 10px 0 10px; +} + +.messages .message.message-personal::before { + left: auto; + right: 0; + border-right: none; + border-left: 5px solid transparent; + border-top: 4px solid #257287; + bottom: -4px; +} + +.messages .message:last-child { + margin-bottom: 30px; +} + +.messages .message.new { + transform: scale(0); + transform-origin: 0 0; + -webkit-animation: bounce 500ms linear both; + animation: bounce 500ms linear both; +} + +.messages .message.loading::before { + position: absolute; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + content: ""; + display: block; + width: 3px; + height: 3px; + border-radius: 50%; + background: rgba(255, 255, 255, 0.5); + z-index: 2; + margin-top: 4px; + -webkit-animation: ball 0.45s cubic-bezier(0, 0, 0.15, 1) alternate infinite; + animation: ball 0.45s cubic-bezier(0, 0, 0.15, 1) alternate infinite; + border: none; + -webkit-animation-delay: 0.15s; + animation-delay: 0.15s; +} + +.messages .message.loading span { + display: block; + font-size: 0; + width: 20px; + height: 10px; + position: relative; +} + +.messages .message.loading span::before { + position: absolute; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + content: ""; + display: block; + width: 3px; + height: 3px; + border-radius: 50%; + background: rgba(255, 255, 255, 0.5); + z-index: 2; + margin-top: 4px; + -webkit-animation: ball 0.45s cubic-bezier(0, 0, 0.15, 1) alternate infinite; + animation: ball 0.45s cubic-bezier(0, 0, 0.15, 1) alternate infinite; + margin-left: -7px; +} + +.messages .message.loading span::after { + position: absolute; + top: 50%; + left: 50%; + transform: translate(-50%, -50%); + content: ""; + display: block; + width: 3px; + height: 3px; + border-radius: 50%; + background: rgba(255, 255, 255, 0.5); + z-index: 2; + margin-top: 4px; + -webkit-animation: ball 0.45s cubic-bezier(0, 0, 0.15, 1) alternate infinite; + animation: ball 0.45s cubic-bezier(0, 0, 0.15, 1) alternate infinite; + margin-left: 7px; + -webkit-animation-delay: 0.3s; + animation-delay: 0.3s; +} + +/*-------------------- +Message Box +--------------------*/ +.message-box { + flex: 0 1 40px; + width: 100%; + background: rgba(0, 0, 0, 0.3); + padding: 10px; + position: relative; +} + +.message-box .message-input { + background: none; + border: none; + outline: none !important; + resize: none; + color: rgba(255, 255, 255, 0.7); + font-size: 11px; + height: 17px; + margin: 0; + padding-right: 20px; + width: 265px; +} + +.message-box textarea:focus:-webkit-placeholder { + color: transparent; +} + +.message-box .message-submit { + position: absolute; + z-index: 1; + top: 9px; + right: 10px; + color: #fff; + border: none; + background: #248A52; + font-size: 10px; + text-transform: uppercase; + line-height: 1; + padding: 6px 10px; + border-radius: 10px; + outline: none !important; + transition: background 0.2s ease; +} + +.message-box .message-submit:hover { + background: #1D7745; +} + +/*-------------------- +Custom Srollbar +--------------------*/ +.mCSB_scrollTools { + margin: 1px -3px 1px 0; + opacity: 0; +} + +.mCSB_inside > .mCSB_container { + margin-right: 0px; + padding: 0 10px; +} + +.mCSB_scrollTools .mCSB_dragger .mCSB_dragger_bar { + background-color: rgba(0, 0, 0, 0.5) !important; +} + +/*-------------------- +Bounce +--------------------*/ +@-webkit-keyframes bounce { + 0% { + transform: matrix3d(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 4.7% { + transform: matrix3d(0.45, 0, 0, 0, 0, 0.45, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 9.41% { + transform: matrix3d(0.883, 0, 0, 0, 0, 0.883, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 14.11% { + transform: matrix3d(1.141, 0, 0, 0, 0, 1.141, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 18.72% { + transform: matrix3d(1.212, 0, 0, 0, 0, 1.212, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 24.32% { + transform: matrix3d(1.151, 0, 0, 0, 0, 1.151, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 29.93% { + transform: matrix3d(1.048, 0, 0, 0, 0, 1.048, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 35.54% { + transform: matrix3d(0.979, 0, 0, 0, 0, 0.979, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 41.04% { + transform: matrix3d(0.961, 0, 0, 0, 0, 0.961, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 52.15% { + transform: matrix3d(0.991, 0, 0, 0, 0, 0.991, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 63.26% { + transform: matrix3d(1.007, 0, 0, 0, 0, 1.007, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 85.49% { + transform: matrix3d(0.999, 0, 0, 0, 0, 0.999, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 100% { + transform: matrix3d(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } +} + +@keyframes bounce { + 0% { + transform: matrix3d(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 4.7% { + transform: matrix3d(0.45, 0, 0, 0, 0, 0.45, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 9.41% { + transform: matrix3d(0.883, 0, 0, 0, 0, 0.883, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 14.11% { + transform: matrix3d(1.141, 0, 0, 0, 0, 1.141, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 18.72% { + transform: matrix3d(1.212, 0, 0, 0, 0, 1.212, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 24.32% { + transform: matrix3d(1.151, 0, 0, 0, 0, 1.151, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 29.93% { + transform: matrix3d(1.048, 0, 0, 0, 0, 1.048, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 35.54% { + transform: matrix3d(0.979, 0, 0, 0, 0, 0.979, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 41.04% { + transform: matrix3d(0.961, 0, 0, 0, 0, 0.961, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 52.15% { + transform: matrix3d(0.991, 0, 0, 0, 0, 0.991, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 63.26% { + transform: matrix3d(1.007, 0, 0, 0, 0, 1.007, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 85.49% { + transform: matrix3d(0.999, 0, 0, 0, 0, 0.999, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } + 100% { + transform: matrix3d(1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1, 0, 0, 0, 0, 1); + } +} + +@-webkit-keyframes ball { + from { + transform: translateY(0) scaleY(0.8); + } + to { + transform: translateY(-10px); + } +} + +@keyframes ball { + from { + transform: translateY(0) scaleY(0.8); + } + to { + transform: translateY(-10px); + } +} + +.question { + white-space: nowrap; + text-overflow: ellipsis; + overflow: hidden; + max-width: 80%; + padding-right: 1em; + margin-top: .5em; + margin-bottom: .5em; + background: white; + color: black; + border-radius: 5px; + padding-left: 1em; + opacity: 50%; +} + +.service-addr { + display: inline-block; + z-index: 2; + position: absolute; +} \ No newline at end of file diff --git a/jina/helloworld/fashion/__init__.py b/jina/helloworld/fashion/__init__.py index 424851829930d..e69de29bb2d1d 100644 --- a/jina/helloworld/fashion/__init__.py +++ b/jina/helloworld/fashion/__init__.py @@ -1,106 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import os -from pathlib import Path - -from pkg_resources import resource_filename - -from ..helper import ( - print_result, - write_html, - download_data, - index_generator, - query_generator, -) -from ...flow import Flow -from ...helper import countdown, colored - - -def hello_world(args): - """ - Runs Jina's Hello World. - - Usage: - Use it via CLI :command:`jina hello-world`. - - Description: - It downloads Fashion-MNIST dataset and :term:`Indexer` 50,000 images. - The index is stored into 4 *shards*. It randomly samples 128 unseen images as :term:`Queries` - Results are shown in a webpage. - - More options can be found in :command:`jina hello-world --help` - """ - - Path(args.workdir).mkdir(parents=True, exist_ok=True) - - targets = { - 'index-labels': { - 'url': args.index_labels_url, - 'filename': os.path.join(args.workdir, 'index-labels'), - }, - 'query-labels': { - 'url': args.query_labels_url, - 'filename': os.path.join(args.workdir, 'query-labels'), - }, - 'index': { - 'url': args.index_data_url, - 'filename': os.path.join(args.workdir, 'index-original'), - }, - 'query': { - 'url': args.query_data_url, - 'filename': os.path.join(args.workdir, 'query-original'), - }, - } - - # download the data - download_data(targets, args.download_proxy) - - # this envs are referred in index and query flow YAMLs - os.environ['PATH'] += ( - os.pathsep + resource_filename('jina', 'resources') + '/fashion/' - ) - os.environ['SHARDS'] = str(args.shards) - os.environ['PARALLEL'] = str(args.parallel) - os.environ['HW_WORKDIR'] = args.workdir - - # reduce the network load by using `fp16`, or even `uint8` - os.environ['JINA_ARRAY_QUANT'] = 'fp16' - - # now comes the real work - # load index flow from a YAML file - f = Flow.load_config(args.uses_index) - - # run it! - with f: - f.index( - index_generator(num_docs=targets['index']['data'].shape[0], target=targets), - request_size=args.index_request_size, - ) - - # wait for couple of seconds - countdown( - 8, - reason=colored( - 'behold! im going to switch to query mode', - 'cyan', - attrs=['underline', 'bold', 'reverse'], - ), - ) - - # now load query flow from another YAML file - f = Flow.load_config(args.uses_query) - # run it! - with f: - f.search( - query_generator( - num_docs=args.num_query, target=targets, with_groundtruth=True - ), - shuffle=True, - on_done=print_result, - request_size=args.query_request_size, - top_k=args.top_k, - ) - - # write result to html - write_html(os.path.join(args.workdir, 'hello-world.html')) diff --git a/jina/helloworld/fashion/app.py b/jina/helloworld/fashion/app.py new file mode 100644 index 0000000000000..5c00956d898d5 --- /dev/null +++ b/jina/helloworld/fashion/app.py @@ -0,0 +1,125 @@ +import os +from pathlib import Path + +from jina import Flow +from jina.helper import countdown +from jina.parsers.helloworld import set_hw_parser + +if __name__ == '__main__': + from helper import ( + print_result, + write_html, + download_data, + index_generator, + query_generator, + colored, + ) + from executors import MyEncoder, MyIndexer, MyEvaluator +else: + from .helper import ( + print_result, + write_html, + download_data, + index_generator, + query_generator, + colored, + ) + from .executors import MyEncoder, MyIndexer, MyEvaluator + +cur_dir = os.path.dirname(os.path.abspath(__file__)) + + +def hello_world(args): + """ + Runs Jina's Hello World. + + Usage: + Use it via CLI :command:`jina hello-world`. + + Description: + It downloads Fashion-MNIST dataset and :term:`Indexer` 50,000 images. + The index is stored into 4 *shards*. It randomly samples 128 unseen images as :term:`Queries` + Results are shown in a webpage. + + More options can be found in :command:`jina hello-world --help` + + :param args: Argparse object + """ + + Path(args.workdir).mkdir(parents=True, exist_ok=True) + + targets = { + 'index-labels': { + 'url': args.index_labels_url, + 'filename': os.path.join(args.workdir, 'index-labels'), + }, + 'query-labels': { + 'url': args.query_labels_url, + 'filename': os.path.join(args.workdir, 'query-labels'), + }, + 'index': { + 'url': args.index_data_url, + 'filename': os.path.join(args.workdir, 'index-original'), + }, + 'query': { + 'url': args.query_data_url, + 'filename': os.path.join(args.workdir, 'query-original'), + }, + } + + # download the data + download_data(targets, args.download_proxy) + + # reduce the network load by using `fp16`, or even `uint8` + os.environ['JINA_ARRAY_QUANT'] = 'fp16' + os.environ['HW_WORKDIR'] = args.workdir + + # now comes the real work + # load index flow from a YAML file + f = Flow().add(uses=MyEncoder, parallel=2).add(uses=MyIndexer).add(uses=MyEvaluator) + + # run it! + with f: + f.index( + index_generator(num_docs=targets['index']['data'].shape[0], target=targets), + request_size=args.request_size, + ) + + # wait for couple of seconds + countdown( + 3, + reason=colored( + 'behold! im going to switch to query mode', + 'cyan', + attrs=['underline', 'bold', 'reverse'], + ), + ) + + # f.search( + # query_generator( + # num_docs=args.num_query, target=targets, with_groundtruth=True + # ), + # shuffle=True, + # on_done=print_result, + # request_size=args.request_size, + # parameters={'top_k': args.top_k}, + # ) + + f.post( + '/eval', + query_generator( + num_docs=args.num_query, target=targets, with_groundtruth=True + ), + shuffle=True, + on_done=print_result, + request_size=args.request_size, + parameters={'top_k': args.top_k}, + ) + + # write result to html + write_html(os.path.join(args.workdir, 'demo.html')) + + +if __name__ == '__main__': + args = set_hw_parser().parse_args() + hello_world(args) diff --git a/jina/resources/fashion/helloworld.html b/jina/helloworld/fashion/demo.html similarity index 99% rename from jina/resources/fashion/helloworld.html rename to jina/helloworld/fashion/demo.html index b2994f591cf11..ecf55af2e0fb7 100644 --- a/jina/resources/fashion/helloworld.html +++ b/jina/helloworld/fashion/demo.html @@ -21,9 +21,11 @@

What just happened?


Intrigued? Learn more about Jina and checkout our Github! - +

Precision@{% TOP_K %}: {% PRECISION_EVALUATION %} Recall@{% TOP_K %}: {% RECALL_EVALUATION %}

+ +
{% RESULT %}

Query

Top-K Results

diff --git a/jina/helloworld/fashion/executors.py b/jina/helloworld/fashion/executors.py new file mode 100644 index 0000000000000..387f52a25cdae --- /dev/null +++ b/jina/helloworld/fashion/executors.py @@ -0,0 +1,149 @@ +from typing import Tuple, Dict + +import numpy as np + +from jina import Executor, DocumentArray, requests, Document + + +class MyIndexer(Executor): + def __init__(self, **kwargs): + super().__init__(**kwargs) + self._docs = DocumentArray() + + @requests(on='/index') + def index(self, docs: 'DocumentArray', **kwargs): + self._docs.extend(docs) + + @requests(on=['/search', '/eval']) + def search(self, docs: 'DocumentArray', parameters: Dict, **kwargs): + + a = np.stack(docs.get_attributes('embedding')) + b = np.stack(self._docs.get_attributes('embedding')) + q_emb = _ext_A(_norm(a)) + d_emb = _ext_B(_norm(b)) + dists = _cosine(q_emb, d_emb) + idx, dist = self._get_sorted_top_k(dists, int(parameters['top_k'])) + for _q, _ids, _dists in zip(docs, idx, dist): + for _id, _dist in zip(_ids, _dists): + d = Document(self._docs[int(_id)], copy=True) + d.score.value = 1 - _dist + _q.matches.append(d) + + @staticmethod + def _get_sorted_top_k( + dist: 'np.array', top_k: int + ) -> Tuple['np.ndarray', 'np.ndarray']: + if top_k >= dist.shape[1]: + idx = dist.argsort(axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx, axis=1) + else: + idx_ps = dist.argpartition(kth=top_k, axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx_ps, axis=1) + idx_fs = dist.argsort(axis=1) + idx = np.take_along_axis(idx_ps, idx_fs, axis=1) + dist = np.take_along_axis(dist, idx_fs, axis=1) + + return idx, dist + + +class MyEncoder(Executor): + def __init__(self, **kwargs): + super().__init__(**kwargs) + np.random.seed(1337) + # generate a random orthogonal matrix + H = np.random.rand(784, 64) + u, s, vh = np.linalg.svd(H, full_matrices=False) + self.oth_mat = u @ vh + + @requests + def encode(self, docs: 'DocumentArray', **kwargs): + # reduce dimension to 50 by random orthogonal projection + content = np.stack(docs.get_attributes('content')) + embeds = (content.reshape([-1, 784]) / 255) @ self.oth_mat + for doc, embed in zip(docs, embeds): + doc.embedding = embed + doc.convert_image_blob_to_uri(width=28, height=28) + doc.pop('blob') + + +def _get_ones(x, y): + return np.ones((x, y)) + + +def _ext_A(A): + nA, dim = A.shape + A_ext = _get_ones(nA, dim * 3) + A_ext[:, dim : 2 * dim] = A + A_ext[:, 2 * dim :] = A ** 2 + return A_ext + + +def _ext_B(B): + nB, dim = B.shape + B_ext = _get_ones(dim * 3, nB) + B_ext[:dim] = (B ** 2).T + B_ext[dim : 2 * dim] = -2.0 * B.T + del B + return B_ext + + +def _euclidean(A_ext, B_ext): + sqdist = A_ext.dot(B_ext).clip(min=0) + return np.sqrt(sqdist) + + +def _norm(A): + return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True) + + +def _cosine(A_norm_ext, B_norm_ext): + return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2 + + +class MyEvaluator(Executor): + def __init__(self, **kwargs): + super().__init__(**kwargs) + self.eval_at = 50 + self.num_docs = 0 + self.total_precision = 0 + self.total_recall = 0 + + @property + def avg_precision(self): + return self.total_precision / self.num_docs + + @property + def avg_recall(self): + return self.total_recall / self.num_docs + + def _precision(self, actual, desired): + if self.eval_at == 0: + return 0.0 + actual_at_k = actual[: self.eval_at] if self.eval_at else actual + ret = len(set(actual_at_k).intersection(set(desired))) + sub = len(actual_at_k) + return ret / sub if sub != 0 else 0.0 + + def _recall(self, actual, desired): + if self.eval_at == 0: + return 0.0 + actual_at_k = actual[: self.eval_at] if self.eval_at else actual + ret = len(set(actual_at_k).intersection(set(desired))) + return ret / len(desired) + + @requests(on='/eval') + def evaluate(self, docs: 'DocumentArray', groundtruths: 'DocumentArray', **kwargs): + for doc, groundtruth in zip(docs, groundtruths): + self.num_docs += 1 + actual = [match.tags['id'] for match in doc.matches] + desired = groundtruth.matches[0].tags['id'] # pseudo_match + precision_score = doc.evaluations.add() + self.total_precision += self._precision(actual, desired) + self.total_recall += self._recall(actual, desired) + precision_score.value = self.avg_precision + precision_score.op_name = f'Precision' + doc.evaluations.append(precision_score) + recall_score = doc.evaluations.add() + recall_score.value = self.avg_recall + recall_score.op_name = f'Recall' + doc.evaluations.append(recall_score) diff --git a/jina/helloworld/helper.py b/jina/helloworld/fashion/helper.py similarity index 87% rename from jina/helloworld/helper.py rename to jina/helloworld/fashion/helper.py index 1ae8294854958..ed444c21717d0 100644 --- a/jina/helloworld/helper.py +++ b/jina/helloworld/fashion/helper.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import gzip import os import random @@ -9,17 +6,16 @@ from collections import defaultdict import numpy as np -from pkg_resources import resource_filename -from .. import Document -from ..helper import colored -from ..logging import default_logger -from ..logging.profile import ProgressBar +from jina import Document +from jina.helper import colored +from jina.logging import default_logger +from jina.logging.profile import ProgressBar result_html = [] +top_k = 0 num_docs_evaluated = 0 evaluation_value = defaultdict(float) -top_k = 0 def _get_groundtruths(target, pseudo_match=True): @@ -29,7 +25,7 @@ def _get_groundtruths(target, pseudo_match=True): a = a[a[:, 0].argsort()] lbl_group = np.split(a[:, 1], np.unique(a[:, 0], return_index=True)[1][1:]) - # each label has one groundtruth, i.e. all docs have the same label are considered as matches + # each label has one groundtruth, i.e. all docs that have the same label are considered as matches groundtruths = {lbl: Document() for lbl in range(10)} for lbl, doc_ids in enumerate(lbl_group): if not pseudo_match: @@ -75,8 +71,10 @@ def query_generator(num_docs: int, target: dict, with_groundtruth: bool = True): num_data = len(target['query-labels']['data']) idx = random.randint(0, num_data - 1) d = Document(content=(target['query']['data'][idx])) + if with_groundtruth: - yield d, gts[target['query-labels']['data'][idx][0]] + gt = gts[target['query-labels']['data'][idx][0]] + yield d, gt else: yield d @@ -87,9 +85,9 @@ def print_result(resp): :param resp: returned response with data """ - global evaluation_value global top_k - for d in resp.search.docs: + global evaluation_value + for d in resp.docs: vi = d.uri result_html.append(f'') top_k = len(d.matches) @@ -110,27 +108,21 @@ def write_html(html_path): :param html_path: path of the written html """ - global num_docs_evaluated - global evaluation_value with open( - resource_filename( - 'jina', '/'.join(('resources', 'fashion', 'helloworld.html')) - ), - 'r', + os.path.join(os.path.dirname(os.path.realpath(__file__)), 'demo.html') ) as fp, open(html_path, 'w') as fw: t = fp.read() t = t.replace('{% RESULT %}', '\n'.join(result_html)) t = t.replace( '{% PRECISION_EVALUATION %}', - '{:.2f}%'.format(evaluation_value['PrecisionEvaluator'] * 100.0), + '{:.2f}%'.format(evaluation_value['Precision'] * 100.0), ) t = t.replace( '{% RECALL_EVALUATION %}', - '{:.2f}%'.format(evaluation_value['RecallEvaluator'] * 100.0), + '{:.2f}%'.format(evaluation_value['Recall'] * 100.0), ) t = t.replace('{% TOP_K %}', str(top_k)) - fw.write(t) url_html_path = 'file://' + os.path.abspath(html_path) diff --git a/jina/helloworld/fork.py b/jina/helloworld/fork.py new file mode 100644 index 0000000000000..f0143113fff75 --- /dev/null +++ b/jina/helloworld/fork.py @@ -0,0 +1,18 @@ +import os +import shutil + +from jina.logging import default_logger + + +def fork_hello(args): + from_path = os.path.join(os.path.dirname(__file__), args.project) + shutil.copytree(from_path, args.destination) + full_path = os.path.abspath(args.destination) + default_logger.success(f'{args.project} project is forked to {full_path}') + default_logger.info( + f''' + To run the project: + ~$ cd {full_path} + ~$ python app.py + ''' + ) diff --git a/jina/helloworld/multimodal/__init__.py b/jina/helloworld/multimodal/__init__.py index b5465b5b97845..e69de29bb2d1d 100644 --- a/jina/helloworld/multimodal/__init__.py +++ b/jina/helloworld/multimodal/__init__.py @@ -1,75 +0,0 @@ -import os -import webbrowser -from pathlib import Path - -from pkg_resources import resource_filename - -from ..helper import download_data -from ... import Flow -from ...importer import ImportExtensions -from ...logging import default_logger - - -def hello_world(args): - """ - Execute the multimodal example. - - :param args: arguments passed from CLI - """ - Path(args.workdir).mkdir(parents=True, exist_ok=True) - - with ImportExtensions( - required=True, - help_text='this demo requires Pytorch and Transformers to be installed, ' - 'if you haven\'t, please do `pip install jina[torch,transformers]`', - ): - import transformers, torch - - assert [torch, transformers] #: prevent pycharm auto remove the above line - - targets = { - 'people-img': { - 'url': args.index_data_url, - 'filename': os.path.join(args.workdir, 'dataset.zip'), - } - } - - # download the data - download_data(targets, args.download_proxy, task_name='download zip data') - import zipfile - - with zipfile.ZipFile(targets['people-img']['filename'], 'r') as fp: - fp.extractall(args.workdir) - - # this envs are referred in index and query flow YAMLs - os.environ['HW_WORKDIR'] = args.workdir - os.environ['PATH'] += os.pathsep + os.path.join( - resource_filename('jina', 'resources'), 'multimodal' - ) - - # now comes the real work - # load index flow from a YAML file - - # index it! - f = Flow.load_config('flow-index.yml') - with f, open(f'{args.workdir}/people-img/meta.csv') as fp: - f.index_csv(fp, request_size=64) - - # search it! - - f = Flow.load_config('flow-query.yml') - # switch to REST gateway - f.use_rest_gateway(args.port_expose) - - with f: - try: - webbrowser.open(args.demo_url, new=2) - except: - pass # intentional pass, browser support isn't cross-platform - finally: - default_logger.success( - f'You should see a demo page opened in your browser, ' - f'if not, you may open {args.demo_url} manually' - ) - if not args.unblock_query_flow: - f.block() diff --git a/jina/helloworld/multimodal/app.py b/jina/helloworld/multimodal/app.py new file mode 100644 index 0000000000000..4328b3c8cdf3e --- /dev/null +++ b/jina/helloworld/multimodal/app.py @@ -0,0 +1,108 @@ +import os +import urllib.request +import webbrowser +from pathlib import Path + +from jina import Flow, Document +from jina.importer import ImportExtensions +from jina.logging import default_logger +from jina.logging.profile import ProgressBar +from jina.parsers.helloworld import set_hw_multimodal_parser + + +def hello_world(args): + """ + Execute the multimodal example. + + :param args: arguments passed from CLI + """ + Path(args.workdir).mkdir(parents=True, exist_ok=True) + + with ImportExtensions( + required=True, + help_text='this demo requires Pytorch and Transformers to be installed, ' + 'if you haven\'t, please do `pip install jina[torch,transformers]`', + ): + import transformers, torch, torchvision + + assert [ + torch, + transformers, + torchvision, + ] #: prevent pycharm auto remove the above line + + # args.workdir = '0bae16ce-5bb2-43be-bcd4-6f1969e8068f' + targets = { + 'people-img': { + 'url': args.index_data_url, + 'filename': os.path.join(args.workdir, 'dataset.zip'), + } + } + + # download the data + download_data(targets, args.download_proxy, task_name='download zip data') + import zipfile + + with zipfile.ZipFile(targets['people-img']['filename'], 'r') as fp: + fp.extractall(args.workdir) + + # this envs are referred in index and query flow YAMLs + os.environ['HW_WORKDIR'] = args.workdir + # now comes the real work + # load index flow from a YAML file + + # index it! + f = Flow.load_config('flow-index.yml') + + with f, open(f'{args.workdir}/people-img/meta.csv', newline='') as fp: + f.index(inputs=Document.from_csv(fp), request_size=10) + + # search it! + f = Flow.load_config('flow-search.yml') + # switch to REST gateway + f.use_rest_gateway(args.port_expose) + + url_html_path = 'file://' + os.path.abspath( + os.path.join(os.path.dirname(os.path.realpath(__file__)), 'static/index.html') + ) + with f: + try: + webbrowser.open(url_html_path, new=2) + except: + pass # intentional pass, browser support isn't cross-platform + finally: + default_logger.success( + f'You should see a demo page opened in your browser, ' + f'if not, you may open {url_html_path} manually' + ) + if not args.unblock_query_flow: + f.block() + + +def download_data(targets, download_proxy=None, task_name='download fashion-mnist'): + """ + Download data. + + :param targets: target path for data. + :param download_proxy: download proxy (e.g. 'http', 'https') + :param task_name: name of the task + """ + opener = urllib.request.build_opener() + opener.addheaders = [('User-agent', 'Mozilla/5.0')] + if download_proxy: + proxy = urllib.request.ProxyHandler( + {'http': download_proxy, 'https': download_proxy} + ) + opener.add_handler(proxy) + urllib.request.install_opener(opener) + with ProgressBar(task_name=task_name, batch_unit='') as t: + for k, v in targets.items(): + if not os.path.exists(v['filename']): + urllib.request.urlretrieve( + v['url'], v['filename'], reporthook=lambda *x: t.update_tick(0.01) + ) + + +if __name__ == '__main__': + args = set_hw_multimodal_parser().parse_args() + hello_world(args) diff --git a/jina/helloworld/multimodal/executors.py b/jina/helloworld/multimodal/executors.py new file mode 100644 index 0000000000000..3516cac17de58 --- /dev/null +++ b/jina/helloworld/multimodal/executors.py @@ -0,0 +1,469 @@ +import os +from typing import Dict, Optional, List, Iterable, Union, Tuple + +import numpy as np +import torch +import torchvision.models as models +from transformers import AutoModel, AutoTokenizer + +from jina import Executor, DocumentArray, requests, Document + + +class Segmenter(Executor): + @requests + def segment(self, docs: DocumentArray, **kwargs): + """ + Read the data and add tags. + + :param docs: received documents. + :param .. + """ + for doc in docs: + text = doc.tags['caption'] + uri = f'{os.environ["HW_WORKDIR"]}/people-img/{doc.tags["image"]}' + chunk_text = Document(text=text, mime_type='text/plain') + chunk_uri = Document(uri=uri, mime_type='image/jpeg') + doc.chunks = [chunk_text, chunk_uri] + doc.uri = uri + doc.convert_uri_to_datauri() + + +class TextEncoder(Executor): + """Transformer executor class """ + + def __init__( + self, + pretrained_model_name_or_path: str = 'sentence-transformers/distilbert-base-nli-stsb-mean-tokens', + base_tokenizer_model: Optional[str] = None, + pooling_strategy: str = 'mean', + layer_index: int = -1, + max_length: Optional[int] = None, + acceleration: Optional[str] = None, + embedding_fn_name: str = '__call__', + *args, + **kwargs, + ): + super().__init__(*args, **kwargs) + self.pretrained_model_name_or_path = pretrained_model_name_or_path + self.base_tokenizer_model = ( + base_tokenizer_model or pretrained_model_name_or_path + ) + self.pooling_strategy = pooling_strategy + self.layer_index = layer_index + self.max_length = max_length + self.acceleration = acceleration + self.embedding_fn_name = embedding_fn_name + self.tokenizer = AutoTokenizer.from_pretrained(self.base_tokenizer_model) + self.model = AutoModel.from_pretrained( + self.pretrained_model_name_or_path, output_hidden_states=True + ) + self.model.to(torch.device('cpu')) + + def _compute_embedding(self, hidden_states: 'torch.Tensor', input_tokens: Dict): + fill_vals = {'cls': 0.0, 'mean': 0.0, 'max': -np.inf, 'min': np.inf} + fill_val = torch.tensor( + fill_vals[self.pooling_strategy], device=torch.device('cpu') + ) + + layer = hidden_states[self.layer_index] + attn_mask = input_tokens['attention_mask'].unsqueeze(-1).expand_as(layer) + layer = torch.where(attn_mask.bool(), layer, fill_val) + + embeddings = layer.sum(dim=1) / attn_mask.sum(dim=1) + return embeddings.cpu().numpy() + + @requests + def encode(self, docs: 'DocumentArray', *args, **kwargs): + + chunks = DocumentArray( + list( + filter(lambda d: d.mime_type == 'text/plain', docs.traverse_flat(['c'])) + ) + ) + + texts = chunks.get_attributes('text') + + with torch.no_grad(): + + if not self.tokenizer.pad_token: + self.tokenizer.add_special_tokens({'pad_token': '[PAD]'}) + self.model.resize_token_embeddings(len(self.tokenizer.vocab)) + + input_tokens = self.tokenizer( + texts, + max_length=self.max_length, + padding='longest', + truncation=True, + return_tensors='pt', + ) + input_tokens = { + k: v.to(torch.device('cpu')) for k, v in input_tokens.items() + } + + outputs = getattr(self.model, self.embedding_fn_name)(**input_tokens) + if isinstance(outputs, torch.Tensor): + return outputs.cpu().numpy() + hidden_states = outputs.hidden_states + + embeds = self._compute_embedding(hidden_states, input_tokens) + for doc, embed in zip(chunks, embeds): + doc.embedding = embed + + return chunks + + +class ImageCrafter(Executor): + def __init__( + self, + target_size: Union[Iterable[int], int] = 224, + img_mean: Tuple[float] = (0, 0, 0), + img_std: Tuple[float] = (1, 1, 1), + resize_dim: int = 256, + channel_axis: int = -1, + target_channel_axis: int = -1, + *args, + **kwargs, + ): + """Set Constructor.""" + super().__init__(*args, **kwargs) + self.target_size = target_size + self.resize_dim = resize_dim + self.img_mean = np.array(img_mean).reshape((1, 1, 3)) + self.img_std = np.array(img_std).reshape((1, 1, 3)) + self.channel_axis = channel_axis + self.target_channel_axis = target_channel_axis + + def craft(self, docs: DocumentArray, fn): + chunks = DocumentArray( + list( + filter(lambda d: d.mime_type == 'image/jpeg', docs.traverse_flat(['c'])) + ) + ) + for doc in chunks: + getattr(doc, fn)() + raw_img = _load_image(doc.blob, self.channel_axis) + _img = self._normalize(raw_img) + # move the channel_axis to target_channel_axis to better fit different models + img = _move_channel_axis(_img, -1, self.target_channel_axis) + doc.blob = img + return chunks + + @requests(on='/index') + def craft_index(self, docs: DocumentArray, **kwargs): + return self.craft(docs, 'convert_image_uri_to_blob') + + @requests(on='/search') + def craft_search(self, docs: DocumentArray, **kwargs): + return self.craft(docs, 'convert_image_datauri_to_blob') + + def _normalize(self, img): + img = _resize_short(img, target_size=self.resize_dim) + img, _, _ = _crop_image(img, target_size=self.target_size, how='center') + img = np.array(img).astype('float32') / 255 + img -= self.img_mean + img /= self.img_std + return img + + +class ImageEncoder(Executor): + def __init__( + self, + model_name: str = 'mobilenet_v2', + pool_strategy: str = 'mean', + channel_axis: int = -1, + *args, + **kwargs, + ): + + super().__init__(*args, **kwargs) + self.channel_axis = channel_axis + # axis 0 is the batch + self._default_channel_axis = 1 + self.model_name = model_name + self.pool_strategy = pool_strategy + self.pool_fn = getattr(np, self.pool_strategy) + model = getattr(models, self.model_name)(pretrained=True) + self.model = model.features.eval() + self.model.to(torch.device('cpu')) + + def _get_features(self, content): + content = content.permute(0, 3, 1, 2) + return self.model(content) + + def _get_pooling(self, feature_map: 'np.ndarray') -> 'np.ndarray': + if feature_map.ndim == 2 or self.pool_strategy is None: + return feature_map + return self.pool_fn(feature_map, axis=(2, 3)) + + @requests + def encode(self, docs: DocumentArray, **kwargs): + content = np.stack(docs.get_attributes('blob')) + _input = torch.from_numpy(content.astype('float32')) + _features = self._get_features(_input).detach() + _features = _features.numpy() + _features = self._get_pooling(_features) + for doc, feature in zip(docs, _features): + doc.embedding = feature + + +class DocVectorIndexer(Executor): + def __init__(self, index_file_name: str, **kwargs): + super().__init__(**kwargs) + self.index_file_name = index_file_name + if os.path.exists(self.save_path): + self._docs = DocumentArray.load(self.save_path) + else: + self._docs = DocumentArray() + + @property + def save_path(self): + if not os.path.exists(self.workspace): + os.makedirs(self.workspace) + return os.path.join(self.workspace, self.index_file_name) + + def close(self): + self._docs.save(self.save_path) + + @requests(on='/index') + def index(self, docs: 'DocumentArray', **kwargs): + self._docs.extend(docs) + + @requests(on='/search') + def search(self, docs: 'DocumentArray', parameters: Dict, **kwargs): + a = np.stack(docs.get_attributes('embedding')) + b = np.stack(self._docs.get_attributes('embedding')) + q_emb = _ext_A(_norm(a)) + d_emb = _ext_B(_norm(b)) + dists = _cosine(q_emb, d_emb) + idx, dist = self._get_sorted_top_k(dists, int(parameters['top_k'])) + for _q, _ids, _dists in zip(docs, idx, dist): + for _id, _dist in zip(_ids, _dists): + d = Document(self._docs[int(_id)], copy=True) + d.score.value = 1 - _dist + _q.matches.append(d) + + @staticmethod + def _get_sorted_top_k( + dist: 'np.array', top_k: int + ) -> Tuple['np.ndarray', 'np.ndarray']: + if top_k >= dist.shape[1]: + idx = dist.argsort(axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx, axis=1) + else: + idx_ps = dist.argpartition(kth=top_k, axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx_ps, axis=1) + idx_fs = dist.argsort(axis=1) + idx = np.take_along_axis(idx_ps, idx_fs, axis=1) + dist = np.take_along_axis(dist, idx_fs, axis=1) + + return idx, dist + + +class KeyValueIndexer(Executor): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + if os.path.exists(self.save_path): + self._docs = DocumentArray.load(self.save_path) + else: + self._docs = DocumentArray() + + @property + def save_path(self): + if not os.path.exists(self.workspace): + os.makedirs(self.workspace) + return os.path.join(self.workspace, 'kv.json') + + def close(self): + self._docs.save(self.save_path) + + @requests(on='/index') + def index(self, docs: DocumentArray, **kwargs): + self._docs.extend(docs) + + @requests(on='/search') + def query(self, docs: DocumentArray, **kwargs): + for doc in docs: + for match in doc.matches: + extracted_doc = self._docs[match.parent_id] + match.MergeFrom(extracted_doc) + + +class WeightedRanker(Executor): + @requests(on='/search') + def rank( + self, docs_matrix: List['DocumentArray'], parameters: Dict, **kwargs + ) -> 'DocumentArray': + """ + :param docs: the doc which gets bubbled up matches + :param kwargs: not used (kept to maintain interface) + """ + + result_da = DocumentArray() # length: 1 as every time there is only one query + for d_mod1, d_mod2 in zip(*docs_matrix): + + final_matches = {} # type: Dict[str, Document] + + for m in d_mod1.matches: + m.score.value *= d_mod1.weight + final_matches[m.parent_id] = Document(m, copy=True) + + for m in d_mod2.matches: + if m.parent_id in final_matches: + final_matches[m.parent_id].score.value += ( + m.score.value * d_mod2.weight + ) + else: + m.score.value *= d_mod2.weight + final_matches[m.parent_id] = Document(m, copy=True) + + da = DocumentArray(list(final_matches.values())) + da.sort(key=lambda ma: ma.score.value, reverse=True) + d = Document(matches=da[: int(parameters['top_k'])]) + result_da.append(d) + return result_da + + +def _get_ones(x, y): + return np.ones((x, y)) + + +def _ext_A(A): + nA, dim = A.shape + A_ext = _get_ones(nA, dim * 3) + A_ext[:, dim : 2 * dim] = A + A_ext[:, 2 * dim :] = A ** 2 + return A_ext + + +def _ext_B(B): + nB, dim = B.shape + B_ext = _get_ones(dim * 3, nB) + B_ext[:dim] = (B ** 2).T + B_ext[dim : 2 * dim] = -2.0 * B.T + del B + return B_ext + + +def _euclidean(A_ext, B_ext): + sqdist = A_ext.dot(B_ext).clip(min=0) + return np.sqrt(sqdist) + + +def _norm(A): + return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True) + + +def _cosine(A_norm_ext, B_norm_ext): + return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2 + + +def _move_channel_axis( + img: 'np.ndarray', channel_axis_to_move: int, target_channel_axis: int = -1 +) -> 'np.ndarray': + """ + Ensure the color channel axis is the default axis. + """ + if channel_axis_to_move == target_channel_axis: + return img + return np.moveaxis(img, channel_axis_to_move, target_channel_axis) + + +def _load_image(blob: 'np.ndarray', channel_axis: int): + """ + Load an image array and return a `PIL.Image` object. + """ + + from PIL import Image + + img = _move_channel_axis(blob, channel_axis) + return Image.fromarray(img.astype('uint8')) + + +def _crop_image( + img, + target_size: Union[Tuple[int, int], int], + top: int = None, + left: int = None, + how: str = 'precise', +): + """ + Crop the input :py:mod:`PIL` image. + + :param img: :py:mod:`PIL.Image`, the image to be resized + :param target_size: desired output size. If size is a sequence like + (h, w), the output size will be matched to this. If size is an int, + the output will have the same height and width as the `target_size`. + :param top: the vertical coordinate of the top left corner of the crop box. + :param left: the horizontal coordinate of the top left corner of the crop box. + :param how: the way of cropping. Valid values include `center`, `random`, and, `precise`. Default is `precise`. + - `center`: crop the center part of the image + - `random`: crop a random part of the image + - `precise`: crop the part of the image specified by the crop box with the given ``top`` and ``left``. + .. warning:: When `precise` is used, ``top`` and ``left`` must be fed valid value. + + """ + import PIL.Image as Image + + assert isinstance(img, Image.Image), 'img must be a PIL.Image' + img_w, img_h = img.size + if isinstance(target_size, int): + target_h = target_w = target_size + elif isinstance(target_size, Tuple) and len(target_size) == 2: + target_h, target_w = target_size + else: + raise ValueError( + f'target_size should be an integer or a tuple of two integers: {target_size}' + ) + w_beg = left + h_beg = top + if how == 'center': + w_beg = int((img_w - target_w) / 2) + h_beg = int((img_h - target_h) / 2) + elif how == 'random': + w_beg = np.random.randint(0, img_w - target_w + 1) + h_beg = np.random.randint(0, img_h - target_h + 1) + elif how == 'precise': + assert w_beg is not None and h_beg is not None + assert ( + 0 <= w_beg <= (img_w - target_w) + ), f'left must be within [0, {img_w - target_w}]: {w_beg}' + assert ( + 0 <= h_beg <= (img_h - target_h) + ), f'top must be within [0, {img_h - target_h}]: {h_beg}' + else: + raise ValueError(f'unknown input how: {how}') + if not isinstance(w_beg, int): + raise ValueError(f'left must be int number between 0 and {img_w}: {left}') + if not isinstance(h_beg, int): + raise ValueError(f'top must be int number between 0 and {img_h}: {top}') + w_end = w_beg + target_w + h_end = h_beg + target_h + img = img.crop((w_beg, h_beg, w_end, h_end)) + return img, h_beg, w_beg + + +def _resize_short(img, target_size, how: str = 'LANCZOS'): + """ + Resize the input :py:mod:`PIL` image. + :param img: :py:mod:`PIL.Image`, the image to be resized + :param target_size: desired output size. If size is a sequence like (h, w), the output size will be matched to + this. If size is an int, the smaller edge of the image will be matched to this number maintain the aspect + ratio. + :param how: the interpolation method. Valid values include `NEAREST`, `BILINEAR`, `BICUBIC`, and `LANCZOS`. + Default is `LANCZOS`. Please refer to `PIL.Image` for detaisl. + """ + import PIL.Image as Image + + assert isinstance(img, Image.Image), 'img must be a PIL.Image' + if isinstance(target_size, int): + percent = float(target_size) / min(img.size[0], img.size[1]) + target_w = int(round(img.size[0] * percent)) + target_h = int(round(img.size[1] * percent)) + elif isinstance(target_size, Tuple) and len(target_size) == 2: + target_h, target_w = target_size + else: + raise ValueError( + f'target_size should be an integer or a tuple of two integers: {target_size}' + ) + img = img.resize((target_w, target_h), getattr(Image, how)) + return img diff --git a/jina/helloworld/multimodal/flow-index.yml b/jina/helloworld/multimodal/flow-index.yml new file mode 100644 index 0000000000000..32b8ad504b447 --- /dev/null +++ b/jina/helloworld/multimodal/flow-index.yml @@ -0,0 +1,36 @@ +jtype: Flow +version: '1' +with: + workspace: $HW_WORKDIR + py_modules: + - executors.py +executors: + - name: segment + uses: + jtype: Segmenter + - name: encodeText + uses: + jtype: TextEncoder + - name: textChunkIndexer + uses: + jtype: DocVectorIndexer + with: + index_file_name: "text.json" + - name: craftImage + uses: + jtype: ImageCrafter + needs: segment + - name: encodeImage + uses: + jtype: ImageEncoder + - name: imageChunkIndexer + uses: + jtype: DocVectorIndexer + with: + index_file_name: "image.json" + - name: keyValueIndexer + uses: + jtype: KeyValueIndexer + needs: segment + - name: joinAll + needs: [ textChunkIndexer, imageChunkIndexer, keyValueIndexer ] diff --git a/jina/helloworld/multimodal/flow-search.yml b/jina/helloworld/multimodal/flow-search.yml new file mode 100644 index 0000000000000..b65ea14dcef9b --- /dev/null +++ b/jina/helloworld/multimodal/flow-search.yml @@ -0,0 +1,35 @@ +jtype: Flow +version: '1' +with: + workspace: $HW_WORKDIR + py_modules: + - executors.py +executors: + - name: encodeText + uses: + jtype: TextEncoder + - name: textChunkIndexer + uses: + jtype: DocVectorIndexer + with: + index_file_name: "text.json" + - name: craftImage + uses: + jtype: ImageCrafter + needs: gateway + - name: encodeImage + uses: + jtype: ImageEncoder + - name: imageChunkIndexer + uses: + jtype: DocVectorIndexer + with: + index_file_name: "image.json" + - name: weightedRanker + uses: + jtype: WeightedRanker + needs: [ textChunkIndexer, imageChunkIndexer ] + - name: keyvalueIndexer + uses: + jtype: KeyValueIndexer + needs: weightedRanker \ No newline at end of file diff --git a/jina/helloworld/multimodal/static/index.html b/jina/helloworld/multimodal/static/index.html new file mode 100644 index 0000000000000..32fed868dce99 --- /dev/null +++ b/jina/helloworld/multimodal/static/index.html @@ -0,0 +1,176 @@ + + + + + + Jina Multimodal Hello World! + + + + +
+ +
+
+
+
+
+ A multimodal document as query (Refresh for new image) +
+
+
+ +
+ +
+
+
+
+

Results rely more on

+
+
Text {{textWeight}}
+
+ +
+
Image {{imageWeight}}
+
+
+
+ +
+
+
+
+
+ REST Payload +
+
+
+                        
+{{payload}}
+                        
+                    
+
+
+
+
+
+
+ +
+ {{item.tags.caption}} +
+
    +
  • score: {{item.score.value}}
  • +
  • filename: {{item.tags.image}}
  • +
+
+
+
+
+
+ + + + + + \ No newline at end of file diff --git a/jina/helper.py b/jina/helper.py index 61e06c5a90d6a..7518ce1586414 100644 --- a/jina/helper.py +++ b/jina/helper.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import asyncio import functools import json @@ -17,6 +14,7 @@ from contextlib import contextmanager from datetime import datetime from itertools import islice +from pathlib import Path from types import SimpleNamespace from typing import ( Tuple, @@ -52,11 +50,9 @@ 'convert_tuple_to_list', 'run_async', 'deprecated_alias', - 'deprecated_class', + 'countdown', ] -from jina.excepts import NotSupportedError - def deprecated_alias(**aliases): """ @@ -72,6 +68,7 @@ def deprecated_alias(**aliases): :param aliases: maps aliases to new arguments :return: wrapper """ + from .excepts import NotSupportedError def _rename_kwargs(func_name: str, kwargs, aliases): """ @@ -129,55 +126,6 @@ def wrapper(*args, **kwargs): return deco -def deprecated_class(dep_class=None, new_class=None, custom_msg=None): - """ - After applying `deprecated_class` the `dep_class` will only be an alias for the `new_class`. - Any code inside of `dep_class` will be completely overwritten. - - For example: - .. highlight:: python - .. code-block:: python - @deprecated_class(new_class=NewClass) - @deprecated_class(new_class=NewClass, custom_msg="custom message") - - :param dep_class: deprecated class - :param new_class: new class - :param custom_msg: a custom message to describe the new class - :return: wrapper - """ - - if not dep_class: - return functools.partial( - deprecated_class, new_class=new_class, custom_msg=custom_msg - ) - - new_init = new_class.__init__ - - @functools.wraps(dep_class) - def wrapper(*args, **kwargs): - """ - Set wrapper function. - - :param args: wrapper arguments - :param kwargs: wrapper key word arguments - :return: class instance of the new type. - """ - - warnings_msg = f'{dep_class.__name__} class is deprecated and will be removed in the next version. ' - if new_class and not custom_msg: - warnings_msg += f'A new name of the class is {new_class.__name__}. ' - if custom_msg: - warnings_msg += f'{custom_msg}' - warnings.warn( - warnings_msg, - DeprecationWarning, - ) - return new_init(*args, **kwargs) - - new_class.__init__ = wrapper - return new_class - - def get_readable_size(num_bytes: Union[int, float]) -> str: """ Transform the bytes into readable value with different units (e.g. 1 KB, 20 MB, 30.1 GB). @@ -1099,7 +1047,7 @@ def get_public_ip(): """ import urllib.request - timeout = 0.5 + timeout = 0.2 results = [] @@ -1318,6 +1266,7 @@ def find_request_binding(target): :return: a dictionary with key as request type and value as method name """ import ast, inspect + from . import __default_endpoint__ res = {} @@ -1328,9 +1277,8 @@ def visit_function_def(node): if isinstance(e, ast.Call) and e.func.id == 'requests': req_name = e.keywords[0].value.s elif isinstance(e, ast.Name) and e.id == 'requests': - req_name = 'default' + req_name = __default_endpoint__ if req_name: - req_name = _canonical_request_name(req_name) if req_name in res: raise ValueError( f'you already bind `{res[req_name]}` with `{req_name}` request' @@ -1350,4 +1298,58 @@ def _canonical_request_name(req_name: str): :param req_name: the original request name :return: canonical form of the request """ - return req_name.lower().replace('request', '') + if req_name.startswith('/'): + # new data request + return f'data://{req_name}' + else: + # legacy request type + return req_name.lower().replace('request', '') + + +def physical_size(directory: str) -> int: + """Return the size of the given directory in bytes + + :param directory: directory as :str: + :return: byte size of the given directory + """ + root_directory = Path(directory) + return sum(f.stat().st_size for f in root_directory.glob('**/*') if f.is_file()) + + +def dunder_get(_dict: Any, key: str) -> Any: + """Returns value for a specified dunderkey + A "dunderkey" is just a fieldname that may or may not contain + double underscores (dunderscores!) for referencing nested keys in + a dict. eg:: + >>> data = {'a': {'b': 1}} + >>> dunder_get(data, 'a__b') + 1 + key 'b' can be referrenced as 'a__b' + :param _dict : (dict, list, struct or object) which we want to index into + :param key : (str) that represents a first level or nested key in the dict + :return: (mixed) value corresponding to the key + """ + + try: + part1, part2 = key.split('__', 1) + except ValueError: + part1, part2 = key, '' + + try: + part1 = int(part1) # parse int parameter + except ValueError: + pass + + from google.protobuf.struct_pb2 import Struct + + if isinstance(part1, int): + result = _dict[part1] + elif isinstance(_dict, (dict, Struct)): + if part1 in _dict: + result = _dict[part1] + else: + result = None + else: + result = getattr(_dict, part1) + + return dunder_get(result, part2) if part2 else result diff --git a/jina/hub b/jina/hub index 03569f442491d..919dff1dfa1da 160000 --- a/jina/hub +++ b/jina/hub @@ -1 +1 @@ -Subproject commit 03569f442491d9a649cf358fd674e6134e5ad5ea +Subproject commit 919dff1dfa1da0190fd0b89c8b9b537b3de9cbc5 diff --git a/jina/importer.py b/jina/importer.py index 7e623b54f1565..b91ef5c0a7de9 100644 --- a/jina/importer.py +++ b/jina/importer.py @@ -1,17 +1,15 @@ import os import sys import warnings +from collections import defaultdict from types import SimpleNamespace, ModuleType from typing import Optional, List, Any, Dict -from collections import defaultdict IMPORTED = SimpleNamespace() IMPORTED.executors = False IMPORTED.executors = False -IMPORTED.drivers = False IMPORTED.hub = False IMPORTED.schema_executors = {} -IMPORTED.schema_drivers = {} def import_classes( @@ -25,15 +23,8 @@ def import_classes( :param import_once: import everything only once, to avoid repeated import :return: the dependency tree of the imported classes under the `namespace` """ - _namespace2type = { - 'jina.executors': 'ExecutorType', - 'jina.drivers': 'DriverType', - 'jina.hub': 'ExecutorType', - } - _import_type = _namespace2type.get(namespace) - if _import_type is None: - raise TypeError(f'namespace: {namespace} is unrecognized') + _import_type = 'ExecutorType' _imported_property = namespace.split('.')[-1] _is_imported = getattr(IMPORTED, _imported_property) if import_once and _is_imported: @@ -172,6 +163,10 @@ def add_modules(*paths) -> Optional[ModuleType]: :param paths: Paths of the modules. :return: The target module. """ + from .jaml.helper import complete_path + + paths = [complete_path(m) for m in paths] + for p in paths: if not os.path.exists(p): raise FileNotFoundError( @@ -260,14 +255,14 @@ def _raise_bad_imports_warnings(bad_imports, namespace): if namespace != 'jina.hub': warnings.warn( f'theses modules or classes can not be imported {bad_imports}. ' - f'You can use `jina check` to list all executors and drivers' + f'You can use `jina check` to list all executors' ) else: warnings.warn( f'due to the missing dependencies or bad implementations, ' f'{bad_imports} can not be imported ' - f'if you are using these executors/drivers, they wont work. ' - f'You can use `jina check` to list all executors and drivers' + f'if you are using these executors, they wont work. ' + f'You can use `jina check` to list all executors' ) @@ -316,17 +311,6 @@ def _filter_modules(modules): return {m for m in modules if not _ignored_module_pattern.findall(m)} -def _load_default_exc_config(cls_obj): - from .executors.requests import get_default_reqs - - try: - _request = get_default_reqs(type.mro(cls_obj)) - except ValueError as ex: - warnings.warn( - f'Please ensure a config yml is given for {cls_obj.__name__}. {ex}' - ) - - def _update_depend_tree(cls_obj, module_name, cur_tree): d = cur_tree for vvv in cls_obj.mro()[:-1][::-1]: @@ -351,15 +335,10 @@ def _import_module(module_name, import_type, depend_tree, load_stat): try: _update_depend_tree(_cls_obj, module_name, depend_tree) if _cls_obj.__class__.__name__ == 'ExecutorType': - _load_default_exc_config(_cls_obj) IMPORTED.schema_executors[ f'Jina::Executors::{_cls_obj.__name__}' ] = _jina_class_to_schema(_cls_obj) - else: - IMPORTED.schema_drivers[ - f'Jina::Drivers::{_cls_obj.__name__}' - ] = _jina_class_to_schema(_cls_obj) - # TODO: _success_msg is never used + _success_msg = colored('▸', 'green').join( f'{vvv.__name__}' for vvv in _cls_obj.mro()[:-1][::-1] ) diff --git a/jina/jaml/__init__.py b/jina/jaml/__init__.py index 3108399cef658..a0b02c59935b8 100644 --- a/jina/jaml/__init__.py +++ b/jina/jaml/__init__.py @@ -501,7 +501,7 @@ def load_config( no_tag_yml = JAML.load_no_tags(fp) if no_tag_yml: # extra arguments are parsed to inject_config - no_tag_yml = cls.inject_config(no_tag_yml, **kwargs) + no_tag_yml.update(**kwargs) else: raise BadConfigSource( f'can not construct {cls} from an empty {source}. nothing to read from there' @@ -529,17 +529,3 @@ def load_config( # load into object, no more substitute return JAML.load(tag_yml, substitute=False) - - @classmethod - def inject_config(cls, raw_config: Dict, *args, **kwargs) -> Dict: - """Inject/modify the config before loading it into an object. - - .. note:: - This function is most likely to be overridden by its subclass. - - :param raw_config: raw config to work on - :param args: not used - :param kwargs: not used - :return: the config - """ - return raw_config diff --git a/jina/jaml/helper.py b/jina/jaml/helper.py index c744347a989b6..bd40f4116cc12 100644 --- a/jina/jaml/helper.py +++ b/jina/jaml/helper.py @@ -110,7 +110,6 @@ def parse_config_source( allow_yaml_file: bool = True, allow_builtin_resource: bool = True, allow_raw_yaml_content: bool = True, - allow_raw_driver_yaml_content: bool = True, allow_class_type: bool = True, allow_dict: bool = True, allow_json: bool = True, @@ -126,7 +125,6 @@ def parse_config_source( :param allow_yaml_file: flag :param allow_builtin_resource: flag :param allow_raw_yaml_content: flag - :param allow_raw_driver_yaml_content: flag :param allow_class_type: flag :param allow_dict: flag :param allow_json: flag @@ -167,27 +165,6 @@ def parse_config_source( # possible YAML content path = path.replace('|', '\n with: ') return io.StringIO(path), None - elif allow_raw_driver_yaml_content and path.lstrip().startswith(('- !', '- jtype')): - # possible driver YAML content, right now it is only used for debugging - with open( - resource_filename( - 'jina', - '/'.join( - ( - 'resources', - 'executors.base.all.yml' - if path.lstrip().startswith('- !!') - else 'executors.base.yml', - ) - ), - ) - ) as fp: - _defaults = fp.read() - path = path.replace('- !!', '- !').replace( - '|', '\n with: ' - ) # for indent, I know, its nasty - path = _defaults.replace('*', path) - return io.StringIO(path), None elif allow_class_type and path.isidentifier(): # possible class name return io.StringIO(f'!{path}'), None diff --git a/jina/jaml/parsers/__init__.py b/jina/jaml/parsers/__init__.py index 8df2df4327e48..38fa40027ddf2 100644 --- a/jina/jaml/parsers/__init__.py +++ b/jina/jaml/parsers/__init__.py @@ -14,12 +14,9 @@ def _get_all_parser(cls: Type['JAMLCompatible']): """ from ...executors import BaseExecutor from ...flow import BaseFlow - from ...drivers import BaseDriver if issubclass(cls, BaseFlow): return _get_flow_parser() - elif issubclass(cls, BaseDriver): - return _get_driver_parser() elif issubclass(cls, BaseExecutor): return _get_exec_parser() else: @@ -39,12 +36,6 @@ def _get_exec_parser(): return [LegacyParser], LegacyParser -def _get_driver_parser(): - from .driver.legacy import LegacyParser - - return [LegacyParser], LegacyParser - - def _get_default_parser(): from .default.v1 import V1Parser diff --git a/jina/jaml/parsers/base.py b/jina/jaml/parsers/base.py index bb09a22957d11..fed4943d1d3a9 100644 --- a/jina/jaml/parsers/base.py +++ b/jina/jaml/parsers/base.py @@ -3,7 +3,6 @@ if False: from ...flow.base import BaseFlow from ...executors import BaseExecutor - from ...drivers import BaseDriver class VersionedYAMLParser: @@ -17,9 +16,7 @@ class VersionedYAMLParser: version = 'legacy' #: the version number this parser designed for - def parse( - self, cls: type, data: Dict - ) -> Union['BaseFlow', 'BaseExecutor', 'BaseDriver']: + def parse(self, cls: type, data: Dict) -> Union['BaseFlow', 'BaseExecutor']: """Return the Flow YAML parser given the syntax version number @@ -29,7 +26,7 @@ def parse( """ raise NotImplementedError - def dump(self, data: Union['BaseFlow', 'BaseExecutor', 'BaseDriver']) -> Dict: + def dump(self, data: Union['BaseFlow', 'BaseExecutor']) -> Dict: """Return the dictionary given a versioned flow object diff --git a/jina/jaml/parsers/driver/legacy.py b/jina/jaml/parsers/driver/legacy.py deleted file mode 100644 index 5092b54df780e..0000000000000 --- a/jina/jaml/parsers/driver/legacy.py +++ /dev/null @@ -1,31 +0,0 @@ -from typing import Dict, Type - -from ..base import VersionedYAMLParser -from ....drivers import BaseDriver - - -class LegacyParser(VersionedYAMLParser): - """Legacy parser for driver.""" - - version = 'legacy' # the version number this parser designed for - - def parse(self, cls: Type['BaseDriver'], data: Dict) -> 'BaseDriver': - """ - :param cls: target class type to parse into, must be a :class:`JAMLCompatible` type - :param data: flow yaml file loaded as python dict - :return: the Flow YAML parser given the syntax version number - """ - - obj = cls(**data.get('with', {})) - return obj - - def dump(self, data: 'BaseDriver') -> Dict: - """ - :param data: versioned flow object - :return: dictionary given a versioned flow object - """ - a = {k: v for k, v in data._init_kwargs_dict.items()} - r = {} - if a: - r['with'] = a - return r diff --git a/jina/jaml/parsers/executor/legacy.py b/jina/jaml/parsers/executor/legacy.py index 6b0f884ccd8c4..a072327325ecb 100644 --- a/jina/jaml/parsers/executor/legacy.py +++ b/jina/jaml/parsers/executor/legacy.py @@ -1,11 +1,10 @@ -import os import inspect -from typing import Dict, Any, Type, Set from functools import reduce +from typing import Dict, Type, Set from ..base import VersionedYAMLParser -from ....executors import BaseExecutor, get_default_metas -from ....executors.compound import CompoundExecutor +from ....executors import BaseExecutor +from ....executors.metas import get_default_metas class LegacyParser(VersionedYAMLParser): @@ -52,112 +51,53 @@ def _accumulate_classes(c, cs): args = list(map(lambda x: get_class_arguments(x), all_classes)) return set(reduce(lambda x, y: x + y, args)) - @staticmethod - def _get_dump_path_from_config(meta_config: Dict): - if 'name' in meta_config: - work_dir = meta_config['workspace'] - name = meta_config['name'] - pea_id = meta_config['pea_id'] - if work_dir: - # then try to see if it can be loaded from its regular expected workspace (ref_indexer) - dump_path = BaseExecutor.get_shard_workspace(work_dir, name, pea_id) - bin_dump_path = os.path.join(dump_path, f'{name}.bin') - if os.path.exists(bin_dump_path): - return bin_dump_path - - root_work_dir = meta_config['root_workspace'] - root_name = meta_config['root_name'] - if root_name != name: - # try to load from the corresponding file as if it was a CompoundExecutor, if the `.bin` does not exist, - # we should try to see if from its workspace can be loaded as it may be a `ref_indexer` - compound_work_dir = ( - CompoundExecutor.get_component_workspace_from_compound_workspace( - root_work_dir, root_name, pea_id - ) - ) - dump_path = BaseExecutor.get_shard_workspace( - compound_work_dir, name, pea_id - ) - bin_dump_path = os.path.join(dump_path, f'{name}.{"bin"}') - if os.path.exists(bin_dump_path): - return bin_dump_path - def parse(self, cls: Type['BaseExecutor'], data: Dict) -> 'BaseExecutor': """ :param cls: target class type to parse into, must be a :class:`JAMLCompatible` type :param data: flow yaml file loaded as python dict :return: the Flow YAML parser given the syntax version number """ + from ....logging import default_logger + _meta_config = get_default_metas() _meta_config.update(data.get('metas', {})) if _meta_config: data['metas'] = _meta_config - dump_path = self._get_dump_path_from_config(data.get('metas', {})) - load_from_dump = False - if dump_path: - obj = cls.load(dump_path) - obj.logger.success(f'restore {cls.__name__} from {dump_path}') - # consider the case where `dump_path` is not based on `obj.workspace`. This is needed - # for - workspace_loaded_from = data.get('metas', {})['workspace'] - workspace_in_dump = getattr(obj, 'workspace', None) - if workspace_in_dump != workspace_loaded_from: - obj.workspace = workspace_loaded_from - - obj.fill_in_drivers(data.get('requests', None)) - load_from_dump = True - else: - cls._init_from_yaml = True - - if cls.store_args_kwargs: - p = data.get('with', {}) # type: Dict[str, Any] - a = p.pop('args') if 'args' in p else () - k = p.pop('kwargs') if 'kwargs' in p else {} - # maybe there are some hanging kwargs in "parameters" - # tmp_a = (expand_env_var(v) for v in a) - # tmp_p = {kk: expand_env_var(vv) for kk, vv in {**k, **p}.items()} - tmp_a = a - tmp_p = {kk: vv for kk, vv in {**k, **p}.items()} - obj = cls( - *tmp_a, - **tmp_p, - metas=data.get('metas', {}), - requests=data.get('requests', {}), - ) - else: - # tmp_p = {kk: expand_env_var(vv) for kk, vv in data.get('with', {}).items()} - obj = cls( - **data.get('with', {}), - metas=data.get('metas', {}), - requests=data.get('requests', {}), - ) - cls._init_from_yaml = False - - # check if the yaml file used to instanciate 'cls' has arguments that are not in 'cls' - arguments_from_cls = LegacyParser._get_all_arguments(cls) - arguments_from_yaml = set(data.get('with', {})) - difference_set = arguments_from_yaml - arguments_from_cls - if any(difference_set): - obj.logger.warning( - f'The arguments {difference_set} defined in the YAML are not expected in the ' - f'class {cls.__name__}' - ) - - obj.logger.success(f'successfully built {cls.__name__} from a yaml config') + cls._init_from_yaml = True + # tmp_p = {kk: expand_env_var(vv) for kk, vv in data.get('with', {}).items()} + obj = cls( + **data.get('with', {}), + metas=data.get('metas', {}), + requests=data.get('requests', {}), + runtime_args=data.get('runtime_args', {}), + ) + cls._init_from_yaml = False + + # check if the yaml file used to instanciate 'cls' has arguments that are not in 'cls' + arguments_from_cls = LegacyParser._get_all_arguments(cls) + arguments_from_yaml = set(data.get('with', {})) + difference_set = arguments_from_yaml - arguments_from_cls + if any(difference_set): + default_logger.warning( + f'The arguments {difference_set} defined in the YAML are not expected in the ' + f'class {cls.__name__}' + ) + + default_logger.success(f'successfully built {cls.__name__} from a yaml config') # if node.tag in {'!CompoundExecutor'}: # os.environ['JINA_WARN_UNNAMED'] = 'YES' if not _meta_config: - obj.logger.warning( + default_logger.warning( '"metas" config is not found in this yaml file, ' 'this map is important as it provides an unique identifier when ' 'persisting the executor on disk.' ) # for compound executor - if not load_from_dump and 'components' in data: + if 'components' in data: obj.components = lambda: data['components'] obj.is_updated = False @@ -170,7 +110,15 @@ def dump(self, data: 'BaseExecutor') -> Dict: """ # note: we only save non-default property for the sake of clarity _defaults = get_default_metas() - p = {k: getattr(data, k) for k, v in _defaults.items() if getattr(data, k) != v} + p = ( + { + k: getattr(data.metas, k) + for k, v in _defaults.items() + if getattr(data.metas, k) != v + } + if hasattr(data, 'metas') + else {} + ) a = {k: v for k, v in data._init_kwargs_dict.items() if k not in _defaults} r = {} if a: @@ -178,8 +126,8 @@ def dump(self, data: 'BaseExecutor') -> Dict: if p: r['metas'] = p - if hasattr(data, '_drivers'): - r['requests'] = {'on': data._drivers} + if hasattr(data, 'requests'): + r['requests'] = {k: v.__name__ for k, v in data.requests.items()} if hasattr(data, 'components'): r['components'] = data.components diff --git a/jina/jaml/parsers/flow/v1.py b/jina/jaml/parsers/flow/v1.py index df0066b8a8308..ee39402198386 100644 --- a/jina/jaml/parsers/flow/v1.py +++ b/jina/jaml/parsers/flow/v1.py @@ -35,11 +35,9 @@ class V1Parser(VersionedYAMLParser): pods: - name: pod0 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - name: pod1 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - method: inspect # add an inspect node on pod1 - method: needs # let's try something new in Flow YAML v1: needs @@ -65,7 +63,7 @@ def parse(self, cls: type, data: Dict) -> 'Flow': tmp_p = {kk: expand_env_var(vv) for kk, vv in {**k, **p}.items()} obj = cls(*tmp_a, env=envs, **tmp_p) - pp = data.get('pods', []) + pp = data.get('executors', data.get('pods', [])) for pods in pp: p_pod_attr = {kk: expand_env_var(vv) for kk, vv in pods.items()} # in v1 YAML, flow is an optional argument @@ -92,7 +90,7 @@ def dump(self, data: 'Flow') -> Dict: r['with'] = data._kwargs if data._pod_nodes: - r['pods'] = [] + r['executors'] = [] last_name = 'gateway' for k, v in data._pod_nodes.items(): @@ -116,5 +114,5 @@ def dump(self, data: 'Flow') -> Dict: if t in kwargs: kwargs.pop(t) last_name = kwargs['name'] - r['pods'].append(kwargs) + r['executors'].append(kwargs) return r diff --git a/jina/logging/__init__.py b/jina/logging/__init__.py index 0dca52fd1dff3..4d834e27d17da 100644 --- a/jina/logging/__init__.py +++ b/jina/logging/__init__.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - from pkg_resources import resource_filename from .logger import JinaLogger diff --git a/jina/logging/logger.py b/jina/logging/logger.py index 09782cf2d507a..bb97f85f333d5 100644 --- a/jina/logging/logger.py +++ b/jina/logging/logger.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import logging import logging.handlers import os diff --git a/jina/logging/profile.py b/jina/logging/profile.py index 62eb50726e89d..aefd90307fa7e 100644 --- a/jina/logging/profile.py +++ b/jina/logging/profile.py @@ -1,14 +1,11 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import sys import time from collections import defaultdict from functools import wraps from typing import Optional -from ..importer import ImportExtensions from ..helper import colored, get_readable_size, get_readable_time +from ..importer import ImportExtensions if False: # fix type-hint complain for sphinx and flake diff --git a/jina/parsers/__init__.py b/jina/parsers/__init__.py index 6e509a92b0d55..533b3ef491621 100644 --- a/jina/parsers/__init__.py +++ b/jina/parsers/__init__.py @@ -1,7 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - - def set_pea_parser(parser=None): """Set the parser for the Pea @@ -121,37 +117,43 @@ def get_main_parser(): from .base import set_base_parser from .helloworld import set_hw_parser, set_hello_parser from .helper import _chf, _SHOW_ALL_ARGS + from .check import set_check_parser from .export_api import set_export_api_parser from .flow import set_flow_parser - from .hub import set_hub_parser from .ping import set_ping_parser - from .optimizer import set_optimizer_parser + + # from .hub import set_hub_parser + # from .optimizer import set_optimizer_parser # create the top-level parser parser = set_base_parser() sp = parser.add_subparsers( dest='cli', - description='use `%(prog)-8s [sub-command] --help` ' - 'to get detailed information about each sub-command', + description=''' + Use `%(prog)-8s [sub-command] --help` to get detailed information about each sub-command. + + To show all commands, run `JINA_FULL_CLI=1 jina --help`. + ''', required=True, ) set_hello_parser( sp.add_parser( 'hello', - help='👋 Hello World! Hello Jina!', - description='Start hello-world demos', + help='👋 Hello Jina!', + description='Start hello world demos.', formatter_class=_chf, ) ) set_pod_parser( sp.add_parser( - 'pod', - help='Start a Pod', - description='Start a Jina Pod', + 'executor', + aliases=['pod'], + help='Start an Executor', + description='Start an Executor. Executor is how Jina processes Document.', formatter_class=_chf, ) ) @@ -159,17 +161,17 @@ def get_main_parser(): set_flow_parser( sp.add_parser( 'flow', - description='Start a Flow that orchestrates multiple pods', + description='Start a Flow. Flow is how Jina streamlines and distributes Executors.', help='Start a Flow', formatter_class=_chf, ) ) - set_optimizer_parser( + set_ping_parser( sp.add_parser( - 'optimizer', - description='Start a FlowOptimizer from a YAML configuration file', - help='Start an FlowOptimizer from a YAML file', + 'ping', + help='Ping an Executor', + description='Ping a Pod and check its network connectivity.', formatter_class=_chf, ) ) @@ -178,48 +180,30 @@ def get_main_parser(): sp.add_parser( 'gateway', description='Start a Gateway that receives client Requests via gRPC/REST interface', - help='Start a Gateway', - formatter_class=_chf, - ) - ) - - set_ping_parser( - sp.add_parser( - 'ping', - help='Ping a pod and check its connectivity', - description='Ping a remote pod and check the network connectivity', - formatter_class=_chf, - ) - ) - - set_check_parser( - sp.add_parser( - 'check', - help='Check the import of all Executors and Drivers', - description='Check the import status of all executors and drivers', + **(dict(help='Start a Gateway')) if _SHOW_ALL_ARGS else {}, formatter_class=_chf, ) ) - set_hub_parser( - sp.add_parser( - 'hub', - help='Build, push, pull Jina Hub images', - description='Build, push, pull Jina Hub images', - formatter_class=_chf, - ) - ) + # set_hub_parser( + # sp.add_parser( + # 'hub', + # help='Build, push, pull Jina Hub images', + # description='Build, push, pull Jina Hub images', + # formatter_class=_chf, + # ) + # ) # Below are low-level / internal / experimental CLIs, hidden from users by default set_pea_parser( sp.add_parser( 'pea', - description='Start a Jina pea. ' + description='Start a Pea. ' 'You should rarely use this directly unless you ' 'are doing low-level orchestration', formatter_class=_chf, - **(dict(help='start a pea')) if _SHOW_ALL_ARGS else {} + **(dict(help='Start a Pea')) if _SHOW_ALL_ARGS else {}, ) ) @@ -228,7 +212,7 @@ def get_main_parser(): 'client', description='Start a Python client that connects to a remote Jina gateway', formatter_class=_chf, - **(dict(help='start a client')) if _SHOW_ALL_ARGS else {} + **(dict(help='Start a Client')) if _SHOW_ALL_ARGS else {}, ) ) @@ -237,18 +221,27 @@ def get_main_parser(): 'export-api', description='Export Jina API to JSON/YAML file for 3rd party applications', formatter_class=_chf, - **(dict(help='export Jina API to file')) if _SHOW_ALL_ARGS else {} + **(dict(help='Export Jina API to file')) if _SHOW_ALL_ARGS else {}, ) ) - set_hw_parser( + # set_optimizer_parser( + # sp.add_parser( + # 'optimizer', + # description='Start a FlowOptimizer from a YAML configuration file', + # **(dict(help='Start an FlowOptimizer from a YAML file')) if _SHOW_ALL_ARGS else {}, + # formatter_class=_chf, + # ) + # ) + + set_check_parser( sp.add_parser( - 'hello-world', - description='Start the hello-world demo, a simple end2end image index and search demo ' - 'without any extra dependencies.', + 'check', + **(dict(help='Check the import of all Executors')) + if _SHOW_ALL_ARGS + else {}, + description='Check the import status of all executors', formatter_class=_chf, - **(dict(help='👋 Hello World! Hello Jina!')) if _SHOW_ALL_ARGS else {} ) ) - return parser diff --git a/jina/parsers/base.py b/jina/parsers/base.py index 9ace44a670e49..a2b828f9b724d 100644 --- a/jina/parsers/base.py +++ b/jina/parsers/base.py @@ -14,10 +14,7 @@ def set_base_parser(): # create the top-level parser urls = { - 'Jina 101': ('🐣', 'https://101.jina.ai'), - 'Docs': ('📚', 'https://docs.jina.ai'), - 'Examples': ('🚀‍', 'https://learn.jina.ai'), - 'Code': ('🧑‍💻', 'https://opensource.jina.ai'), + 'Code': ('🧑‍💻', 'https://github.com/jina-ai/jina'), 'Hiring!': ('🙌', 'https://career.jina.ai'), } url_str = '\n'.join( @@ -27,8 +24,7 @@ def set_base_parser(): parser = argparse.ArgumentParser( epilog=f''' -Jina (v{colored(__version__, "green")}) is the cloud-native neural search solution powered by AI & deep learning. -It is a universal solution to large-scale index and query of unstructured & multimedia data. +Jina (v{colored(__version__, "green")}) is the cloud-native neural search framework powered by deep learning. {url_str} diff --git a/jina/parsers/check.py b/jina/parsers/check.py index a4309c550cdd7..17682b5180c3c 100644 --- a/jina/parsers/check.py +++ b/jina/parsers/check.py @@ -16,9 +16,4 @@ def set_check_parser(parser=None): type=str, help='The markdown file path for all executors summary', ) - parser.add_argument( - '--summary-driver', - type=str, - help='The markdown file path for all drivers summary', - ) return parser diff --git a/jina/parsers/client.py b/jina/parsers/client.py index a626561d204a2..f49f8a614688e 100644 --- a/jina/parsers/client.py +++ b/jina/parsers/client.py @@ -1,6 +1,5 @@ """Module for argparse for Client""" from .helper import add_arg_group -from ..enums import RequestType def mixin_client_cli_parser(parser): @@ -17,27 +16,6 @@ def mixin_client_cli_parser(parser): help='The number of Documents in each Request.', ) - gp.add_argument( - '--mode', - choices=list(RequestType), - type=RequestType.from_string, - # required=True, - help=''' -The Request mode. This applies to all Requests sent from this client. - -- INDEX: store new Documents into the system -- SEARCH: query Documents from an indexed system -- UPDATE: update existing Documents in an indexed system -- DELETE: delete existing Documents from an indexed system -- CONTROL: (advance) control Pea/Pod such as shutdown, status -- TRAIN: (experimental) train the system - ''', - ) - gp.add_argument( - '--top-k', - type=int, - help='At maximum k results are returned.', - ) gp.add_argument('--mime-type', type=str, help='MIME type of the input Documents.') gp.add_argument( '--continue-on-error', diff --git a/jina/parsers/flow.py b/jina/parsers/flow.py index 1fb150c595437..bedd1faa7735b 100644 --- a/jina/parsers/flow.py +++ b/jina/parsers/flow.py @@ -27,5 +27,6 @@ def set_flow_parser(parser=None): If `REMOVE` is given then all inspect pods are removed when building the flow. ''', ) + parser.set_defaults(workspace='./') return parser diff --git a/jina/parsers/helloworld.py b/jina/parsers/helloworld.py index fbf87f82adb36..c6a6914329ac2 100644 --- a/jina/parsers/helloworld.py +++ b/jina/parsers/helloworld.py @@ -27,7 +27,11 @@ def mixin_hw_base_parser(parser): def set_hello_parser(parser=None): - """Set the hello parser""" + """ + Set the hello parser + + :param parser: the parser configure + """ if not parser: parser = set_base_parser() @@ -53,11 +57,11 @@ def set_hello_parser(parser=None): spp.add_parser( 'chatbot', help=''' -Start a simple Covid-19 chatbot. +Start a simple Covid-19 chatbot. Remarks: -- Pytorch, transformers & FastAPI are required to run this demo. To install all dependencies, use +- Pytorch, transformers & FastAPI are required to run this demo. To install all dependencies, use pip install "jina[chatbot]" @@ -72,7 +76,7 @@ def set_hello_parser(parser=None): spp.add_parser( 'multimodal', help=''' -Start a simple multimodal document search. +Start a simple multimodal document search. Remarks: @@ -88,6 +92,15 @@ def set_hello_parser(parser=None): ) ) + set_hw_fork_parser( + spp.add_parser( + 'fork', + help='Fork a hello world project to a local directory, and start to build your own project on it.', + description='Fork a hello world project to a local directory.', + formatter_class=_chf, + ) + ) + def set_hw_parser(parser=None): """Set the hello world parser @@ -99,28 +112,7 @@ def set_hw_parser(parser=None): parser = set_base_parser() mixin_hw_base_parser(parser) - gp = add_arg_group(parser, title='Scalability') - gp.add_argument( - '--shards', - type=int, - default=2, - help='The number of shards when index and query', - ) - gp.add_argument( - '--parallel', - type=int, - default=2, - help='The number of parallel when index and query', - ) gp = add_arg_group(parser, title='Index') - gp.add_argument( - '--uses-index', - type=str, - default=resource_filename( - 'jina', '/'.join(('resources', 'fashion', 'helloworld.flow.index.yml')) - ), - help='The yaml path of the index flow', - ) gp.add_argument( '--index-data-url', type=str, @@ -133,23 +125,8 @@ def set_hw_parser(parser=None): default='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-labels-idx1-ubyte.gz', help='The url of index labels data (should be in idx3-ubyte.gz format)', ) - gp.add_argument( - '--index-request-size', - type=int, - default=1024, - help='The request size in indexing (the maximum number of documents that will be included in a ' - 'Request before sending it)', - ) gp = add_arg_group(parser, title='Search') - gp.add_argument( - '--uses-query', - type=str, - default=resource_filename( - 'jina', '/'.join(('resources', 'fashion', 'helloworld.flow.query.yml')) - ), - help='The yaml path of the query flow', - ) gp.add_argument( '--query-data-url', type=str, @@ -162,13 +139,15 @@ def set_hw_parser(parser=None): default='http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/t10k-labels-idx1-ubyte.gz', help='The url of query labels data (should be in idx3-ubyte.gz format)', ) + gp.add_argument( - '--query-request-size', + '--request-size', type=int, - default=32, - help='The request size in searching (the maximum number of documents that will be included in a ' + default=1024, + help='The request size in indexing (the maximum number of documents that will be included in a ' 'Request before sending it)', ) + gp.add_argument( '--num-query', type=int, default=128, help='The number of queries to visualize' ) @@ -194,12 +173,6 @@ def set_hw_chatbot_parser(parser=None): default='https://static.jina.ai/chatbot/dataset.csv', help='The url of index csv data', ) - parser.add_argument( - '--demo-url', - type=str, - default='https://static.jina.ai/chatbot/', - help='The url of the demo page', - ) parser.add_argument( '--port-expose', type=int, @@ -221,6 +194,31 @@ def set_hw_chatbot_parser(parser=None): return parser +def set_hw_fork_parser(parser=None): + """Set the parser for forking hello world demo + + :param parser: the parser configure + :return: the new parser + """ + if not parser: + parser = set_base_parser() + + parser.add_argument( + 'project', + type=str, + choices=['fashion', 'chatbot', 'multimodal'], + help='The hello world project to fork', + ) + + parser.add_argument( + 'destination', + type=str, + help='The dest directory of the forked project. Note, it can not be an existing path.', + ) + + return parser + + def set_hw_multimodal_parser(parser=None): """Set the parser for the hello world multimodal @@ -245,12 +243,6 @@ def set_hw_multimodal_parser(parser=None): default='https://static.jina.ai/multimodal/people-img.zip', help='The url of index csv data', ) - parser.add_argument( - '--demo-url', - type=str, - default='https://static.jina.ai/multimodal/', - help='The url of the demo page', - ) parser.add_argument( '--port-expose', type=int, diff --git a/jina/parsers/peapods/base.py b/jina/parsers/peapods/base.py index 7edaff3fa543d..b16a862d969fe 100644 --- a/jina/parsers/peapods/base.py +++ b/jina/parsers/peapods/base.py @@ -36,6 +36,13 @@ def mixin_base_ppr_parser(parser): help='The description of this object. It will be used in automatics docs UI.', ) + gp.add_argument( + '--workspace', + type=str, + help='The working directory for any IO operations in this object. ' + 'If not set, then derive from its parent `workspace`.', + ) + gp.add_argument( '--log-config', type=str, diff --git a/jina/parsers/peapods/pea.py b/jina/parsers/peapods/pea.py index e333de3de5d95..2d97cd364e6cf 100644 --- a/jina/parsers/peapods/pea.py +++ b/jina/parsers/peapods/pea.py @@ -3,7 +3,6 @@ from ..helper import add_arg_group, _SHOW_ALL_ARGS, KVAppendAction from ...enums import PeaRoleType, RuntimeBackendType -from ...peapods.runtimes import list_all_runtimes def mixin_pea_parser(parser): @@ -33,7 +32,6 @@ def mixin_pea_parser(parser): gp.add_argument( '--runtime-cls', type=str, - choices=list_all_runtimes(), default='ZEDRuntime', help='The runtime class to run inside the Pea', ) diff --git a/jina/parsers/peapods/runtimes/container.py b/jina/parsers/peapods/runtimes/container.py index 525f9a2a105b3..8df9044c69bc0 100644 --- a/jina/parsers/peapods/runtimes/container.py +++ b/jina/parsers/peapods/runtimes/container.py @@ -7,11 +7,12 @@ def mixin_container_runtime_parser(parser): :param parser: the parser instance to which we add arguments """ gp = add_arg_group(parser, title='ContainerRuntime') + from jina import __default_executor__ gp.add_argument( '--uses-internal', type=str, - default='BaseExecutor', + default=__default_executor__, help=''' The config runs inside the Docker container. diff --git a/jina/parsers/peapods/runtimes/zed.py b/jina/parsers/peapods/runtimes/zed.py index a259e13b98011..c8329dcfc161c 100644 --- a/jina/parsers/peapods/runtimes/zed.py +++ b/jina/parsers/peapods/runtimes/zed.py @@ -13,20 +13,19 @@ def mixin_zed_runtime_parser(parser): """ gp = add_arg_group(parser, title='ZEDRuntime') + from jina import __default_executor__ gp.add_argument( '--uses', type=str, - default='_pass', + default=__default_executor__, help=''' The config of the executor, it could be one of the followings: * an Executor-level YAML file path (.yml, .yaml, .jaml) * a name of a class inherited from `jina.Executor` * a docker image (must start with `docker://`) - * builtin executors, e.g. `_pass`, `_logforward`, `_merge` * the string literal of a YAML config (must start with `!`) * the string literal of a JSON config - * the string literal of a YAML driver config (must start with `- !!`) When use it under Python, one can use the following values additionally: - a Python dict that represents the config @@ -88,21 +87,6 @@ def mixin_zed_runtime_parser(parser): help='The socket type for output port', ) - gp.add_argument( - '--load-interval', - type=int, - default=-1, - help='Reload the Executor in the Pod on every n seconds. ' - '-1 or 0 means do not reload. ', - ) - - gp.add_argument( - '--dump-interval', - type=int, - default=240, - help='Serialize the Executor in the Pod every n seconds if model changes. ' - '-1 means --read-only. ', - ) gp.add_argument( '--read-only', action='store_true', @@ -127,9 +111,8 @@ def mixin_zed_runtime_parser(parser): help=''' The skip strategy on exceptions. -- IGNORE: Ignore it, keep running all Drivers & Executors logics in the sequel flow -- SKIP_EXECUTOR: Skip all Executors in the sequel, but drivers are still called -- SKIP_HANDLE: Skip all Drivers & Executors in the sequel, only `pre_hook` and `post_hook` are called +- IGNORE: Ignore it, keep running all Executors in the sequel flow +- SKIP_HANDLE: Skip all Executors in the sequel, only `pre_hook` and `post_hook` are called - THROW_EARLY: Immediately throw the exception, the sequel flow will not be running at all Note, `IGNORE`, `SKIP_EXECUTOR` and `SKIP_HANDLE` do not guarantee the success execution in the sequel flow. If something diff --git a/jina/peapods/__init__.py b/jina/peapods/__init__.py index 3e6e9b705b022..b510511c3ecb4 100644 --- a/jina/peapods/__init__.py +++ b/jina/peapods/__init__.py @@ -1,7 +1,4 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - from .peas import BasePea as Pea from .pods import BasePod from .pods import Pod -from .pods.compoundpod import CompoundPod +from .pods.compound import CompoundPod diff --git a/jina/peapods/pods/__init__.py b/jina/peapods/pods/__init__.py index 8cb4b072d2473..1a7e738c1cd54 100644 --- a/jina/peapods/pods/__init__.py +++ b/jina/peapods/pods/__init__.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import copy import sys from abc import abstractmethod @@ -10,14 +7,11 @@ from typing import Dict, Union, Set from typing import List, Optional -from jina.peapods.zmq import send_ctrl_message -from jina.types.message.dump import DumpMessage from ..peas import BasePea -from ... import __default_host__ +from ... import __default_host__, __default_executor__ from ... import helper from ...enums import SchedulerType, PodRoleType, SocketType, PeaRoleType, PollingType from ...helper import get_public_ip, get_internal_ip, random_identity -from ...types.message.common import ControlMessage class ExitFIFO(ExitStack): @@ -212,7 +206,7 @@ def _copy_to_head_args( else: _head_args.socket_out = SocketType.PUB_BIND if as_router: - _head_args.uses = args.uses_before or '_pass' + _head_args.uses = args.uses_before or __default_executor__ if as_router: _head_args.pea_role = PeaRoleType.HEAD @@ -247,7 +241,7 @@ def _copy_to_tail_args( _tail_args.uses = None if as_router: - _tail_args.uses = args.uses_after or '_pass' + _tail_args.uses = args.uses_after or __default_executor__ if args.name: _tail_args.name = f'{args.name}/tail' else: @@ -575,7 +569,7 @@ def _set_after_to_pass(self, args): # remark 1: i think it's related to route driver. if hasattr(args, 'polling') and args.polling.is_push: # ONLY reset when it is push - args.uses_after = '_pass' + args.uses_after = __default_executor__ @staticmethod def _set_peas_args( @@ -635,6 +629,9 @@ def _set_peas_args( bind_args=tail_args, connect_args=_args ) + # pea workspace if not set then derive from workspace + if not _args.workspace: + _args.workspace = args.workspace result.append(_args) return result @@ -652,8 +649,12 @@ def _parse_base_pod_args(self, args): head_args=parsed_args['head'], tail_args=parsed_args['tail'], ) - elif (getattr(args, 'uses_before', None) and args.uses_before != '_pass') or ( - getattr(args, 'uses_after', None) and args.uses_after != '_pass' + elif ( + getattr(args, 'uses_before', None) + and args.uses_before != __default_executor__ + ) or ( + getattr(args, 'uses_after', None) + and args.uses_after != __default_executor__ ): args.scheduling = SchedulerType.ROUND_ROBIN if getattr(args, 'uses_before', None): @@ -674,26 +675,3 @@ def _parse_base_pod_args(self, args): # note that peas_args['peas'][0] exist either way and carries the original property return parsed_args - - def dump(self, pod_name, dump_path, shards, timeout): - """Emit a Dump request to its Peas - - :param pod_name: the pod to target - :param shards: the nr of shards in the dump - :param dump_path: the path to which to dump - :param timeout: time to wait (seconds) - """ - for pea in self.peas: - if pea.inner: - send_ctrl_message( - pea.runtime.ctrl_addr, - ControlMessage( - command='DUMP', - pod_name=pod_name, - args={ - 'dump_path': dump_path, - 'shards': shards, - }, - ), - timeout=timeout, - ) diff --git a/jina/peapods/pods/compoundpod.py b/jina/peapods/pods/compound.py similarity index 98% rename from jina/peapods/pods/compoundpod.py rename to jina/peapods/pods/compound.py index 9587ce7f5ac1c..5aad3c940aba1 100644 --- a/jina/peapods/pods/compoundpod.py +++ b/jina/peapods/pods/compound.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import copy from argparse import Namespace from itertools import cycle diff --git a/jina/peapods/pods/factory.py b/jina/peapods/pods/factory.py index 30013be39180b..4f472fcc07880 100644 --- a/jina/peapods/pods/factory.py +++ b/jina/peapods/pods/factory.py @@ -1,9 +1,9 @@ -from typing import Union, Optional, Dict, Set from argparse import Namespace +from typing import Union, Optional, Dict, Set -from .. import Pod +from .compound import CompoundPod from .. import BasePod -from .compoundpod import CompoundPod +from .. import Pod class PodFactory: diff --git a/jina/peapods/runtimes/asyncio/grpc/async_call.py b/jina/peapods/runtimes/asyncio/grpc/async_call.py index 5b3bb844973a1..38b974c2d2094 100644 --- a/jina/peapods/runtimes/asyncio/grpc/async_call.py +++ b/jina/peapods/runtimes/asyncio/grpc/async_call.py @@ -1,5 +1,4 @@ import asyncio -from typing import Iterable, Any from .....helper import random_identity, typename from .....logging import JinaLogger diff --git a/jina/peapods/runtimes/asyncio/rest/app.py b/jina/peapods/runtimes/asyncio/rest/app.py index 14ad8dfe6026b..b63e912787271 100644 --- a/jina/peapods/runtimes/asyncio/rest/app.py +++ b/jina/peapods/runtimes/asyncio/rest/app.py @@ -1,15 +1,13 @@ import argparse import asyncio -import warnings from typing import Any -from google.protobuf.json_format import MessageToDict, MessageToJson +from google.protobuf.json_format import MessageToJson from ..grpc.async_call import AsyncPrefetchCall from ....zmq import AsyncZmqlet from ..... import __version__ from .....clients.request import request_generator -from .....enums import RequestType from .....helper import get_full_version, random_identity from .....importer import ImportExtensions from .....logging import JinaLogger, default_logger @@ -27,7 +25,7 @@ def get_fastapi_app(args: 'argparse.Namespace', logger: 'JinaLogger'): :return: fastapi app """ with ImportExtensions(required=True): - from fastapi import FastAPI, WebSocket, Body + from fastapi import FastAPI, WebSocket from fastapi.responses import JSONResponse from fastapi.middleware.cors import CORSMiddleware from starlette.endpoints import WebSocketEndpoint @@ -40,7 +38,7 @@ def get_fastapi_app(args: 'argparse.Namespace', logger: 'JinaLogger'): JinaDeleteRequestModel, JinaUpdateRequestModel, JinaSearchRequestModel, - JinaReloadRequestModel, + JinaRequestModel, ) app = FastAPI( @@ -90,7 +88,7 @@ async def startup(): path='/status', summary='Get the status of Jina', response_model=JinaStatusModel, - tags=['jina'], + tags=['Management'], ) async def _status(): _info = get_full_version() @@ -100,41 +98,21 @@ async def _status(): 'used_memory': used_memory_readable(), } - @app.post(path='/api/{mode}', deprecated=True) - async def api(mode: str, body: Any = Body(...)): + @app.post( + path='/post', summary='Post a general data request to Jina', tags=['General'] + ) + async def post(body: JinaRequestModel): """ Request mode service and return results in JSON, a deprecated interface. - :param mode: INDEX, SEARCH, DELETE, UPDATE, CONTROL, TRAIN. :param body: Request body. :return: Results in JSONresponse. """ - warnings.warn('this interface will be retired soon', DeprecationWarning) - if mode.upper() not in RequestType.__members__: - return error(reason=f'unsupported mode {mode}', status_code=405) - - if 'data' not in body: - return error('"data" field is empty', 406) - - body['mode'] = RequestType.from_string(mode) - from .....clients import BaseClient - - BaseClient.add_default_kwargs(body) - req_iter = request_generator(**body) - results = await get_result_in_json(req_iter=req_iter) - return JSONResponse(content=results[0], status_code=200) - - async def get_result_in_json(req_iter): - """ - Convert message to JSON data. - :param req_iter: Request iterator - :return: Results in JSON format - """ - return [ - MessageToDict(k) - async for k in servicer.Call(request_iterator=req_iter, context=None) - ] + bd = body.dict() + return StreamingResponse( + result_in_stream(request_generator(**bd)), media_type='application/json' + ) @app.post(path='/index', summary='Index documents into Jina', tags=['CRUD']) async def index_api(body: JinaIndexRequestModel): @@ -144,10 +122,8 @@ async def index_api(body: JinaIndexRequestModel): :param body: index request. :return: Response of the results. """ - from .....clients import BaseClient bd = body.dict() - bd['mode'] = RequestType.INDEX return StreamingResponse( result_in_stream(request_generator(**bd)), media_type='application/json' ) @@ -160,10 +136,8 @@ async def search_api(body: JinaSearchRequestModel): :param body: search request. :return: Response of the results. """ - from .....clients import BaseClient bd = body.dict() - bd['mode'] = RequestType.SEARCH return StreamingResponse( result_in_stream(request_generator(**bd)), media_type='application/json' ) @@ -176,10 +150,8 @@ async def update_api(body: JinaUpdateRequestModel): :param body: update request. :return: Response of the results. """ - from .....clients import BaseClient bd = body.dict() - bd['mode'] = RequestType.UPDATE return StreamingResponse( result_in_stream(request_generator(**bd)), media_type='application/json' ) @@ -192,34 +164,12 @@ async def delete_api(body: JinaDeleteRequestModel): :param body: delete request. :return: Response of the results. """ - from .....clients import BaseClient bd = body.dict() - bd['mode'] = RequestType.DELETE return StreamingResponse( result_in_stream(request_generator(**bd)), media_type='application/json' ) - @app.post( - path='/reload', summary='Reload the executor of certain Peas/Pods in the Flow' - ) - async def reload_api(body: JinaReloadRequestModel): - """ - Reload the executor of certain peas/pods in the Flow - - :param body: reload request. - :return: Response of the results. - """ - from .....clients import BaseClient - - bd = body.dict() - bd['mode'] = RequestType.CONTROL - bd['command'] = 'RELOAD' - return StreamingResponse( - result_in_stream(request_generator(data=[], **bd)), - media_type='application/json', - ) - async def result_in_stream(req_iter): """ Streams results from AsyncPrefetchCall as json diff --git a/jina/peapods/runtimes/asyncio/rest/models.py b/jina/peapods/runtimes/asyncio/rest/models.py index 42633764ab2b3..50593457f8afe 100644 --- a/jina/peapods/runtimes/asyncio/rest/models.py +++ b/jina/peapods/runtimes/asyncio/rest/models.py @@ -6,6 +6,8 @@ from google.protobuf.descriptor import Descriptor, FieldDescriptor from google.protobuf.pyext.cpp_message import GeneratedProtocolMessageType +from pydantic import Field, BaseModel, BaseConfig, create_model, root_validator + from jina.enums import DataInputType from jina.parsers import set_client_cli_parser from jina.proto.jina_pb2 import ( @@ -19,10 +21,8 @@ StatusProto, MessageProto, RequestProto, - QueryLangProto, ) from jina.types.document import Document -from pydantic import Field, BaseModel, BaseConfig, create_model, root_validator DEFAULT_REQUEST_SIZE = set_client_cli_parser().parse_args([]).request_size PROTO_TO_PYDANTIC_MODELS = SimpleNamespace() @@ -228,7 +228,6 @@ def protobuf_to_pydantic_model( StatusProto, MessageProto, RequestProto, - QueryLangProto, ): protobuf_to_pydantic_model(proto) @@ -257,83 +256,45 @@ class JinaRequestModel(BaseModel): """ # To avoid an error while loading the request model schema on swagger, we've added an example. - data: Union[ - List[PROTO_TO_PYDANTIC_MODELS.DocumentProto], - List[Dict[str, Any]], - List[str], - List[bytes], - ] = Field(..., example=[Document().dict()]) + exec_endpoint: Optional[str] = None + data: Optional[ + Union[ + List[PROTO_TO_PYDANTIC_MODELS.DocumentProto], + List[Dict[str, Any]], + List[str], + List[bytes], + ] + ] = Field(None, example=[Document().dict()]) request_size: Optional[int] = DEFAULT_REQUEST_SIZE mime_type: Optional[str] = '' - queryset: Optional[List[PROTO_TO_PYDANTIC_MODELS.QueryLangProto]] = None data_type: DataInputType = DataInputType.AUTO - - @root_validator(pre=True, allow_reuse=True) - def add_default_kwargs(cls, kwargs: dict): - """ - Replicates jina.clients.base.BaseClient.add_default_kwargs for Pydantic - - :param kwargs: arguments passed to the Pydantic model - :type kwargs: dict - :return: kwargs - """ - if ('top_k' in kwargs) and (kwargs['top_k'] is not None): - # associate all VectorSearchDriver and SliceQL driver to use top_k - topk_ql = [ - PROTO_TO_PYDANTIC_MODELS.QueryLangProto( - **{ - 'name': 'SliceQL', - 'priority': 1, - 'parameters': {'end': kwargs['top_k']}, - } - ), - PROTO_TO_PYDANTIC_MODELS.QueryLangProto( - **{ - 'name': 'VectorSearchDriver', - 'priority': 1, - 'parameters': {'top_k': kwargs['top_k']}, - } - ), - ] - if 'queryset' not in kwargs: - kwargs['queryset'] = topk_ql - else: - kwargs['queryset'].extend(topk_ql) - - return kwargs + target_peapod: Optional[str] = '' + parameters: Optional[Dict] = None class JinaIndexRequestModel(JinaRequestModel): """Index request model.""" - pass + exec_endpoint = '/index' class JinaSearchRequestModel(JinaRequestModel): """Search request model.""" - pass + exec_endpoint = '/search' class JinaUpdateRequestModel(JinaRequestModel): """Update request model.""" - pass + exec_endpoint = '/update' class JinaDeleteRequestModel(JinaRequestModel): """Delete request model.""" - data: List[str] + exec_endpoint = '/delete' class JinaControlRequestModel(JinaRequestModel): """Control request model.""" - - pass - - -class JinaTrainRequestModel(JinaRequestModel): - """Train request model.""" - - pass diff --git a/jina/peapods/runtimes/zmq/zed.py b/jina/peapods/runtimes/zmq/zed.py index 61ca348295fbf..4742d8aabc298 100644 --- a/jina/peapods/runtimes/zmq/zed.py +++ b/jina/peapods/runtimes/zmq/zed.py @@ -1,13 +1,14 @@ +import re import time from collections import defaultdict from typing import Dict, List import zmq +from google.protobuf.json_format import MessageToDict from .base import ZMQRuntime from ...zmq import ZmqStreamlet -from .... import Message -from .... import Request +from .... import __default_endpoint__ from ....enums import OnErrorStrategy from ....excepts import ( NoExplicitMessage, @@ -16,15 +17,19 @@ ChainedPodException, BadConfigSource, RuntimeTerminated, + UnknownControlCommand, ) from ....executors import BaseExecutor -from ....helper import random_identity -from ....logging.profile import used_memory, TimeDict +from ....helper import random_identity, typename +from ....logging.profile import used_memory from ....proto import jina_pb2 +from ....types.arrays.document import DocumentArray +from ....types.message import Message +from ....types.request import Request class ZEDRuntime(ZMQRuntime): - """Runtime procedure leveraging :class:`ZmqStreamlet` for Executor, Driver.""" + """Runtime procedure leveraging :class:`ZmqStreamlet` for Executor.""" def run_forever(self): """Start the `ZmqStreamlet`.""" @@ -34,10 +39,6 @@ def setup(self): """Initialize private parameters and execute private loading functions.""" self._id = random_identity() self._last_active_time = time.perf_counter() - self._last_dump_time = time.perf_counter() - self._last_load_time = time.perf_counter() - - self._timer = TimeDict() self._request = None self._message = None @@ -47,6 +48,9 @@ def setup(self): self._partial_requests = None self._partial_messages = None + # idle_dealer_ids only becomes non-None when it receives IDLE ControlRequest + self._idle_dealer_ids = set() + self._load_zmqlet() self._load_plugins() self._load_executor() @@ -72,13 +76,8 @@ def _load_executor(self): try: self._executor = BaseExecutor.load_config( self.args.uses, - pea_id=self.args.pea_id, - replica_id=getattr(self.args, 'replica_id', -1), - read_only=self.args.read_only, - # only used by QueryIndexers - metas={'dump_path': getattr(self.args, 'dump_path', None)}, + runtime_args=vars(self.args), ) - self._executor.attach(runtime=self) except BadConfigSource as ex: self.logger.error( f'fail to load config from {self.args.uses}, if you are using docker image for --uses, ' @@ -101,17 +100,6 @@ def _load_plugins(self): #: Private methods required by :meth:`teardown` - def _save_executor(self): - """Save the contained executor according to the `dump_interval` parameter.""" - if (time.perf_counter() - self._last_dump_time) > self.args.dump_interval > 0: - self._executor.save() - self._last_dump_time = time.perf_counter() - - def _reload_executor(self): - if (time.perf_counter() - self._last_load_time) > self.args.load_interval > 0: - self._load_executor() - self._last_load_time = time.perf_counter() - def _check_memory_watermark(self): """Check the memory watermark.""" if used_memory() > self.args.memory_hwm > 0: @@ -126,7 +114,6 @@ def _pre_hook(self, msg: 'Message') -> 'ZEDRuntime': :return: `ZEDRuntime` """ msg.add_route(self.name, self._id) - self._reload_executor() self._request = msg.request self._message = msg @@ -138,9 +125,28 @@ def _pre_hook(self, msg: 'Message') -> 'ZEDRuntime': self._partial_requests = [v.request for v in self._partial_messages] part_str = f'({len(self.partial_requests)}/{self.expect_parts} parts)' - self.logger.info( - f'recv {msg.envelope.request_type} {part_str} from {msg.colored_route}' - ) + info_msg = f'recv {msg.envelope.request_type} ' + if self.request_type == 'DataRequest': + info_msg += f'({self.envelope.header.exec_endpoint}) ' + elif self.request_type == 'ControlRequest': + info_msg += f'({self.request.command}) ' + info_msg += f'{part_str} from {msg.colored_route}' + self.logger.info(info_msg) + + if self.expect_parts > 1 and self.expect_parts > len(self.partial_requests): + # NOTE: reduce priority is higher than chain exception + # otherwise a reducer will lose its function when earlier pods raise exception + raise NoExplicitMessage + + if self.request_type == 'ControlRequest': + self._handle_control_req() + + if ( + msg.envelope.status.code == jina_pb2.StatusProto.ERROR + and self.args.on_error_strategy >= OnErrorStrategy.SKIP_HANDLE + ): + raise ChainedPodException + return self def _post_hook(self, msg: 'Message') -> 'ZEDRuntime': @@ -150,8 +156,10 @@ def _post_hook(self, msg: 'Message') -> 'ZEDRuntime': :param msg: received message :return: `ZEDRuntime` """ + # do NOT access `msg.request.*` in the _pre_hook, as it will trigger the deserialization + # all meta information should be stored and accessed via `msg.envelope` + self._last_active_time = time.perf_counter() - self._save_executor() self._zmqlet.print_stats() self._check_memory_watermark() @@ -170,20 +178,81 @@ def _handle(self, msg: 'Message') -> 'ZEDRuntime': :param msg: the message received :return: ZEDRuntime procedure. """ - if self.expect_parts > 1 and self.expect_parts > len(self.partial_requests): - # NOTE: reduce priority is higher than chain exception - # otherwise a reducer will lose its function when eailier pods raise exception - raise NoExplicitMessage if ( - msg.envelope.status.code != jina_pb2.StatusProto.ERROR - or self.args.on_error_strategy < OnErrorStrategy.SKIP_HANDLE + not re.match(self.envelope.header.target_peapod, self.name) + or self.request_type != 'DataRequest' ): - self._executor(self.request_type) - else: - raise ChainedPodException + return self + + # migrated from the previously RouteDriver logic + if self._idle_dealer_ids: + dealer_id = self._idle_dealer_ids.pop() + self.envelope.receiver_id = dealer_id + + # when no available dealer, pause the pollin from upstream + if not self._idle_dealer_ids: + self._zmqlet.pause_pollin() + + if ( + self.envelope.header.exec_endpoint not in self._executor.requests + and __default_endpoint__ not in self._executor.requests + ): + return self + + # executor logic + r_docs = self._executor( + req_endpoint=self.envelope.header.exec_endpoint, + docs=self.docs, + parameters=MessageToDict(self.request.parameters), + docs_matrix=self.docs_matrix, + groundtruths=self.groundtruths, + groundtruths_matrix=self.groundtruths_matrix, + ) + + # assigning result back to request + # 1. Return none: do nothing + # 2. Return nonempty and non-DocumentArray: raise error + # 3. Return DocArray, but the memory pointer says it is the same as self.docs: do nothing + # 4. Return DocArray and its not a shallow copy of self.docs: assign self.request.docs + if r_docs is not None: + if not isinstance(r_docs, DocumentArray): + raise TypeError( + f'return type must be {DocumentArray!r} or None, but getting {typename(r_docs)}' + ) + elif r_docs != self.request.docs: + # this means the returned DocArray is a completely new one + self.request.docs.clear() + self.request.docs.extend(r_docs) + return self + def _handle_control_req(self): + # migrated from previous ControlDriver logic + if self.request.command == 'TERMINATE': + self.envelope.status.code = jina_pb2.StatusProto.SUCCESS + raise RuntimeTerminated + elif self.request.command == 'STATUS': + self.envelope.status.code = jina_pb2.StatusProto.READY + self.request.parameters = vars(self.args) + elif self.request.command == 'IDLE': + self._idle_dealer_ids.add(self.envelope.receiver_id) + self._zmqlet.resume_pollin() + self.logger.debug( + f'{self.envelope.receiver_id} is idle, now I know these idle peas {self._idle_dealer_ids}' + ) + elif self.request.command == 'CANCEL': + if self.envelope.receiver_id in self._idle_dealer_ids: + self._idle_dealer_ids.remove(self.envelope.receiver_id) + elif self.request.command == 'ACTIVATE': + self._zmqlet._send_idle_to_router() + elif self.request.command == 'DEACTIVATE': + self._zmqlet._send_cancel_to_router() + else: + raise UnknownControlCommand( + f'don\'t know how to handle {self.request.command}' + ) + def _callback(self, msg: 'Message'): self.is_post_hook_done = False #: if the post_hook is called self._pre_hook(msg)._handle(msg)._post_hook(msg) @@ -232,14 +301,10 @@ def _msg_callback(self, msg: 'Message') -> None: if self.args.on_error_strategy == OnErrorStrategy.THROW_EARLY: raise if isinstance(ex, ChainedPodException): + # the error is print from previous pod, no need to show it again + # hence just add exception and propagate further + # please do NOT add logger.error here! msg.add_exception() - self.logger.error( - f'{ex!r}' - + f'\n add "--quiet-error" to suppress the exception details' - if not self.args.quiet_error - else '', - exc_info=not self.args.quiet_error, - ) else: msg.add_exception(ex, executor=getattr(self, '_executor')) self.logger.error( @@ -316,3 +381,69 @@ def partial_messages(self) -> List['Message']: """ return self._partial_messages + + def _get_docs(self, field: str) -> 'DocumentArray': + if self.expect_parts > 1: + result = DocumentArray( + [d for r in reversed(self.partial_requests) for d in getattr(r, field)] + ) + else: + result = getattr(self.request, field) + + # to unify all length=0 DocumentArray (or any other results) will simply considered as None + # otherwise the executor has to handle DocArray(0) + if len(result): + return result + + def _get_docs_matrix(self, field) -> List['DocumentArray']: + """DocumentArray from (multiple) requests + + :param field: either `docs` or `groundtruths` + + .. # noqa: DAR201""" + if self.expect_parts > 1: + result = [getattr(r, field) for r in reversed(self.partial_requests)] + else: + result = [getattr(self.request, field)] + + # to unify all length=0 DocumentArray (or any other results) will simply considered as None + # otherwise, the executor has to handle [None, None, None] or [DocArray(0), DocArray(0), DocArray(0)] + len_r = sum(len(r) for r in result) + if len_r: + return result + + @property + def docs(self) -> 'DocumentArray': + """Return a DocumentArray by concatenate (multiple) ``requests.docs`` + + .. # noqa: DAR201""" + return self._get_docs('docs') + + @property + def groundtruths(self) -> 'DocumentArray': + """Return a DocumentArray by concatenate (multiple) ``requests.groundtruths`` + + .. # noqa: DAR201""" + return self._get_docs('groundtruths') + + @property + def docs_matrix(self) -> List['DocumentArray']: + """Return a list of DocumentArray from multiple requests + + .. # noqa: DAR201""" + return self._get_docs_matrix('docs') + + @property + def groundtruths_matrix(self) -> List['DocumentArray']: + """A flattened DocumentArray from (multiple) requests + + .. # noqa: DAR201""" + return self._get_docs_matrix('groundtruths') + + @property + def envelope(self) -> 'jina_pb2.EnvelopeProto': + """Get the current message envelope + + .. # noqa: DAR201 + """ + return self._message.envelope diff --git a/jina/peapods/zmq/__init__.py b/jina/peapods/zmq/__init__.py index 2ec0aea0f9f03..30cab8375e4b1 100644 --- a/jina/peapods/zmq/__init__.py +++ b/jina/peapods/zmq/__init__.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - import argparse import asyncio import os @@ -13,13 +10,14 @@ from zmq.eventloop.zmqstream import ZMQStream from zmq.ssh import tunnel_connection -from ... import __default_host__, Request +from ... import __default_host__ from ...enums import SocketType from ...helper import colored, random_identity, get_readable_size, get_or_reuse_loop from ...importer import ImportExtensions from ...logging import default_logger, profile_logger, JinaLogger from ...types.message import Message from ...types.message.common import ControlMessage +from ...types.request import Request class Zmqlet: @@ -59,6 +57,7 @@ def __init__( self.msg_recv = 0 self.msg_sent = 0 self.is_closed = False + self.is_polling_paused = False self.opened_socks = [] # this must be here for `close()` self.ctx, self.in_sock, self.out_sock, self.ctrl_sock = self._init_sockets() self._register_pollin() @@ -411,10 +410,13 @@ def close(self, flush: bool = True, *args, **kwargs): def pause_pollin(self): """Remove :attr:`in_sock` from the poller """ self.in_sock.stop_on_recv() + self.is_polling_paused = True def resume_pollin(self): """Put :attr:`in_sock` back to the poller """ - self.in_sock.on_recv(self._in_sock_callback) + if self.is_polling_paused: + self.in_sock.on_recv(self._in_sock_callback) + self.is_polling_paused = False def start(self, callback: Callable[['Message'], 'Message']): """ diff --git a/jina/proto/jina.proto b/jina/proto/jina.proto index f5182d232c05a..cb163630f0ce2 100644 --- a/jina/proto/jina.proto +++ b/jina/proto/jina.proto @@ -83,10 +83,7 @@ message DocumentProto { // the width of the recursive match structure uint32 adjacency = 22; - // the name of the level - string level_name = 15; - - // the parent id from the previous level + // the parent id from the previous granularity string parent_id = 16; oneof content { @@ -98,36 +95,36 @@ message DocumentProto { // a text document string text = 13; + + // a uri of the document could be: a local file path, a remote url starts with http or https or data URI scheme + string uri = 9; } - // list of the chunks of this document (recursive structure) + // list of the sub-documents of this document (recursive structure) repeated DocumentProto chunks = 4; - // the weight of this document + // The weight of this document float weight = 5; // total number of siblings of this document (docs that are in the same granularity and have the same parent_id) uint32 siblings = 25; - // the top-k matched Docs on the same level (recursive structure) + // the matched documents on the same level (recursive structure) repeated DocumentProto matches = 8; // mime type of this document, for buffer content, this is required; for other contents, this can be guessed string mime_type = 10; - // a uri of the document could be: a local file path, a remote url starts with http or https or data URI scheme - string uri = 9; - // a structured data value, consisting of field which map to dynamically typed values. google.protobuf.Struct tags = 11; // the position of the doc, could be start and end index of a string; could be x,y (top, left) coordinate of an image crop; could be timestamp of an audio clip repeated uint32 location = 17; - // the offset of this doc in the previous level document + // the offset of this doc in the previous granularity document uint32 offset = 18; - // the embedding array of this doc + // the embedding `ndarray` of this document NdArrayProto embedding = 19; // TODO: List of matching scores performed on the document, each element corresponds to a metric @@ -168,8 +165,6 @@ message EnvelopeProto { // timeout in second until this message is dropped uint32 timeout = 4; - repeated RouteProto routes = 5; // a list of routes this message goes through - /** * Represents a the version information */ @@ -181,11 +176,9 @@ message EnvelopeProto { VersionProto version = 6; // version info - StatusProto status = 7; // status info, when present, it is the first exception that routes carry - - string request_type = 9; // type of the request: TrainRequest, IndexRequest, SearchRequest, ControlRequest + string request_type = 7; // type of the request: DataRequest, ControlRequest - bool check_version = 10; // check local Protobuf version on every Pod that this message flows to + bool check_version = 8; // check local Protobuf version on every Pod that this message flows to /** * Represents a config for the compression algorithm @@ -200,12 +193,37 @@ message EnvelopeProto { google.protobuf.Struct parameters = 4; // other parameters that can be accepted by the algorithm } - CompressConfigProto compression = 11; // compress configuration used for request + + CompressConfigProto compression = 9; // compress configuration used for request + + repeated RouteProto routes = 10; // status info on every routes + + StatusProto status = 11; // status info + + HeaderProto header = 12; // header contains meta info defined by the user, copied from Request, for lazy serialization } +/** + * Represents a Header. + * - The header's content will be defined by the user request. + * - It will be copied to the envelope.header + * - In-flow operations will modify the envelope.header + * - While returning, copy envelope.header back to request.header + */ +message HeaderProto { + string exec_endpoint = 1; // the endpoint specified by `@requests(on='/abc')` + + string target_peapod = 2; // if set, the request is targeted to certain peas/pods, regex strings + + bool no_propagate = 3; // if set, then this request is not propagate over the Flow topology + +} +/** + * Represents a Status + */ message StatusProto { enum StatusCode { @@ -237,9 +255,6 @@ message StatusProto { // the name of the executor bind to that peapod (if applicable) string executor = 4; - // the name of the driver bind to that peapod (if applicable) - string driver = 5; - } @@ -255,6 +270,10 @@ message MessageProto { RequestProto request = 2; // the request body } +message DocumentArrayProto { + repeated DocumentProto docs = 1; // a list of Documents +} + /** * Represents a Request */ @@ -263,62 +282,16 @@ message RequestProto { string request_id = 1; // the unique ID of this request. Multiple requests with the same ID will be gathered oneof body { - TrainRequestProto train = 2; // a train request - IndexRequestProto index = 3; // an index request - SearchRequestProto search = 4; // a search request - ControlRequestProto control = 5; // a control request - DeleteRequestProto delete = 9; // a delete request - UpdateRequestProto update = 10; // an update request - DumpRequestProto dump = 11; // an dump request - } - - /** - * Represents a train request - */ - message TrainRequestProto { - repeated DocumentProto docs = 1; // a list of Documents to train - - repeated DocumentProto groundtruths = 2; // the groundtruth document you want to evaluate it with - } - - /** - * Represents an index request - */ - message IndexRequestProto { - repeated DocumentProto docs = 1; // a list of Documents to index - - repeated DocumentProto groundtruths = 2; // the groundtruth document you want to evaluate it with - } - - /** - * Represents an update request - */ - message UpdateRequestProto { - repeated DocumentProto docs = 1; // a list of Documents to update - } - - /** - * Represents a delete request - */ - message DeleteRequestProto { - repeated string ids = 1; // a list of Document Ids to delete + ControlRequestProto control = 2; // a control request + DataRequestProto data = 3; // a data request } /** - * Represents a search request + * Represents a general data request */ - message SearchRequestProto { + message DataRequestProto { repeated DocumentProto docs = 1; // a list of Documents to query - - repeated DocumentProto groundtruths = 2; // the groundtruth document you want to evaluate it with - } - - /** - * Represents a dump request - */ - message DumpRequestProto { - string path = 1; // path for the index - int32 shards = 2; // nr of shards in the dump + repeated DocumentProto groundtruths = 2; // a list of groundtruth Document you want to evaluate it with } /** @@ -328,35 +301,24 @@ message RequestProto { enum Command { TERMINATE = 0; // shutdown the BasePod STATUS = 1; // check the status of the BasePod - IDLE = 3; // used in ROUTER-DEALER pattern, tells the router that the dealer is idle - CANCEL = 4; // used in ROUTER-DEALER pattern, tells the router that the dealer is busy (or closed) - RELOAD = 5; // triggers a Pod/Pea to (re)load the executor - SCALE = 6; // scale up/down a Pod - ACTIVATE = 7; // used in ROUTER-DEALER pattern, Indicate a Pea that it can activate itself and send the IDLE command to their router - DEACTIVATE = 8; // used in ROUTER-DEALER pattern, Indicate a Pea that it can deactivate itself and send the CANCEL command to their router - DUMP = 9; // Dump the data to a path, for Rolling Update + IDLE = 2; // used in ROUTER-DEALER pattern, tells the router that the dealer is idle + CANCEL = 3; // used in ROUTER-DEALER pattern, tells the router that the dealer is busy (or closed) + SCALE = 4; // scale up/down a Pod + ACTIVATE = 5; // used in ROUTER-DEALER pattern, Indicate a Pea that it can activate itself and send the IDLE command to their router + DEACTIVATE = 6; // used in ROUTER-DEALER pattern, Indicate a Pea that it can deactivate itself and send the CANCEL command to their router } Command command = 1; // the control command + } - google.protobuf.Struct args = 2; // the pea arguments, useful in STATUS + HeaderProto header = 4; // header contains meta info defined by the user - bool propagate = 3; // if set, then this request is propagate over the Flow topology just like a data request + google.protobuf.Struct parameters = 5; // extra kwargs that will be used in executor - repeated string targets = 4; // if set, the control request is targeted to certain peas/pods, regex strings - } + repeated RouteProto routes = 6; // status info on every routes - StatusProto status = 6; // status info + StatusProto status = 7; // status info - repeated RouteProto routes = 8; // status info on every routes - - repeated QueryLangProto queryset = 7; // the query set -} -message QueryLangProto { - string name = 1; // the name of the query lang, map to the corresponding driver - google.protobuf.Struct parameters = 2; // the parameter of the query lang - bool disabled = 3; // when set, then is query lang is disabled - int32 priority = 4; // when query lang is set on both driver and request levels, higher priority query lang is applied, lower priority is dropped } diff --git a/jina/proto/jina_pb2.py b/jina/proto/jina_pb2.py index e1369dbf854cf..31e02ff43f3f7 100644 --- a/jina/proto/jina_pb2.py +++ b/jina/proto/jina_pb2.py @@ -21,7 +21,7 @@ syntax='proto3', serialized_options=None, create_key=_descriptor._internal_create_key, - serialized_pb=b'\n\njina.proto\x12\x04jina\x1a\x1fgoogle/protobuf/timestamp.proto\x1a\x1cgoogle/protobuf/struct.proto\"\xfd\x01\n\x11\x44\x65nseNdArrayProto\x12\x0e\n\x06\x62uffer\x18\x01 \x01(\x0c\x12\r\n\x05shape\x18\x02 \x03(\r\x12\r\n\x05\x64type\x18\x03 \x01(\t\x12>\n\x0cquantization\x18\x04 \x01(\x0e\x32(.jina.DenseNdArrayProto.QuantizationMode\x12\x0f\n\x07max_val\x18\x05 \x01(\x02\x12\x0f\n\x07min_val\x18\x06 \x01(\x02\x12\r\n\x05scale\x18\x07 \x01(\x02\x12\x16\n\x0eoriginal_dtype\x18\x08 \x01(\t\"1\n\x10QuantizationMode\x12\x08\n\x04NONE\x10\x00\x12\x08\n\x04\x46P16\x10\x01\x12\t\n\x05UINT8\x10\x02\"o\n\x0cNdArrayProto\x12(\n\x05\x64\x65nse\x18\x01 \x01(\x0b\x32\x17.jina.DenseNdArrayProtoH\x00\x12*\n\x06sparse\x18\x02 \x01(\x0b\x32\x18.jina.SparseNdArrayProtoH\x00\x42\t\n\x07\x63ontent\"|\n\x12SparseNdArrayProto\x12(\n\x07indices\x18\x01 \x01(\x0b\x32\x17.jina.DenseNdArrayProto\x12\'\n\x06values\x18\x02 \x01(\x0b\x32\x17.jina.DenseNdArrayProto\x12\x13\n\x0b\x64\x65nse_shape\x18\x03 \x03(\x03\"\x7f\n\x0fNamedScoreProto\x12\r\n\x05value\x18\x01 \x01(\x02\x12\x0f\n\x07op_name\x18\x02 \x01(\t\x12\x13\n\x0b\x64\x65scription\x18\x03 \x01(\t\x12\'\n\x08operands\x18\x04 \x03(\x0b\x32\x15.jina.NamedScoreProto\x12\x0e\n\x06ref_id\x18\x05 \x01(\t\"\xb2\x04\n\rDocumentProto\x12\n\n\x02id\x18\x01 \x01(\t\x12\x14\n\x0c\x63ontent_hash\x18\x18 \x01(\t\x12\x13\n\x0bgranularity\x18\x0e \x01(\r\x12\x11\n\tadjacency\x18\x16 \x01(\r\x12\x12\n\nlevel_name\x18\x0f \x01(\t\x12\x11\n\tparent_id\x18\x10 \x01(\t\x12\x10\n\x06\x62uffer\x18\x03 \x01(\x0cH\x00\x12\"\n\x04\x62lob\x18\x0c \x01(\x0b\x32\x12.jina.NdArrayProtoH\x00\x12\x0e\n\x04text\x18\r \x01(\tH\x00\x12#\n\x06\x63hunks\x18\x04 \x03(\x0b\x32\x13.jina.DocumentProto\x12\x0e\n\x06weight\x18\x05 \x01(\x02\x12\x10\n\x08siblings\x18\x19 \x01(\r\x12$\n\x07matches\x18\x08 \x03(\x0b\x32\x13.jina.DocumentProto\x12\x11\n\tmime_type\x18\n \x01(\t\x12\x0b\n\x03uri\x18\t \x01(\t\x12%\n\x04tags\x18\x0b \x01(\x0b\x32\x17.google.protobuf.Struct\x12\x10\n\x08location\x18\x11 \x03(\r\x12\x0e\n\x06offset\x18\x12 \x01(\r\x12%\n\tembedding\x18\x13 \x01(\x0b\x32\x12.jina.NdArrayProto\x12$\n\x05score\x18\x14 \x01(\x0b\x32\x15.jina.NamedScoreProto\x12\x10\n\x08modality\x18\x15 \x01(\t\x12*\n\x0b\x65valuations\x18\x17 \x03(\x0b\x32\x15.jina.NamedScoreProtoB\t\n\x07\x63ontent\"\xaa\x01\n\nRouteProto\x12\x0b\n\x03pod\x18\x01 \x01(\t\x12\x0e\n\x06pod_id\x18\x02 \x01(\t\x12.\n\nstart_time\x18\x03 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12,\n\x08\x65nd_time\x18\x04 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12!\n\x06status\x18\x05 \x01(\x0b\x32\x11.jina.StatusProto\"\xf6\x03\n\rEnvelopeProto\x12\x11\n\tsender_id\x18\x01 \x01(\t\x12\x13\n\x0breceiver_id\x18\x02 \x01(\t\x12\x12\n\nrequest_id\x18\x03 \x01(\t\x12\x0f\n\x07timeout\x18\x04 \x01(\r\x12 \n\x06routes\x18\x05 \x03(\x0b\x32\x10.jina.RouteProto\x12\x31\n\x07version\x18\x06 \x01(\x0b\x32 .jina.EnvelopeProto.VersionProto\x12!\n\x06status\x18\x07 \x01(\x0b\x32\x11.jina.StatusProto\x12\x14\n\x0crequest_type\x18\t \x01(\t\x12\x15\n\rcheck_version\x18\n \x01(\x08\x12<\n\x0b\x63ompression\x18\x0b \x01(\x0b\x32\'.jina.EnvelopeProto.CompressConfigProto\x1a\x38\n\x0cVersionProto\x12\x0c\n\x04jina\x18\x01 \x01(\t\x12\r\n\x05proto\x18\x02 \x01(\t\x12\x0b\n\x03vcs\x18\x03 \x01(\t\x1a{\n\x13\x43ompressConfigProto\x12\x11\n\talgorithm\x18\x01 \x01(\t\x12\x11\n\tmin_bytes\x18\x02 \x01(\x04\x12\x11\n\tmin_ratio\x18\x03 \x01(\x02\x12+\n\nparameters\x18\x04 \x01(\x0b\x32\x17.google.protobuf.Struct\"\xdf\x02\n\x0bStatusProto\x12*\n\x04\x63ode\x18\x01 \x01(\x0e\x32\x1c.jina.StatusProto.StatusCode\x12\x13\n\x0b\x64\x65scription\x18\x02 \x01(\t\x12\x33\n\texception\x18\x03 \x01(\x0b\x32 .jina.StatusProto.ExceptionProto\x1a^\n\x0e\x45xceptionProto\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\x0c\n\x04\x61rgs\x18\x02 \x03(\t\x12\x0e\n\x06stacks\x18\x03 \x03(\t\x12\x10\n\x08\x65xecutor\x18\x04 \x01(\t\x12\x0e\n\x06\x64river\x18\x05 \x01(\t\"z\n\nStatusCode\x12\x0b\n\x07SUCCESS\x10\x00\x12\x0b\n\x07PENDING\x10\x01\x12\t\n\x05READY\x10\x02\x12\t\n\x05\x45RROR\x10\x03\x12\x13\n\x0f\x45RROR_DUPLICATE\x10\x04\x12\x14\n\x10\x45RROR_NOTALLOWED\x10\x05\x12\x11\n\rERROR_CHAINED\x10\x06\"Z\n\x0cMessageProto\x12%\n\x08\x65nvelope\x18\x01 \x01(\x0b\x32\x13.jina.EnvelopeProto\x12#\n\x07request\x18\x02 \x01(\x0b\x32\x12.jina.RequestProto\"\xf7\t\n\x0cRequestProto\x12\x12\n\nrequest_id\x18\x01 \x01(\t\x12\x35\n\x05train\x18\x02 \x01(\x0b\x32$.jina.RequestProto.TrainRequestProtoH\x00\x12\x35\n\x05index\x18\x03 \x01(\x0b\x32$.jina.RequestProto.IndexRequestProtoH\x00\x12\x37\n\x06search\x18\x04 \x01(\x0b\x32%.jina.RequestProto.SearchRequestProtoH\x00\x12\x39\n\x07\x63ontrol\x18\x05 \x01(\x0b\x32&.jina.RequestProto.ControlRequestProtoH\x00\x12\x37\n\x06\x64\x65lete\x18\t \x01(\x0b\x32%.jina.RequestProto.DeleteRequestProtoH\x00\x12\x37\n\x06update\x18\n \x01(\x0b\x32%.jina.RequestProto.UpdateRequestProtoH\x00\x12\x33\n\x04\x64ump\x18\x0b \x01(\x0b\x32#.jina.RequestProto.DumpRequestProtoH\x00\x12!\n\x06status\x18\x06 \x01(\x0b\x32\x11.jina.StatusProto\x12 \n\x06routes\x18\x08 \x03(\x0b\x32\x10.jina.RouteProto\x12&\n\x08queryset\x18\x07 \x03(\x0b\x32\x14.jina.QueryLangProto\x1a\x61\n\x11TrainRequestProto\x12!\n\x04\x64ocs\x18\x01 \x03(\x0b\x32\x13.jina.DocumentProto\x12)\n\x0cgroundtruths\x18\x02 \x03(\x0b\x32\x13.jina.DocumentProto\x1a\x61\n\x11IndexRequestProto\x12!\n\x04\x64ocs\x18\x01 \x03(\x0b\x32\x13.jina.DocumentProto\x12)\n\x0cgroundtruths\x18\x02 \x03(\x0b\x32\x13.jina.DocumentProto\x1a\x37\n\x12UpdateRequestProto\x12!\n\x04\x64ocs\x18\x01 \x03(\x0b\x32\x13.jina.DocumentProto\x1a!\n\x12\x44\x65leteRequestProto\x12\x0b\n\x03ids\x18\x01 \x03(\t\x1a\x62\n\x12SearchRequestProto\x12!\n\x04\x64ocs\x18\x01 \x03(\x0b\x32\x13.jina.DocumentProto\x12)\n\x0cgroundtruths\x18\x02 \x03(\x0b\x32\x13.jina.DocumentProto\x1a\x30\n\x10\x44umpRequestProto\x12\x0c\n\x04path\x18\x01 \x01(\t\x12\x0e\n\x06shards\x18\x02 \x01(\x05\x1a\x9c\x02\n\x13\x43ontrolRequestProto\x12?\n\x07\x63ommand\x18\x01 \x01(\x0e\x32..jina.RequestProto.ControlRequestProto.Command\x12%\n\x04\x61rgs\x18\x02 \x01(\x0b\x32\x17.google.protobuf.Struct\x12\x11\n\tpropagate\x18\x03 \x01(\x08\x12\x0f\n\x07targets\x18\x04 \x03(\t\"y\n\x07\x43ommand\x12\r\n\tTERMINATE\x10\x00\x12\n\n\x06STATUS\x10\x01\x12\x08\n\x04IDLE\x10\x03\x12\n\n\x06\x43\x41NCEL\x10\x04\x12\n\n\x06RELOAD\x10\x05\x12\t\n\x05SCALE\x10\x06\x12\x0c\n\x08\x41\x43TIVATE\x10\x07\x12\x0e\n\nDEACTIVATE\x10\x08\x12\x08\n\x04\x44UMP\x10\tB\x06\n\x04\x62ody\"o\n\x0eQueryLangProto\x12\x0c\n\x04name\x18\x01 \x01(\t\x12+\n\nparameters\x18\x02 \x01(\x0b\x32\x17.google.protobuf.Struct\x12\x10\n\x08\x64isabled\x18\x03 \x01(\x08\x12\x10\n\x08priority\x18\x04 \x01(\x05\x32?\n\x07JinaRPC\x12\x34\n\x04\x43\x61ll\x12\x12.jina.RequestProto\x1a\x12.jina.RequestProto\"\x00(\x01\x30\x01\x62\x06proto3' + serialized_pb=b'\n\njina.proto\x12\x04jina\x1a\x1fgoogle/protobuf/timestamp.proto\x1a\x1cgoogle/protobuf/struct.proto\"\xfd\x01\n\x11\x44\x65nseNdArrayProto\x12\x0e\n\x06\x62uffer\x18\x01 \x01(\x0c\x12\r\n\x05shape\x18\x02 \x03(\r\x12\r\n\x05\x64type\x18\x03 \x01(\t\x12>\n\x0cquantization\x18\x04 \x01(\x0e\x32(.jina.DenseNdArrayProto.QuantizationMode\x12\x0f\n\x07max_val\x18\x05 \x01(\x02\x12\x0f\n\x07min_val\x18\x06 \x01(\x02\x12\r\n\x05scale\x18\x07 \x01(\x02\x12\x16\n\x0eoriginal_dtype\x18\x08 \x01(\t\"1\n\x10QuantizationMode\x12\x08\n\x04NONE\x10\x00\x12\x08\n\x04\x46P16\x10\x01\x12\t\n\x05UINT8\x10\x02\"o\n\x0cNdArrayProto\x12(\n\x05\x64\x65nse\x18\x01 \x01(\x0b\x32\x17.jina.DenseNdArrayProtoH\x00\x12*\n\x06sparse\x18\x02 \x01(\x0b\x32\x18.jina.SparseNdArrayProtoH\x00\x42\t\n\x07\x63ontent\"|\n\x12SparseNdArrayProto\x12(\n\x07indices\x18\x01 \x01(\x0b\x32\x17.jina.DenseNdArrayProto\x12\'\n\x06values\x18\x02 \x01(\x0b\x32\x17.jina.DenseNdArrayProto\x12\x13\n\x0b\x64\x65nse_shape\x18\x03 \x03(\x03\"\x7f\n\x0fNamedScoreProto\x12\r\n\x05value\x18\x01 \x01(\x02\x12\x0f\n\x07op_name\x18\x02 \x01(\t\x12\x13\n\x0b\x64\x65scription\x18\x03 \x01(\t\x12\'\n\x08operands\x18\x04 \x03(\x0b\x32\x15.jina.NamedScoreProto\x12\x0e\n\x06ref_id\x18\x05 \x01(\t\"\xa0\x04\n\rDocumentProto\x12\n\n\x02id\x18\x01 \x01(\t\x12\x14\n\x0c\x63ontent_hash\x18\x18 \x01(\t\x12\x13\n\x0bgranularity\x18\x0e \x01(\r\x12\x11\n\tadjacency\x18\x16 \x01(\r\x12\x11\n\tparent_id\x18\x10 \x01(\t\x12\x10\n\x06\x62uffer\x18\x03 \x01(\x0cH\x00\x12\"\n\x04\x62lob\x18\x0c \x01(\x0b\x32\x12.jina.NdArrayProtoH\x00\x12\x0e\n\x04text\x18\r \x01(\tH\x00\x12\r\n\x03uri\x18\t \x01(\tH\x00\x12#\n\x06\x63hunks\x18\x04 \x03(\x0b\x32\x13.jina.DocumentProto\x12\x0e\n\x06weight\x18\x05 \x01(\x02\x12\x10\n\x08siblings\x18\x19 \x01(\r\x12$\n\x07matches\x18\x08 \x03(\x0b\x32\x13.jina.DocumentProto\x12\x11\n\tmime_type\x18\n \x01(\t\x12%\n\x04tags\x18\x0b \x01(\x0b\x32\x17.google.protobuf.Struct\x12\x10\n\x08location\x18\x11 \x03(\r\x12\x0e\n\x06offset\x18\x12 \x01(\r\x12%\n\tembedding\x18\x13 \x01(\x0b\x32\x12.jina.NdArrayProto\x12$\n\x05score\x18\x14 \x01(\x0b\x32\x15.jina.NamedScoreProto\x12\x10\n\x08modality\x18\x15 \x01(\t\x12*\n\x0b\x65valuations\x18\x17 \x03(\x0b\x32\x15.jina.NamedScoreProtoB\t\n\x07\x63ontent\"\xaa\x01\n\nRouteProto\x12\x0b\n\x03pod\x18\x01 \x01(\t\x12\x0e\n\x06pod_id\x18\x02 \x01(\t\x12.\n\nstart_time\x18\x03 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12,\n\x08\x65nd_time\x18\x04 \x01(\x0b\x32\x1a.google.protobuf.Timestamp\x12!\n\x06status\x18\x05 \x01(\x0b\x32\x11.jina.StatusProto\"\x99\x04\n\rEnvelopeProto\x12\x11\n\tsender_id\x18\x01 \x01(\t\x12\x13\n\x0breceiver_id\x18\x02 \x01(\t\x12\x12\n\nrequest_id\x18\x03 \x01(\t\x12\x0f\n\x07timeout\x18\x04 \x01(\r\x12\x31\n\x07version\x18\x06 \x01(\x0b\x32 .jina.EnvelopeProto.VersionProto\x12\x14\n\x0crequest_type\x18\x07 \x01(\t\x12\x15\n\rcheck_version\x18\x08 \x01(\x08\x12<\n\x0b\x63ompression\x18\t \x01(\x0b\x32\'.jina.EnvelopeProto.CompressConfigProto\x12 \n\x06routes\x18\n \x03(\x0b\x32\x10.jina.RouteProto\x12!\n\x06status\x18\x0b \x01(\x0b\x32\x11.jina.StatusProto\x12!\n\x06header\x18\x0c \x01(\x0b\x32\x11.jina.HeaderProto\x1a\x38\n\x0cVersionProto\x12\x0c\n\x04jina\x18\x01 \x01(\t\x12\r\n\x05proto\x18\x02 \x01(\t\x12\x0b\n\x03vcs\x18\x03 \x01(\t\x1a{\n\x13\x43ompressConfigProto\x12\x11\n\talgorithm\x18\x01 \x01(\t\x12\x11\n\tmin_bytes\x18\x02 \x01(\x04\x12\x11\n\tmin_ratio\x18\x03 \x01(\x02\x12+\n\nparameters\x18\x04 \x01(\x0b\x32\x17.google.protobuf.Struct\"Q\n\x0bHeaderProto\x12\x15\n\rexec_endpoint\x18\x01 \x01(\t\x12\x15\n\rtarget_peapod\x18\x02 \x01(\t\x12\x14\n\x0cno_propagate\x18\x03 \x01(\x08\"\xcf\x02\n\x0bStatusProto\x12*\n\x04\x63ode\x18\x01 \x01(\x0e\x32\x1c.jina.StatusProto.StatusCode\x12\x13\n\x0b\x64\x65scription\x18\x02 \x01(\t\x12\x33\n\texception\x18\x03 \x01(\x0b\x32 .jina.StatusProto.ExceptionProto\x1aN\n\x0e\x45xceptionProto\x12\x0c\n\x04name\x18\x01 \x01(\t\x12\x0c\n\x04\x61rgs\x18\x02 \x03(\t\x12\x0e\n\x06stacks\x18\x03 \x03(\t\x12\x10\n\x08\x65xecutor\x18\x04 \x01(\t\"z\n\nStatusCode\x12\x0b\n\x07SUCCESS\x10\x00\x12\x0b\n\x07PENDING\x10\x01\x12\t\n\x05READY\x10\x02\x12\t\n\x05\x45RROR\x10\x03\x12\x13\n\x0f\x45RROR_DUPLICATE\x10\x04\x12\x14\n\x10\x45RROR_NOTALLOWED\x10\x05\x12\x11\n\rERROR_CHAINED\x10\x06\"Z\n\x0cMessageProto\x12%\n\x08\x65nvelope\x18\x01 \x01(\x0b\x32\x13.jina.EnvelopeProto\x12#\n\x07request\x18\x02 \x01(\x0b\x32\x12.jina.RequestProto\"7\n\x12\x44ocumentArrayProto\x12!\n\x04\x64ocs\x18\x01 \x03(\x0b\x32\x13.jina.DocumentProto\"\xcf\x04\n\x0cRequestProto\x12\x12\n\nrequest_id\x18\x01 \x01(\t\x12\x39\n\x07\x63ontrol\x18\x02 \x01(\x0b\x32&.jina.RequestProto.ControlRequestProtoH\x00\x12\x33\n\x04\x64\x61ta\x18\x03 \x01(\x0b\x32#.jina.RequestProto.DataRequestProtoH\x00\x12!\n\x06header\x18\x04 \x01(\x0b\x32\x11.jina.HeaderProto\x12+\n\nparameters\x18\x05 \x01(\x0b\x32\x17.google.protobuf.Struct\x12 \n\x06routes\x18\x06 \x03(\x0b\x32\x10.jina.RouteProto\x12!\n\x06status\x18\x07 \x01(\x0b\x32\x11.jina.StatusProto\x1a`\n\x10\x44\x61taRequestProto\x12!\n\x04\x64ocs\x18\x01 \x03(\x0b\x32\x13.jina.DocumentProto\x12)\n\x0cgroundtruths\x18\x02 \x03(\x0b\x32\x13.jina.DocumentProto\x1a\xbb\x01\n\x13\x43ontrolRequestProto\x12?\n\x07\x63ommand\x18\x01 \x01(\x0e\x32..jina.RequestProto.ControlRequestProto.Command\"c\n\x07\x43ommand\x12\r\n\tTERMINATE\x10\x00\x12\n\n\x06STATUS\x10\x01\x12\x08\n\x04IDLE\x10\x02\x12\n\n\x06\x43\x41NCEL\x10\x03\x12\t\n\x05SCALE\x10\x04\x12\x0c\n\x08\x41\x43TIVATE\x10\x05\x12\x0e\n\nDEACTIVATE\x10\x06\x42\x06\n\x04\x62ody2?\n\x07JinaRPC\x12\x34\n\x04\x43\x61ll\x12\x12.jina.RequestProto\x1a\x12.jina.RequestProto\"\x00(\x01\x30\x01\x62\x06proto3' , dependencies=[google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,]) @@ -102,8 +102,8 @@ ], containing_type=None, serialized_options=None, - serialized_start=2180, - serialized_end=2302, + serialized_start=2264, + serialized_end=2386, ) _sym_db.RegisterEnumDescriptor(_STATUSPROTO_STATUSCODE) @@ -125,45 +125,35 @@ type=None, create_key=_descriptor._internal_create_key), _descriptor.EnumValueDescriptor( - name='IDLE', index=2, number=3, + name='IDLE', index=2, number=2, serialized_options=None, type=None, create_key=_descriptor._internal_create_key), _descriptor.EnumValueDescriptor( - name='CANCEL', index=3, number=4, + name='CANCEL', index=3, number=3, serialized_options=None, type=None, create_key=_descriptor._internal_create_key), _descriptor.EnumValueDescriptor( - name='RELOAD', index=4, number=5, + name='SCALE', index=4, number=4, serialized_options=None, type=None, create_key=_descriptor._internal_create_key), _descriptor.EnumValueDescriptor( - name='SCALE', index=5, number=6, + name='ACTIVATE', index=5, number=5, serialized_options=None, type=None, create_key=_descriptor._internal_create_key), _descriptor.EnumValueDescriptor( - name='ACTIVATE', index=6, number=7, - serialized_options=None, - type=None, - create_key=_descriptor._internal_create_key), - _descriptor.EnumValueDescriptor( - name='DEACTIVATE', index=7, number=8, - serialized_options=None, - type=None, - create_key=_descriptor._internal_create_key), - _descriptor.EnumValueDescriptor( - name='DUMP', index=8, number=9, + name='DEACTIVATE', index=6, number=6, serialized_options=None, type=None, create_key=_descriptor._internal_create_key), ], containing_type=None, serialized_options=None, - serialized_start=3539, - serialized_end=3660, + serialized_start=3022, + serialized_end=3121, ) _sym_db.RegisterEnumDescriptor(_REQUESTPROTO_CONTROLREQUESTPROTO_COMMAND) @@ -437,40 +427,40 @@ is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='level_name', full_name='jina.DocumentProto.level_name', index=4, - number=15, type=9, cpp_type=9, label=1, - has_default_value=False, default_value=b"".decode('utf-8'), - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='parent_id', full_name='jina.DocumentProto.parent_id', index=5, + name='parent_id', full_name='jina.DocumentProto.parent_id', index=4, number=16, type=9, cpp_type=9, label=1, has_default_value=False, default_value=b"".decode('utf-8'), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='buffer', full_name='jina.DocumentProto.buffer', index=6, + name='buffer', full_name='jina.DocumentProto.buffer', index=5, number=3, type=12, cpp_type=9, label=1, has_default_value=False, default_value=b"", message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='blob', full_name='jina.DocumentProto.blob', index=7, + name='blob', full_name='jina.DocumentProto.blob', index=6, number=12, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='text', full_name='jina.DocumentProto.text', index=8, + name='text', full_name='jina.DocumentProto.text', index=7, number=13, type=9, cpp_type=9, label=1, has_default_value=False, default_value=b"".decode('utf-8'), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), + _descriptor.FieldDescriptor( + name='uri', full_name='jina.DocumentProto.uri', index=8, + number=9, type=9, cpp_type=9, label=1, + has_default_value=False, default_value=b"".decode('utf-8'), + message_type=None, enum_type=None, containing_type=None, + is_extension=False, extension_scope=None, + serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( name='chunks', full_name='jina.DocumentProto.chunks', index=9, number=4, type=11, cpp_type=10, label=3, @@ -507,56 +497,49 @@ is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='uri', full_name='jina.DocumentProto.uri', index=14, - number=9, type=9, cpp_type=9, label=1, - has_default_value=False, default_value=b"".decode('utf-8'), - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='tags', full_name='jina.DocumentProto.tags', index=15, + name='tags', full_name='jina.DocumentProto.tags', index=14, number=11, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='location', full_name='jina.DocumentProto.location', index=16, + name='location', full_name='jina.DocumentProto.location', index=15, number=17, type=13, cpp_type=3, label=3, has_default_value=False, default_value=[], message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='offset', full_name='jina.DocumentProto.offset', index=17, + name='offset', full_name='jina.DocumentProto.offset', index=16, number=18, type=13, cpp_type=3, label=1, has_default_value=False, default_value=0, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='embedding', full_name='jina.DocumentProto.embedding', index=18, + name='embedding', full_name='jina.DocumentProto.embedding', index=17, number=19, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='score', full_name='jina.DocumentProto.score', index=19, + name='score', full_name='jina.DocumentProto.score', index=18, number=20, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='modality', full_name='jina.DocumentProto.modality', index=20, + name='modality', full_name='jina.DocumentProto.modality', index=19, number=21, type=9, cpp_type=9, label=1, has_default_value=False, default_value=b"".decode('utf-8'), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='evaluations', full_name='jina.DocumentProto.evaluations', index=21, + name='evaluations', full_name='jina.DocumentProto.evaluations', index=20, number=23, type=11, cpp_type=10, label=3, has_default_value=False, default_value=[], message_type=None, enum_type=None, containing_type=None, @@ -580,7 +563,7 @@ fields=[]), ], serialized_start=708, - serialized_end=1270, + serialized_end=1252, ) @@ -639,8 +622,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=1273, - serialized_end=1443, + serialized_start=1255, + serialized_end=1425, ) @@ -685,8 +668,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=1767, - serialized_end=1823, + serialized_start=1784, + serialized_end=1840, ) _ENVELOPEPROTO_COMPRESSCONFIGPROTO = _descriptor.Descriptor( @@ -737,8 +720,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=1825, - serialized_end=1948, + serialized_start=1842, + serialized_end=1965, ) _ENVELOPEPROTO = _descriptor.Descriptor( @@ -778,51 +761,104 @@ is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='routes', full_name='jina.EnvelopeProto.routes', index=4, - number=5, type=11, cpp_type=10, label=3, + name='version', full_name='jina.EnvelopeProto.version', index=4, + number=6, type=11, cpp_type=10, label=1, + has_default_value=False, default_value=None, + message_type=None, enum_type=None, containing_type=None, + is_extension=False, extension_scope=None, + serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), + _descriptor.FieldDescriptor( + name='request_type', full_name='jina.EnvelopeProto.request_type', index=5, + number=7, type=9, cpp_type=9, label=1, + has_default_value=False, default_value=b"".decode('utf-8'), + message_type=None, enum_type=None, containing_type=None, + is_extension=False, extension_scope=None, + serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), + _descriptor.FieldDescriptor( + name='check_version', full_name='jina.EnvelopeProto.check_version', index=6, + number=8, type=8, cpp_type=7, label=1, + has_default_value=False, default_value=False, + message_type=None, enum_type=None, containing_type=None, + is_extension=False, extension_scope=None, + serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), + _descriptor.FieldDescriptor( + name='compression', full_name='jina.EnvelopeProto.compression', index=7, + number=9, type=11, cpp_type=10, label=1, + has_default_value=False, default_value=None, + message_type=None, enum_type=None, containing_type=None, + is_extension=False, extension_scope=None, + serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), + _descriptor.FieldDescriptor( + name='routes', full_name='jina.EnvelopeProto.routes', index=8, + number=10, type=11, cpp_type=10, label=3, has_default_value=False, default_value=[], message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='version', full_name='jina.EnvelopeProto.version', index=5, - number=6, type=11, cpp_type=10, label=1, + name='status', full_name='jina.EnvelopeProto.status', index=9, + number=11, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='status', full_name='jina.EnvelopeProto.status', index=6, - number=7, type=11, cpp_type=10, label=1, + name='header', full_name='jina.EnvelopeProto.header', index=10, + number=12, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), + ], + extensions=[ + ], + nested_types=[_ENVELOPEPROTO_VERSIONPROTO, _ENVELOPEPROTO_COMPRESSCONFIGPROTO, ], + enum_types=[ + ], + serialized_options=None, + is_extendable=False, + syntax='proto3', + extension_ranges=[], + oneofs=[ + ], + serialized_start=1428, + serialized_end=1965, +) + + +_HEADERPROTO = _descriptor.Descriptor( + name='HeaderProto', + full_name='jina.HeaderProto', + filename=None, + file=DESCRIPTOR, + containing_type=None, + create_key=_descriptor._internal_create_key, + fields=[ _descriptor.FieldDescriptor( - name='request_type', full_name='jina.EnvelopeProto.request_type', index=7, - number=9, type=9, cpp_type=9, label=1, + name='exec_endpoint', full_name='jina.HeaderProto.exec_endpoint', index=0, + number=1, type=9, cpp_type=9, label=1, has_default_value=False, default_value=b"".decode('utf-8'), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='check_version', full_name='jina.EnvelopeProto.check_version', index=8, - number=10, type=8, cpp_type=7, label=1, - has_default_value=False, default_value=False, + name='target_peapod', full_name='jina.HeaderProto.target_peapod', index=1, + number=2, type=9, cpp_type=9, label=1, + has_default_value=False, default_value=b"".decode('utf-8'), message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='compression', full_name='jina.EnvelopeProto.compression', index=9, - number=11, type=11, cpp_type=10, label=1, - has_default_value=False, default_value=None, + name='no_propagate', full_name='jina.HeaderProto.no_propagate', index=2, + number=3, type=8, cpp_type=7, label=1, + has_default_value=False, default_value=False, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), ], extensions=[ ], - nested_types=[_ENVELOPEPROTO_VERSIONPROTO, _ENVELOPEPROTO_COMPRESSCONFIGPROTO, ], + nested_types=[], enum_types=[ ], serialized_options=None, @@ -831,8 +867,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=1446, - serialized_end=1948, + serialized_start=1967, + serialized_end=2048, ) @@ -872,13 +908,6 @@ message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='driver', full_name='jina.StatusProto.ExceptionProto.driver', index=4, - number=5, type=9, cpp_type=9, label=1, - has_default_value=False, default_value=b"".decode('utf-8'), - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), ], extensions=[ ], @@ -891,8 +920,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=2084, - serialized_end=2178, + serialized_start=2184, + serialized_end=2262, ) _STATUSPROTO = _descriptor.Descriptor( @@ -937,8 +966,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=1951, - serialized_end=2302, + serialized_start=2051, + serialized_end=2386, ) @@ -976,33 +1005,26 @@ extension_ranges=[], oneofs=[ ], - serialized_start=2304, - serialized_end=2394, + serialized_start=2388, + serialized_end=2478, ) -_REQUESTPROTO_TRAINREQUESTPROTO = _descriptor.Descriptor( - name='TrainRequestProto', - full_name='jina.RequestProto.TrainRequestProto', +_DOCUMENTARRAYPROTO = _descriptor.Descriptor( + name='DocumentArrayProto', + full_name='jina.DocumentArrayProto', filename=None, file=DESCRIPTOR, containing_type=None, create_key=_descriptor._internal_create_key, fields=[ _descriptor.FieldDescriptor( - name='docs', full_name='jina.RequestProto.TrainRequestProto.docs', index=0, + name='docs', full_name='jina.DocumentArrayProto.docs', index=0, number=1, type=11, cpp_type=10, label=3, has_default_value=False, default_value=[], message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='groundtruths', full_name='jina.RequestProto.TrainRequestProto.groundtruths', index=1, - number=2, type=11, cpp_type=10, label=3, - has_default_value=False, default_value=[], - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), ], extensions=[ ], @@ -1015,127 +1037,28 @@ extension_ranges=[], oneofs=[ ], - serialized_start=2935, - serialized_end=3032, + serialized_start=2480, + serialized_end=2535, ) -_REQUESTPROTO_INDEXREQUESTPROTO = _descriptor.Descriptor( - name='IndexRequestProto', - full_name='jina.RequestProto.IndexRequestProto', - filename=None, - file=DESCRIPTOR, - containing_type=None, - create_key=_descriptor._internal_create_key, - fields=[ - _descriptor.FieldDescriptor( - name='docs', full_name='jina.RequestProto.IndexRequestProto.docs', index=0, - number=1, type=11, cpp_type=10, label=3, - has_default_value=False, default_value=[], - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='groundtruths', full_name='jina.RequestProto.IndexRequestProto.groundtruths', index=1, - number=2, type=11, cpp_type=10, label=3, - has_default_value=False, default_value=[], - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - ], - extensions=[ - ], - nested_types=[], - enum_types=[ - ], - serialized_options=None, - is_extendable=False, - syntax='proto3', - extension_ranges=[], - oneofs=[ - ], - serialized_start=3034, - serialized_end=3131, -) -_REQUESTPROTO_UPDATEREQUESTPROTO = _descriptor.Descriptor( - name='UpdateRequestProto', - full_name='jina.RequestProto.UpdateRequestProto', +_REQUESTPROTO_DATAREQUESTPROTO = _descriptor.Descriptor( + name='DataRequestProto', + full_name='jina.RequestProto.DataRequestProto', filename=None, file=DESCRIPTOR, containing_type=None, create_key=_descriptor._internal_create_key, fields=[ _descriptor.FieldDescriptor( - name='docs', full_name='jina.RequestProto.UpdateRequestProto.docs', index=0, + name='docs', full_name='jina.RequestProto.DataRequestProto.docs', index=0, number=1, type=11, cpp_type=10, label=3, has_default_value=False, default_value=[], message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - ], - extensions=[ - ], - nested_types=[], - enum_types=[ - ], - serialized_options=None, - is_extendable=False, - syntax='proto3', - extension_ranges=[], - oneofs=[ - ], - serialized_start=3133, - serialized_end=3188, -) - -_REQUESTPROTO_DELETEREQUESTPROTO = _descriptor.Descriptor( - name='DeleteRequestProto', - full_name='jina.RequestProto.DeleteRequestProto', - filename=None, - file=DESCRIPTOR, - containing_type=None, - create_key=_descriptor._internal_create_key, - fields=[ _descriptor.FieldDescriptor( - name='ids', full_name='jina.RequestProto.DeleteRequestProto.ids', index=0, - number=1, type=9, cpp_type=9, label=3, - has_default_value=False, default_value=[], - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - ], - extensions=[ - ], - nested_types=[], - enum_types=[ - ], - serialized_options=None, - is_extendable=False, - syntax='proto3', - extension_ranges=[], - oneofs=[ - ], - serialized_start=3190, - serialized_end=3223, -) - -_REQUESTPROTO_SEARCHREQUESTPROTO = _descriptor.Descriptor( - name='SearchRequestProto', - full_name='jina.RequestProto.SearchRequestProto', - filename=None, - file=DESCRIPTOR, - containing_type=None, - create_key=_descriptor._internal_create_key, - fields=[ - _descriptor.FieldDescriptor( - name='docs', full_name='jina.RequestProto.SearchRequestProto.docs', index=0, - number=1, type=11, cpp_type=10, label=3, - has_default_value=False, default_value=[], - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='groundtruths', full_name='jina.RequestProto.SearchRequestProto.groundtruths', index=1, + name='groundtruths', full_name='jina.RequestProto.DataRequestProto.groundtruths', index=1, number=2, type=11, cpp_type=10, label=3, has_default_value=False, default_value=[], message_type=None, enum_type=None, containing_type=None, @@ -1153,46 +1076,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=3225, - serialized_end=3323, -) - -_REQUESTPROTO_DUMPREQUESTPROTO = _descriptor.Descriptor( - name='DumpRequestProto', - full_name='jina.RequestProto.DumpRequestProto', - filename=None, - file=DESCRIPTOR, - containing_type=None, - create_key=_descriptor._internal_create_key, - fields=[ - _descriptor.FieldDescriptor( - name='path', full_name='jina.RequestProto.DumpRequestProto.path', index=0, - number=1, type=9, cpp_type=9, label=1, - has_default_value=False, default_value=b"".decode('utf-8'), - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='shards', full_name='jina.RequestProto.DumpRequestProto.shards', index=1, - number=2, type=5, cpp_type=1, label=1, - has_default_value=False, default_value=0, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - ], - extensions=[ - ], - nested_types=[], - enum_types=[ - ], - serialized_options=None, - is_extendable=False, - syntax='proto3', - extension_ranges=[], - oneofs=[ - ], - serialized_start=3325, - serialized_end=3373, + serialized_start=2835, + serialized_end=2931, ) _REQUESTPROTO_CONTROLREQUESTPROTO = _descriptor.Descriptor( @@ -1210,27 +1095,6 @@ message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='args', full_name='jina.RequestProto.ControlRequestProto.args', index=1, - number=2, type=11, cpp_type=10, label=1, - has_default_value=False, default_value=None, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='propagate', full_name='jina.RequestProto.ControlRequestProto.propagate', index=2, - number=3, type=8, cpp_type=7, label=1, - has_default_value=False, default_value=False, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='targets', full_name='jina.RequestProto.ControlRequestProto.targets', index=3, - number=4, type=9, cpp_type=9, label=3, - has_default_value=False, default_value=[], - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), ], extensions=[ ], @@ -1244,8 +1108,8 @@ extension_ranges=[], oneofs=[ ], - serialized_start=3376, - serialized_end=3660, + serialized_start=2934, + serialized_end=3121, ) _REQUESTPROTO = _descriptor.Descriptor( @@ -1264,79 +1128,51 @@ is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='train', full_name='jina.RequestProto.train', index=1, + name='control', full_name='jina.RequestProto.control', index=1, number=2, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='index', full_name='jina.RequestProto.index', index=2, + name='data', full_name='jina.RequestProto.data', index=2, number=3, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='search', full_name='jina.RequestProto.search', index=3, + name='header', full_name='jina.RequestProto.header', index=3, number=4, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='control', full_name='jina.RequestProto.control', index=4, + name='parameters', full_name='jina.RequestProto.parameters', index=4, number=5, type=11, cpp_type=10, label=1, has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='delete', full_name='jina.RequestProto.delete', index=5, - number=9, type=11, cpp_type=10, label=1, - has_default_value=False, default_value=None, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='update', full_name='jina.RequestProto.update', index=6, - number=10, type=11, cpp_type=10, label=1, - has_default_value=False, default_value=None, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='dump', full_name='jina.RequestProto.dump', index=7, - number=11, type=11, cpp_type=10, label=1, - has_default_value=False, default_value=None, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='status', full_name='jina.RequestProto.status', index=8, - number=6, type=11, cpp_type=10, label=1, - has_default_value=False, default_value=None, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='routes', full_name='jina.RequestProto.routes', index=9, - number=8, type=11, cpp_type=10, label=3, + name='routes', full_name='jina.RequestProto.routes', index=5, + number=6, type=11, cpp_type=10, label=3, has_default_value=False, default_value=[], message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), _descriptor.FieldDescriptor( - name='queryset', full_name='jina.RequestProto.queryset', index=10, - number=7, type=11, cpp_type=10, label=3, - has_default_value=False, default_value=[], + name='status', full_name='jina.RequestProto.status', index=6, + number=7, type=11, cpp_type=10, label=1, + has_default_value=False, default_value=None, message_type=None, enum_type=None, containing_type=None, is_extension=False, extension_scope=None, serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), ], extensions=[ ], - nested_types=[_REQUESTPROTO_TRAINREQUESTPROTO, _REQUESTPROTO_INDEXREQUESTPROTO, _REQUESTPROTO_UPDATEREQUESTPROTO, _REQUESTPROTO_DELETEREQUESTPROTO, _REQUESTPROTO_SEARCHREQUESTPROTO, _REQUESTPROTO_DUMPREQUESTPROTO, _REQUESTPROTO_CONTROLREQUESTPROTO, ], + nested_types=[_REQUESTPROTO_DATAREQUESTPROTO, _REQUESTPROTO_CONTROLREQUESTPROTO, ], enum_types=[ ], serialized_options=None, @@ -1350,61 +1186,8 @@ create_key=_descriptor._internal_create_key, fields=[]), ], - serialized_start=2397, - serialized_end=3668, -) - - -_QUERYLANGPROTO = _descriptor.Descriptor( - name='QueryLangProto', - full_name='jina.QueryLangProto', - filename=None, - file=DESCRIPTOR, - containing_type=None, - create_key=_descriptor._internal_create_key, - fields=[ - _descriptor.FieldDescriptor( - name='name', full_name='jina.QueryLangProto.name', index=0, - number=1, type=9, cpp_type=9, label=1, - has_default_value=False, default_value=b"".decode('utf-8'), - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='parameters', full_name='jina.QueryLangProto.parameters', index=1, - number=2, type=11, cpp_type=10, label=1, - has_default_value=False, default_value=None, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='disabled', full_name='jina.QueryLangProto.disabled', index=2, - number=3, type=8, cpp_type=7, label=1, - has_default_value=False, default_value=False, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - _descriptor.FieldDescriptor( - name='priority', full_name='jina.QueryLangProto.priority', index=3, - number=4, type=5, cpp_type=1, label=1, - has_default_value=False, default_value=0, - message_type=None, enum_type=None, containing_type=None, - is_extension=False, extension_scope=None, - serialized_options=None, file=DESCRIPTOR, create_key=_descriptor._internal_create_key), - ], - extensions=[ - ], - nested_types=[], - enum_types=[ - ], - serialized_options=None, - is_extendable=False, - syntax='proto3', - extension_ranges=[], - oneofs=[ - ], - serialized_start=3670, - serialized_end=3781, + serialized_start=2538, + serialized_end=3129, ) _DENSENDARRAYPROTO.fields_by_name['quantization'].enum_type = _DENSENDARRAYPROTO_QUANTIZATIONMODE @@ -1436,71 +1219,45 @@ _DOCUMENTPROTO.oneofs_by_name['content'].fields.append( _DOCUMENTPROTO.fields_by_name['text']) _DOCUMENTPROTO.fields_by_name['text'].containing_oneof = _DOCUMENTPROTO.oneofs_by_name['content'] +_DOCUMENTPROTO.oneofs_by_name['content'].fields.append( + _DOCUMENTPROTO.fields_by_name['uri']) +_DOCUMENTPROTO.fields_by_name['uri'].containing_oneof = _DOCUMENTPROTO.oneofs_by_name['content'] _ROUTEPROTO.fields_by_name['start_time'].message_type = google_dot_protobuf_dot_timestamp__pb2._TIMESTAMP _ROUTEPROTO.fields_by_name['end_time'].message_type = google_dot_protobuf_dot_timestamp__pb2._TIMESTAMP _ROUTEPROTO.fields_by_name['status'].message_type = _STATUSPROTO _ENVELOPEPROTO_VERSIONPROTO.containing_type = _ENVELOPEPROTO _ENVELOPEPROTO_COMPRESSCONFIGPROTO.fields_by_name['parameters'].message_type = google_dot_protobuf_dot_struct__pb2._STRUCT _ENVELOPEPROTO_COMPRESSCONFIGPROTO.containing_type = _ENVELOPEPROTO -_ENVELOPEPROTO.fields_by_name['routes'].message_type = _ROUTEPROTO _ENVELOPEPROTO.fields_by_name['version'].message_type = _ENVELOPEPROTO_VERSIONPROTO -_ENVELOPEPROTO.fields_by_name['status'].message_type = _STATUSPROTO _ENVELOPEPROTO.fields_by_name['compression'].message_type = _ENVELOPEPROTO_COMPRESSCONFIGPROTO +_ENVELOPEPROTO.fields_by_name['routes'].message_type = _ROUTEPROTO +_ENVELOPEPROTO.fields_by_name['status'].message_type = _STATUSPROTO +_ENVELOPEPROTO.fields_by_name['header'].message_type = _HEADERPROTO _STATUSPROTO_EXCEPTIONPROTO.containing_type = _STATUSPROTO _STATUSPROTO.fields_by_name['code'].enum_type = _STATUSPROTO_STATUSCODE _STATUSPROTO.fields_by_name['exception'].message_type = _STATUSPROTO_EXCEPTIONPROTO _STATUSPROTO_STATUSCODE.containing_type = _STATUSPROTO _MESSAGEPROTO.fields_by_name['envelope'].message_type = _ENVELOPEPROTO _MESSAGEPROTO.fields_by_name['request'].message_type = _REQUESTPROTO -_REQUESTPROTO_TRAINREQUESTPROTO.fields_by_name['docs'].message_type = _DOCUMENTPROTO -_REQUESTPROTO_TRAINREQUESTPROTO.fields_by_name['groundtruths'].message_type = _DOCUMENTPROTO -_REQUESTPROTO_TRAINREQUESTPROTO.containing_type = _REQUESTPROTO -_REQUESTPROTO_INDEXREQUESTPROTO.fields_by_name['docs'].message_type = _DOCUMENTPROTO -_REQUESTPROTO_INDEXREQUESTPROTO.fields_by_name['groundtruths'].message_type = _DOCUMENTPROTO -_REQUESTPROTO_INDEXREQUESTPROTO.containing_type = _REQUESTPROTO -_REQUESTPROTO_UPDATEREQUESTPROTO.fields_by_name['docs'].message_type = _DOCUMENTPROTO -_REQUESTPROTO_UPDATEREQUESTPROTO.containing_type = _REQUESTPROTO -_REQUESTPROTO_DELETEREQUESTPROTO.containing_type = _REQUESTPROTO -_REQUESTPROTO_SEARCHREQUESTPROTO.fields_by_name['docs'].message_type = _DOCUMENTPROTO -_REQUESTPROTO_SEARCHREQUESTPROTO.fields_by_name['groundtruths'].message_type = _DOCUMENTPROTO -_REQUESTPROTO_SEARCHREQUESTPROTO.containing_type = _REQUESTPROTO -_REQUESTPROTO_DUMPREQUESTPROTO.containing_type = _REQUESTPROTO +_DOCUMENTARRAYPROTO.fields_by_name['docs'].message_type = _DOCUMENTPROTO +_REQUESTPROTO_DATAREQUESTPROTO.fields_by_name['docs'].message_type = _DOCUMENTPROTO +_REQUESTPROTO_DATAREQUESTPROTO.fields_by_name['groundtruths'].message_type = _DOCUMENTPROTO +_REQUESTPROTO_DATAREQUESTPROTO.containing_type = _REQUESTPROTO _REQUESTPROTO_CONTROLREQUESTPROTO.fields_by_name['command'].enum_type = _REQUESTPROTO_CONTROLREQUESTPROTO_COMMAND -_REQUESTPROTO_CONTROLREQUESTPROTO.fields_by_name['args'].message_type = google_dot_protobuf_dot_struct__pb2._STRUCT _REQUESTPROTO_CONTROLREQUESTPROTO.containing_type = _REQUESTPROTO _REQUESTPROTO_CONTROLREQUESTPROTO_COMMAND.containing_type = _REQUESTPROTO_CONTROLREQUESTPROTO -_REQUESTPROTO.fields_by_name['train'].message_type = _REQUESTPROTO_TRAINREQUESTPROTO -_REQUESTPROTO.fields_by_name['index'].message_type = _REQUESTPROTO_INDEXREQUESTPROTO -_REQUESTPROTO.fields_by_name['search'].message_type = _REQUESTPROTO_SEARCHREQUESTPROTO _REQUESTPROTO.fields_by_name['control'].message_type = _REQUESTPROTO_CONTROLREQUESTPROTO -_REQUESTPROTO.fields_by_name['delete'].message_type = _REQUESTPROTO_DELETEREQUESTPROTO -_REQUESTPROTO.fields_by_name['update'].message_type = _REQUESTPROTO_UPDATEREQUESTPROTO -_REQUESTPROTO.fields_by_name['dump'].message_type = _REQUESTPROTO_DUMPREQUESTPROTO -_REQUESTPROTO.fields_by_name['status'].message_type = _STATUSPROTO +_REQUESTPROTO.fields_by_name['data'].message_type = _REQUESTPROTO_DATAREQUESTPROTO +_REQUESTPROTO.fields_by_name['header'].message_type = _HEADERPROTO +_REQUESTPROTO.fields_by_name['parameters'].message_type = google_dot_protobuf_dot_struct__pb2._STRUCT _REQUESTPROTO.fields_by_name['routes'].message_type = _ROUTEPROTO -_REQUESTPROTO.fields_by_name['queryset'].message_type = _QUERYLANGPROTO -_REQUESTPROTO.oneofs_by_name['body'].fields.append( - _REQUESTPROTO.fields_by_name['train']) -_REQUESTPROTO.fields_by_name['train'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] -_REQUESTPROTO.oneofs_by_name['body'].fields.append( - _REQUESTPROTO.fields_by_name['index']) -_REQUESTPROTO.fields_by_name['index'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] -_REQUESTPROTO.oneofs_by_name['body'].fields.append( - _REQUESTPROTO.fields_by_name['search']) -_REQUESTPROTO.fields_by_name['search'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] +_REQUESTPROTO.fields_by_name['status'].message_type = _STATUSPROTO _REQUESTPROTO.oneofs_by_name['body'].fields.append( _REQUESTPROTO.fields_by_name['control']) _REQUESTPROTO.fields_by_name['control'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] _REQUESTPROTO.oneofs_by_name['body'].fields.append( - _REQUESTPROTO.fields_by_name['delete']) -_REQUESTPROTO.fields_by_name['delete'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] -_REQUESTPROTO.oneofs_by_name['body'].fields.append( - _REQUESTPROTO.fields_by_name['update']) -_REQUESTPROTO.fields_by_name['update'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] -_REQUESTPROTO.oneofs_by_name['body'].fields.append( - _REQUESTPROTO.fields_by_name['dump']) -_REQUESTPROTO.fields_by_name['dump'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] -_QUERYLANGPROTO.fields_by_name['parameters'].message_type = google_dot_protobuf_dot_struct__pb2._STRUCT + _REQUESTPROTO.fields_by_name['data']) +_REQUESTPROTO.fields_by_name['data'].containing_oneof = _REQUESTPROTO.oneofs_by_name['body'] DESCRIPTOR.message_types_by_name['DenseNdArrayProto'] = _DENSENDARRAYPROTO DESCRIPTOR.message_types_by_name['NdArrayProto'] = _NDARRAYPROTO DESCRIPTOR.message_types_by_name['SparseNdArrayProto'] = _SPARSENDARRAYPROTO @@ -1508,10 +1265,11 @@ DESCRIPTOR.message_types_by_name['DocumentProto'] = _DOCUMENTPROTO DESCRIPTOR.message_types_by_name['RouteProto'] = _ROUTEPROTO DESCRIPTOR.message_types_by_name['EnvelopeProto'] = _ENVELOPEPROTO +DESCRIPTOR.message_types_by_name['HeaderProto'] = _HEADERPROTO DESCRIPTOR.message_types_by_name['StatusProto'] = _STATUSPROTO DESCRIPTOR.message_types_by_name['MessageProto'] = _MESSAGEPROTO +DESCRIPTOR.message_types_by_name['DocumentArrayProto'] = _DOCUMENTARRAYPROTO DESCRIPTOR.message_types_by_name['RequestProto'] = _REQUESTPROTO -DESCRIPTOR.message_types_by_name['QueryLangProto'] = _QUERYLANGPROTO _sym_db.RegisterFileDescriptor(DESCRIPTOR) DenseNdArrayProto = _reflection.GeneratedProtocolMessageType('DenseNdArrayProto', (_message.Message,), { @@ -1579,6 +1337,13 @@ _sym_db.RegisterMessage(EnvelopeProto.VersionProto) _sym_db.RegisterMessage(EnvelopeProto.CompressConfigProto) +HeaderProto = _reflection.GeneratedProtocolMessageType('HeaderProto', (_message.Message,), { + 'DESCRIPTOR' : _HEADERPROTO, + '__module__' : 'jina_pb2' + # @@protoc_insertion_point(class_scope:jina.HeaderProto) + }) +_sym_db.RegisterMessage(HeaderProto) + StatusProto = _reflection.GeneratedProtocolMessageType('StatusProto', (_message.Message,), { 'ExceptionProto' : _reflection.GeneratedProtocolMessageType('ExceptionProto', (_message.Message,), { @@ -1601,47 +1366,19 @@ }) _sym_db.RegisterMessage(MessageProto) -RequestProto = _reflection.GeneratedProtocolMessageType('RequestProto', (_message.Message,), { - - 'TrainRequestProto' : _reflection.GeneratedProtocolMessageType('TrainRequestProto', (_message.Message,), { - 'DESCRIPTOR' : _REQUESTPROTO_TRAINREQUESTPROTO, - '__module__' : 'jina_pb2' - # @@protoc_insertion_point(class_scope:jina.RequestProto.TrainRequestProto) - }) - , - - 'IndexRequestProto' : _reflection.GeneratedProtocolMessageType('IndexRequestProto', (_message.Message,), { - 'DESCRIPTOR' : _REQUESTPROTO_INDEXREQUESTPROTO, - '__module__' : 'jina_pb2' - # @@protoc_insertion_point(class_scope:jina.RequestProto.IndexRequestProto) - }) - , - - 'UpdateRequestProto' : _reflection.GeneratedProtocolMessageType('UpdateRequestProto', (_message.Message,), { - 'DESCRIPTOR' : _REQUESTPROTO_UPDATEREQUESTPROTO, - '__module__' : 'jina_pb2' - # @@protoc_insertion_point(class_scope:jina.RequestProto.UpdateRequestProto) - }) - , - - 'DeleteRequestProto' : _reflection.GeneratedProtocolMessageType('DeleteRequestProto', (_message.Message,), { - 'DESCRIPTOR' : _REQUESTPROTO_DELETEREQUESTPROTO, - '__module__' : 'jina_pb2' - # @@protoc_insertion_point(class_scope:jina.RequestProto.DeleteRequestProto) - }) - , +DocumentArrayProto = _reflection.GeneratedProtocolMessageType('DocumentArrayProto', (_message.Message,), { + 'DESCRIPTOR' : _DOCUMENTARRAYPROTO, + '__module__' : 'jina_pb2' + # @@protoc_insertion_point(class_scope:jina.DocumentArrayProto) + }) +_sym_db.RegisterMessage(DocumentArrayProto) - 'SearchRequestProto' : _reflection.GeneratedProtocolMessageType('SearchRequestProto', (_message.Message,), { - 'DESCRIPTOR' : _REQUESTPROTO_SEARCHREQUESTPROTO, - '__module__' : 'jina_pb2' - # @@protoc_insertion_point(class_scope:jina.RequestProto.SearchRequestProto) - }) - , +RequestProto = _reflection.GeneratedProtocolMessageType('RequestProto', (_message.Message,), { - 'DumpRequestProto' : _reflection.GeneratedProtocolMessageType('DumpRequestProto', (_message.Message,), { - 'DESCRIPTOR' : _REQUESTPROTO_DUMPREQUESTPROTO, + 'DataRequestProto' : _reflection.GeneratedProtocolMessageType('DataRequestProto', (_message.Message,), { + 'DESCRIPTOR' : _REQUESTPROTO_DATAREQUESTPROTO, '__module__' : 'jina_pb2' - # @@protoc_insertion_point(class_scope:jina.RequestProto.DumpRequestProto) + # @@protoc_insertion_point(class_scope:jina.RequestProto.DataRequestProto) }) , @@ -1656,21 +1393,9 @@ # @@protoc_insertion_point(class_scope:jina.RequestProto) }) _sym_db.RegisterMessage(RequestProto) -_sym_db.RegisterMessage(RequestProto.TrainRequestProto) -_sym_db.RegisterMessage(RequestProto.IndexRequestProto) -_sym_db.RegisterMessage(RequestProto.UpdateRequestProto) -_sym_db.RegisterMessage(RequestProto.DeleteRequestProto) -_sym_db.RegisterMessage(RequestProto.SearchRequestProto) -_sym_db.RegisterMessage(RequestProto.DumpRequestProto) +_sym_db.RegisterMessage(RequestProto.DataRequestProto) _sym_db.RegisterMessage(RequestProto.ControlRequestProto) -QueryLangProto = _reflection.GeneratedProtocolMessageType('QueryLangProto', (_message.Message,), { - 'DESCRIPTOR' : _QUERYLANGPROTO, - '__module__' : 'jina_pb2' - # @@protoc_insertion_point(class_scope:jina.QueryLangProto) - }) -_sym_db.RegisterMessage(QueryLangProto) - _JINARPC = _descriptor.ServiceDescriptor( @@ -1680,8 +1405,8 @@ index=0, serialized_options=None, create_key=_descriptor._internal_create_key, - serialized_start=3783, - serialized_end=3846, + serialized_start=3131, + serialized_end=3194, methods=[ _descriptor.MethodDescriptor( name='Call', diff --git a/jina/proto/serializer.py b/jina/proto/serializer.py index ad5ecdb66abd1..bcd0a6794ce3a 100644 --- a/jina/proto/serializer.py +++ b/jina/proto/serializer.py @@ -1,4 +1,4 @@ -from .. import Request +from ..types.request import Request class RequestProto: @@ -10,8 +10,18 @@ class RequestProto: @staticmethod def SerializeToString(x: 'Request'): + """ + # noqa: DAR101 + # noqa: DAR102 + # noqa: DAR201 + """ return x.proto.SerializeToString() @staticmethod def FromString(x: bytes): + """ + # noqa: DAR101 + # noqa: DAR102 + # noqa: DAR201 + """ return Request(x) diff --git a/jina/resources/chatbot/helloworld.indexer.yml b/jina/resources/chatbot/helloworld.indexer.yml deleted file mode 100644 index 5b75c2a3c77bf..0000000000000 --- a/jina/resources/chatbot/helloworld.indexer.yml +++ /dev/null @@ -1,16 +0,0 @@ -jtype: CompoundIndexer -components: - - jtype: VectorIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx # a customized name - - jtype: KeyValueIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx -metas: - name: chunk_indexer - workspace: $HW_WORKDIR \ No newline at end of file diff --git a/jina/resources/executors._clear.yml b/jina/resources/executors._clear.yml deleted file mode 100644 index b3935ff35650c..0000000000000 --- a/jina/resources/executors._clear.yml +++ /dev/null @@ -1,33 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: clear -requests: - on: - SearchRequest: - - !ExcludeReqQL - with: - fields: - - search - TrainRequest: - - !ExcludeReqQL - with: - fields: - - train - IndexRequest: - - !ExcludeReqQL - with: - fields: - - index - DeleteRequest: - - !ExcludeReqQL - with: - fields: - - delete - UpdateRequest: - - !ExcludeReqQL - with: - fields: - - update - ControlRequest: - - !ControlReqDriver {} diff --git a/jina/resources/executors._concat.yml b/jina/resources/executors._concat.yml deleted file mode 100644 index 3916dddf63100..0000000000000 --- a/jina/resources/executors._concat.yml +++ /dev/null @@ -1,8 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: concat -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - - !ConcatEmbedDriver {} \ No newline at end of file diff --git a/jina/resources/executors._eval_pr.yml b/jina/resources/executors._eval_pr.yml deleted file mode 100644 index 4b23ae3386597..0000000000000 --- a/jina/resources/executors._eval_pr.yml +++ /dev/null @@ -1,24 +0,0 @@ -!CompoundExecutor -components: - - !PrecisionEvaluator - metas: - name: precision # a customized name - - !RecallEvaluator - metas: - name: recall # a customized name -metas: - name: eval_pr # a customized name -requests: - on: - SearchRequest: - with: - traversal_paths: ['r'] - running_avg: true - fields: [tags__id] - drivers: - - !RankEvaluateDriver - with: - executor: precision - - !RankEvaluateDriver - with: - executor: recall diff --git a/jina/resources/executors._index.yml b/jina/resources/executors._index.yml deleted file mode 100644 index f4a3d2991192e..0000000000000 --- a/jina/resources/executors._index.yml +++ /dev/null @@ -1,17 +0,0 @@ -!CompoundIndexer -components: - - !VectorIndexer - with: - index_filename: vec.gz - metric: euclidean - delete_on_dump: true - metas: - name: vecidx # a customized name - - !KeyValueIndexer - with: - index_filename: chunk.gz - metas: - name: docidx -metas: - name: simple_indexer - workspace: ./ diff --git a/jina/resources/executors._logforward.yml b/jina/resources/executors._logforward.yml deleted file mode 100644 index 12e3fefe773a9..0000000000000 --- a/jina/resources/executors._logforward.yml +++ /dev/null @@ -1,10 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: logforward -requests: - on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - - !LogInfoDriver {} \ No newline at end of file diff --git a/jina/resources/executors._merge.yml b/jina/resources/executors._merge.yml deleted file mode 100644 index 21e8a86c564b3..0000000000000 --- a/jina/resources/executors._merge.yml +++ /dev/null @@ -1,12 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: merge -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - - !ReduceAllDriver - with: - traversal_paths: ['r'] - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors._merge_chunks.yml b/jina/resources/executors._merge_chunks.yml deleted file mode 100644 index fe3c635d06b2a..0000000000000 --- a/jina/resources/executors._merge_chunks.yml +++ /dev/null @@ -1,10 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: merge_chunks -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - - !ReduceAllDriver {} - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors._merge_eval.yml b/jina/resources/executors._merge_eval.yml deleted file mode 100644 index ed4e4a6d74ee0..0000000000000 --- a/jina/resources/executors._merge_eval.yml +++ /dev/null @@ -1,10 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: merge_eval -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - - !CollectEvaluationDriver {} - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors._merge_matches.yml b/jina/resources/executors._merge_matches.yml deleted file mode 100644 index db7319c76a626..0000000000000 --- a/jina/resources/executors._merge_matches.yml +++ /dev/null @@ -1,12 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: merge_matches -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - - !ReduceAllDriver - with: - traversal_paths: ['m'] - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors._merge_matches_topk.yml b/jina/resources/executors._merge_matches_topk.yml deleted file mode 100644 index b088406960dc9..0000000000000 --- a/jina/resources/executors._merge_matches_topk.yml +++ /dev/null @@ -1,21 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: merge_matches_topk -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - with: - traversal_paths: [ 'm' ] - drivers: - - !ReduceAllDriver {} - - !SortQL - with: - reverse: False - field: 'score__value' - - !SliceQL - with: - start: 0 - end: 10 # is overwritten by the QueryLangDriver - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors._merge_root.yml b/jina/resources/executors._merge_root.yml deleted file mode 100644 index 8a08466d090b9..0000000000000 --- a/jina/resources/executors._merge_root.yml +++ /dev/null @@ -1,12 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: merge_root -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - - !ReduceAllDriver - with: - traversal_paths: ['r'] - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors._pass.yml b/jina/resources/executors._pass.yml deleted file mode 100644 index b4b8ddbaa17ad..0000000000000 --- a/jina/resources/executors._pass.yml +++ /dev/null @@ -1,8 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: forward -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest, ControlRequest]: - - !RouteDriver {} \ No newline at end of file diff --git a/jina/resources/executors._whoosh.yml b/jina/resources/executors._whoosh.yml deleted file mode 100644 index e1a518fce6289..0000000000000 --- a/jina/resources/executors._whoosh.yml +++ /dev/null @@ -1,8 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: whoosh -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest, ControlRequest]: - - !WhooshDriver {} \ No newline at end of file diff --git a/jina/resources/executors.base.all.yml b/jina/resources/executors.base.all.yml deleted file mode 100644 index d51215f5e16e8..0000000000000 --- a/jina/resources/executors.base.all.yml +++ /dev/null @@ -1,8 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: clear -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest, ControlRequest]: - * # replace by drivers \ No newline at end of file diff --git a/jina/resources/executors.base.yml b/jina/resources/executors.base.yml deleted file mode 100644 index 1e2b530748b2d..0000000000000 --- a/jina/resources/executors.base.yml +++ /dev/null @@ -1,10 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: clear -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, DeleteRequest, UpdateRequest]: - * # replace by drivers - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors.metas.default.yml b/jina/resources/executors.metas.default.yml deleted file mode 100644 index 2aa6d2e17e9b9..0000000000000 --- a/jina/resources/executors.metas.default.yml +++ /dev/null @@ -1,14 +0,0 @@ -is_updated: false -batch_size: -workspace: -name: -on_gpu: false -warn_unnamed: false -max_snapshot: 0 -py_modules: -root_workspace: ${{root.metas.workspace}} # needed to extract the `dump path` for inner executor of CompoundExecutors -root_name: ${{root.metas.name}} # needed to extract the `dump path` for inner executor of CompoundExecutors -pea_id: ${{root.metas.pea_id}} # this may result in self-referred -replica_id: ${{root.metas.replica_id}} -read_only: false -description: # the description of this executor. It will be used in automatics docs UI \ No newline at end of file diff --git a/jina/resources/executors.requests.BaseCrafter.yml b/jina/resources/executors.requests.BaseCrafter.yml deleted file mode 100644 index f57d77cf841f8..0000000000000 --- a/jina/resources/executors.requests.BaseCrafter.yml +++ /dev/null @@ -1,7 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest]: - - !CraftDriver {} - [DeleteRequest, UpdateRequest]: - - !RouteDriver {} \ No newline at end of file diff --git a/jina/resources/executors.requests.BaseEmbeddingEvaluator.yml b/jina/resources/executors.requests.BaseEmbeddingEvaluator.yml deleted file mode 100644 index c6c8be7ed613f..0000000000000 --- a/jina/resources/executors.requests.BaseEmbeddingEvaluator.yml +++ /dev/null @@ -1,7 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest]: - - !NDArrayEvaluateDriver {} - [DeleteRequest, UpdateRequest]: - - !RouteDriver {} diff --git a/jina/resources/executors.requests.BaseEncoder.yml b/jina/resources/executors.requests.BaseEncoder.yml deleted file mode 100644 index 1ec3a603274c8..0000000000000 --- a/jina/resources/executors.requests.BaseEncoder.yml +++ /dev/null @@ -1,7 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest]: - - !EncodeDriver {} - [DeleteRequest, UpdateRequest]: - - !RouteDriver {} \ No newline at end of file diff --git a/jina/resources/executors.requests.BaseExecutor.yml b/jina/resources/executors.requests.BaseExecutor.yml deleted file mode 100644 index d3204b26a7c79..0000000000000 --- a/jina/resources/executors.requests.BaseExecutor.yml +++ /dev/null @@ -1,3 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} diff --git a/jina/resources/executors.requests.BaseIndexer.yml b/jina/resources/executors.requests.BaseIndexer.yml deleted file mode 100644 index 5a3d4c6f1d7d8..0000000000000 --- a/jina/resources/executors.requests.BaseIndexer.yml +++ /dev/null @@ -1,3 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors.requests.BaseKVIndexer.yml b/jina/resources/executors.requests.BaseKVIndexer.yml deleted file mode 100644 index a760f23f13576..0000000000000 --- a/jina/resources/executors.requests.BaseKVIndexer.yml +++ /dev/null @@ -1,9 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !KVSearchDriver {} - [IndexRequest, UpdateRequest]: - - !KVIndexDriver {} - DeleteRequest: - - !DeleteDriver {} diff --git a/jina/resources/executors.requests.BaseRanker.yml b/jina/resources/executors.requests.BaseRanker.yml deleted file mode 100644 index 905f852f8f32d..0000000000000 --- a/jina/resources/executors.requests.BaseRanker.yml +++ /dev/null @@ -1,18 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - with: - traversal_paths: [ 'm' ] - drivers: - - !ExcludeQL - with: - fields: - - embedding - - !SortQL - with: - field: 'score__value' - - !SliceQL - with: - start: 0 - end: 50 diff --git a/jina/resources/executors.requests.BaseRankingEvaluator.yml b/jina/resources/executors.requests.BaseRankingEvaluator.yml deleted file mode 100644 index e9dd2e8893aab..0000000000000 --- a/jina/resources/executors.requests.BaseRankingEvaluator.yml +++ /dev/null @@ -1,16 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - with: - traversal_paths: ['r'] - drivers: - - !ExcludeQL - with: - fields: - - embedding - - buffer - - blob - - text - - !RankEvaluateDriver {} - diff --git a/jina/resources/executors.requests.BaseSegmenter.yml b/jina/resources/executors.requests.BaseSegmenter.yml deleted file mode 100644 index 67f580af0fd6a..0000000000000 --- a/jina/resources/executors.requests.BaseSegmenter.yml +++ /dev/null @@ -1,7 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest]: - - !SegmentDriver {} - [DeleteRequest, UpdateRequest]: - - !RouteDriver {} \ No newline at end of file diff --git a/jina/resources/executors.requests.BaseSparseVectorIndexer.yml b/jina/resources/executors.requests.BaseSparseVectorIndexer.yml deleted file mode 100644 index 83ee0a7d4e95d..0000000000000 --- a/jina/resources/executors.requests.BaseSparseVectorIndexer.yml +++ /dev/null @@ -1,9 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !SparseVectorSearchDriver {} - [IndexRequest, UpdateRequest]: - - !SparseVectorIndexDriver {} - DeleteRequest: - - !DeleteDriver {} \ No newline at end of file diff --git a/jina/resources/executors.requests.BaseTextEvaluator.yml b/jina/resources/executors.requests.BaseTextEvaluator.yml deleted file mode 100644 index 8ef302c49cd59..0000000000000 --- a/jina/resources/executors.requests.BaseTextEvaluator.yml +++ /dev/null @@ -1,7 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest]: - - !TextEvaluateDriver {} - [DeleteRequest, UpdateRequest]: - - !RouteDriver {} diff --git a/jina/resources/executors.requests.BaseVectorIndexer.yml b/jina/resources/executors.requests.BaseVectorIndexer.yml deleted file mode 100644 index e31a0883ea923..0000000000000 --- a/jina/resources/executors.requests.BaseVectorIndexer.yml +++ /dev/null @@ -1,9 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !VectorSearchDriver {} - [IndexRequest, UpdateRequest]: - - !VectorIndexDriver {} - DeleteRequest: - - !DeleteDriver {} \ No newline at end of file diff --git a/jina/resources/executors.requests.CompoundIndexer.yml b/jina/resources/executors.requests.CompoundIndexer.yml deleted file mode 100644 index 2c02f2ccc1f41..0000000000000 --- a/jina/resources/executors.requests.CompoundIndexer.yml +++ /dev/null @@ -1,34 +0,0 @@ -on: - SearchRequest: - - !VectorSearchDriver - with: - executor: BaseVectorIndexer - - !ExcludeQL - with: - fields: - - embedding - - !KVSearchDriver - with: - executor: BaseKVIndexer - is_update: True - traversal_paths: ['m'] - [IndexRequest, UpdateRequest]: - - !VectorIndexDriver - with: - executor: BaseVectorIndexer - - !ExcludeQL - with: - fields: - - embedding - - !KVIndexDriver - with: - executor: BaseKVIndexer - ControlRequest: - - !ControlReqDriver {} - DeleteRequest: - - !DeleteDriver - with: - executor: BaseKVIndexer - - !DeleteDriver - with: - executor: BaseVectorIndexer diff --git a/jina/resources/executors.requests.DataURIPbIndexer.yml b/jina/resources/executors.requests.DataURIPbIndexer.yml deleted file mode 100644 index dea0ebc81d95f..0000000000000 --- a/jina/resources/executors.requests.DataURIPbIndexer.yml +++ /dev/null @@ -1,15 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !KVSearchDriver {} - [IndexRequest, UpdateRequest]: - - !Buffer2URI {} - - !ExcludeQL - with: - fields: - - chunks # we don't want to save nested structure - - buffer - - !KVIndexDriver {} - DeleteRequest: - - !DeleteDriver {} diff --git a/jina/resources/executors.requests.GenericExecutor.yml b/jina/resources/executors.requests.GenericExecutor.yml deleted file mode 100644 index ded1e95ed5b79..0000000000000 --- a/jina/resources/executors.requests.GenericExecutor.yml +++ /dev/null @@ -1,15 +0,0 @@ -on: - SearchRequest: - - !GenericExecutorDriver { } - IndexRequest: - - !GenericExecutorDriver { } - UpdateRequest: - - !GenericExecutorDriver { } - DeleteRequest: - - !GenericExecutorDriver { } - DumpRequest: - - !GenericExecutorDriver { } - TrainRequest: - - !GenericExecutorDriver { } - ControlRequest: - - !ControlReqDriver { } diff --git a/jina/resources/executors.requests.Match2DocRanker.yml b/jina/resources/executors.requests.Match2DocRanker.yml deleted file mode 100644 index 902c3f694b60b..0000000000000 --- a/jina/resources/executors.requests.Match2DocRanker.yml +++ /dev/null @@ -1,6 +0,0 @@ -on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - drivers: - - !Matches2DocRankDriver {} diff --git a/jina/resources/executors.requests.UniquePbIndexer.yml b/jina/resources/executors.requests.UniquePbIndexer.yml deleted file mode 100644 index 55bd94284f9f9..0000000000000 --- a/jina/resources/executors.requests.UniquePbIndexer.yml +++ /dev/null @@ -1,24 +0,0 @@ -on: - [IndexRequest, UpdateRequest]: - - !TaggingCacheDriver - with: - executor: DocCache - tags: - is_indexed: true - - !FilterQL - with: - lookups: {tags__is_indexed__neq: true} - - !ExcludeQL - with: - fields: - - chunks - - buffer - - !KVIndexDriver - with: - executor: BinaryPbIndexer - SearchRequest: - - !KVSearchDriver - with: - executor: BinaryPbIndexer - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/executors.requests.UniqueVectorIndexer.yml b/jina/resources/executors.requests.UniqueVectorIndexer.yml deleted file mode 100644 index 8da2e970a4395..0000000000000 --- a/jina/resources/executors.requests.UniqueVectorIndexer.yml +++ /dev/null @@ -1,19 +0,0 @@ -on: - [IndexRequest, UpdateRequest]: - - !TaggingCacheDriver - with: - executor: DocCache - tags: - is_indexed: true - - !FilterQL - with: - lookups: {tags__is_indexed__neq: true} - - !VectorIndexDriver - with: - executor: BaseVectorIndexer - SearchRequest: - - !VectorSearchDriver - with: - executor: BaseVectorIndexer - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/jina/resources/extra-requirements.txt b/jina/resources/extra-requirements.txt index fcc59b018d751..1188c360e854a 100644 --- a/jina/resources/extra-requirements.txt +++ b/jina/resources/extra-requirements.txt @@ -34,7 +34,7 @@ onnx: framework, py37 onnxruntime: framework, py37 Pillow: cv, cicd, multimodal annoy>=1.9.5: index -sklearn: numeric, cicd +sklearn: numeric plyvel: index jieba: nlp lz4<3.1.2: devel, cicd, perf, network diff --git a/jina/resources/fashion/components.py b/jina/resources/fashion/components.py deleted file mode 100644 index a0984ad641d47..0000000000000 --- a/jina/resources/fashion/components.py +++ /dev/null @@ -1,33 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.encoders import BaseImageEncoder - - -class MyEncoder(BaseImageEncoder): - """Simple Encoder used in :command:`jina hello-world`, - it transforms the original 784-dim vector into a 64-dim vector using - a random orthogonal matrix, which is stored and shared in index and query time""" - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - np.random.seed(1337) - # generate a random orthogonal matrix - H = np.random.rand(784, 64) - u, s, vh = np.linalg.svd(H, full_matrices=False) - self.oth_mat = u @ vh - self.touch() - - def encode(self, content: 'np.ndarray', *args, **kwargs): - """ - Encode data and reduce dimension - - :param content: image data - :param args: arguments - :param kwargs: keyword arguments - :return: encoded data - """ - # reduce dimension to 50 by random orthogonal projection - return (content.reshape([-1, 784]) / 255) @ self.oth_mat diff --git a/jina/resources/fashion/helloworld.encoder.yml b/jina/resources/fashion/helloworld.encoder.yml deleted file mode 100644 index 4506bccd47791..0000000000000 --- a/jina/resources/fashion/helloworld.encoder.yml +++ /dev/null @@ -1,19 +0,0 @@ -jtype: MyEncoder -metas: - name: myenc # a customized name - workspace: $HW_WORKDIR - py_modules: - - components.py -requests: - on: - [IndexRequest, SearchRequest]: - - jtype: Blob2PngURI - with: - height: 28 - width: 28 - - jtype: EncodeDriver {} - - jtype: ExcludeQL - with: - fields: - - buffer - - chunks diff --git a/jina/resources/fashion/helloworld.flow.index.yml b/jina/resources/fashion/helloworld.flow.index.yml deleted file mode 100644 index 99f736b372a71..0000000000000 --- a/jina/resources/fashion/helloworld.flow.index.yml +++ /dev/null @@ -1,11 +0,0 @@ -jtype: Flow -version: '1.0' -with: - compress_hwm: 1024 -pods: - - name: encode - uses: helloworld.encoder.yml - parallel: $PARALLEL - - name: index - uses: helloworld.indexer.yml - shards: $SHARDS diff --git a/jina/resources/fashion/helloworld.flow.query.yml b/jina/resources/fashion/helloworld.flow.query.yml deleted file mode 100644 index db27f197cd4eb..0000000000000 --- a/jina/resources/fashion/helloworld.flow.query.yml +++ /dev/null @@ -1,17 +0,0 @@ -jtype: Flow -version: '1.0' -with: - read_only: true # better add this in the query time - compress_hwm: 1024 -pods: - - name: encode - uses: helloworld.encoder.yml - parallel: $PARALLEL - - name: index - uses: helloworld.indexer.yml - shards: $SHARDS - polling: all - uses_after: helloworld.reduce.yml - timeout_ready: 100000 # larger timeout as in query time will read all the data - - name: evaluate # optional evaluation, do another step for precision/recall computing - uses: _eval_pr # use internal evaluator on precision & recall diff --git a/jina/resources/fashion/helloworld.indexer.yml b/jina/resources/fashion/helloworld.indexer.yml deleted file mode 100644 index 5b75c2a3c77bf..0000000000000 --- a/jina/resources/fashion/helloworld.indexer.yml +++ /dev/null @@ -1,16 +0,0 @@ -jtype: CompoundIndexer -components: - - jtype: VectorIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx # a customized name - - jtype: KeyValueIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx -metas: - name: chunk_indexer - workspace: $HW_WORKDIR \ No newline at end of file diff --git a/jina/resources/fashion/helloworld.merge-evaluate.yml b/jina/resources/fashion/helloworld.merge-evaluate.yml deleted file mode 100644 index 1dc86911907ee..0000000000000 --- a/jina/resources/fashion/helloworld.merge-evaluate.yml +++ /dev/null @@ -1,10 +0,0 @@ -jtype: BaseExecutor -with: {} -metas: - name: merge -requests: - on: - [SearchRequest, TrainRequest, IndexRequest]: - - jtype: CollectEvaluationDriver {} - ControlRequest: - - jtype: ControlReqDriver {} diff --git a/jina/resources/fashion/helloworld.reduce.yml b/jina/resources/fashion/helloworld.reduce.yml deleted file mode 100644 index 2a1772b337edf..0000000000000 --- a/jina/resources/fashion/helloworld.reduce.yml +++ /dev/null @@ -1,21 +0,0 @@ -jtype: BaseExecutor -with: {} -metas: - name: top50 -requests: - on: - SearchRequest: - with: - traversal_paths: ['m'] - drivers: - - jtype: ReduceAllDriver {} - - jtype: SortQL - with: - reverse: true - field: 'score__value' - - jtype: SliceQL - with: - start: 0 - end: 20 - ControlRequest: - - jtype: ControlReqDriver {} diff --git a/jina/resources/multimodal/flow-index.yml b/jina/resources/multimodal/flow-index.yml deleted file mode 100644 index 07812e891b107..0000000000000 --- a/jina/resources/multimodal/flow-index.yml +++ /dev/null @@ -1,37 +0,0 @@ -jtype: Flow -version: '1' -pods: - - name: segment - uses: pods/segment.yml - # first pathway - - name: filter_text - uses: pods/filter.yml - env: - filter_mime: text/plain - - name: textEncoder - uses: pods/encode-text.yml - - name: textModIndexer - uses: pods/index-comp.yml - env: - indexer_name: text - # second pathway, in parallel - - name: filter_image - uses: pods/filter.yml - env: - filter_mime: image/jpeg - needs: segment - - name: imageCrafter - uses: pods/crafte-image.yml - - name: imageEncoder - uses: pods/encode-image.yml - - name: imageModIndexer - uses: pods/index-comp.yml - env: - indexer_name: image - # third pathway, in parallel - - name: docIndexer - uses: pods/index-doc.yml - needs: segment - # join all parallel works - - needs: [docIndexer, imageModIndexer, textModIndexer] - name: joiner diff --git a/jina/resources/multimodal/flow-query.yml b/jina/resources/multimodal/flow-query.yml deleted file mode 100644 index 90913e9c7c072..0000000000000 --- a/jina/resources/multimodal/flow-query.yml +++ /dev/null @@ -1,37 +0,0 @@ -jtype: Flow -version: '1' -read_only: true -pods: - # first pathway - - name: filter_text - uses: pods/filter.yml - env: - filter_mime: text/plain - - name: textEncoder - uses: pods/encode-text.yml - - name: textModIndexer - uses: pods/index-comp.yml - env: - indexer_name: text - # second pathway, in parallel - - name: filter_image - uses: pods/filter.yml - env: - filter_mime: image/jpeg - needs: gateway - - name: imageCrafter - uses: pods/crafte-image.yml - - name: imageEncoder - uses: pods/encode-image.yml - - name: imageModIndexer - uses: pods/index-comp.yml - env: - indexer_name: image - # join - - needs: [imageModIndexer, textModIndexer] - name: joiner - uses: _merge_chunks - - uses: pods/ranker.yml - name: ranker - - name: docIndexer - uses: pods/index-doc.yml diff --git a/jina/resources/multimodal/pods/__init__.py b/jina/resources/multimodal/pods/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/jina/resources/multimodal/pods/crafte-image.yml b/jina/resources/multimodal/pods/crafte-image.yml deleted file mode 100644 index ee18d0bd271e1..0000000000000 --- a/jina/resources/multimodal/pods/crafte-image.yml +++ /dev/null @@ -1,15 +0,0 @@ -jtype: ImageNormalizer -requests: - on: - IndexRequest: - with: - traversal_paths: ['c'] - drivers: - - jtype: URI2Blob {} - - jtype: CraftDriver {} - SearchRequest: - with: - traversal_paths: [ 'c' ] - drivers: - - jtype: DataURI2Blob { } - - jtype: CraftDriver { } \ No newline at end of file diff --git a/jina/resources/multimodal/pods/encode-image.yml b/jina/resources/multimodal/pods/encode-image.yml deleted file mode 100644 index 42c81b02a9415..0000000000000 --- a/jina/resources/multimodal/pods/encode-image.yml +++ /dev/null @@ -1,15 +0,0 @@ -jtype: ImageTorchEncoder -with: - channel_axis: -1 -requests: - use_default: true - on: - [IndexRequest, SearchRequest]: - with: - traversal_paths: ['c'] - drivers: - - jtype: EncodeDriver {} - - jtype: ExcludeQL - with: - fields: - - blob diff --git a/jina/resources/multimodal/pods/encode-text.yml b/jina/resources/multimodal/pods/encode-text.yml deleted file mode 100644 index aae2c120d333a..0000000000000 --- a/jina/resources/multimodal/pods/encode-text.yml +++ /dev/null @@ -1,9 +0,0 @@ -jtype: TransformerTorchEncoder -requests: - use_default: true - on: - [SearchRequest, IndexRequest]: - with: - traversal_paths: ['c'] - drivers: - - jtype: EncodeDriver {} \ No newline at end of file diff --git a/jina/resources/multimodal/pods/filter.yml b/jina/resources/multimodal/pods/filter.yml deleted file mode 100644 index 6033978646b26..0000000000000 --- a/jina/resources/multimodal/pods/filter.yml +++ /dev/null @@ -1,10 +0,0 @@ -jtype: BaseExecutor -requests: - use_default: true - on: - [IndexRequest, SearchRequest]: - - jtype: FilterQL - with: - lookups: - mime_type: '${{ENV.filter_mime}}' - traversal_paths: ['c'] \ No newline at end of file diff --git a/jina/resources/multimodal/pods/index-comp.yml b/jina/resources/multimodal/pods/index-comp.yml deleted file mode 100644 index 6c18b5e164565..0000000000000 --- a/jina/resources/multimodal/pods/index-comp.yml +++ /dev/null @@ -1,50 +0,0 @@ -jtype: CompoundIndexer -components: - - jtype: VectorIndexer - with: - index_filename: vec.gz - metric: cosine - metas: - name: vecidx # a customized name - - jtype: KeyValueIndexer - with: - index_filename: chunk.gz - metas: - name: docidx -metas: - name: '${{ ENV.indexer_name }}' - workspace: ${{ENV.HW_WORKDIR}}/indexed -requests: - use_default: true - on: - SearchRequest: - drivers: - - jtype: VectorSearchDriver - with: - executor: BaseVectorIndexer - traversal_paths: [ 'c' ] - - jtype: KVSearchDriver - with: - executor: BaseKVIndexer - traversal_paths: [ 'cm' ] - - jtype: ExcludeQL - with: - fields: - - embedding - - blob - traversal_paths: [ 'cm', 'c' ] - [ IndexRequest, UpdateRequest ]: - with: - traversal_paths: [ 'c' ] - drivers: - - jtype: VectorIndexDriver - with: - executor: BaseVectorIndexer - - jtype: ExcludeQL - with: - fields: - - embedding - - blob - - jtype: KVIndexDriver - with: - executor: BaseKVIndexer diff --git a/jina/resources/multimodal/pods/index-doc.yml b/jina/resources/multimodal/pods/index-doc.yml deleted file mode 100644 index fe3c79b9be422..0000000000000 --- a/jina/resources/multimodal/pods/index-doc.yml +++ /dev/null @@ -1,20 +0,0 @@ -jtype: KeyValueIndexer -with: - index_filename: 'doc.gz' -metas: - name: kv_indexer - workspace: ${{ENV.HW_WORKDIR}}/indexed -requests: - use_default: true - on: - IndexRequest: - - jtype: ExcludeQL - with: - fields: - - chunks - - jtype: URI2DataURI {} - - jtype: ExcludeQL - with: - fields: - - buffer - - jtype: KVIndexDriver {} \ No newline at end of file diff --git a/jina/resources/multimodal/pods/ranker.yml b/jina/resources/multimodal/pods/ranker.yml deleted file mode 100644 index 84ebb7d1d108c..0000000000000 --- a/jina/resources/multimodal/pods/ranker.yml +++ /dev/null @@ -1,28 +0,0 @@ -jtype: WeightedRanker -with: - query_required_keys: - - weight -metas: - py_modules: - - weighted_ranker.py -requests: - use_default: true - on: - SearchRequest: - - jtype: Chunk2DocRankDriver - with: - traversal_paths: ['r'] - - jtype: SortQL - with: - reverse: true - field: 'score__value' - traversal_paths: ['m'] - - jtype: SliceQL - with: - start: 0 - end: 10 # is overwritten by the QueryLangDriver - traversal_paths: ['m'] - - jtype: ExcludeQL - with: - fields: - - chunks \ No newline at end of file diff --git a/jina/resources/multimodal/pods/segment.yml b/jina/resources/multimodal/pods/segment.yml deleted file mode 100644 index 9b4f8f4cc326c..0000000000000 --- a/jina/resources/multimodal/pods/segment.yml +++ /dev/null @@ -1,22 +0,0 @@ -jtype: CompoundExecutor -components: - - jtype: SimpleCrafter - metas: - name: craft - - jtype: BiSegmenter - metas: - name: seg -metas: - name: converter - py_modules: - - segmenter.py -requests: - use_default: true - on: - [IndexRequest, SearchRequest]: - - jtype: CraftDriver - with: - executor: craft - - jtype: SegmentDriver - with: - executor: seg \ No newline at end of file diff --git a/jina/resources/multimodal/pods/segmenter.py b/jina/resources/multimodal/pods/segmenter.py deleted file mode 100644 index e62904d0e8cbe..0000000000000 --- a/jina/resources/multimodal/pods/segmenter.py +++ /dev/null @@ -1,39 +0,0 @@ -import os - -from jina import Segmenter, Crafter -from jina.executors.decorators import single - - -class SimpleCrafter(Crafter): - """Simple crafter for multimodal example.""" - - @single - def craft(self, tags): - """ - Read the data and add tags. - - :param tags: tags of data - :return: crafted data - """ - return { - 'text': tags['caption'], - 'uri': f'{os.environ["HW_WORKDIR"]}/people-img/{tags["image"]}', - } - - -class BiSegmenter(Segmenter): - """Segmenter for multimodal example.""" - - @single(slice_nargs=2) - def segment(self, text, uri): - """ - Segment data into text and uri. - - :param text: text data - :param uri: uri data of images - :return: Segmented data. - """ - return [ - {'text': text, 'mime_type': 'text/plain'}, - {'uri': uri, 'mime_type': 'image/jpeg'}, - ] diff --git a/jina/resources/multimodal/pods/weighted_ranker.py b/jina/resources/multimodal/pods/weighted_ranker.py deleted file mode 100644 index a864695a4f83e..0000000000000 --- a/jina/resources/multimodal/pods/weighted_ranker.py +++ /dev/null @@ -1,42 +0,0 @@ -from jina.executors.rankers import Chunk2DocRanker - - -class WeightedRanker(Chunk2DocRanker): - """ - Ranker for multimodal example. - - Will give the scores to chunk data according to weight. - """ - - match_required_keys = None - query_required_keys = {'weight'} - - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - """ - Given a set of queries (that may correspond to the chunks of a root level query) and a set of matches - corresponding to the same parent id, compute the matching score of the common parent of the set of matches. - Returns a score corresponding to the score of the parent document of the matches in `match_idx` - - :param match_idx: A [N x 4] numpy ``ndarray``, column-wise: - - ``match_idx[:, 0]``: ``parent_id`` of the matched docs, integer - - ``match_idx[:, 1]``: ``id`` of the matched chunks, integer - - ``match_idx[:, 2]``: ``id`` of the query chunks, integer - - ``match_idx[:, 3]``: distance/metric/score between the query and matched chunks, float. - All the matches belong to the same `parent` - :param query_chunk_meta: The meta information of the query chunks, where the key is query chunks' ``chunk_id``, - the value is extracted by the ``query_required_keys``. - :param match_chunk_meta: The meta information of the matched chunks, where the key is matched chunks' - ``chunk_id``, the value is extracted by the ``match_required_keys``. - :param args: Extra positional arguments - :param kwargs: Extra keyword arguments - :return: Return the score associated to the parent id of the matches - """ - - scores = match_idx[[self.COL_QUERY_CHUNK_ID, self.COL_SCORE]] - - weight_score = 0.0 - for k, v in scores: - vv = 1 / (1 + v) - weight_score += query_chunk_meta[k]['weight'] * vv - - return weight_score diff --git a/jina/resources/soundfx/bell.mp3 b/jina/resources/soundfx/bell.mp3 deleted file mode 100644 index e5583a091b301..0000000000000 Binary files a/jina/resources/soundfx/bell.mp3 and /dev/null differ diff --git a/jina/resources/soundfx/whoosh.mp3 b/jina/resources/soundfx/whoosh.mp3 deleted file mode 100644 index 607f1b38f5dbd..0000000000000 Binary files a/jina/resources/soundfx/whoosh.mp3 and /dev/null differ diff --git a/jina/schemas/__init__.py b/jina/schemas/__init__.py index 34cc770cfb2a9..e60bb13536d7b 100644 --- a/jina/schemas/__init__.py +++ b/jina/schemas/__init__.py @@ -1,38 +1,28 @@ def get_full_schema() -> dict: - """ - Return the full schema for Jina core as a dict. + """Get full schema + :return: the full schema for Jina core as a dict. """ from .. import __version__ from ..importer import IMPORTED - from .driver import schema_all_drivers from .executor import schema_all_executors from .flow import schema_flow from .meta import schema_metas - from .request import schema_requests from .pod import schema_pod definitions = {} for s in [ - schema_all_drivers, schema_all_executors, schema_flow, schema_metas, - schema_requests, schema_pod, IMPORTED.schema_executors, - IMPORTED.schema_drivers, ]: definitions.update(s) - # fix CompoundExecutor - definitions['Jina::Executors::CompoundExecutor']['properties']['components'] = { - '$ref': '#/definitions/Jina::Executors::All' - } - return { '$id': f'https://api.jina.ai/schemas/{__version__}.json', '$schema': 'http://json-schema.org/draft-07/schema#', - 'description': 'The YAML schema of Jina objects (Flow, Executor, Drivers).', + 'description': 'The YAML schema of Jina objects (Flow, Executor).', 'type': 'object', 'oneOf': [{'$ref': '#/definitions/Jina::Flow'}] + [{"$ref": f"#/definitions/{k}"} for k in IMPORTED.schema_executors.keys()], diff --git a/jina/schemas/driver.py b/jina/schemas/driver.py deleted file mode 100644 index 7f963d9b8448d..0000000000000 --- a/jina/schemas/driver.py +++ /dev/null @@ -1,13 +0,0 @@ -from ..importer import IMPORTED - -schema_all_drivers = { - 'Jina::Drivers::All': { - 'type': 'array', - 'items': { - 'oneOf': [ - {'$ref': f'#/definitions/{k}'} for k in IMPORTED.schema_drivers.keys() - ] - }, - 'minItems': 1, - } -} diff --git a/jina/schemas/executor.py b/jina/schemas/executor.py index 1c7da395374eb..43696c419e08d 100644 --- a/jina/schemas/executor.py +++ b/jina/schemas/executor.py @@ -1,13 +1,6 @@ -from ..importer import IMPORTED - schema_all_executors = { 'Jina::Executors::All': { 'type': 'array', - 'items': { - 'oneOf': [ - {'$ref': f'#/definitions/{k}'} for k in IMPORTED.schema_executors.keys() - ] - }, 'minItems': 1, } } diff --git a/jina/schemas/flow.py b/jina/schemas/flow.py index 8f0712678a13e..8d3d057ce3b67 100644 --- a/jina/schemas/flow.py +++ b/jina/schemas/flow.py @@ -13,7 +13,7 @@ 'properties': { 'with': _schema_flow_with, 'jtype': { - 'description': 'The type of Jina object (Flow, Executor, Driver).\n' + 'description': 'The type of Jina object (Flow, Executor).\n' 'A Flow is made up of several sub-tasks, and it manages the states and context of these sub-tasks.\n' 'The input and output data of Flows are Documents.', 'type': 'string', @@ -25,7 +25,7 @@ 'type': 'string', 'default': '\'1\'', }, - 'pods': { + 'executors': { 'description': 'Define the steps in the Flow.\n' 'A Pod is a container and interface for one or multiple Peas that have the same properties.', 'type': 'array', diff --git a/jina/schemas/helper.py b/jina/schemas/helper.py index a26d9d64526ee..ab161f00cea58 100644 --- a/jina/schemas/helper.py +++ b/jina/schemas/helper.py @@ -150,7 +150,7 @@ def _jina_class_to_schema(cls): }, 'with': { 'type': 'object', - 'description': 'The arguments of this Jina Executor/Driver', + 'description': 'The arguments of this Jina Executor', 'properties': kwargs, 'additionalProperties': False, }, diff --git a/jina/schemas/meta.py b/jina/schemas/meta.py index 5338411832902..dcd7ef7286f9b 100644 --- a/jina/schemas/meta.py +++ b/jina/schemas/meta.py @@ -5,29 +5,6 @@ 'required': [], 'additionalProperties': False, 'properties': { - 'is_updated': { - 'description': 'Indicates if the executor is updated or changed since last save. ' - 'If not, then save() will do nothing. A forced save is possible to use `touch()` before `save()`', - 'type': 'boolean', - 'default': False, - }, - 'batch_size': { - 'description': 'The size of each batch, methods decorated by `@batching` will respect this. ' - 'Useful when incoming data is too large to fit into (GPU) memory.', - 'type': 'number', - 'default': None, - }, - 'workspace': { - 'description': ''' -The working directory, for persisting the artifacts of the executor. An artifact is a file or collection of files used during a workflow run. - -By default it is not set, if you expect your executor to be persisted or to persist any data, remember to set it to the desired value. - -When a BaseExecutor is a component of a `CompoundExecutor`, its workspace value will be overridden by the workspace coming from the `CompoundExecutor` unless a particular workspace value is set for the component BaseExecutor. - ''', - 'type': 'string', - 'default': None, - }, 'name': { 'description': 'The name of the executor.', 'type': 'string', @@ -38,17 +15,6 @@ 'type': 'string', 'default': None, }, - 'read_only': { - 'description': 'Do not allow the Pod to modify the Executor, save calls will be ignored. ' - 'If set to true no serialization of the Executor', - 'type': 'boolean', - 'default': False, - }, - 'on_gpu': { - 'description': 'If the executor is set to run on GPU.', - 'type': 'boolean', - 'default': False, - }, 'py_modules': { 'type': 'array', 'default': None, diff --git a/jina/schemas/request.py b/jina/schemas/request.py deleted file mode 100644 index 4d1faa54a9e2a..0000000000000 --- a/jina/schemas/request.py +++ /dev/null @@ -1,44 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -schema_requests = { - 'Jina::Requests': { - 'description': 'Define how the Executor behaves under network requests.', - 'type': 'object', - 'additionalProperties': False, - 'properties': { - 'use_default': { - 'description': 'If set, then inherit from the default `Driver` settings for this type of Executor.', - 'type': 'boolean', - 'default': False, - }, - 'on': { - 'description': 'Defines how the `Executor` behaves under different types of request', - 'type': 'object', - 'properties': { - f'{r_type}Request': { - 'type': 'object', - 'properties': { - 'with': { - 'type': 'object', - 'description': 'The common kwargs that all drivers defined under this Request.', - }, - 'drivers': {'$ref': f'#/definitions/Jina::Drivers::All'}, - }, - 'additionalProperties': False, - 'description': f'Defines how the `Executor` behaves under {r_type} request.', - } - for r_type in [ - 'Index', - 'Train', - 'Search', - 'Update', - 'Delete', - 'Control', - ] - }, - 'additionalProperties': False, - }, - }, - } -} diff --git a/jina/types/arrays/__init__.py b/jina/types/arrays/__init__.py index f71c98c738d67..3f9cc84c4fe79 100644 --- a/jina/types/arrays/__init__.py +++ b/jina/types/arrays/__init__.py @@ -1,7 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - from .document import DocumentArray -from .querylang import QueryLangArray from .chunk import ChunkArray from .match import MatchArray diff --git a/jina/types/arrays/chunk.py b/jina/types/arrays/chunk.py index 5a688e4894e13..851f196a771ef 100644 --- a/jina/types/arrays/chunk.py +++ b/jina/types/arrays/chunk.py @@ -11,15 +11,19 @@ class ChunkArray(DocumentArray): :class:`ChunkArray` inherits from :class:`DocumentArray`. It's a subset of Documents. - :param docs_proto: List of sub-documents (i.e chunks) of `reference_doc` - :type docs_proto: :class:`Document` + :param docs_proto: Set of sub-documents (i.e chunks) of `reference_doc` :param reference_doc: Reference :class:`Document` for the sub-documents - :type reference_doc: :class:`Document` """ def __init__(self, docs_proto, reference_doc: 'Document'): - super().__init__(docs_proto) + """ + Set constructor method. + + :param docs_proto: protobuf representation of the chunks + :param reference_doc: parent document + """ self._ref_doc = reference_doc + super().__init__(docs_proto) def append(self, document: 'Document', **kwargs) -> 'Document': """Add a sub-document (i.e chunk) to the current Document. @@ -37,17 +41,16 @@ def append(self, document: 'Document', **kwargs) -> 'Document': from ..document import Document - c = self._docs_proto.add() - c.CopyFrom(document.proto) - chunk = Document(c) + chunk = Document(document, copy=True) - chunk.set_attrs( + chunk.set_attributes( parent_id=self._ref_doc.id, granularity=self.granularity, **kwargs ) if not chunk.mime_type: chunk.mime_type = self._ref_doc.mime_type chunk.update_content_hash() + super().append(chunk) return chunk def extend(self, iterable: Iterable['Document']) -> None: diff --git a/jina/types/arrays/doc_groundtruth.py b/jina/types/arrays/doc_groundtruth.py deleted file mode 100644 index 95073113ec4df..0000000000000 --- a/jina/types/arrays/doc_groundtruth.py +++ /dev/null @@ -1,20 +0,0 @@ -from typing import Sequence -from .traversable import TraversableSequence - -if False: - from ..document.helper import DocGroundtruthPair - - -class DocumentGroundtruthSequence(TraversableSequence): - """ - :class:`DocumentGroundtruthSequence` holds a list of `DocGrundtruthPair` objects. - It is mostly intented to be used with Evaluators. - - :param pairs: a sequence of `DocGrundtruthPair` objects. - """ - - def __init__(self, pairs: Sequence['DocGroundtruthPair']): - self._pairs = pairs - - def __iter__(self): - yield from self._pairs diff --git a/jina/types/arrays/document.py b/jina/types/arrays/document.py index 1459b1a28920a..fcf0563f7d448 100644 --- a/jina/types/arrays/document.py +++ b/jina/types/arrays/document.py @@ -1,11 +1,12 @@ -from collections.abc import MutableSequence -from typing import Union, Iterable, Tuple, Sequence, List +import json +from collections.abc import MutableSequence, Iterable as Itr +from contextlib import nullcontext +from typing import Union, Iterable, Tuple, List, Iterator, TextIO -import numpy as np - -from ...helper import typename +from .traversable import TraversableSequence +from ...helper import typename, cached_property from ...logging import default_logger -from ...enums import EmbeddingClsType +from ...proto.jina_pb2 import DocumentProto try: # when protobuf using Cpp backend @@ -18,89 +19,77 @@ RepeatedCompositeFieldContainer as RepeatedContainer, ) -from ...proto.jina_pb2 import DocumentProto -from .traversable import TraversableSequence - __all__ = ['DocumentArray'] if False: from ..document import Document - from scipy.sparse import coo_matrix - - # fix type-hint complain for sphinx and flake - from typing import TypeVar - import scipy - import tensorflow as tf - import torch - - EmbeddingType = TypeVar( - 'EncodingType', - np.ndarray, - scipy.sparse.csr_matrix, - scipy.sparse.coo_matrix, - scipy.sparse.bsr_matrix, - scipy.sparse.csc_matrix, - torch.sparse_coo_tensor, - tf.SparseTensor, - ) - - SparseEmbeddingType = TypeVar( - 'SparseEmbeddingType', - np.ndarray, - scipy.sparse.csr_matrix, - scipy.sparse.coo_matrix, - scipy.sparse.bsr_matrix, - scipy.sparse.csc_matrix, - torch.sparse_coo_tensor, - tf.SparseTensor, - ) -class DocumentArray(TraversableSequence, MutableSequence): +class DocumentArray(TraversableSequence, MutableSequence, Itr): """ :class:`DocumentArray` is a mutable sequence of :class:`Document`. - It gives an efficient view of an array of Documents. - One can iterate over it like a generator but ALSO modify it, count it, - get item, or union two 'DocumentArray's using the '+' and '+=' operators. + It gives an efficient view of a list of Document. One can iterate over it like + a generator but ALSO modify it, count it, get item, or union two 'DocumentArray's using the '+' and '+=' operators. :param docs_proto: A list of :class:`Document` :type docs_proto: Union['RepeatedContainer', Sequence['Document']] """ - def __init__(self, docs_proto: Union['RepeatedContainer', Sequence['Document']]): + def __init__( + self, docs_proto: Union['RepeatedContainer', Iterable['Document'], None] = None + ): super().__init__() - self._docs_proto = docs_proto - self._docs_map = {} + if docs_proto is not None: + self._docs_proto = docs_proto + else: + self._docs_proto = [] def insert(self, index: int, doc: 'Document') -> None: """ - Insert :param:`doc.proto` at :param:`index` into the array of `:class:`DocumentArray` . + Insert :param:`doc.proto` at :param:`index` into the list of `:class:`DocumentArray` . + :param index: Position of the insertion. - :param doc: The doc to be inserted. + :param doc: The doc needs to be inserted. """ self._docs_proto.insert(index, doc.proto) def __setitem__(self, key, value: 'Document'): - if isinstance(key, int): - self._docs_proto[key].CopyFrom(value) - elif isinstance(key, str): - self._docs_map[key].CopyFrom(value) + if isinstance(key, (int, str)): + self[key].CopyFrom(value) else: raise IndexError(f'do not support this index {key}') - def __delitem__(self, index): - del self._docs_proto[index] + def __delitem__(self, index: Union[int, str, slice]): + if isinstance(index, int): + del self._docs_proto[index] + elif isinstance(index, str): + del self._docs_map[index] + elif isinstance(index, slice): + del self._docs_proto[index] + else: + raise IndexError( + f'do not support this index type {typename(index)}: {index}' + ) + + def __eq__(self, other): + return ( + type(self._docs_proto) is type(other._docs_proto) + and self._docs_proto == other._docs_proto + ) def __len__(self): return len(self._docs_proto) - def __iter__(self) -> 'Document': + def __iter__(self) -> Iterator['Document']: from ..document import Document for d in self._docs_proto: yield Document(d) - def __getitem__(self, item): + def __contains__(self, item: str): + return item in self._docs_map + + def __getitem__(self, item: Union[int, str, slice]): from ..document import Document if isinstance(item, int): @@ -110,39 +99,33 @@ def __getitem__(self, item): elif isinstance(item, slice): return DocumentArray(self._docs_proto[item]) else: - raise IndexError(f'do not support this index {item}') + raise IndexError(f'do not support this index type {typename(item)}: {item}') - def __add__(self, other: 'DocumentArray'): - v = DocumentArray([]) + def __add__(self, other: Iterable['Document']): + v = DocumentArray() for doc in self: - v.add(doc) + v.append(doc) for doc in other: - v.add(doc) + v.append(doc) return v - def __iadd__(self, other: 'DocumentArray'): + def __iadd__(self, other: Iterable['Document']): for doc in other: - self.add(doc) + self.append(doc) return self - def append(self, doc: 'Document') -> 'Document': + def append(self, doc: 'Document'): """ Append :param:`doc` in :class:`DocumentArray`. - :param doc: The doc needs to be appended. - :return: Appended internal list. - """ - return self._docs_proto.append(doc.proto) - def add(self, doc: 'Document') -> 'Document': - """Shortcut to :meth:`append`, do not override this method. - :param doc: the document to add to the array - :return: Appended internal list. + :param doc: The doc needs to be appended. """ - return self.append(doc) + self._docs_proto.append(doc.proto) def extend(self, iterable: Iterable['Document']) -> None: """ Extend the :class:`DocumentArray` by appending all the items from the iterable. + :param iterable: the iterable of Documents to extend this array with """ for doc in iterable: @@ -166,166 +149,56 @@ def reverse(self): elif isinstance(self._docs_proto, list): self._docs_proto.reverse() - def build(self): - """Build a doc_id to doc mapping so one can later index a Document using doc_id as string key.""" - self._docs_map = {d.id: d for d in self._docs_proto} + @cached_property + def _docs_map(self): + """Returns a doc_id to doc mapping so one can later index a Document using doc_id as string key. + + .. # noqa: DAR201""" + return {d.id: d for d in self._docs_proto} def sort(self, *args, **kwargs): """ Sort the items of the :class:`DocumentArray` in place. - - :param args: variable list of arguments to pass to the sorting underlying function + :param args: variable set of arguments to pass to the sorting underlying function :param kwargs: keyword arguments to pass to the sorting underlying function """ self._docs_proto.sort(*args, **kwargs) - @property - def all_embeddings(self) -> Tuple['np.ndarray', 'DocumentArray']: - """Return all embeddings from every document in this array as a ndarray - :return: The corresponding documents in a :class:`DocumentArray`, - and the documents have no embedding in a :class:`DocumentArray`. - :rtype: A tuple of embedding in :class:`np.ndarray` + def get_attributes(self, *fields: str) -> Union[List, List[List]]: + """Return all nonempty values of the fields from all docs this array contains + + :param fields: Variable length argument with the name of the fields to extract + :return: Returns a list of the values for these fields. + When `fields` has multiple values, then it returns a list of list. """ - return self.extract_docs('embedding', stack_contents=True) + return self.get_attributes_with_docs(*fields)[0] - def get_all_sparse_embeddings( - self, embedding_cls_type: EmbeddingClsType - ) -> Tuple['SparseEmbeddingType', 'DocumentArray']: - """Return all embeddings from every document in this array as a sparse array + def get_attributes_with_docs( + self, + *fields: str, + ) -> Tuple[Union[List, List[List]], 'DocumentArray']: + """Return all nonempty values of the fields together with their nonempty docs - :param embedding_cls_type: Type of sparse matrix backend, e.g. `scipy`, `torch` or `tf`. - :return: The corresponding documents in a :class:`DocumentArray`, - and the documents have no embedding in a :class:`DocumentArray`. - :rtype: A tuple of embedding and DocumentArray as sparse arrays + :param fields: Variable length argument with the name of the fields to extract + :return: Returns a tuple. The first element is a list of the values for these fields. + When `fields` has multiple values, then it returns a list of list. The second element is the non-empty docs. """ - def stack_embeddings(embeddings): - if embedding_cls_type.is_scipy: - import scipy - - return scipy.sparse.vstack(embeddings) - elif embedding_cls_type.is_torch: - import torch - - return torch.vstack(embeddings) - elif embedding_cls_type.is_tf: - return embeddings - else: - raise ValueError( - f'Trying to stack sparse embeddings with embedding_cls_type {embedding_cls_type} failed' - ) - - def get_sparse_ndarray_type_kwargs(): - if embedding_cls_type.is_scipy: - from jina.types.ndarray.sparse.scipy import SparseNdArray - - if not embedding_cls_type.is_scipy_stackable not in ['coo', 'csr']: - default_logger.warning( - f'found `{embedding_cls_type.name}` matrix, recommend to use `coo` or `csr` type.' - ) - return SparseNdArray, {'sp_format': embedding_cls_type.scipy_cls_type} - elif embedding_cls_type.is_torch: - from jina.types.ndarray.sparse.pytorch import SparseNdArray - - return SparseNdArray, {} - elif embedding_cls_type.is_tf: - from jina.types.ndarray.sparse.tensorflow import SparseNdArray - - return SparseNdArray, {} - else: - raise ValueError( - f'Trying to get sparse embeddings with embedding_cls_type {embedding_cls_type} failed' - ) - - embeddings = [] + contents = [] docs_pts = [] bad_docs = [] - sparse_ndarray_type, sparse_kwargs = get_sparse_ndarray_type_kwargs() + for doc in self: - embedding = doc.get_sparse_embedding( - sparse_ndarray_cls_type=sparse_ndarray_type, **sparse_kwargs - ) - if embedding is None: + r = doc.get_attributes(*fields) + if r is None: bad_docs.append(doc) continue - embeddings.append(embedding) + contents.append(r) docs_pts.append(doc) - if bad_docs: - default_logger.warning( - f'found {len(bad_docs)} docs at granularity {bad_docs[0].granularity} are missing sparse_embedding' - ) - - return stack_embeddings(embeddings), docs_pts - - @property - def all_contents(self) -> Tuple['np.ndarray', 'DocumentArray']: - """Return all embeddings from every document in this array as a ndarray - :return: The corresponding documents in a :class:`DocumentArray`, - and the documents have no contents in a :class:`DocumentArray`. - :rtype: A tuple of embedding in :class:`np.ndarray` - """ - # stack true for backward compatibility, but will not work if content is blob of different shapes - return self.extract_docs('content', stack_contents=True) - - def extract_docs( - self, *fields: str, stack_contents: Union[bool, List[bool]] = False - ) -> Tuple[Union['np.ndarray', List['np.ndarray']], 'DocumentArray']: - """Return in batches all the values of the fields - :param fields: Variable length argument with the name of the fields to extract - :param stack_contents: boolean flag indicating if output arrays should be stacked with `np.stack` - :return: Returns an :class:`np.ndarray` or an array of :class:`np.ndarray` with the batches for these fields - """ - - list_of_contents_output = len(fields) > 1 - contents = [[] for _ in fields if list_of_contents_output] - docs_pts = [] - bad_docs = [] - - if list_of_contents_output: - for doc in self: - content = doc.get_attrs_values(*fields) - if content is None: - bad_docs.append(doc) - continue - for i, c in enumerate(content): - contents[i].append(c) - docs_pts.append(doc) - for idx, c in enumerate(contents): - if not c: - continue - if ( - isinstance(stack_contents, bool) - and stack_contents - and not isinstance(c[0], bytes) - ) or ( - isinstance(stack_contents, list) - and stack_contents[idx] - and not isinstance(c[0], bytes) - ): - contents[idx] = np.stack(c) - else: - for doc in self: - content = doc.get_attrs_values(*fields)[0] - if content is None: - bad_docs.append(doc) - continue - contents.append(content) - docs_pts.append(doc) - - if not contents: - contents = None - elif ( - isinstance(stack_contents, bool) - and stack_contents - and not isinstance(contents[0], bytes) - ) or ( - isinstance(stack_contents, list) - and stack_contents[0] - and not isinstance(contents[0], bytes) - ): - contents = np.stack(contents) + if len(fields) > 1: + contents = list(map(list, zip(*contents))) if bad_docs: default_logger.warning( @@ -340,24 +213,25 @@ def extract_docs( def __bool__(self): """To simulate ```l = []; if l: ...``` + :return: returns true if the length of the array is larger than 0 """ return len(self) > 0 - def new(self) -> 'Document': - """Create a new empty document appended to the end of the array. - :return: a new Document appended to the internal list - """ + def __str__(self): from ..document import Document - return self.append(Document()) + if hasattr(self._docs_proto, '__len__'): + content = f'{self.__class__.__name__} has {len(self._docs_proto)} items' - def __str__(self): - from ..document import Document + if len(self._docs_proto) > 3: + content += ' (showing first three)' + else: + content = 'unknown length array' + + content += ':\n' + content += ',\n'.join(str(Document(d)) for d in self._docs_proto[:3]) - content = ',\n'.join(str(Document(d)) for d in self._docs_proto[:3]) - if len(self._docs_proto) > 3: - content += f'in total {len(self._docs_proto)} items' return content def __repr__(self): @@ -367,3 +241,40 @@ def __repr__(self): content += f' at {id(self)}' content = content.strip() return f'<{typename(self)} {content}>' + + def save(self, file: Union[str, TextIO]) -> None: + """Save array elements into a JSON file. + + :param file: File or filename to which the data is saved. + """ + if hasattr(file, 'write'): + file_ctx = nullcontext(file) + else: + file_ctx = open(file, 'w') + + with file_ctx as fp: + for d in self: + json.dump(d.dict(), fp) + fp.write('\n') + + @staticmethod + def load(file: Union[str, TextIO]) -> 'DocumentArray': + """Load array elements from a JSON file. + + :param file: File or filename to which the data is saved. + + :return: a DocumentArray object + """ + + if hasattr(file, 'read'): + file_ctx = nullcontext(file) + else: + file_ctx = open(file) + + from jina import Document + + da = DocumentArray() + with file_ctx as fp: + for v in fp: + da.append(Document(v)) + return da diff --git a/jina/types/arrays/match.py b/jina/types/arrays/match.py index 26d2badc0922b..6d70dd21e476d 100644 --- a/jina/types/arrays/match.py +++ b/jina/types/arrays/match.py @@ -9,15 +9,13 @@ class MatchArray(DocumentArray): :class:`MatchArray` inherits from :class:`DocumentArray`. It's a subset of Documents that represents the matches - :param docs_proto: List of matches of the `reference_doc` - :type docs_proto: :class:`Document` + :param docs_proto: Set of matches of the `reference_doc` :param reference_doc: Reference :class:`Document` for the sub-documents - :type reference_doc: :class:`Document` """ def __init__(self, docs_proto, reference_doc: 'Document'): - super().__init__(docs_proto) self._ref_doc = reference_doc + super().__init__(docs_proto) def append(self, document: 'Document', **kwargs) -> 'Document': """Add a matched document to the current Document. @@ -30,15 +28,14 @@ def append(self, document: 'Document', **kwargs) -> 'Document': """ from ..document import Document - m = self._docs_proto.add() - m.CopyFrom(document.proto) - match = Document(m) + match = Document(document, copy=True) - match.set_attrs( + match.set_attributes( granularity=self.granularity, adjacency=self.adjacency, **kwargs ) match.score.ref_id = self._ref_doc.id + super().append(match) return match @property diff --git a/jina/types/arrays/querylang.py b/jina/types/arrays/querylang.py deleted file mode 100644 index 9715d7ec5f2c3..0000000000000 --- a/jina/types/arrays/querylang.py +++ /dev/null @@ -1,122 +0,0 @@ -from collections.abc import MutableSequence -from typing import Iterable, Union, Dict - - -try: - # when protobuf using Cpp backend - from google.protobuf.pyext._message import ( - RepeatedCompositeContainer as RepeatedContainer, - ) -except: - # when protobuf using Python backend - from google.protobuf.internal.containers import ( - RepeatedCompositeFieldContainer as RepeatedContainer, - ) - - -from ..querylang import QueryLang -from ...helper import typename -from ...proto.jina_pb2 import QueryLangProto - -AcceptQueryLangType = Union[QueryLang, QueryLangProto, Dict] - -__all__ = ['QueryLangArray', 'AcceptQueryLangType'] - - -class QueryLangArray(MutableSequence): - """ - :class:`QueryLangArray` is a mutable sequence of :class:`QueryLang`. - It gives an efficient view of an array of Documents. One can iterate over it like - a generator but ALSO modify it, count it, get item. - - :param querylang_protos: A list of :class:`QueryLangProto` - :type querylang_protos: :class:`RepeatedCompositeContainer` - """ - - def __init__(self, querylang_protos: 'RepeatedContainer'): - super().__init__() - self._querylangs_proto = querylang_protos - self._querylangs_map = {} - - def insert(self, index: int, ql: 'QueryLang') -> None: - """ - Insert :param:`ql` at :param:`index` into `_querylangs_proto`. - - :param index: Position of the insertion. - :param ql: The querylang to be inserted. - """ - - self._querylangs_proto.insert(index, ql.proto) - - def __setitem__(self, key, value: 'QueryLang'): - if isinstance(key, int): - self._querylangs_proto[key].CopyFrom(value.proto) - elif isinstance(key, str): - self._querylangs_map[key].CopyFrom(value.proto) - else: - raise IndexError(f'do not support this index {key}') - - def __delitem__(self, index): - del self._querylangs_proto[index] - - def __len__(self): - return len(self._querylangs_proto) - - def __iter__(self): - for d in self._querylangs_proto: - yield QueryLang(d) - - def __getitem__(self, item): - if isinstance(item, int): - return QueryLang(self._querylangs_proto[item]) - elif isinstance(item, str): - return QueryLang(self._querylangs_map[item]) - else: - raise IndexError(f'do not support this index {item}') - - def append(self, value: 'AcceptQueryLangType'): - """ - Append :param:`value` in `_querylangs_proto`. - - :param value: The value to be appended. - """ - - q_pb = self._querylangs_proto.add() - if isinstance(value, Dict): - q_pb.CopyFrom(QueryLang(value).proto) - elif isinstance(value, QueryLangProto): - q_pb.CopyFrom(value) - elif isinstance(value, QueryLang): - q_pb.CopyFrom(value.proto) - else: - raise TypeError(f'unknown type {typename(value)}') - - def extend(self, iterable: Iterable[AcceptQueryLangType]) -> None: - """ - Extend an iterable to :class:QueryLangArray. - - :param iterable: the iterable of AcceptQueryLangType to extend this array with - """ - - for q in iterable: - self.append(q) - - def clear(self): - """Clear `_querylangs_proto` list.""" - del self._querylangs_proto[:] - - def reverse(self): - """Reverse order of `_querylangs_proto` list.""" - size = len(self._querylangs_proto) - hi_idx = size - 1 - for i in range(int(size / 2)): - tmp = QueryLangProto() - tmp.CopyFrom(self._querylangs_proto[hi_idx]) - self._querylangs_proto[hi_idx].CopyFrom(self._querylangs_proto[i]) - self._querylangs_proto[i].CopyFrom(tmp) - hi_idx -= 1 - - def build(self): - """Build a name to QueryLang mapping so one can later index a QueryLang using name as string key.""" - # TODO This is a temp fix, QueryLangProto do not have an id field. - self._querylangs_map = {q.name: q for q in self._querylangs_proto} diff --git a/jina/types/arrays/traversable.py b/jina/types/arrays/traversable.py index 278d193652ac6..c157cf0156656 100644 --- a/jina/types/arrays/traversable.py +++ b/jina/types/arrays/traversable.py @@ -1,8 +1,5 @@ -from typing import Iterable import itertools - -if False: - from ..document.traversable import Traversable +from typing import Iterable class TraversableSequence: @@ -10,9 +7,6 @@ class TraversableSequence: A mixin used for traversing a `Sequence[Traversable]`. """ - def __iter__(self) -> Iterable['Traversable']: - raise NotImplementedError - def traverse( self, traversal_paths: Iterable[str] ) -> Iterable['TraversableSequence']: @@ -50,10 +44,14 @@ def _traverse(docs: 'TraversableSequence', path: str): elif loc == 'c': for d in docs: yield from TraversableSequence._traverse(d.chunks, path[1:]) + else: + raise ValueError( + f'`path`:{loc} is invalid, must be one of `c`, `r`, `m`' + ) else: yield docs - def traverse_flattened_per_path( + def traverse_flat_per_path( self, traversal_paths: Iterable[str] ) -> Iterable['TraversableSequence']: """ @@ -66,7 +64,7 @@ def traverse_flattened_per_path( for p in traversal_paths: yield self._flatten(self._traverse(self, p)) - def traverse_flatten(self, traversal_paths: Iterable[str]) -> 'TraversableSequence': + def traverse_flat(self, traversal_paths: Iterable[str]) -> 'TraversableSequence': """ Returns a single flattened :class:``TraversableSequence`` with all Documents, that are reached via the :param:``traversal_paths``. @@ -77,7 +75,7 @@ def traverse_flatten(self, traversal_paths: Iterable[str]) -> 'TraversableSequen behavior then in :method:``traverse`` and :method:``traverse_flattened_per_path``! :param traversal_paths: a list of string that represents the traversal path - :return: a singel :class:``TraversableSequence`` containing the document of all leaves when applying the traversal_paths. + :return: a single :class:``TraversableSequence`` containing the document of all leaves when applying the traversal_paths. """ leaves = self.traverse(traversal_paths) return self._flatten(leaves) diff --git a/jina/types/document/__init__.py b/jina/types/document/__init__.py index e627253f82d9b..56cba3ab94ab4 100644 --- a/jina/types/document/__init__.py +++ b/jina/types/document/__init__.py @@ -1,32 +1,46 @@ import base64 +import csv +import glob import io +import itertools as it import json import mimetypes import os +import random import urllib.parse import urllib.request import warnings from hashlib import blake2b -from typing import Union, Dict, Optional, TypeVar, Any, Tuple, List, Type +from typing import ( + Iterable, + Generator, + Union, + Dict, + Optional, + TypeVar, + Any, + Tuple, + List, + Type, +) import numpy as np from google.protobuf import json_format from google.protobuf.field_mask_pb2 import FieldMask from .converters import png_to_buffer, to_datauri, guess_mime, to_image_blob -from .traversable import Traversable +from ..arrays.chunk import ChunkArray +from ..arrays.match import MatchArray from ..mixin import ProtoTypeMixin from ..ndarray.generic import NdArray, BaseSparseNdArray -from ..querylang.queryset.dunderkey import dunder_get from ..score import NamedScore -from ..arrays.chunk import ChunkArray -from ..arrays.match import MatchArray from ...excepts import BadDocType from ...helper import ( is_url, typename, random_identity, download_mermaid_url, + dunder_get, ) from ...importer import ImportExtensions from ...logging import default_logger @@ -36,14 +50,12 @@ from scipy.sparse import coo_matrix # fix type-hint complain for sphinx and flake - from typing import TypeVar - import numpy as np import scipy import tensorflow as tf import torch EmbeddingType = TypeVar( - 'EncodingType', + 'EmbeddingType', np.ndarray, scipy.sparse.csr_matrix, scipy.sparse.coo_matrix, @@ -72,15 +84,10 @@ 'DocumentSourceType', jina_pb2.DocumentProto, bytes, str, Dict ) -_document_fields = set( - list(jina_pb2.DocumentProto().DESCRIPTOR.fields_by_camelcase_name) - + list(jina_pb2.DocumentProto().DESCRIPTOR.fields_by_name) -) - _all_mime_types = set(mimetypes.types_map.values()) -class Document(ProtoTypeMixin, Traversable): +class Document(ProtoTypeMixin): """ :class:`Document` is one of the **primitive data type** in Jina. @@ -194,17 +201,37 @@ def __init__( } user_fields = set(document.keys()) - if _document_fields.issuperset(user_fields): + support_fields = set( + self.attributes( + include_proto_fields_camelcase=True, include_properties=False + ) + ) + + if support_fields.issuperset(user_fields): json_format.ParseDict(document, self._pb_body) else: - _intersect = _document_fields.intersection(user_fields) + _intersect = support_fields.intersection(user_fields) _remainder = user_fields.difference(_intersect) if _intersect: json_format.ParseDict( {k: document[k] for k in _intersect}, self._pb_body ) if _remainder: - self._pb_body.tags.update({k: document[k] for k in _remainder}) + support_prop = set( + self.attributes( + include_proto_fields=False, include_properties=True + ) + ) + _intersect2 = support_prop.intersection(_remainder) + _remainder2 = _remainder.difference(_intersect2) + + if _intersect2: + self.set_attributes(**{p: document[p] for p in _intersect2}) + + if _remainder2: + self._pb_body.tags.update( + {k: document[k] for k in _remainder} + ) elif isinstance(document, bytes): # directly parsing from binary string gives large false-positive # fortunately protobuf throws a warning when the parsing seems go wrong @@ -238,9 +265,21 @@ def __init__( if self._pb_body.id is None or not self._pb_body.id: self.id = random_identity(use_uuid1=True) - self.set_attrs(**kwargs) + self.set_attributes(**kwargs) self._mermaid_id = random_identity() #: for mermaid visualize id + def pop(self, *fields) -> None: + """Remove the values from the given fields of this Document. + + :param fields: field names + """ + for k in fields: + self._pb_body.ClearField(k) + + def clear(self) -> None: + """Remove all values from all fields of this Document. """ + self._pb_body.Clear() + @property def siblings(self) -> int: """ @@ -623,26 +662,37 @@ def _update_ndarray(self, k, v): def matches(self) -> 'MatchArray': """Get all matches of the current document. - :return: the set of matches attached to this document + :return: the array of matches attached to this document """ return MatchArray(self._pb_body.matches, reference_doc=self) + @matches.setter + def matches(self, value: Iterable['Document']): + """Get all chunks of the current document. + + :param value: value to set + """ + self.pop('matches') + self.matches.extend(value) + @property def chunks(self) -> 'ChunkArray': """Get all chunks of the current document. - :return: the set of chunks of this document + :return: the array of chunks of this document """ return ChunkArray(self._pb_body.chunks, reference_doc=self) - def __getattr__(self, item): - if hasattr(self._pb_body, item): - value = getattr(self._pb_body, item) - else: - value = dunder_get(self._pb_body, item) - return value + @chunks.setter + def chunks(self, value: Iterable['Document']): + """Get all chunks of the current document. - def set_attrs(self, **kwargs): + :param value: the array of chunks of this document + """ + self.pop('chunks') + self.chunks.extend(value) + + def set_attributes(self, **kwargs): """Bulk update Document fields with key-value specified in kwargs .. seealso:: @@ -676,52 +726,7 @@ def set_attrs(self, **kwargs): else: raise AttributeError(f'{k} is not recognized') - def get_attrs(self, *args) -> Dict[str, Any]: - """Bulk fetch Document fields and return a dict of the key-value pairs - - .. seealso:: - :meth:`update` for bulk set/update attributes - - .. note:: - Arguments will be extracted using `dunder_get` - .. highlight:: python - .. code-block:: python - - d = Document({'id': '123', 'hello': 'world', 'tags': {'id': 'external_id', 'good': 'bye'}}) - - assert d.id == '123' # true - assert d.tags['hello'] == 'world' # true - assert d.tags['good'] == 'bye' # true - assert d.tags['id'] == 'external_id' # true - - res = d.get_attrs(*['id', 'tags__hello', 'tags__good', 'tags__id']) - - assert res['id'] == '123' # true - assert res['tags__hello'] == 'world' # true - assert res['tags__good'] == 'bye' # true - assert res['tags__id'] == 'external_id' # true - - :param args: the variable length values to extract from the document - :return: a dictionary mapping the fields in `:param:args` to the actual attributes of this document - """ - - ret = {} - for k in args: - try: - value = getattr(self, k) - - if value is None: - raise ValueError - - ret[k] = value - except (AttributeError, ValueError): - default_logger.warning( - f'Could not get attribute `{typename(self)}.{k}`, returning `None`' - ) - ret[k] = None - return ret - - def get_attrs_values(self, *args) -> List[Any]: + def get_attributes(self, *fields: str) -> Union[Any, List[Any]]: """Bulk fetch Document fields and return a list of the values of these fields .. note:: @@ -740,12 +745,12 @@ def get_attrs_values(self, *args) -> List[Any]: assert res == ['123', 'world', 'bye', 'external_id'] - :param args: the variable length values to extract from the document + :param fields: the variable length values to extract from the document :return: a list with the attributes of this document ordered as the args """ ret = [] - for k in args: + for k in fields: try: value = getattr(self, k) @@ -759,6 +764,10 @@ def get_attrs_values(self, *args) -> List[Any]: ) ret.append(None) + # unboxing if args is single + if len(fields) == 1: + ret = ret[0] + return ret @property @@ -828,17 +837,8 @@ def uri(self, value: str): :param value: acceptable URI/URL, raise ``ValueError`` when it is not a valid URI """ - scheme = urllib.parse.urlparse(value).scheme - if ( - (scheme in {'http', 'https'} and is_url(value)) - or (scheme in {'data'}) - or os.path.exists(value) - or os.access(os.path.dirname(value), os.W_OK) - ): - self._pb_body.uri = value - self.mime_type = guess_mime(value) - else: - raise ValueError(f'{value} is not a valid URI') + self._pb_body.uri = value + self.mime_type = guess_mime(value) @property def mime_type(self) -> str: @@ -905,13 +905,9 @@ def content(self, value: DocumentContentType): if isinstance(value, bytes): self.buffer = value elif isinstance(value, str): - # TODO(Han): this implicit fallback is too much but that's - # how the original _generate function implement. And a lot of - # tests depend on this logic. Stay in this - # way to keep all tests passing until I got time to refactor this part - try: + if _is_uri(value): self.uri = value - except ValueError: + else: self.text = value elif isinstance(value, np.ndarray): self.blob = value @@ -972,67 +968,67 @@ def score(self, value: Union[jina_pb2.NamedScoreProto, NamedScore]): else: raise TypeError(f'score is in unsupported type {typename(value)}') - def convert_buffer_to_blob(self, **kwargs): - """Assuming the :attr:`buffer` is a _valid_ buffer of Numpy ndarray, - set :attr:`blob` accordingly. - - :param kwargs: reserved for maximum compatibility when using with ConvertDriver - - .. note:: - One can only recover values not shape information from pure buffer. - """ - self.blob = np.frombuffer(self.buffer) - - def convert_buffer_image_to_blob(self, color_axis: int = -1, **kwargs): + def convert_image_buffer_to_blob(self, color_axis: int = -1): """Convert an image buffer to blob :param color_axis: the axis id of the color channel, ``-1`` indicates the color channel info at the last axis - :param kwargs: reserved for maximum compatibility when using with ConvertDriver """ self.blob = to_image_blob(io.BytesIO(self.buffer), color_axis) - def convert_blob_to_uri( - self, width: int, height: int, resize_method: str = 'BILINEAR', **kwargs + def convert_image_blob_to_uri( + self, width: int, height: int, resize_method: str = 'BILINEAR' ): """Assuming :attr:`blob` is a _valid_ image, set :attr:`uri` accordingly :param width: the width of the blob :param height: the height of the blob :param resize_method: the resize method name - :param kwargs: reserved for maximum compatibility when using with ConvertDriver """ png_bytes = png_to_buffer(self.blob, width, height, resize_method) self.uri = 'data:image/png;base64,' + base64.b64encode(png_bytes).decode() - def convert_uri_to_blob( - self, color_axis: int = -1, uri_prefix: Optional[str] = None, **kwargs + def convert_image_uri_to_blob( + self, color_axis: int = -1, uri_prefix: Optional[str] = None ): """Convert uri to blob :param color_axis: the axis id of the color channel, ``-1`` indicates the color channel info at the last axis :param uri_prefix: the prefix of the uri - :param kwargs: reserved for maximum compatibility when using with ConvertDriver """ self.blob = to_image_blob( (uri_prefix + self.uri) if uri_prefix else self.uri, color_axis ) - def convert_data_uri_to_blob(self, color_axis: int = -1, **kwargs): + def convert_image_datauri_to_blob(self, color_axis: int = -1): """Convert data URI to image blob :param color_axis: the axis id of the color channel, ``-1`` indicates the color channel info at the last axis - :param kwargs: reserved for maximum compatibility when using with ConvertDriver """ req = urllib.request.Request(self.uri, headers={'User-Agent': 'Mozilla/5.0'}) with urllib.request.urlopen(req) as fp: buffer = fp.read() self.blob = to_image_blob(io.BytesIO(buffer), color_axis) - def convert_uri_to_buffer(self, **kwargs): + def convert_buffer_to_blob(self, dtype=None, count=-1, offset=0): + """Assuming the :attr:`buffer` is a _valid_ buffer of Numpy ndarray, + set :attr:`blob` accordingly. + + :param dtype: Data-type of the returned array; default: float. + :param count: Number of items to read. ``-1`` means all data in the buffer. + :param offset: Start reading the buffer from this offset (in bytes); default: 0. + + .. note:: + One can only recover values not shape information from pure buffer. + """ + self.blob = np.frombuffer(self.buffer, dtype, count, offset) + + def convert_blob_to_buffer(self): + """Convert blob to buffer """ + self.buffer = self.blob.tobytes() + + def convert_uri_to_buffer(self): """Convert uri to buffer Internally it downloads from the URI and set :attr:`buffer`. - :param kwargs: reserved for maximum compatibility when using with ConvertDriver - """ if urllib.parse.urlparse(self.uri).scheme in {'http', 'https', 'data'}: req = urllib.request.Request( @@ -1046,22 +1042,20 @@ def convert_uri_to_buffer(self, **kwargs): else: raise FileNotFoundError(f'{self.uri} is not a URL or a valid local path') - def convert_uri_to_data_uri( - self, charset: str = 'utf-8', base64: bool = False, **kwargs - ): + def convert_uri_to_datauri(self, charset: str = 'utf-8', base64: bool = False): """Convert uri to data uri. Internally it reads uri into buffer and convert it to data uri :param charset: charset may be any character set registered with IANA :param base64: used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters. - :param kwargs: reserved for maximum compatibility when using with ConvertDriver """ - self.convert_uri_to_buffer() - self.uri = to_datauri(self.mime_type, self.buffer, charset, base64, binary=True) + if not _is_datauri(self.uri): + self.convert_uri_to_buffer() + self.uri = to_datauri( + self.mime_type, self.buffer, charset, base64, binary=True + ) - def convert_buffer_to_uri( - self, charset: str = 'utf-8', base64: bool = False, **kwargs - ): + def convert_buffer_to_uri(self, charset: str = 'utf-8', base64: bool = False): """Convert buffer to data uri. Internally it first reads into buffer and then converts it to data URI. @@ -1069,7 +1063,6 @@ def convert_buffer_to_uri( :param base64: used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters. - :param kwargs: reserved for maximum compatibility when using with ConvertDriver """ if not self.mime_type: @@ -1079,33 +1072,24 @@ def convert_buffer_to_uri( self.uri = to_datauri(self.mime_type, self.buffer, charset, base64, binary=True) - def convert_text_to_uri( - self, charset: str = 'utf-8', base64: bool = False, **kwargs - ): + def convert_text_to_uri(self, charset: str = 'utf-8', base64: bool = False): """Convert text to data uri. :param charset: charset may be any character set registered with IANA :param base64: used to encode arbitrary octet sequences into a form that satisfies the rules of 7bit. Designed to be efficient for non-text 8 bit and binary data. Sometimes used for text data that frequently uses non-US-ASCII characters. - :param kwargs: reserved for maximum compatibility when using with ConvertDriver """ self.uri = to_datauri(self.mime_type, self.text, charset, base64, binary=False) - def convert_uri_to_text(self, **kwargs): - """Assuming URI is text, convert it to text - - :param kwargs: reserved for maximum compatibility when using with ConvertDriver - """ + def convert_uri_to_text(self): + """Assuming URI is text, convert it to text""" self.convert_uri_to_buffer() self.text = self.buffer.decode() - def convert_content_to_uri(self, **kwargs): - """Convert content in URI with best effort - - :param kwargs: reserved for maximum compatibility when using with ConvertDriver - """ + def convert_content_to_uri(self): + """Convert content in URI with best effort""" if self.text: self.convert_text_to_uri() elif self.buffer: @@ -1114,14 +1098,14 @@ def convert_content_to_uri(self, **kwargs): raise NotImplementedError def MergeFrom(self, doc: 'Document'): - """Merge the content of target :param:doc into current document. + """Merge the content of target :param doc: the document to merge from """ self._pb_body.MergeFrom(doc.proto) def CopyFrom(self, doc: 'Document'): - """Copy the content of target :param:doc into current document. + """Copy the content of target :param doc: the document to copy from """ @@ -1173,10 +1157,10 @@ def _mermaid_to_url(self, img_type: str) -> str: mermaid_str = ( """ - %%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#FFC666'}}}%% - classDiagram - - """ + %%{init: {'theme': 'base', 'themeVariables': { 'primaryColor': '#FFC666'}}}%% + classDiagram + + """ + self.__mermaid_str__() ) @@ -1229,28 +1213,248 @@ def non_empty_fields(self) -> Tuple[str]: """ return tuple(field[0].name for field in self.ListFields()) - @property - def raw(self) -> 'Document': - """Return self as a document object. - - :return: this Document - """ - return self - @staticmethod - def get_all_attributes() -> List[str]: + def attributes( + include_proto_fields: bool = True, + include_proto_fields_camelcase: bool = False, + include_properties: bool = False, + ) -> List[str]: """Return all attributes supported by the Document, which can be accessed by ``doc.attribute`` + :param include_proto_fields: if set, then include all protobuf fields + :param include_proto_fields_camelcase: if set, then include all protobuf fields in CamelCase + :param include_properties: if set, then include all properties defined for Document class :return: a list of attributes in string. """ import inspect - support_keys = list(jina_pb2.DocumentProto().DESCRIPTOR.fields_by_name) + support_keys = [] - support_keys += [ - name - for (name, value) in inspect.getmembers( - Document, lambda x: isinstance(x, property) + if include_proto_fields: + support_keys = list(jina_pb2.DocumentProto().DESCRIPTOR.fields_by_name) + if include_proto_fields_camelcase: + support_keys += list( + jina_pb2.DocumentProto().DESCRIPTOR.fields_by_camelcase_name ) - ] + + if include_properties: + support_keys += [ + name + for (name, value) in inspect.getmembers( + Document, lambda x: isinstance(x, property) + ) + ] return list(set(support_keys)) + + @staticmethod + def from_lines( + lines: Optional[Iterable[str]] = None, + filepath: Optional[str] = None, + read_mode: str = 'r', + line_format: str = 'json', + field_resolver: Optional[Dict[str, str]] = None, + size: Optional[int] = None, + sampling_rate: Optional[float] = None, + ) -> Generator['Document', None, None]: + """Generator function for lines, json and csv. Yields documents or strings. + + :param lines: a list of strings, each is considered as a document + :param filepath: a text file that each line contains a document + :param read_mode: specifies the mode in which the file + is opened. 'r' for reading in text mode, 'rb' for reading in binary + :param line_format: the format of each line ``json`` or ``csv`` + :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field + names defined in Protobuf. This is only used when the given ``document`` is + a JSON string or a Python dict. + :param size: the maximum number of the documents + :param sampling_rate: the sampling rate between [0, 1] + :yield: documents + + """ + if filepath: + file_type = os.path.splitext(filepath)[1] + with open(filepath, read_mode) as f: + if file_type in _jsonl_ext: + yield from Document.from_ndjson(f) + elif file_type in _csv_ext: + yield from Document.from_csv(f, field_resolver, size, sampling_rate) + else: + yield from _subsample(f, size, sampling_rate) + elif lines: + if line_format == 'json': + yield from Document.from_ndjson(lines) + elif line_format == 'csv': + yield from Document.from_csv(lines, field_resolver, size, sampling_rate) + else: + yield from _subsample(lines, size, sampling_rate) + else: + raise ValueError('"filepath" and "lines" can not be both empty') + + @staticmethod + def from_ndjson( + fp: Iterable[str], + field_resolver: Optional[Dict[str, str]] = None, + size: Optional[int] = None, + sampling_rate: Optional[float] = None, + ) -> Generator['Document', None, None]: + """Generator function for line separated JSON. Yields documents. + + :param fp: file paths + :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field + names defined in Protobuf. This is only used when the given ``document`` is + a JSON string or a Python dict. + :param size: the maximum number of the documents + :param sampling_rate: the sampling rate between [0, 1] + :yield: documents + + """ + for line in _subsample(fp, size, sampling_rate): + value = json.loads(line) + if 'groundtruth' in value and 'document' in value: + yield Document(value['document'], field_resolver), Document( + value['groundtruth'], field_resolver + ) + else: + yield Document(value, field_resolver) + + @staticmethod + def from_csv( + fp: Iterable[str], + field_resolver: Optional[Dict[str, str]] = None, + size: Optional[int] = None, + sampling_rate: Optional[float] = None, + ) -> Generator['Document', None, None]: + """Generator function for CSV. Yields documents. + + :param fp: file paths + :param field_resolver: a map from field names defined in ``document`` (JSON, dict) to the field + names defined in Protobuf. This is only used when the given ``document`` is + a JSON string or a Python dict. + :param size: the maximum number of the documents + :param sampling_rate: the sampling rate between [0, 1] + :yield: documents + + """ + lines = csv.DictReader(fp) + for value in _subsample(lines, size, sampling_rate): + if 'groundtruth' in value and 'document' in value: + yield Document(value['document'], field_resolver), Document( + value['groundtruth'], field_resolver + ) + else: + yield Document(value, field_resolver) + + @staticmethod + def from_files( + patterns: Union[str, List[str]], + recursive: bool = True, + size: Optional[int] = None, + sampling_rate: Optional[float] = None, + read_mode: Optional[str] = None, + ) -> Generator['Document', None, None]: + """Creates an iterator over a list of file path or the content of the files. + + :param patterns: The pattern may contain simple shell-style wildcards, e.g. '\*.py', '[\*.zip, \*.gz]' + :param recursive: If recursive is true, the pattern '**' will match any files + and zero or more directories and subdirectories + :param size: the maximum number of the files + :param sampling_rate: the sampling rate between [0, 1] + :param read_mode: specifies the mode in which the file is opened. + 'r' for reading in text mode, 'rb' for reading in binary mode. + If `read_mode` is None, will iterate over filenames. + :yield: file paths or binary content + + .. note:: + This function should not be directly used, use :meth:`Flow.index_files`, :meth:`Flow.search_files` instead + """ + if read_mode not in {'r', 'rb', None}: + raise RuntimeError( + f'read_mode should be "r", "rb" or None, got {read_mode}' + ) + + def _iter_file_exts(ps): + return it.chain.from_iterable( + glob.iglob(p, recursive=recursive) for p in ps + ) + + d = 0 + if isinstance(patterns, str): + patterns = [patterns] + for g in _iter_file_exts(patterns): + if sampling_rate is None or random.random() < sampling_rate: + if read_mode is None: + yield Document(uri=g) + elif read_mode in {'r', 'rb'}: + with open(g, read_mode) as fp: + yield Document(content=fp.read()) + d += 1 + if size is not None and d > size: + break + + @staticmethod + def from_ndarray( + array: 'np.ndarray', + axis: int = 0, + size: Optional[int] = None, + shuffle: bool = False, + ) -> Generator['Document', None, None]: + """Create a generator for a given dimension of a numpy array. + + :param array: the numpy ndarray data source + :param axis: iterate over that axis + :param size: the maximum number of the sub arrays + :param shuffle: shuffle the numpy data source beforehand + :yield: ndarray + + .. note:: + This function should not be directly used, use :meth:`Flow.index_ndarray`, :meth:`Flow.search_ndarray` instead + """ + + if shuffle: + # shuffle for random query + array = np.take(array, np.random.permutation(array.shape[0]), axis=axis) + d = 0 + for r in array: + yield Document(content=r) + d += 1 + if size is not None and d >= size: + break + + def __getattr__(self, item): + if hasattr(self._pb_body, item): + value = getattr(self._pb_body, item) + else: + value = dunder_get(self._pb_body, item) + return value + + +# https://github.com/ndjson/ndjson.github.io/issues/1#issuecomment-109935996 +_jsonl_ext = {'.jsonlines', '.ndjson', '.jsonl', '.jl', '.ldjson'} +_csv_ext = {'.csv', '.tcsv'} + + +def _sample(iterable, sampling_rate: Optional[float] = None): + for i in iterable: + if sampling_rate is None or random.random() < sampling_rate: + yield i + + +def _subsample( + iterable, size: Optional[int] = None, sampling_rate: Optional[float] = None +): + yield from it.islice(_sample(iterable, sampling_rate), size) + + +def _is_uri(value: str) -> bool: + scheme = urllib.parse.urlparse(value).scheme + return ( + (scheme in {'http', 'https'} and is_url(value)) + or (scheme in {'data'}) + or os.path.exists(value) + or os.access(os.path.dirname(value), os.W_OK) + ) + + +def _is_datauri(value: str) -> bool: + scheme = urllib.parse.urlparse(value).scheme + return is_url(value) and scheme in {'data'} diff --git a/jina/types/document/helper.py b/jina/types/document/helper.py index 062ff9b421b69..628e4f33e0364 100644 --- a/jina/types/document/helper.py +++ b/jina/types/document/helper.py @@ -1,12 +1,12 @@ from typing import Iterable -from . import Document -from .traversable import Traversable - __all__ = ['DocGroundtruthPair'] +if False: + from . import Document + -class DocGroundtruthPair(Traversable): +class DocGroundtruthPair: """ Helper class to expose common interface to the traversal logic of the BaseExecutable Driver. It is important to note that it checks the matching structure of `docs` and `groundtruths`. It is important while @@ -18,20 +18,30 @@ class DocGroundtruthPair(Traversable): """ def __init__(self, doc: 'Document', groundtruth: 'Document'): - """Set constructor method.""" + """Set constructor method. + + :param doc: actual Document + :param groundtruth: groundtruth Document + """ self.doc = doc self.groundtruth = groundtruth @property def matches(self) -> Iterable['DocGroundtruthPair']: - """Get the pairs between matches and Groundtruth.""" + """Get the pairs between matches and Groundtruth. + + :yields: DocGroundtruthPair object + """ assert len(self.doc.matches) == len(self.groundtruth.matches) for doc, groundtruth in zip(self.doc.matches, self.groundtruth.matches): yield DocGroundtruthPair(doc, groundtruth) @property def chunks(self) -> Iterable['DocGroundtruthPair']: - """Get the pairs between chunks and Groundtruth.""" + """Get the pairs between chunks and Groundtruth. + + :yields: DocGroundtruthPair object + """ assert len(self.doc.chunks) == len(self.groundtruth.chunks) for doc, groundtruth in zip(self.doc.chunks, self.groundtruth.chunks): yield DocGroundtruthPair(doc, groundtruth) diff --git a/jina/types/document/multimodal.py b/jina/types/document/multimodal.py index 77d09f3d29d36..8612d0a78fc40 100644 --- a/jina/types/document/multimodal.py +++ b/jina/types/document/multimodal.py @@ -119,7 +119,7 @@ def modality_content_map(self, value: Dict[str, Any]): with Document() as chunk: chunk.modality = modality chunk.content = content - self.chunks.add(chunk) + self.chunks.append(chunk) def __getitem__(self, modality: str) -> DocumentContentType: """Extract content by the name of the modality. diff --git a/jina/types/document/traversable.py b/jina/types/document/traversable.py deleted file mode 100644 index 41912870255ea..0000000000000 --- a/jina/types/document/traversable.py +++ /dev/null @@ -1,21 +0,0 @@ -from typing import Iterable - - -class Traversable: - """ - Helper class to expose common interface to the traversal logic of the BaseExecutable Driver. - It is important to note that it checks the matching structure of `docs` and `groundtruths`. It is important while - traversing to ensure that then the driver can be applied at a comparable level of granularity and adjacency. - This does not imply that you can't compare at the end a document with 10 matches with a groundtruth with 20 matches - - :param doc: Target `Document`. - :param groundtruth: The :class:`Document` with desired state. - """ - - @property - def matches(self) -> Iterable['Traversable']: - """Get the pairs between matches and Groundtruth.""" - - @property - def chunks(self) -> Iterable['Traversable']: - """Get the pairs between chunks and Groundtruth.""" diff --git a/jina/types/message/__init__.py b/jina/types/message/__init__.py index 3d84653a985b7..f53668df16de9 100644 --- a/jina/types/message/__init__.py +++ b/jina/types/message/__init__.py @@ -67,6 +67,8 @@ def __init__( if isinstance(self.request, Request): self.request._envelope = self.envelope + self.envelope.header.CopyFrom(self.request.header) + if self.envelope.check_version: self._check_version() @@ -126,9 +128,7 @@ def is_data_request(self) -> bool: :return: boolean which states if data is requested """ - return ( - self.envelope.request_type != 'ControlRequest' or self.request.propagate - ) and self.envelope.request_type != 'DumpRequest' + return self.envelope.request_type == 'DataRequest' def _add_envelope( self, diff --git a/jina/types/message/common.py b/jina/types/message/common.py index a55fd2aa58b3d..b3b3245b299fc 100644 --- a/jina/types/message/common.py +++ b/jina/types/message/common.py @@ -14,14 +14,14 @@ class ControlMessage(Message): Class of the protobuf message. :param command: Command with string content. (e.g. 'IDLE', 'CANCEL', 'TERMINATE', 'STATUS') - :param pod_name: Name of the current pod. + :param pod_name: Name of the current pod, to represent routes only. :param identity: The identity of the current pod :param args: Additional positional arguments which are just used for the parent initialization :param kwargs: Additional keyword arguments which are just used for the parent initialization """ def __init__( - self, command: str, pod_name: str = 'ctl', identity: str = '', *args, **kwargs + self, command: str, pod_name: str = '', identity: str = '', *args, **kwargs ): req = Request(jina_pb2.RequestProto()) if command in _available_commands: @@ -39,5 +39,3 @@ def __init__( args = kwargs.get('args', None) if args: req.args = args - - req.targets.extend([pod_name]) diff --git a/jina/types/message/dump.py b/jina/types/message/dump.py deleted file mode 100644 index 3557828e3b54f..0000000000000 --- a/jina/types/message/dump.py +++ /dev/null @@ -1,39 +0,0 @@ -from . import Message -from ..request import Request -from ...proto import jina_pb2 - -_available_commands = dict( - jina_pb2.RequestProto.DumpRequestProto.DESCRIPTOR.enum_values_by_name -) - -__all__ = ['DumpMessage'] - - -class DumpMessage(Message): - """ - Class of the protobuf message. - - :param path: the path to which to dump - :param shards: the nr of shards to which to dump - :param pod_name: Name of the current pod. - :param identity: The identity of the current pod - :param args: Additional positional arguments which are just used for the parent initialization - :param kwargs: Additional keyword arguments which are just used for the parent initialization - """ - - def __init__( - self, - path: str, - shards: int, - pod_name: str = 'ctl', - identity: str = '', - *args, - **kwargs - ): - req = Request(jina_pb2.RequestProto()) - req.dump.path = path - req.dump.shards = shards - super().__init__( - None, req, pod_name=pod_name, identity=identity, *args, **kwargs - ) - req.request_type = 'dump' diff --git a/jina/types/mixin.py b/jina/types/mixin.py index 5a1cf647fc945..8eca0bd7b242a 100644 --- a/jina/types/mixin.py +++ b/jina/types/mixin.py @@ -1,4 +1,3 @@ -import pprint from typing import Dict from google.protobuf.json_format import MessageToJson, MessageToDict @@ -16,10 +15,7 @@ def json(self) -> str: :return: JSON string of the object """ return MessageToJson( - self._pb_body, - including_default_value_fields=True, - preserving_proto_field_name=True, - use_integers_for_enums=True, + self._pb_body, preserving_proto_field_name=True, sort_keys=True ) def dict(self) -> Dict: @@ -27,11 +23,12 @@ def dict(self) -> Dict: :return: dict representation of the object """ + + # NOTE: PLEASE DO NOT ADD `including_default_value_fields`, + # it makes the output very verbose! return MessageToDict( self._pb_body, - including_default_value_fields=True, preserving_proto_field_name=True, - use_integers_for_enums=True, ) @property @@ -42,7 +39,6 @@ def proto(self) -> 'jina_pb2._reflection.GeneratedProtocolMessageType': """ return self._pb_body - @property def binary_str(self) -> bytes: """Return the serialized the message to a string. @@ -50,6 +46,14 @@ def binary_str(self) -> bytes: """ return self._pb_body.SerializeToString() + @property + def nbytes(self) -> int: + """Return total bytes consumed by protobuf. + + :return: number of bytes + """ + return len(self.binary_str()) + def __getattr__(self, name: str): return getattr(self._pb_body, name) diff --git a/jina/types/querylang/__init__.py b/jina/types/querylang/__init__.py deleted file mode 100644 index abd26ecf63d0d..0000000000000 --- a/jina/types/querylang/__init__.py +++ /dev/null @@ -1,118 +0,0 @@ -import warnings -from typing import TypeVar, Dict, Optional - -from google.protobuf import json_format - -from ..mixin import ProtoTypeMixin -from ...excepts import BadQueryLangType -from ...helper import typename -from ...proto import jina_pb2 - -QueryLangSourceType = TypeVar( - 'QueryLangSourceType', jina_pb2.QueryLangProto, bytes, str, Dict -) - -__all__ = ['QueryLang'] - - -class QueryLang(ProtoTypeMixin): - """ - :class:`QueryLang` is one of the **primitive data type** in Jina. - - It offers a Pythonic interface to allow users access and manipulate - :class:`jina.jina_pb2.QueryLangProto` object without working with Protobuf itself. - - To create a :class:`QueryLang` object from a Dict containing the name of a :class:`BaseDriver`, - and the parameters to override, simply: - - .. highlight:: python - .. code-block:: python - - from jina import QueryLang - ql = QueryLang({name: 'SliceQL', priority: 1, parameters: {'start': 3, 'end': 1}}) - - .. warning:: - The `BaseDriver` needs to be a `QuerySetReader` to be able to read the `QueryLang` - - One can also build a :class`QueryLang` from JSON string, bytes, dict or directly from a protobuf object. - - A :class:`QueryLang` object (no matter how it is constructed) can be converted to - protobuf object by using: - - .. highlight:: python - .. code-block:: python - - # to protobuf object - ql.as_pb_object - - :param querylang: the query language source to construct from, acceptable types include: - :class:`jina_pb2.QueryLangProto`, :class:`bytes`, :class:`str`, :class:`Dict`, Tuple. - :type querylang: Optional[QueryLangSourceType] - :param copy: when ``querylang`` is given as a :class:`QueryLangProto` object, build a - view (i.e. weak reference) from it or a deep copy from it. - :type copy: bool - """ - - def __init__( - self, querylang: Optional[QueryLangSourceType] = None, copy: bool = False - ): - """Set constructor method.""" - self._pb_body = jina_pb2.QueryLangProto() - try: - if isinstance(querylang, jina_pb2.QueryLangProto): - if copy: - self._pb_body.CopyFrom(querylang) - else: - self._pb_body = querylang - elif isinstance(querylang, dict): - json_format.ParseDict(querylang, self._pb_body) - elif isinstance(querylang, str): - json_format.Parse(querylang, self._pb_body) - elif isinstance(querylang, bytes): - # directly parsing from binary string gives large false-positive - # fortunately protobuf throws a warning when the parsing seems go wrong - # the context manager below converts this warning into exception and throw it - # properly - with warnings.catch_warnings(): - warnings.filterwarnings( - 'error', 'Unexpected end-group tag', category=RuntimeWarning - ) - try: - self._pb_body.ParseFromString(querylang) - except RuntimeWarning as ex: - raise BadQueryLangType( - 'fail to construct a query language' - ) from ex - elif querylang is not None: - # note ``None`` is not considered as a bad type - raise ValueError(f'{typename(querylang)} is not recognizable') - except Exception as ex: - raise BadQueryLangType('fail to construct a query language') from ex - - @property - def priority(self) -> int: - """ - Get the priority of this query language. - The query language only takes effect when if it has a higher priority than the internal one with the same name - """ - return self._pb_body.priority - - @priority.setter - def priority(self, value: int): - """Set the priority of this query language with :param:`value`.""" - self._pb_body.priority = value - - @property - def name(self) -> str: - """Get the name of the driver that the query language attached to.""" - return self._pb_body.name - - @name.setter - def name(self, value: str): - """ - Set the name of the driver that the query language attached to. - - :param value: Name of the driver - :type value: str - """ - self._pb_body.name = value diff --git a/jina/types/querylang/queryset/__init__.py b/jina/types/querylang/queryset/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/jina/types/querylang/queryset/dunderkey.py b/jina/types/querylang/queryset/dunderkey.py deleted file mode 100644 index 36137f0dabc4a..0000000000000 --- a/jina/types/querylang/queryset/dunderkey.py +++ /dev/null @@ -1,218 +0,0 @@ -""" - -Originally from https://github.com/naiquevin/lookupy - -The library is provided as-is under the MIT License - -Copyright (c) 2013 Vineet Naik (naikvin@gmail.com) - -Permission is hereby granted, free of charge, to any person obtaining -a copy of this software and associated documentation files (the -"Software"), to deal in the Software without restriction, including -without limitation the rights to use, copy, modify, merge, publish, -distribute, sublicense, and/or sell copies of the Software, and to -permit persons to whom the Software is furnished to do so, subject to -the following conditions: - -The above copyright notice and this permission notice shall be -included in all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE -LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION -OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION -WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - -""" - -## This module deals with code regarding handling the double -## underscore separated keys - - -from typing import Tuple, Dict, Any, Optional - -from google.protobuf.struct_pb2 import Struct - -from .helper import * - - -def dunderkey(*args: str) -> str: - """Produces a nested key from multiple args separated by double - underscore - - >>> dunderkey('a', 'b', 'c') - >>> 'a__b__c' - - :param args: the multiple strings - :return:: the joined key - """ - return '__'.join(args) - - -def dunder_partition(key: str) -> Tuple[str, Optional[str]]: - """Split a dunderkey into 2 parts. - - The first part is everything before the final double underscore - The second part is after the final double underscore - - >>> dunder_partition('a__b__c') - >>> ('a__b', 'c') - - :param key : the dunder string - :return: the two parts - - """ - part1: str - part2: Optional[str] - try: - part1, part2 = key.rsplit('__', 1) - except ValueError: - part1, part2 = key, None - return part1, part2 - - -def dunder_init(key: str) -> str: - """Returns the initial part of the dunder key - - >>> dunder_init('a__b__c') - >>> 'a__b' - - :param key : the dunder string - :return: the first part (None if invalid dunder str) - """ - return dunder_partition(key)[0] - - -def dunder_last(key: str) -> Optional[str]: - """Returns the last part of the dunder key - - >>> dunder_last('a__b__c') - >>> 'c' - - :param key : the dunder string - :return: the last part (None if invalid dunder string) - """ - return dunder_partition(key)[1] - - -def dunder_get(_dict: Any, key: str) -> Any: - """Returns value for a specified dunderkey - - A "dunderkey" is just a fieldname that may or may not contain - double underscores (dunderscores!) for referencing nested keys in - a dict. eg:: - - >>> data = {'a': {'b': 1}} - >>> dunder_get(data, 'a__b') - 1 - - key 'b' can be referrenced as 'a__b' - - :param _dict : (dict, list, struct or object) which we want to index into - :param key : (str) that represents a first level or nested key in the dict - :return: (mixed) value corresponding to the key - - """ - - try: - part1, part2 = key.split('__', 1) - except ValueError: - part1, part2 = key, '' - - try: - part1 = int(part1) # parse int parameter - except ValueError: - pass - - if isinstance(part1, int): - result = guard_iter(_dict)[part1] - elif isinstance(_dict, (dict, Struct)): - if part1 in _dict: - result = _dict[part1] - else: - result = None - else: - result = getattr(_dict, part1) - - return dunder_get(result, part2) if part2 else result - - -def undunder_keys(_dict: Dict) -> Dict: - """Returns dict with the dunder keys converted back to nested dicts - - eg:: - - >>> undunder_keys({'a': 'hello', 'b__c': 'world'}) - {'a': 'hello', 'b': {'c': 'world'}} - - :param _dict : (dict) flat dict - :return: (dict) nested dict - - """ - - def f(keys, value): - """ - Recursively undunder the keys. - - :param keys: keys to undunder - :param value: related value - :return: undundered keys - """ - return {keys[0]: f(keys[1:], value)} if keys else value - - def merge(dict1, dict2): - """ - Merge second dictionary into the first one. - - :param dict1: dictionary which gets modified - :param dict2: dictionary to read from - """ - key, val = list(dict2.items())[0] - - if key in dict1: - merge(dict1[key], val) - else: - dict1[key] = val - - result = {} - for k, v in _dict.items(): - merge(result, f(k.split('__'), v)) - - return result - - -def dunder_truncate(_dict: Dict) -> Dict: - """Returns dict with dunder keys truncated to only the last part - - In other words, replaces the dunder keys with just last part of - it. In case many identical last parts are encountered, they are - not truncated further - - eg:: - - >>> dunder_truncate({'a__p': 3, 'b__c': 'no'}) - {'c': 'no', 'p': 3} - >>> dunder_truncate({'a__p': 'yay', 'b__p': 'no', 'c__z': 'dunno'}) - {'a__p': 'yay', 'b__p': 'no', 'z': 'dunno'} - - :param _dict : (dict) to flatten - :return: (dict) flattened result - - """ - keylist = list(_dict.keys()) - - def decide_key(k, klist): - """ - Get the truncated key. - - :param k: One element of key list. - :param klist: List of current keys. - :return: Original k if truncated key is not unique else return truncated key. - """ - newkey = dunder_last(k) - return newkey if list(map(dunder_last, klist)).count(newkey) == 1 else k - - original_keys = [decide_key(key, keylist) for key in keylist] - return dict(zip(original_keys, _dict.values())) diff --git a/jina/types/querylang/queryset/helper.py b/jina/types/querylang/queryset/helper.py deleted file mode 100644 index fcb51f860f6ac..0000000000000 --- a/jina/types/querylang/queryset/helper.py +++ /dev/null @@ -1,41 +0,0 @@ -from functools import partial -from typing import Callable, List, Type, Union, Iterable - -from ....excepts import LookupyError - - -def iff(precond: Callable, val: Union[int, str], f: Callable) -> bool: - """If and only if the precond is True - - Shortcut function for precond(val) and f(val). It is mainly used - to create partial functions for commonly required preconditions - - :param precond : (function) represents the precondition - :param val: (mixed) value to which the functions are applied - :param f: (function) the actual function - :return: whether or not the cond is satisfied - """ - return False if not precond(val) else f(val) - - -iff_not_none = partial(iff, lambda x: x is not None) - - -def guard_type( - classinfo: Union[Type[str], Type[Iterable]], val: Union[str, List[int]] -) -> Union[str, List[int]]: - """ - Make sure the type of :param:`val` is :param:`classinfo`. - - :param classinfo: Guard type. - :param val: Target object. - :return: :param:`val` if it has correct type. - """ - if not isinstance(val, classinfo): - raise LookupyError(f'Value not a {classinfo}') - return val - - -guard_str = partial(guard_type, str) -guard_iter = partial(guard_type, Iterable) -guard_int = partial(guard_type, int) diff --git a/jina/types/querylang/queryset/lookup.py b/jina/types/querylang/queryset/lookup.py deleted file mode 100644 index 9bf3fe8f45930..0000000000000 --- a/jina/types/querylang/queryset/lookup.py +++ /dev/null @@ -1,339 +0,0 @@ -""" - -Originally from https://github.com/naiquevin/lookupy - -The library is provided as-is under the MIT License - -Copyright (c) 2013 Vineet Naik (naikvin@gmail.com) - -Permission is hereby granted, free of charge, to any person obtaining -a copy of this software and associated documentation files (the -"Software"), to deal in the Software without restriction, including -without limitation the rights to use, copy, modify, merge, publish, -distribute, sublicense, and/or sell copies of the Software, and to -permit persons to whom the Software is furnished to do so, subject to -the following conditions: - -The above copyright notice and this permission notice shall be -included in all copies or substantial portions of the Software. - -THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, -EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF -MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND -NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE -LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION -OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION -WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. - -""" - -import re - -from .dunderkey import dunder_get, dunder_partition, undunder_keys, dunder_truncate -from .helper import * -from typing import Dict, Iterable, Any - - -class QuerySet: - """Provides an interface to filter data and select specific fields - from the data - - QuerySet is used for filtering data and also selecting only - relevant fields out of it. This object is internally created which - means usually you, the user wouldn't need to create it. - - :param data: an iterable of dicts - - """ - - def __init__(self, data): - """Set constructor method.""" - self.data = data - - def filter(self, *args, **kwargs) -> 'QuerySet': - """Filters data using the _lookup parameters - - Lookup parameters can be passed as, - - 1. keyword arguments of type `field__lookuptype=value` where - lookuptype specifies how to "query" eg:: - - >>> c.items.filter(language__contains='java') - - above will match all items where the language field - contains the substring 'java' such as 'java', - 'javascript'. Each look up is treated as a conditional - clause and if multiple of them are passed, they are - combined using logical the ``and`` operator - - For nested fields, double underscore can be used eg:: - - >>> data = [{'a': {'b': 3}}, {'a': {'b': 10}}] - >>> c = Collection(data) - >>> c.items.filter(a__b__gt=5) - - above _lookup will match the 2nd element (b > 5) - - For the list of supported _lookup parameter, see - documentation on Github - - 2. pos arguments of the type ``field__lookuptype=Q(...)``. - These can be useful to build conditional clauses that - need to be combined using logical `or` or negated using - `not` - - >>> c.items.filter(Q(language__exact='Python') - | - Q(language__exact='Ruby') - - above query will only filter the data where language is - either 'Python' or 'Ruby' - - For more documentation see README on Github - - :param args : ``Q`` objects - :param kwargs : _lookup parameters - :return: QuerySet of the filtered docs - - """ - return self.__class__(filter_items(self.data, *args, **kwargs)) - - def select(self, *args, **kwargs) -> 'QuerySet': - """Selects specific fields of the data - - e.g. to select just the keys 'framework' and 'type' from many - keys, :: - - >>> c.items.select('framework', 'type') - - - :param args : field names to select - :param kwargs : optional keyword args - :return: subset of fields - """ - flatten = kwargs.pop('flatten', False) - f = dunder_truncate if flatten else undunder_keys - result = (f(d) for d in include_keys(self.data, args)) - return self.__class__(result) - - def __iter__(self): - for d in self.data: - yield d - - -# QuerySet given an alias for backward compatibility -Collection = QuerySet - - -## filter and _lookup functions - - -def filter_items(items: Iterable, *args, **kwargs) -> Iterable: - """Filters an iterable using _lookup parameters - - :param items : iterable - :param args : ``Q`` objects - :param kwargs : _lookup parameters - :return: lazy iterable (generator) - - """ - q1 = list(args) if args else [] - q2 = [Q(**kwargs)] if kwargs else [] - lookup_groups = q1 + q2 - pred = lambda item: all(lg.evaluate(item) for lg in lookup_groups) - return (item for item in items if pred(item)) - - -def _lookup(key: str, val: Any, item: Dict) -> bool: - """Checks if key-val pair exists in item using various _lookup types - - The _lookup types are derived from the `key` and then used to check - if the _lookup holds true for the item:: - - >>> _lookup('request__url__exact', 'http://example.com', item) - - The above will return True if item['request']['url'] == - 'http://example.com' else False - - :param key : (str) that represents the field name to find - :param val : (mixed) object to match the value in the item against - :param item : (dict) - :return: (boolean) True if field-val exists else False - - """ - init, last = dunder_partition(key) - if last == 'exact': - return dunder_get(item, init) == val - elif last == 'neq': - return dunder_get(item, init) != val - elif last == 'contains': - val = guard_str(val) - return iff_not_none(dunder_get(item, init), lambda y: val in y) - elif last == 'icontains': - val = guard_str(val) - return iff_not_none(dunder_get(item, init), lambda y: val.lower() in y.lower()) - elif last == 'in': - val = guard_iter(val) - return dunder_get(item, init) in val - elif last == 'startswith': - val = guard_str(val) - return iff_not_none(dunder_get(item, init), lambda y: y.startswith(val)) - elif last == 'istartswith': - val = guard_str(val) - return iff_not_none( - dunder_get(item, init), lambda y: y.lower().startswith(val.lower()) - ) - elif last == 'endswith': - val = guard_str(val) - return iff_not_none(dunder_get(item, init), lambda y: y.endswith(val)) - elif last == 'iendswith': - val = guard_str(val) - return iff_not_none( - dunder_get(item, init), lambda y: y.lower().endswith(val.lower()) - ) - elif last == 'gt': - return iff_not_none(dunder_get(item, init), lambda y: y > val) - elif last == 'gte': - return iff_not_none(dunder_get(item, init), lambda y: y >= val) - elif last == 'lt': - return iff_not_none(dunder_get(item, init), lambda y: y < val) - elif last == 'lte': - return iff_not_none(dunder_get(item, init), lambda y: y <= val) - elif last == 'regex': - return iff_not_none( - dunder_get(item, init), lambda y: re.search(val, y) is not None - ) - elif last == 'filter': - val = guard_Q(val) - result = guard_iter(dunder_get(item, init)) - return len(list(filter_items(result, val))) > 0 - else: - return dunder_get(item, key) == val - - -## Classes to compose compound lookups (Q object) - - -class LookupTreeElem: - """Base class for a child in the _lookup expression tree""" - - def __init__(self): - """Set constructor method.""" - self.negate = False - - def evaluate(self, item: Dict) -> bool: - """Evaluates the expression represented by the object for the item, needs to be implemented in subclass. - - :param item : (dict) item - :return: (boolean) whether _lookup passed or failed - """ - raise NotImplementedError - - def __or__(self, other): - node = LookupNode() - node.op = 'or' - node.add_child(self) - node.add_child(other) - return node - - def __and__(self, other): - node = LookupNode() - node.add_child(self) - node.add_child(other) - return node - - -class LookupNode(LookupTreeElem): - """A node (element having children) in the _lookup expression tree - - Typically it's any object composed of two ``Q`` objects eg:: - - >>> Q(language__neq='Ruby') | Q(framework__startswith='S') - >>> ~Q(language__exact='PHP') - - """ - - def __init__(self): - """Set constructor method.""" - super().__init__() - self.children = [] - self.op = 'and' - - def add_child(self, child): - """ - Add child node into `self.children` list. - - :param child: Node needs to be added. - """ - self.children.append(child) - - def evaluate(self, item: Dict) -> bool: - """Evaluates the expression represented by the object for the item - - :param item : (dict) item - :return: (boolean) whether _lookup passed or failed - """ - results = map(lambda x: x.evaluate(item), self.children) - result = any(results) if self.op == 'or' else all(results) - return not result if self.negate else result - - def __invert__(self): - newnode = LookupNode() - for c in self.children: - newnode.add_child(c) - newnode.negate = not self.negate - return newnode - - -class LookupLeaf(LookupTreeElem): - """Class for a leaf in the _lookup expression tree""" - - def __init__(self, **kwargs): - """Set constructor method.""" - super().__init__() - self.lookups = kwargs - - def evaluate(self, item: Dict) -> bool: - """Evaluates the expression represented by the object for the item - - :param item : (dict) item - :return: (boolean) whether _lookup passed or failed - """ - result = all(_lookup(k, v, item) for k, v in self.lookups.items()) - return not result if self.negate else result - - def __invert__(self): - newleaf = LookupLeaf(**self.lookups) - newleaf.negate = not self.negate - return newleaf - - -# alias LookupLeaf to Q -Q = LookupLeaf - - -## functions that work on the keys in a dict - - -def include_keys( - items: Iterable[Dict[str, Any]], fields: Iterable[str] -) -> Iterable[Dict]: - """Function to keep only specified fields in data - - Returns a list of dict with only the keys mentioned in the - `fields` param:: - - >>> include_keys(items, ['request__url', 'response__status']) - - Note: the resulting keys are "dundered", as they appear in `fields`, - rather than nested as they are in `items`. - - :param items : iterable of dicts - :param fields : (iterable) fieldnames to keep - :return: lazy iterable - - """ - return ({f: dunder_get(item, f) for f in fields} for item in items) - - -guard_Q = partial(guard_type, Q) diff --git a/jina/types/request/__init__.py b/jina/types/request/__init__.py index c60016016d905..5431830d5acf7 100644 --- a/jina/types/request/__init__.py +++ b/jina/types/request/__init__.py @@ -3,7 +3,6 @@ from google.protobuf import json_format from ..mixin import ProtoTypeMixin -from ..arrays import QueryLangArray from ...enums import CompressAlgo, RequestType from ...excepts import BadRequestType from ...helper import random_identity, typename @@ -13,10 +12,8 @@ _trigger_body_fields = set( kk for v in [ - jina_pb2.RequestProto.IndexRequestProto, - jina_pb2.RequestProto.SearchRequestProto, - jina_pb2.RequestProto.TrainRequestProto, jina_pb2.RequestProto.ControlRequestProto, + jina_pb2.RequestProto.DataRequestProto, ] for kk in v.DESCRIPTOR.fields_by_name.keys() ) @@ -130,29 +127,14 @@ def as_typed_request(self, request_type: str): :param request_type: string representation of the request type :return: self """ - from .train import TrainRequest - from .search import SearchRequest from .control import ControlRequest - from .index import IndexRequest - from .delete import DeleteRequest - from .update import UpdateRequest - from .dump import DumpRequest + from .data import DataRequest rt = request_type.upper() - if rt.startswith(str(RequestType.TRAIN)): - self.__class__ = TrainRequest - elif rt.startswith(str(RequestType.DELETE)): - self.__class__ = DeleteRequest - elif rt.startswith(str(RequestType.INDEX)): - self.__class__ = IndexRequest - elif rt.startswith(str(RequestType.SEARCH)): - self.__class__ = SearchRequest - elif rt.startswith(str(RequestType.UPDATE)): - self.__class__ = UpdateRequest + if rt.startswith(str(RequestType.DATA)): + self.__class__ = DataRequest elif rt.startswith(str(RequestType.CONTROL)): self.__class__ = ControlRequest - elif rt.startswith(str(RequestType.DUMP)): - self.__class__ = DumpRequest else: raise TypeError(f'{request_type} is not recognized') return self @@ -243,16 +225,6 @@ def SerializeToString(self) -> bytes: # no touch, skip serialization, return original return self._buffer - @property - def queryset(self) -> 'QueryLangArray': - """ - Get the queryset in :class:`QueryLangArray` type. - - :return: query lang set - """ - self.is_used = True - return QueryLangArray(self.proto.queryset) - def as_response(self): """ Return a weak reference of this object but as :class:`Response` object. It gives a more diff --git a/jina/types/request/common.py b/jina/types/request/common.py deleted file mode 100644 index 47b1b860b6236..0000000000000 --- a/jina/types/request/common.py +++ /dev/null @@ -1,50 +0,0 @@ -from . import Request -from ...proto import jina_pb2 - -__all__ = [ - 'TrainDryRunRequest', - 'IndexDryRunRequest', - 'SearchDryRunRequest', - 'ControlDryRunRequest', - 'DryRunRequest', -] - - -class DryRunRequest(Request): - """Base empty request for dry run.""" - - -class TrainDryRunRequest(DryRunRequest): - """Empty train request for dry run.""" - - def __init__(self): - """Set the constructor.""" - super().__init__() - self.proto.train.CopyFrom(jina_pb2.RequestProto.TrainRequestProto()) - - -class IndexDryRunRequest(DryRunRequest): - """Empty index request for dry run.""" - - def __init__(self): - """Set the constructor.""" - super().__init__() - self.proto.index.CopyFrom(jina_pb2.RequestProto.IndexRequestProto()) - - -class SearchDryRunRequest(DryRunRequest): - """Empty search request for dry run.""" - - def __init__(self): - """Set the constructor.""" - super().__init__() - self.proto.search.CopyFrom(jina_pb2.RequestProto.SearchRequestProto()) - - -class ControlDryRunRequest(DryRunRequest): - """Empty control request for dry run.""" - - def __init__(self): - """Set the constructor.""" - super().__init__() - self.proto.control.CopyFrom(jina_pb2.RequestProto.ControlRequestProto()) diff --git a/jina/types/request/control.py b/jina/types/request/control.py index 59b0e6940a14b..e30f2717c5659 100644 --- a/jina/types/request/control.py +++ b/jina/types/request/control.py @@ -1,5 +1,3 @@ -from typing import Dict - from . import Request from .mixin import CommandMixin @@ -8,14 +6,4 @@ class ControlRequest(Request, CommandMixin): """Control request class.""" - @property - def args(self): - """struct args - - - .. #noqa: DAR201""" - return self.proto._args - - @args.setter - def args(self, value: Dict): - self.args.update(value) + pass diff --git a/jina/types/request/data.py b/jina/types/request/data.py new file mode 100644 index 0000000000000..9d5bf69934bfc --- /dev/null +++ b/jina/types/request/data.py @@ -0,0 +1,22 @@ +from . import Request +from .mixin import * + + +class DataRequest(Request, DocsPropertyMixin, GroundtruthPropertyMixin): + """Data request class.""" + + @property + def endpoint(self) -> str: + """Get the endpoint. + + # noqa: DAR201""" + return self.body.endpoint + + @endpoint.setter + def endpoint(self, val: str): + """Set the endpoint. + + # noqa: DAR201 + # noqa: DAR101 + """ + self.body.endpoint = val diff --git a/jina/types/request/delete.py b/jina/types/request/delete.py deleted file mode 100644 index 60e96ee732d43..0000000000000 --- a/jina/types/request/delete.py +++ /dev/null @@ -1,8 +0,0 @@ -from . import Request -from .mixin import IdsMixin - - -class DeleteRequest(Request, IdsMixin): - """Delete request class.""" - - pass diff --git a/jina/types/request/dump.py b/jina/types/request/dump.py deleted file mode 100644 index ac31254fe3e69..0000000000000 --- a/jina/types/request/dump.py +++ /dev/null @@ -1,37 +0,0 @@ -from . import Request - - -class DumpRequest(Request): - """A request telling an Indexer to dump its data""" - - @property - def path(self): - """ - - - .. # noqa: DAR102 - - - .. # noqa: DAR201 - """ - return self.body.path - - @path.setter - def path(self, value): - self.body.path = value - - @property - def shards(self): - """ - - - .. # noqa: DAR102 - - - .. # noqa: DAR201 - """ - return self.body.shards - - @shards.setter - def shards(self, value): - self.body.shards = value diff --git a/jina/types/request/index.py b/jina/types/request/index.py deleted file mode 100644 index 7b406487c57f6..0000000000000 --- a/jina/types/request/index.py +++ /dev/null @@ -1,8 +0,0 @@ -from . import Request -from .mixin import * - - -class IndexRequest(Request, DocsPropertyMixin, GroundtruthPropertyMixin): - """Index request class.""" - - pass diff --git a/jina/types/request/mixin.py b/jina/types/request/mixin.py index b307871da4a65..97f189040dc54 100644 --- a/jina/types/request/mixin.py +++ b/jina/types/request/mixin.py @@ -1,4 +1,4 @@ -from ... import DocumentArray +from ..arrays import DocumentArray from ...proto import jina_pb2 @@ -9,8 +9,7 @@ class DocsPropertyMixin: def docs(self) -> 'DocumentArray': """Get the :class: `DocumentArray` with sequence `body.docs` as content. - - .. #noqa: DAR201""" + .. # noqa: DAR201""" self.is_used = True return DocumentArray(self.body.docs) @@ -22,24 +21,11 @@ class GroundtruthPropertyMixin: def groundtruths(self) -> 'DocumentArray': """Get the groundtruths in :class: `DocumentArray` type. - - .. #noqa: DAR201""" + .. # noqa: DAR201""" self.is_used = True return DocumentArray(self.body.groundtruths) -class IdsMixin: - """Mixin class of ids property.""" - - @property - def ids(self): - """Get the ids. - - - .. #noqa: DAR201""" - return self.body.ids - - class CommandMixin: """Mixin class of command property.""" @@ -47,7 +33,6 @@ class CommandMixin: def command(self) -> str: """Get the command. - .. #noqa: DAR201""" self.is_used = True return jina_pb2.RequestProto.ControlRequestProto.Command.Name( diff --git a/jina/types/request/search.py b/jina/types/request/search.py deleted file mode 100644 index 7493da5a056e6..0000000000000 --- a/jina/types/request/search.py +++ /dev/null @@ -1,8 +0,0 @@ -from . import Request -from .mixin import * - - -class SearchRequest(Request, DocsPropertyMixin, GroundtruthPropertyMixin): - """Search request class.""" - - pass diff --git a/jina/types/request/train.py b/jina/types/request/train.py deleted file mode 100644 index 784131fa59d7a..0000000000000 --- a/jina/types/request/train.py +++ /dev/null @@ -1,8 +0,0 @@ -from . import Request -from .mixin import * - - -class TrainRequest(Request, DocsPropertyMixin, GroundtruthPropertyMixin): - """Train request class.""" - - pass diff --git a/jina/types/request/update.py b/jina/types/request/update.py deleted file mode 100644 index 0a54497e7ff71..0000000000000 --- a/jina/types/request/update.py +++ /dev/null @@ -1,8 +0,0 @@ -from . import Request -from .mixin import * - - -class UpdateRequest(Request, DocsPropertyMixin, GroundtruthPropertyMixin): - """Update request class.""" - - pass diff --git a/jina/types/score/__init__.py b/jina/types/score/__init__.py index 9de8aa2c126c3..bb50faad39b9c 100644 --- a/jina/types/score/__init__.py +++ b/jina/types/score/__init__.py @@ -70,6 +70,22 @@ def __init__( self.set_attrs(**kwargs) + @property + def value(self) -> float: + """ + Return the ``value`` of this NamedScore, the `id` of which this NamedScore is a score. + :return:: the score value + """ + return self._pb_body.value + + @value.setter + def value(self, val: float): + """ + Set the ``value`` to :attr:`value`. + :param val: The score value to set + """ + self._pb_body.value = val + @property def ref_id(self) -> str: """ @@ -86,6 +102,38 @@ def ref_id(self, val: str): """ self._pb_body.ref_id = val + @property + def op_name(self) -> str: + """ + Return the ``op_name`` of this NamedScore + :return:: the op_name + """ + return self._pb_body.op_name + + @op_name.setter + def op_name(self, val: str): + """ + Set the ``op_name`` to :param: `val`. + :param val: The op_name value to set + """ + self._pb_body.op_name = val + + @property + def description(self) -> str: + """ + Return the ``description`` of this NamedScore + :return:: the description + """ + return self._pb_body.description + + @description.setter + def description(self, val: str): + """ + Set the ``description`` to :param: `val`. + :param val: The description value to set + """ + self._pb_body.description = val + @property def operands(self) -> List['NamedScore']: """ diff --git a/jina/types/sets/__init__.py b/jina/types/sets/__init__.py deleted file mode 100644 index 13748b4839a7d..0000000000000 --- a/jina/types/sets/__init__.py +++ /dev/null @@ -1,7 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from .document import DocumentSet -from .querylang import QueryLangSet -from .chunk import ChunkSet -from .match import MatchSet diff --git a/jina/types/sets/chunk.py b/jina/types/sets/chunk.py deleted file mode 100644 index 0e7b9956a954b..0000000000000 --- a/jina/types/sets/chunk.py +++ /dev/null @@ -1,12 +0,0 @@ -from .document import DocumentSet -from ...helper import deprecated_class -from ..arrays.chunk import ChunkArray - - -@deprecated_class(new_class=ChunkArray) -class ChunkSet(DocumentSet): - """ - :class:`ChunkSet` is deprecated. A new class name is ChunkArray. - """ - - pass diff --git a/jina/types/sets/doc_groundtruth.py b/jina/types/sets/doc_groundtruth.py deleted file mode 100644 index 0b592391bdea6..0000000000000 --- a/jina/types/sets/doc_groundtruth.py +++ /dev/null @@ -1,15 +0,0 @@ -from .traversable import TraversableSequence -from ..arrays.doc_groundtruth import DocumentGroundtruthSequence -from ...helper import deprecated_class - - -@deprecated_class( - new_class=DocumentGroundtruthSequence, - custom_msg="The class has been moved to '..types.arrays', keeping its original name.", -) -class DocumentGroundtruthSequence(TraversableSequence): - """ - :class:`DocumentGroundtruthSequence` is deprecated. It moved to `jina.types.array.doc_groundtruth`. - """ - - pass diff --git a/jina/types/sets/document.py b/jina/types/sets/document.py deleted file mode 100644 index 535267f8bb124..0000000000000 --- a/jina/types/sets/document.py +++ /dev/null @@ -1,18 +0,0 @@ -from collections.abc import MutableSequence - -from ...helper import deprecated_class - -from .traversable import TraversableSequence - -from ..arrays.document import DocumentArray - -__all__ = ['DocumentSet'] - - -@deprecated_class(new_class=DocumentArray) -class DocumentSet(TraversableSequence, MutableSequence): - """ - :class:`DocumentSet` is deprecated. A new class name is ChunkArray. - """ - - pass diff --git a/jina/types/sets/match.py b/jina/types/sets/match.py deleted file mode 100644 index 44554fcefaed2..0000000000000 --- a/jina/types/sets/match.py +++ /dev/null @@ -1,13 +0,0 @@ -from .document import DocumentSet -from ...helper import deprecated_class - -from ..arrays.match import MatchArray - - -@deprecated_class(new_class=MatchArray) -class MatchSet(DocumentSet): - """ - :class:`MatchSet` is deprecated. A new class name is MatchArray. - """ - - pass diff --git a/jina/types/sets/querylang.py b/jina/types/sets/querylang.py deleted file mode 100644 index ef7da31108aee..0000000000000 --- a/jina/types/sets/querylang.py +++ /dev/null @@ -1,22 +0,0 @@ -from collections.abc import MutableSequence -from typing import Iterable, Union, Dict - -from ...helper import deprecated_class - -from ..querylang import QueryLang -from ...proto.jina_pb2 import QueryLangProto - -from ..arrays.querylang import QueryLangArray - -AcceptQueryLangType = Union[QueryLang, QueryLangProto, Dict] - -__all__ = ['QueryLangSet', 'AcceptQueryLangType'] - - -@deprecated_class(new_class=QueryLangArray) -class QueryLangSet(MutableSequence): - """ - :class:`QueryLangSet` is deprecated. A new class name is QueryLangArray. - """ - - pass diff --git a/jina/types/sets/traversable.py b/jina/types/sets/traversable.py deleted file mode 100644 index cd7c237340d43..0000000000000 --- a/jina/types/sets/traversable.py +++ /dev/null @@ -1,94 +0,0 @@ -from typing import Iterable -import itertools - -from ..arrays.traversable import TraversableSequence -from ...helper import deprecated_class - -if False: - from ..document.traversable import Traversable - - -@deprecated_class( - new_class=TraversableSequence, - custom_msg="The class has been moved to '..types.arrays', keeping its original name.", -) -class TraversableSequence: - """ - A mixin used for traversing a `Sequence[Traversable]`. - """ - - def __iter__(self) -> Iterable['Traversable']: - raise NotImplementedError - - def traverse( - self, traversal_paths: Iterable[str] - ) -> Iterable['TraversableSequence']: - """ - Return an Iterator of :class:``TraversableSequence`` of the leaves when applying the traversal_paths. - Each :class:``TraversableSequence`` is either the root Documents, a ChunkArray or a MatchArray. - - :param traversal_paths: a list of string that represents the traversal path - :yield: :class:``TraversableSequence`` of the leaves when applying the traversal_paths. - - Example on ``traversal_paths``: - - - [`r`]: docs in this TraversableSequence - - [`m`]: all match-documents at adjacency 1 - - [`c`]: all child-documents at granularity 1 - - [`cc`]: all child-documents at granularity 2 - - [`mm`]: all match-documents at adjacency 2 - - [`cm`]: all match-document at adjacency 1 and granularity 1 - - [`r`, `c`]: docs in this TraversableSequence and all child-documents at granularity 1 - - """ - - for p in traversal_paths: - yield from self._traverse(self, p) - - @staticmethod - def _traverse(docs: 'TraversableSequence', path: str): - if path: - loc = path[0] - if loc == 'r': - yield from TraversableSequence._traverse(docs, path[1:]) - elif loc == 'm': - for d in docs: - yield from TraversableSequence._traverse(d.matches, path[1:]) - elif loc == 'c': - for d in docs: - yield from TraversableSequence._traverse(d.chunks, path[1:]) - else: - yield docs - - def traverse_flattened_per_path( - self, traversal_paths: Iterable[str] - ) -> Iterable['TraversableSequence']: - """ - Returns a flattened :class:``TraversableSequence`` per path in :param:``traversal_paths`` - with all Documents, that are reached by the path. - - :param traversal_paths: a list of string that represents the traversal path - :yield: :class:``TraversableSequence`` containing the document of all leaves per path. - """ - for p in traversal_paths: - yield self._flatten(self._traverse(self, p)) - - def traverse_flatten(self, traversal_paths: Iterable[str]) -> 'TraversableSequence': - """ - Returns a single flattened :class:``TraversableSequence`` with all Documents, that are reached - via the :param:``traversal_paths``. - - .. warning:: - When defining the :param:``traversal_paths`` with multiple paths, the returned - :class:``Documents`` are determined at once and not on the fly. This is a different - behavior then in :method:``traverse`` and :method:``traverse_flattened_per_path``! - - :param traversal_paths: a list of string that represents the traversal path - :return: a singel :class:``TraversableSequence`` containing the document of all leaves when applying the traversal_paths. - """ - leaves = self.traverse(traversal_paths) - return self._flatten(leaves) - - @classmethod - def _flatten(cls, sequence): - return cls(list(itertools.chain.from_iterable(sequence))) diff --git a/scripts/docstrings_lint.sh b/scripts/docstrings_lint.sh index 79fea02b87162..98382c4a31c12 100755 --- a/scripts/docstrings_lint.sh +++ b/scripts/docstrings_lint.sh @@ -11,6 +11,8 @@ arrVar=() for changed_file in $CHANGED_FILES; do case ${changed_file} in tests/* | \ + .github/* | \ + jina/helloworld/* | \ jina/proto/jina_pb2.py | \ jina/proto/jina_pb2_grpc.py) ;;*) diff --git a/scripts/release.sh b/scripts/release.sh index 8161506f421e5..4ae1b463277f7 100755 --- a/scripts/release.sh +++ b/scripts/release.sh @@ -81,6 +81,8 @@ LAST_VER=$(git tag -l | sort -V | tail -n1) printf "last version: \e[1;32m$LAST_VER\e[0m\n" if [[ $1 == "final" ]]; then + printf "final release 2.0 is disabled until 2.0.0rc is mature!" + exit 1; printf "this will be a final release: \e[1;33m$RELEASE_VER\e[0m\n" NEXT_VER=$(echo $RELEASE_VER | awk -F. -v OFS=. 'NF==1{print ++$NF}; NF>1{$NF=sprintf("%0*d", length($NF), ($NF+1)); print}') @@ -93,9 +95,26 @@ if [[ $1 == "final" ]]; then pub_pypi VER_TAG_NEXT=$VER_TAG\'${NEXT_VER}\' + update_ver_line "$VER_TAG" "$VER_TAG_NEXT" "$INIT_FILE" RELEASE_REASON="$2" RELEASE_ACTOR="$3" + git_commit + slack_notif +elif [[ $1 == 'rc' ]]; then + printf "this will be a release candidate: \e[1;33m$RELEASE_VER\e[0m\n" + DOT_RELEASE_VER=$(echo $RELEASE_VER | sed "s/rc/\./") + NEXT_VER=$(echo $DOT_RELEASE_VER | awk -F. -v OFS=. 'NF==1{print ++$NF}; NF>1{$NF=sprintf("%0*d", length($NF), ($NF+1)); print}') + NEXT_VER=$(echo $NEXT_VER | sed "s/\.\([^.]*\)$/rc\1/") + printf "bump master version to: \e[1;32m$NEXT_VER\e[0m, this will be the next version\n" + + make_release_note + + pub_pypi + + VER_TAG_NEXT=$VER_TAG\'${NEXT_VER}\' update_ver_line "$VER_TAG" "$VER_TAG_NEXT" "$INIT_FILE" + RELEASE_REASON="$2" + RELEASE_ACTOR="$3" git_commit slack_notif else diff --git a/tests/__init__.py b/tests/__init__.py index 07765ab701b76..00974e4fd678d 100644 --- a/tests/__init__.py +++ b/tests/__init__.py @@ -1,13 +1,10 @@ import os -import shutil import sys from typing import Iterator import numpy as np -import pytest + from jina import Document -from jina.executors.decorators import batching, as_ndarray -from jina.executors.encoders import BaseTextEncoder file_dir = os.path.dirname(__file__) sys.path.append(os.path.dirname(file_dir)) @@ -45,151 +42,8 @@ def random_docs( yield d -def rm_files(file_paths): - for file_path in file_paths: - if os.path.exists(file_path): - if os.path.isfile(file_path): - os.remove(file_path) - elif os.path.isdir(file_path): - shutil.rmtree(file_path, ignore_errors=False, onerror=None) - - def validate_callback(mock, validate_func): for args, kwargs in mock.call_args_list: validate_func(*args, **kwargs) mock.assert_called() - - -np.random.seed(0) -d_embedding = np.array([1, 1, 1, 1, 1, 1, 1]) -c_embedding = np.array([2, 2, 2, 2, 2, 2, 2]) - - -def get_documents(chunks, same_content, nr=10, index_start=0, same_tag_content=None): - next_chunk_id = nr + index_start - for i in range(index_start, nr + index_start): - with Document() as d: - d.id = i - if same_content: - d.text = 'hello world' - d.embedding = d_embedding - else: - d.text = f'hello world {i}' - d.embedding = np.random.random(d_embedding.shape) - if same_tag_content: - d.tags['tag_field'] = 'tag data' - elif same_tag_content is False: - d.tags['tag_field'] = f'tag data {i}' - for j in range(chunks): - with Document() as c: - c.id = next_chunk_id - if same_content: - c.text = 'hello world from chunk' - c.embedding = c_embedding - else: - c.text = f'hello world from chunk {j}' - c.embedding = np.random.random(d_embedding.shape) - if same_tag_content: - c.tags['tag field'] = 'tag data' - elif same_tag_content is False: - c.tags['tag field'] = f'tag data {next_chunk_id}' - next_chunk_id += 1 - d.chunks.append(c) - yield d - - -def check_docs(chunk_content, chunks, same_content, docs, ids_used, index_start=0): - for i, d in enumerate(docs): - i += index_start - id_int = d.id - assert id_int not in ids_used - ids_used.add(id_int) - - if same_content: - assert d.text == 'hello world' - np.testing.assert_almost_equal(d.embedding, d_embedding) - else: - assert d.text == f'hello world {i}' - assert d.embedding.shape == d_embedding.shape - - assert len(d.chunks) == chunks - - for j, c in enumerate(d.chunks): - id_int = c.id - assert id_int not in ids_used - ids_used.add(id_int) - if same_content: - if chunk_content is None: - chunk_content = c.content_hash - assert c.content_hash == chunk_content - assert c.text == 'hello world from chunk' - np.testing.assert_almost_equal(c.embedding, c_embedding) - else: - assert c.text == f'hello world from chunk {j}' - assert c.embedding.shape == c_embedding.shape - - -@pytest.mark.parametrize('chunks', [0, 3, 5]) -@pytest.mark.parametrize('same_content', [False, True]) -@pytest.mark.parametrize('nr', [0, 10, 100, 201]) -def test_docs_generator(chunks, same_content, nr): - chunk_content = None - docs = list(get_documents(chunks=chunks, same_content=same_content, nr=nr)) - assert len(docs) == nr - ids_used = set() - check_docs(chunk_content, chunks, same_content, docs, ids_used) - - if nr > 0: - index_start = 1 + len(list(ids_used)) - else: - index_start = 1 - new_docs = list( - get_documents( - chunks=chunks, same_content=same_content, nr=nr, index_start=index_start - ) - ) - new_ids = set([d.id for d in new_docs]) - assert len(new_ids.intersection(ids_used)) == 0 - check_docs(chunk_content, chunks, same_content, new_docs, ids_used, index_start) - - -class DummyOneHotTextEncoder(BaseTextEncoder): - """ - One-hot Encoder encodes the characters into one-hot vectors. ONLY FOR TESTING USAGES. - :param on_value: the default value for the locations represented by characters - :param off_value: the default value for the locations not represented by characters - """ - - def __init__(self, on_value: float = 1, off_value: float = 0, *args, **kwargs): - super().__init__(*args, **kwargs) - self.offset = 32 - self.dim = ( - 127 - self.offset + 2 - ) # only the Unicode code point between 32 and 127 are embedded, and the rest are considered as ``UNK``` - self.unk = self.dim - self.on_value = on_value - self.off_value = off_value - self.embeddings = None - - def post_init(self): - self.embeddings = ( - np.eye(self.dim) * self.on_value - + (np.ones((self.dim, self.dim)) - np.eye(self.dim)) * self.off_value - ) - - @batching - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - """ - - :param content: each row is one character, an 1d array of string type (data.dtype.kind == 'U') in size B - :return: an ndarray of `B x D` - """ - output = [] - for r in content: - r_emb = [ - ord(c) - self.offset if self.offset <= ord(c) <= 127 else self.unk - for c in r - ] - output.append(self.embeddings[r_emb, :].sum(axis=0)) - return output diff --git a/tests/daemon/unit/api/endpoints/bad_flow.yml b/tests/daemon/unit/api/endpoints/bad_flow.yml index 1e88d86103fa9..8cd2238ca2d16 100644 --- a/tests/daemon/unit/api/endpoints/bad_flow.yml +++ b/tests/daemon/unit/api/endpoints/bad_flow.yml @@ -1,4 +1,4 @@ !Flow version: '1.0' pods: - - use: BADFLow \ No newline at end of file + - uses: BADFLow \ No newline at end of file diff --git a/tests/daemon/unit/api/endpoints/good_flow.yml b/tests/daemon/unit/api/endpoints/good_flow.yml index 53298401a9e9a..14a39e7fa9c6d 100644 --- a/tests/daemon/unit/api/endpoints/good_flow.yml +++ b/tests/daemon/unit/api/endpoints/good_flow.yml @@ -1,5 +1,5 @@ !Flow version: '1.0' pods: - - uses: _pass - - uses: _pass \ No newline at end of file + - name: hello + - name: world \ No newline at end of file diff --git a/tests/daemon/unit/api/endpoints/good_flow_dep.yml b/tests/daemon/unit/api/endpoints/good_flow_dep.yml index a0ba66252d49f..08d55256bbf92 100644 --- a/tests/daemon/unit/api/endpoints/good_flow_dep.yml +++ b/tests/daemon/unit/api/endpoints/good_flow_dep.yml @@ -2,4 +2,4 @@ version: '1.0' pods: - uses: mwu_encoder.yml - - uses: _pass \ No newline at end of file + - name: hello \ No newline at end of file diff --git a/tests/daemon/unit/api/endpoints/good_flow_jtype.yml b/tests/daemon/unit/api/endpoints/good_flow_jtype.yml index 13699b0b6e83a..c5ae3a0646814 100644 --- a/tests/daemon/unit/api/endpoints/good_flow_jtype.yml +++ b/tests/daemon/unit/api/endpoints/good_flow_jtype.yml @@ -1,5 +1,5 @@ jtype: Flow version: '1.0' pods: - - uses: _pass - - uses: _pass \ No newline at end of file + - name: hello + - name: world \ No newline at end of file diff --git a/tests/daemon/unit/api/endpoints/mwu_encoder.py b/tests/daemon/unit/api/endpoints/mwu_encoder.py index ff74b8d7897c3..de26fc54c97f4 100644 --- a/tests/daemon/unit/api/endpoints/mwu_encoder.py +++ b/tests/daemon/unit/api/endpoints/mwu_encoder.py @@ -1,16 +1,13 @@ from typing import Any -import numpy as np +from jina import Executor, requests -from jina.executors.encoders import BaseEncoder - -class MWUEncoder(BaseEncoder): +class MWUEncoder(Executor): def __init__(self, greetings: str, *args, **kwargs): super().__init__(*args, **kwargs) self._greetings = greetings - self.logger.success(f'look at me! {greetings}') - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.logger.info(f'{self._greetings} {content}') - return np.random.random([content.shape[0], 3]) + @requests + def encode(self, *args, **kwargs) -> Any: + pass diff --git a/tests/daemon/unit/stores/flow.yml b/tests/daemon/unit/stores/flow.yml index 1fe47c5146c16..52ca71b2a3f42 100644 --- a/tests/daemon/unit/stores/flow.yml +++ b/tests/daemon/unit/stores/flow.yml @@ -2,5 +2,4 @@ version: '1.0' pods: - name: pod1 - uses: _pass read_only: true diff --git a/tests/daemon/unit/stores/test_peapodstore.py b/tests/daemon/unit/stores/test_peapodstore.py index 725538045e8e4..f0376003a6bbe 100644 --- a/tests/daemon/unit/stores/test_peapodstore.py +++ b/tests/daemon/unit/stores/test_peapodstore.py @@ -1,7 +1,7 @@ import pytest from daemon.stores import PeaStore, PodStore -from jina import Crafter +from jina import Executor from jina.parsers import set_pea_parser, set_pod_parser @@ -36,7 +36,7 @@ def test_peastore_multi_add(parser, store): 'parser, store', [(set_pea_parser, PeaStore), (set_pod_parser, PodStore)] ) def test_peapod_store_add_bad(parser, store): - class BadCrafter(Crafter): + class BadCrafter(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) raise NotImplementedError diff --git a/tests/distributed/helpers.py b/tests/distributed/helpers.py index aec7f054b7c93..8df80e50ddb77 100644 --- a/tests/distributed/helpers.py +++ b/tests/distributed/helpers.py @@ -4,7 +4,6 @@ import requests -from jina import Client from jina.parsers import set_client_cli_parser @@ -123,11 +122,3 @@ def create_flow( print(f'Checking if the flow creation is succeeded: {r.json()}') assert r.status_code == 201 return r.json() - - -def get_client(port): - args = set_client_cli_parser().parse_args( - ['--host', 'localhost', '--port-expose', str(port), '--return-results'] - ) - - return Client(args) diff --git a/tests/distributed/test_against_external_daemon/delayed_executor.py b/tests/distributed/test_against_external_daemon/delayed_executor.py index d7515334082ac..4d71527ddf9d6 100644 --- a/tests/distributed/test_against_external_daemon/delayed_executor.py +++ b/tests/distributed/test_against_external_daemon/delayed_executor.py @@ -1,14 +1,12 @@ import time -from jina.executors.decorators import as_ndarray -from jina.executors.encoders import BaseEncoder +from jina.executors import BaseExecutor -class DelayedExecutor(BaseEncoder): +class DelayedExecutor(BaseExecutor): def post_init(self): self.logger.info('sleeping for 8 secs') time.sleep(8) - @as_ndarray - def encode(self, content: 'np.ndarray', *args, **kwargs): - return [[1, 2]] * len(content) + def encode(self, **kwargs): + pass diff --git a/tests/distributed/test_against_external_daemon/mwu_encoder.py b/tests/distributed/test_against_external_daemon/mwu_encoder.py index ff74b8d7897c3..7886e31439035 100644 --- a/tests/distributed/test_against_external_daemon/mwu_encoder.py +++ b/tests/distributed/test_against_external_daemon/mwu_encoder.py @@ -1,16 +1,14 @@ from typing import Any -import numpy as np +from jina.executors.decorators import requests +from jina.executors import BaseExecutor -from jina.executors.encoders import BaseEncoder - -class MWUEncoder(BaseEncoder): +class MWUEncoder(BaseExecutor): def __init__(self, greetings: str, *args, **kwargs): super().__init__(*args, **kwargs) self._greetings = greetings - self.logger.success(f'look at me! {greetings}') - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.logger.info(f'{self._greetings} {content}') - return np.random.random([content.shape[0], 3]) + @requests + def encode(self, **kwargs) -> Any: + pass diff --git a/tests/distributed/test_against_external_daemon/test_single_instance.py b/tests/distributed/test_against_external_daemon/test_single_instance.py index 2bbda0a7702d9..5fc04aa59f3ec 100644 --- a/tests/distributed/test_against_external_daemon/test_single_instance.py +++ b/tests/distributed/test_against_external_daemon/test_single_instance.py @@ -3,7 +3,7 @@ import numpy as np import pytest -from jina import Flow +from jina import Flow, Document from tests import random_docs cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -22,7 +22,10 @@ def test_r_l_simple(silent_log, parallels, mocker): .add(parallel=parallels) ) with f: - f.index(('hello' for _ in range(NUM_DOCS)), on_done=response_mock) + f.index( + inputs=(Document(text='hello') for _ in range(NUM_DOCS)), + on_done=response_mock, + ) response_mock.assert_called() @@ -38,7 +41,10 @@ def test_l_r_simple(silent_log, parallels, mocker): .add(host=CLOUD_HOST, parallel=parallels, quiet_remote_logs=silent_log) ) with f: - f.index(('hello' for _ in range(NUM_DOCS)), on_done=response_mock) + f.index( + inputs=(Document(text='hello') for _ in range(NUM_DOCS)), + on_done=response_mock, + ) response_mock.assert_called() @@ -54,7 +60,10 @@ def test_r_l_r_simple(silent_log, parallels, mocker): .add(host=CLOUD_HOST, parallel=parallels, quiet_remote_logs=silent_log) ) with f: - f.index(('hello' for _ in range(NUM_DOCS)), on_done=response_mock) + f.index( + inputs=(Document(text='hello') for _ in range(NUM_DOCS)), + on_done=response_mock, + ) response_mock.assert_called() @@ -70,7 +79,10 @@ def test_r_r_r_simple(silent_log, parallels, mocker): .add(host=CLOUD_HOST, parallel=parallels, quiet_remote_logs=silent_log) ) with f: - f.index(('hello' for _ in range(NUM_DOCS)), on_done=response_mock) + f.index( + inputs=(Document(text='hello') for _ in range(NUM_DOCS)), + on_done=response_mock, + ) response_mock.assert_called() @@ -86,7 +98,10 @@ def test_l_r_l_simple(silent_log, parallels, mocker): .add() ) with f: - f.index(('hello' for _ in range(NUM_DOCS)), on_done=response_mock) + f.index( + inputs=(Document(text='hello') for _ in range(NUM_DOCS)), + on_done=response_mock, + ) response_mock.assert_called() @@ -107,7 +122,10 @@ def test_l_r_l_with_upload(silent_log, parallels, mocker): .add() ) with f: - f.index_ndarray(np.random.random([NUM_DOCS, 100]), on_done=response_mock) + f.index( + inputs=(Document(blob=np.random.random([1, 100])) for _ in range(NUM_DOCS)), + on_done=response_mock, + ) response_mock.assert_called() @@ -143,7 +161,10 @@ def test_l_r_l_with_upload_remote(silent_log, parallels, docker_image, mocker): .add() ) with f: - f.index_ndarray(np.random.random([NUM_DOCS, 100]), on_done=response_mock) + f.index( + inputs=(Document(blob=np.random.random([1, 100])) for _ in range(NUM_DOCS)), + on_done=response_mock, + ) response_mock.assert_called() @@ -162,4 +183,4 @@ def test_create_pea_timeout(parallels): .add() ) with f: - f.index(random_docs(10)) + f.index(inputs=random_docs(10)) diff --git a/tests/distributed/test_index_query/.env b/tests/distributed/test_index_query/.env deleted file mode 100644 index d91821045429b..0000000000000 --- a/tests/distributed/test_index_query/.env +++ /dev/null @@ -1,4 +0,0 @@ -JINA_ENCODER_HOST=encoder -JINA_INDEXER_HOST=indexer -JINA_GATEWAY_REST_PORT=45678 -JINA_WORKSPACE=workspace diff --git a/tests/distributed/test_index_query/__init__.py b/tests/distributed/test_index_query/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_index_query/docker-compose.yml b/tests/distributed/test_index_query/docker-compose.yml deleted file mode 100644 index 5e966aef08777..0000000000000 --- a/tests/distributed/test_index_query/docker-compose.yml +++ /dev/null @@ -1,28 +0,0 @@ -version: "3.3" -services: - flow: - image: test_index_query - build: - context: . - dockerfile: Dockerfiles/debianx.Dockerfile - ports: - - "8000:8000" - - "45678:45678" - env_file: - - tests/distributed/test_index_query/.env - expose: - - 10000-60000 - encoder: - image: test_index_query - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_index_query/.env - indexer: - image: test_index_query - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_index_query/.env diff --git a/tests/distributed/test_index_query/flow.yml b/tests/distributed/test_index_query/flow.yml deleted file mode 100644 index 9b273384dcc32..0000000000000 --- a/tests/distributed/test_index_query/flow.yml +++ /dev/null @@ -1,18 +0,0 @@ -!Flow -version: '1' -with: - read_only: true - rest_api: true - port_expose: $JINA_GATEWAY_REST_PORT -pods: - - name: encoder - uses: encode.yml - host: $JINA_ENCODER_HOST - port_expose: 8000 - upload_files: - - dummy-encoder.py - - name: indexer - uses: index.yml - read_only: False - host: $JINA_INDEXER_HOST - port_expose: 8000 diff --git a/tests/distributed/test_index_query/pods/__init__.py b/tests/distributed/test_index_query/pods/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_index_query/pods/dummy-encoder.py b/tests/distributed/test_index_query/pods/dummy-encoder.py deleted file mode 100644 index 5490ee4375fde..0000000000000 --- a/tests/distributed/test_index_query/pods/dummy-encoder.py +++ /dev/null @@ -1,16 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.encoders import BaseEncoder -from jina.executors.decorators import batching - - -class DummyEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=1) - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - return np.random.random((1, 3)) diff --git a/tests/distributed/test_index_query/pods/encode.yml b/tests/distributed/test_index_query/pods/encode.yml deleted file mode 100644 index 922f18188bc80..0000000000000 --- a/tests/distributed/test_index_query/pods/encode.yml +++ /dev/null @@ -1,5 +0,0 @@ -!DummyEncoder -with: - {} -metas: - py_modules: dummy-encoder.py diff --git a/tests/distributed/test_index_query/pods/index.yml b/tests/distributed/test_index_query/pods/index.yml deleted file mode 100644 index dcce922fee8df..0000000000000 --- a/tests/distributed/test_index_query/pods/index.yml +++ /dev/null @@ -1,16 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: indexer - workspace: $JINA_WORKSPACE diff --git a/tests/distributed/test_index_query/test_integration.py b/tests/distributed/test_index_query/test_integration.py deleted file mode 100644 index b4f3212c34b9d..0000000000000 --- a/tests/distributed/test_index_query/test_integration.py +++ /dev/null @@ -1,67 +0,0 @@ -import os - -import pytest - -from ..helpers import create_workspace, create_flow_2, assert_request - -cur_dir = os.path.dirname(os.path.abspath(__file__)) -compose_yml = os.path.join(cur_dir, 'docker-compose.yml') -flow_yaml = os.path.join(cur_dir, 'flow.yml') -pod_dir = os.path.join(cur_dir, 'pods') -dependencies = [ - f'{pod_dir}/index.yml', - f'{pod_dir}/encode.yml', - f'{pod_dir}/dummy-encoder.py', -] - - -@pytest.mark.parametrize('docker_compose', [compose_yml], indirect=['docker_compose']) -def test_index_query(docker_compose): - print(f'\nCreating workspace with dependencies') - workspace_id = create_workspace(filepaths=dependencies) - - print(f'\nCreating Flow: {flow_yaml} with workspace_id: {workspace_id}') - index_flow_id = create_flow_2(flow_yaml=flow_yaml, workspace_id=workspace_id) - - print(f'\nIndexing: `hey, dude`') - r = assert_request( - method='post', - url='http://localhost:45678/index', - payload={'top_k': 10, 'data': ['text:hey, dude']}, - ) - text_indexed = r['index']['docs'][0]['text'] - assert text_indexed == 'text:hey, dude' - - print(f'\nFetching index flow id: {index_flow_id}') - r = assert_request(method='get', url=f'http://localhost:8000/flows/{index_flow_id}') - - print(f'\nDeleting index flow id: {index_flow_id}, but keeping the workspace alive') - r = assert_request( - method='delete', - url=f'http://localhost:8000/flows/{index_flow_id}', - payload={'workspace': False}, - ) - - print(f'\nCreating query Flow {flow_yaml} with workspace_id: {workspace_id}') - query_flow_id = create_flow_2(flow_yaml=flow_yaml, workspace_id=workspace_id) - assert query_flow_id is not None - - print(f'\nQuerying any text') - r = assert_request( - method='post', - url='http://0.0.0.0:45678/search', - payload={'top_k': 10, 'data': ['text:anything will match the same']}, - ) - print(f'returned: {r}') - text_matched = r['search']['docs'][0]['matches'][0]['text'] - assert text_matched == 'text:hey, dude' - - print(f'\nFetching query flow id: {query_flow_id}') - assert_request(method='get', url=f'http://localhost:8000/flows/{query_flow_id}') - - print(f'\nDeleting query flow id: {index_flow_id}, along with the workspace') - assert_request( - method='delete', - url=f'http://localhost:8000/flows/{query_flow_id}', - payload={'workspace': True}, - ) diff --git a/tests/distributed/test_index_query_with_shards/.env b/tests/distributed/test_index_query_with_shards/.env deleted file mode 100644 index d91821045429b..0000000000000 --- a/tests/distributed/test_index_query_with_shards/.env +++ /dev/null @@ -1,4 +0,0 @@ -JINA_ENCODER_HOST=encoder -JINA_INDEXER_HOST=indexer -JINA_GATEWAY_REST_PORT=45678 -JINA_WORKSPACE=workspace diff --git a/tests/distributed/test_index_query_with_shards/__init__.py b/tests/distributed/test_index_query_with_shards/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_index_query_with_shards/docker-compose.yml b/tests/distributed/test_index_query_with_shards/docker-compose.yml deleted file mode 100644 index f05890875e33e..0000000000000 --- a/tests/distributed/test_index_query_with_shards/docker-compose.yml +++ /dev/null @@ -1,28 +0,0 @@ -version: "3.3" -services: - flow: - image: test_index_query_with_shards - build: - context: . - dockerfile: Dockerfiles/debianx.Dockerfile - ports: - - "8000:8000" - - "45678:45678" - env_file: - - tests/distributed/test_index_query_with_shards/.env - expose: - - 10000-60000 - encoder: - image: test_index_query_with_shards - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_index_query_with_shards/.env - indexer: - image: test_index_query_with_shards - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_index_query_with_shards/.env diff --git a/tests/distributed/test_index_query_with_shards/flow.yml b/tests/distributed/test_index_query_with_shards/flow.yml deleted file mode 100644 index d2e6ce0296f9e..0000000000000 --- a/tests/distributed/test_index_query_with_shards/flow.yml +++ /dev/null @@ -1,24 +0,0 @@ -!Flow -version: '1' -with: - read_only: true - rest_api: true - port_expose: $JINA_GATEWAY_REST_PORT -pods: - - name: encoder - uses: encode.yml - host: $JINA_ENCODER_HOST - port_expose: 8000 - upload_files: - - dummy-encoder.py - - name: indexer - uses: index.yml - read_only: False - parallel: 3 - polling: all - host: $JINA_INDEXER_HOST - port_expose: 8000 - - name: slice - uses: slice.yml - host: $JINA_INDEXER_HOST - port_expose: 8000 diff --git a/tests/distributed/test_index_query_with_shards/flow_distributed_peas_in_pod.yml b/tests/distributed/test_index_query_with_shards/flow_distributed_peas_in_pod.yml deleted file mode 100644 index 93ff97319bee1..0000000000000 --- a/tests/distributed/test_index_query_with_shards/flow_distributed_peas_in_pod.yml +++ /dev/null @@ -1,26 +0,0 @@ -!Flow -version: '1' -with: - read_only: true - rest_api: true - port_expose: $JINA_GATEWAY_REST_PORT -pods: - - name: encoder - uses: encode.yml - host: $JINA_ENCODER_HOST - port_expose: 8000 - upload_files: - - dummy-encoder.py - - name: indexer - uses: index.yml - read_only: False - parallel: 3 - polling: all - host: $JINA_INDEXER_HOST - peas_hosts: - - $JINA_ENCODER_HOST - port_expose: 8000 - - name: slice - uses: slice.yml - host: $JINA_INDEXER_HOST - port_expose: 8000 diff --git a/tests/distributed/test_index_query_with_shards/pods/__init__.py b/tests/distributed/test_index_query_with_shards/pods/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_index_query_with_shards/pods/dummy-encoder.py b/tests/distributed/test_index_query_with_shards/pods/dummy-encoder.py deleted file mode 100644 index 5490ee4375fde..0000000000000 --- a/tests/distributed/test_index_query_with_shards/pods/dummy-encoder.py +++ /dev/null @@ -1,16 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.encoders import BaseEncoder -from jina.executors.decorators import batching - - -class DummyEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=1) - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - return np.random.random((1, 3)) diff --git a/tests/distributed/test_index_query_with_shards/pods/encode.yml b/tests/distributed/test_index_query_with_shards/pods/encode.yml deleted file mode 100644 index 922f18188bc80..0000000000000 --- a/tests/distributed/test_index_query_with_shards/pods/encode.yml +++ /dev/null @@ -1,5 +0,0 @@ -!DummyEncoder -with: - {} -metas: - py_modules: dummy-encoder.py diff --git a/tests/distributed/test_index_query_with_shards/pods/index.yml b/tests/distributed/test_index_query_with_shards/pods/index.yml deleted file mode 100644 index dcce922fee8df..0000000000000 --- a/tests/distributed/test_index_query_with_shards/pods/index.yml +++ /dev/null @@ -1,16 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: indexer - workspace: $JINA_WORKSPACE diff --git a/tests/distributed/test_index_query_with_shards/pods/slice.yml b/tests/distributed/test_index_query_with_shards/pods/slice.yml deleted file mode 100644 index ccd7177aa135d..0000000000000 --- a/tests/distributed/test_index_query_with_shards/pods/slice.yml +++ /dev/null @@ -1,16 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: forward -requests: - on: - IndexRequest: - - !ForwardDriver {} - SearchRequest: - - !SliceQL - with: - start: 0 - end: 10 - traversal_paths: ['m'] - ControlRequest: - - !ControlReqDriver {} diff --git a/tests/distributed/test_index_query_with_shards/test_integration.py b/tests/distributed/test_index_query_with_shards/test_integration.py deleted file mode 100644 index d23d3995ce0a6..0000000000000 --- a/tests/distributed/test_index_query_with_shards/test_integration.py +++ /dev/null @@ -1,73 +0,0 @@ -import os - -import pytest - -from ..helpers import create_workspace, create_flow_2, assert_request - -cur_dir = os.path.dirname(os.path.abspath(__file__)) -compose_yml = os.path.join(cur_dir, 'docker-compose.yml') -pod_dir = os.path.join(cur_dir, 'pods') -dependencies = [ - f'{pod_dir}/index.yml', - f'{pod_dir}/encode.yml', - f'{pod_dir}/slice.yml', - f'{pod_dir}/dummy-encoder.py', -] - - -@pytest.mark.parametrize( - 'flow_yml', - [ - os.path.join(cur_dir, 'flow.yml'), - os.path.join(cur_dir, 'flow_distributed_peas_in_pod.yml'), - ], -) -@pytest.mark.parametrize('docker_compose', [compose_yml], indirect=['docker_compose']) -def test_flow(flow_yml, docker_compose): - print(f'\nCreating workspace with dependencies') - workspace_id = create_workspace(filepaths=dependencies) - - print(f'\nCreating Flow: {flow_yml} with workspace_id: {workspace_id}') - index_flow_id = create_flow_2(flow_yaml=flow_yml, workspace_id=workspace_id) - - for x in range(100): - text = 'text:hey, dude ' + str(x) - print(f'Indexing with text: {text}') - r = assert_request( - method='post', - url='http://0.0.0.0:45678/index', - payload={'top_k': 10, 'data': [text]}, - ) - text_indexed = r['index']['docs'][0]['text'] - print(f'Got response text_indexed: {text_indexed}') - # assert text_indexed == text - - assert_request(method='get', url=f'http://localhost:8000/flows/{index_flow_id}') - - assert_request( - method='delete', - url=f'http://localhost:8000/flows/{index_flow_id}', - payload={'workspace': False}, - ) - - print(f'\nCreating query Flow {flow_yml} with workspace_id: {workspace_id}') - query_flow_id = create_flow_2(flow_yaml=flow_yml, workspace_id=workspace_id) - assert query_flow_id is not None - - print(f'\nQuerying any text') - r = assert_request( - method='post', - url='http://0.0.0.0:45678/search', - payload={'top_k': 100, 'data': ['text:anything will match the same']}, - ) - print(f'returned: {r}') - texts_matched = r['search']['docs'][0]['matches'] - assert len(texts_matched) == 100 - - assert_request(method='get', url=f'http://localhost:8000/flows/{query_flow_id}') - - assert_request( - method='delete', - url=f'http://localhost:8000/flows/{query_flow_id}', - payload={'workspace': True}, - ) diff --git a/tests/distributed/test_join_local_from_remote/flow.yml b/tests/distributed/test_join_local_from_remote/flow.yml index 0e13f5fa87ab8..b0e70d89f0af3 100644 --- a/tests/distributed/test_join_local_from_remote/flow.yml +++ b/tests/distributed/test_join_local_from_remote/flow.yml @@ -6,11 +6,9 @@ with: rest_api: false pods: - name: pod1 - uses: _pass host: $JINA_POD1_HOST port_expose: 8000 - name: pod2 - uses: _pass parallel: 2 host: $JINA_POD2_HOST port_expose: 8000 @@ -20,7 +18,7 @@ pods: port_expose: 8000 needs: [gateway] - name: join_all - uses: _merge method: needs + uses: merge.yml needs: [pod2, pod3] read_only: true diff --git a/jina/drivers/querylang/__init__.py b/tests/distributed/test_join_local_from_remote/pods/__init__.py similarity index 100% rename from jina/drivers/querylang/__init__.py rename to tests/distributed/test_join_local_from_remote/pods/__init__.py diff --git a/tests/distributed/test_join_local_from_remote/pods/merge.py b/tests/distributed/test_join_local_from_remote/pods/merge.py new file mode 100644 index 0000000000000..fb2548565ed65 --- /dev/null +++ b/tests/distributed/test_join_local_from_remote/pods/merge.py @@ -0,0 +1,8 @@ +from jina import requests, Executor + + +class MergeExecutor(Executor): + @requests + def foo(self, docs, **kwargs): + print(f' MERGE EXECUTOR HERE {len(docs)}') + return docs diff --git a/tests/distributed/test_join_local_from_remote/pods/merge.yml b/tests/distributed/test_join_local_from_remote/pods/merge.yml new file mode 100644 index 0000000000000..608e632bd6e48 --- /dev/null +++ b/tests/distributed/test_join_local_from_remote/pods/merge.yml @@ -0,0 +1,3 @@ +!MergeExecutor +metas: + py_modules: merge.py \ No newline at end of file diff --git a/tests/distributed/test_join_local_from_remote/test_integration.py b/tests/distributed/test_join_local_from_remote/test_integration.py index 428a2a0dc1cc3..672bf96aa0dee 100644 --- a/tests/distributed/test_join_local_from_remote/test_integration.py +++ b/tests/distributed/test_join_local_from_remote/test_integration.py @@ -2,10 +2,11 @@ import pytest -from ..helpers import create_flow_2, assert_request -from jina import Client, Document +from jina import Document +from jina.clients import Client from jina.parsers import set_client_cli_parser from tests import validate_callback +from ..helpers import create_flow, assert_request cur_dir = os.path.dirname(os.path.abspath(__file__)) compose_yml = os.path.join(cur_dir, 'docker-compose.yml') @@ -33,12 +34,12 @@ def client(): @pytest.mark.parametrize('docker_compose', [compose_yml], indirect=['docker_compose']) def test_flow(docker_compose, doc_to_index, client, mocker): def validate_resp(resp): - assert len(resp.search.docs) == 2 - assert resp.search.docs[0].text == 'test' - assert resp.search.docs[1].text == 'test' + assert len(resp.data.docs) == 2 + assert resp.data.docs[0].text == 'test' + assert resp.data.docs[1].text == 'test' mock = mocker.Mock() - flow_id = create_flow_2(flow_yaml=flow_yaml) + flow_id = create_flow(flow_yaml=flow_yaml, pod_dir=os.path.join(cur_dir, 'pods')) client.search(inputs=[doc_to_index], on_done=mock) diff --git a/tests/distributed/test_local_flow_local_remote_local/flow.yml b/tests/distributed/test_local_flow_local_remote_local/flow.yml index 2c51744576748..5c5084096a06e 100644 --- a/tests/distributed/test_local_flow_local_remote_local/flow.yml +++ b/tests/distributed/test_local_flow_local_remote_local/flow.yml @@ -6,10 +6,8 @@ with: pods: - name: crafter # for now all the uploaded files are stored in the same folder, so no subpaths allowed in remote - uses: _pass read_only: False - name: encoder - uses: _pass host: $JINA_ENCODER_HOST port_in: 45003 port_ctrl: 45006 @@ -17,7 +15,6 @@ pods: needs: $JINA_ENCODER_NEEDS - name: indexer # for now all the uploaded files are stored in the same folder, so no subpaths allowed in remote - uses: _pass read_only: False method: $JINA_INDEXER_METHOD needs: $JINA_INDEXER_NEEDS diff --git a/tests/distributed/test_local_flow_local_remote_local/test_integration.py b/tests/distributed/test_local_flow_local_remote_local/test_integration.py index c6e4e9bb89c6a..34358a4ec3d62 100644 --- a/tests/distributed/test_local_flow_local_remote_local/test_integration.py +++ b/tests/distributed/test_local_flow_local_remote_local/test_integration.py @@ -22,8 +22,8 @@ def test_flow( text = 'cats rules' def validate_output(resp): - assert len(resp.index.docs) == 1 - assert resp.index.docs[0].text == text + assert len(resp.data.docs) == 1 + assert resp.data.docs[0].text == text os.environ['JINA_ENCODER_HOST'] = '172.28.1.1' os.environ['JINA_WORKSPACE'] = str(tmpdir) diff --git a/tests/distributed/test_local_flow_remote_local_remote/flow.yml b/tests/distributed/test_local_flow_remote_local_remote/flow.yml index a5181406c85ea..311dcae8371ce 100644 --- a/tests/distributed/test_local_flow_remote_local_remote/flow.yml +++ b/tests/distributed/test_local_flow_remote_local_remote/flow.yml @@ -5,18 +5,15 @@ with: pods: - name: crafter # for now all the uploaded files are stored in the same folder, so no subpaths allowed in remote - uses: _pass host: $JINA_CRAFTER_HOST read_only: False port_in: 45000 port_ctrl: 45001 port_expose: 8000 - name: encoder - uses: _pass needs: $JINA_ENCODER_NEEDS - name: indexer # for now all the uploaded files are stored in the same folder, so no subpaths allowed in remote - uses: _pass read_only: False host: $JINA_INDEXER_HOST port_in: 45004 diff --git a/tests/distributed/test_local_flow_remote_local_remote/test_integration.py b/tests/distributed/test_local_flow_remote_local_remote/test_integration.py index c44080b6ad785..40065064f2210 100644 --- a/tests/distributed/test_local_flow_remote_local_remote/test_integration.py +++ b/tests/distributed/test_local_flow_remote_local_remote/test_integration.py @@ -20,8 +20,8 @@ def test_flow(docker_compose, mocker, encoder_needs, indexer_needs): text = 'cats rules' def validate_output(resp): - assert len(resp.index.docs) == 1 - assert resp.index.docs[0].text == text + assert len(resp.data.docs) == 1 + assert resp.data.docs[0].text == text os.environ['JINA_CRAFTER_HOST'] = '172.28.1.1' os.environ['JINA_INDEXER_HOST'] = '172.28.1.2' diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/docker-compose.yml b/tests/distributed/test_remote_flow_dump_rolling_update/docker-compose.yml index b19ed0da91538..3a3435c9306c5 100644 --- a/tests/distributed/test_remote_flow_dump_rolling_update/docker-compose.yml +++ b/tests/distributed/test_remote_flow_dump_rolling_update/docker-compose.yml @@ -1,7 +1,7 @@ version: "3.3" services: jinad: - image: test_index_query + image: test_remote_flow_dump_reload build: context: . dockerfile: Dockerfiles/debianx.Dockerfile diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/dump_size.txt b/tests/distributed/test_remote_flow_dump_rolling_update/dump_size.txt deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/flow_query.yml b/tests/distributed/test_remote_flow_dump_rolling_update/flow_query.yml index 4f6b4b15ecfae..b55ae4127fda2 100644 --- a/tests/distributed/test_remote_flow_dump_rolling_update/flow_query.yml +++ b/tests/distributed/test_remote_flow_dump_rolling_update/flow_query.yml @@ -10,5 +10,5 @@ pods: replicas: 2 parallel: 3 polling: all - uses_after: _merge_matches + uses_after: merge_matches.yml timeout_ready: '-1' \ No newline at end of file diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/indexer_dbms.yml b/tests/distributed/test_remote_flow_dump_rolling_update/indexer_dbms.yml deleted file mode 100644 index 1c8d8d4eb5675..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update/indexer_dbms.yml +++ /dev/null @@ -1,16 +0,0 @@ -jtype: KeyValueDBMSIndexer -with: - index_filename: keyvalue.bin -metas: - workspace: workspace_dbms - name: keyvalue -requests: - on: - [IndexRequest, UpdateRequest]: - - jtype: DBMSIndexDriver - with: - executor: keyvalue - DeleteRequest: - - jtype: DeleteDriver - with: - executor: keyvalue diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/indexer_query.yml b/tests/distributed/test_remote_flow_dump_rolling_update/indexer_query.yml deleted file mode 100644 index 1aadbffafc97e..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update/indexer_query.yml +++ /dev/null @@ -1,28 +0,0 @@ -jtype: CompoundQueryExecutor -components: - - jtype: NumpyQueryIndexer - with: - index_filename: vec.bin - metric: euclidean - metas: - name: vecidx - - jtype: BinaryPbQueryIndexer - with: - index_filename: kv.bin - metas: - name: kvidx -metas: - name: compound_indexer - workspace: workspace_query -requests: - on: - SearchRequest: - - jtype: VectorSearchDriver - with: - executor: vecidx - fill_embedding: True # to test embeddings - - jtype: KVSearchDriver - with: - executor: kvidx - is_update: True - traversal_paths: ['m'] \ No newline at end of file diff --git a/jina/drivers/train/__init__.py b/tests/distributed/test_remote_flow_dump_rolling_update/pods/__init__.py similarity index 100% rename from jina/drivers/train/__init__.py rename to tests/distributed/test_remote_flow_dump_rolling_update/pods/__init__.py diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/pods/executors.py b/tests/distributed/test_remote_flow_dump_rolling_update/pods/executors.py new file mode 100644 index 0000000000000..a40cbb588f31c --- /dev/null +++ b/tests/distributed/test_remote_flow_dump_rolling_update/pods/executors.py @@ -0,0 +1,109 @@ +import os + +from typing import Optional + +import numpy as np +from jina import Executor, requests, DocumentArray, Document +from jina.logging import JinaLogger + + +class KeyValueDBMSIndexer(Executor): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self._docs = DocumentArray() + self.logger = JinaLogger('KeyValueDBMSIndexer') + + @requests(on='/index') + def index(self, docs: 'DocumentArray', *args, **kwargs): + self._docs.extend(docs) + + @requests(on='/dump') + def dump(self, parameters, *args, **kwargs): + dump_path = parameters['dump_path'] + shards = int(parameters['shards']) + # TODO: maybe put some logic for shards here + self._docs.save(dump_path) + + +class CompoundQueryExecutor(Executor): + def __init__(self, dump_path: Optional[str] = None, *args, **kwargs): + super().__init__(*args, **kwargs) + self.logger = JinaLogger('CompoundQueryExecutor') + self._dump_path = dump_path + if self._dump_path is not None and os.path.exists(self._dump_path): + self._docs = DocumentArray.load(self._dump_path) + else: + self._docs = DocumentArray() + + @staticmethod + def _get_sorted_top_k(dist: 'np.array', top_k: int): + if top_k >= dist.shape[1]: + idx = dist.argsort(axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx, axis=1) + else: + idx_ps = dist.argpartition(kth=top_k, axis=1)[:, :top_k] + dist = np.take_along_axis(dist, idx_ps, axis=1) + idx_fs = dist.argsort(axis=1) + idx = np.take_along_axis(idx_ps, idx_fs, axis=1) + dist = np.take_along_axis(dist, idx_fs, axis=1) + + return idx, dist + + @requests(on='/search') + def search(self, docs: 'DocumentArray', parameters, **kwargs): + if len(self._docs) > 0: + a = np.stack(docs.get_attributes('embedding')) + b = np.stack(self._docs.get_attributes('embedding')) + q_emb = _ext_A(_norm(a)) + d_emb = _ext_B(_norm(b)) + dists = _cosine(q_emb, d_emb) + idx, dist = self._get_sorted_top_k(dists, int(parameters['top_k'])) + for _q, _ids, _dists in zip(docs, idx, dist): + for _id, _dist in zip(_ids, _dists): + d = Document(self._docs[int(_id)], copy=True) + d.score.value = 1 - _dist + _q.matches.append(d) + + +def _get_ones(x, y): + return np.ones((x, y)) + + +def _ext_A(A): + nA, dim = A.shape + A_ext = _get_ones(nA, dim * 3) + A_ext[:, dim : 2 * dim] = A + A_ext[:, 2 * dim :] = A ** 2 + return A_ext + + +def _ext_B(B): + nB, dim = B.shape + B_ext = _get_ones(dim * 3, nB) + B_ext[:dim] = (B ** 2).T + B_ext[dim : 2 * dim] = -2.0 * B.T + del B + return B_ext + + +def _euclidean(A_ext, B_ext): + sqdist = A_ext.dot(B_ext).clip(min=0) + return np.sqrt(sqdist) + + +def _norm(A): + return A / np.linalg.norm(A, ord=2, axis=1, keepdims=True) + + +def _cosine(A_norm_ext, B_norm_ext): + return A_norm_ext.dot(B_norm_ext).clip(min=0) / 2 + + +class MergeMatchesExecutor(Executor): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.logger = JinaLogger('MergeMatchesExecutor') + + @requests + def merge(self, *args, **kwargs): + pass diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/pods/indexer_dbms.yml b/tests/distributed/test_remote_flow_dump_rolling_update/pods/indexer_dbms.yml new file mode 100644 index 0000000000000..1569a30d24765 --- /dev/null +++ b/tests/distributed/test_remote_flow_dump_rolling_update/pods/indexer_dbms.yml @@ -0,0 +1,5 @@ +jtype: KeyValueDBMSIndexer +metas: + workspace: workspace_dbms + name: keyvalue + py_modules: [executors.py] diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/pods/indexer_query.yml b/tests/distributed/test_remote_flow_dump_rolling_update/pods/indexer_query.yml new file mode 100644 index 0000000000000..c5d4f407ffd00 --- /dev/null +++ b/tests/distributed/test_remote_flow_dump_rolling_update/pods/indexer_query.yml @@ -0,0 +1,7 @@ +jtype: CompoundQueryExecutor +with: + dump_path: /tmp/dump # TEMPORARY SOLUTION UNTIL WE FIND HOW TO RESTART WITH DUMP_PATH +metas: + name: compound_indexer + workspace: workspace_query + py_modules: [executors.py] diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/pods/merge_matches.yml b/tests/distributed/test_remote_flow_dump_rolling_update/pods/merge_matches.yml new file mode 100644 index 0000000000000..e2f5cd6c770d2 --- /dev/null +++ b/tests/distributed/test_remote_flow_dump_rolling_update/pods/merge_matches.yml @@ -0,0 +1,4 @@ +jtype: MergeMatchesExecutor +metas: + name: merge_matches + py_modules: [executors.py] \ No newline at end of file diff --git a/tests/distributed/test_remote_flow_dump_rolling_update/test_dumb_dbms_remote.py b/tests/distributed/test_remote_flow_dump_rolling_update/test_dump_dbms_remote.py similarity index 63% rename from tests/distributed/test_remote_flow_dump_rolling_update/test_dumb_dbms_remote.py rename to tests/distributed/test_remote_flow_dump_rolling_update/test_dump_dbms_remote.py index db690f8df5c7e..fa6af277756fb 100644 --- a/tests/distributed/test_remote_flow_dump_rolling_update/test_dumb_dbms_remote.py +++ b/tests/distributed/test_remote_flow_dump_rolling_update/test_dump_dbms_remote.py @@ -1,6 +1,5 @@ import os -from contextlib import ExitStack -from typing import List +from ..helpers import create_flow import numpy as np import pytest @@ -38,20 +37,17 @@ def _path_size_remote(this_dump_path): def _create_flows(): # create dbms flow - dbms_deps = [os.path.join(cur_dir, 'indexer_dbms.yml')] - dbms_flow_id = _create_flow( - dbms_flow_yml, - dbms_deps, - flow_url=f'http://localhost:{JINAD_PORT_DBMS}/flows', - ws_url=f'http://localhost:{JINAD_PORT_DBMS}/workspaces', + dbms_flow_id = create_flow( + flow_yaml=dbms_flow_yml, + pod_dir=os.path.join(cur_dir, 'pods'), + url=f'http://localhost:{JINAD_PORT_DBMS}', ) + # create query flow - query_deps = [os.path.join(cur_dir, 'indexer_query.yml')] - query_flow_id = _create_flow( - query_flow_yml, - query_deps, - flow_url=f'http://localhost:{JINAD_PORT_QUERY}/flows', - ws_url=f'http://localhost:{JINAD_PORT_QUERY}/workspaces', + query_flow_id = create_flow( + flow_yaml=query_flow_yml, + pod_dir=os.path.join(cur_dir, 'pods'), + url=f'http://localhost:{JINAD_PORT_QUERY}', ) return dbms_flow_id, query_flow_id @@ -72,8 +68,8 @@ def test_dump_dbms_remote(docker_compose): ) # TODO some times it was None assert ( - r['search']['docs'][0].get('matches') is None - or r['search']['docs'][0].get('matches') == [] + r['data']['docs'][0].get('matches') is None + or r['data']['docs'][0].get('matches') == [] ) _send_rest_request(REST_PORT_DBMS, 'index', 'post', [doc.dict() for doc in docs]) @@ -99,45 +95,33 @@ def test_dump_dbms_remote(docker_compose): # data request goes to client r = _send_rest_request( - REST_PORT_QUERY, 'search', 'post', [doc.dict() for doc in docs[:nr_search]] + REST_PORT_QUERY, + 'search', + 'post', + [doc.dict() for doc in docs[:nr_search]], + params={'top_k': 100}, ) - for doc in r['search']['docs']: + for doc in r['data']['docs']: assert len(doc.get('matches')) == nr_docs -def _create_flow( - flow_yaml: str, - deps: List[str], - flow_url: str, - ws_url: str, -) -> str: - workspace_id = _create_workspace(deps, url=ws_url) - with open(flow_yaml, 'rb') as f: - r = requests.post( - flow_url, data={'workspace_id': workspace_id}, files={'flow': f} - ) - logger.info(f'Checking if the flow creation has succeeded: {r.json()}') - assert r.status_code == 201 - return r.json() - - -def _create_workspace(filepaths: List[str], url: str) -> str: - with ExitStack() as file_stack: - files = [ - ('files', file_stack.enter_context(open(filepath, 'rb'))) - for filepath in filepaths - ] - logger.info(f'uploading {files}') - r = requests.post(url, files=files) - assert r.status_code == 201 - - workspace_id = r.json() - logger.info(f'Got workspace_id: {workspace_id}') - return workspace_id - - -def _send_rest_request(port_expose, endpoint, method, data, timeout=13): +def _send_rest_request( + port_expose, + endpoint, + method, + data, + exec_endpoint=None, + params=None, + target_peapod=None, + timeout=13, +): json = {'data': data} + if params: + json['parameters'] = params + if target_peapod: + json['target_peapod'] = target_peapod + if exec_endpoint: + json['exec_endpoint'] = exec_endpoint url = f'http://0.0.0.0:{port_expose}/{endpoint}' r = getattr(requests, method)(url, json=json, timeout=timeout) @@ -161,15 +145,21 @@ def _get_documents(nr=10, index_start=0, emb_size=7): def _jinad_dump(pod_name, dump_path, shards, url): params = { - 'kind': 'dump', 'pod_name': pod_name, 'dump_path': dump_path, 'shards': shards, } # url params - logger.info(f'sending PUT req to dump') - r = requests.put(url, params=params) - assert r.status_code == 200 + logger.info(f'sending dump request') + _send_rest_request( + REST_PORT_DBMS, + 'post', + 'post', + data=[], + exec_endpoint='/dump', + params=params, + target_peapod=pod_name, + ) def _jinad_rolling_update(pod_name, dump_path, url): diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/__init__.py b/tests/distributed/test_remote_flow_dump_rolling_update_stress/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/docker-compose.yml b/tests/distributed/test_remote_flow_dump_rolling_update_stress/docker-compose.yml deleted file mode 100644 index b19ed0da91538..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update_stress/docker-compose.yml +++ /dev/null @@ -1,14 +0,0 @@ -version: "3.3" -services: - jinad: - image: test_index_query - build: - context: . - dockerfile: Dockerfiles/debianx.Dockerfile - ports: - - "8001:8000" - - "9000:9000" - - "9001:9001" - - "45678:45678" - expose: - - 10000-60000 diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/dump_size.txt b/tests/distributed/test_remote_flow_dump_rolling_update_stress/dump_size.txt deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/flow_dbms.yml b/tests/distributed/test_remote_flow_dump_rolling_update_stress/flow_dbms.yml deleted file mode 100644 index 77b07e55813a6..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update_stress/flow_dbms.yml +++ /dev/null @@ -1,9 +0,0 @@ -jtype: Flow -version: '1' -with: - read_only: false - rest_api: true - port_expose: 9000 -pods: - - name: indexer_dbms - uses: indexer_dbms.yml diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/flow_query.yml b/tests/distributed/test_remote_flow_dump_rolling_update_stress/flow_query.yml deleted file mode 100644 index 4f6b4b15ecfae..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update_stress/flow_query.yml +++ /dev/null @@ -1,14 +0,0 @@ -jtype: Flow -version: '1' -with: - read_only: false - rest_api: true - port_expose: 9001 -pods: - - name: indexer_query - uses: indexer_query.yml - replicas: 2 - parallel: 3 - polling: all - uses_after: _merge_matches - timeout_ready: '-1' \ No newline at end of file diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/indexer_dbms.yml b/tests/distributed/test_remote_flow_dump_rolling_update_stress/indexer_dbms.yml deleted file mode 100644 index 1c8d8d4eb5675..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update_stress/indexer_dbms.yml +++ /dev/null @@ -1,16 +0,0 @@ -jtype: KeyValueDBMSIndexer -with: - index_filename: keyvalue.bin -metas: - workspace: workspace_dbms - name: keyvalue -requests: - on: - [IndexRequest, UpdateRequest]: - - jtype: DBMSIndexDriver - with: - executor: keyvalue - DeleteRequest: - - jtype: DeleteDriver - with: - executor: keyvalue diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/indexer_query.yml b/tests/distributed/test_remote_flow_dump_rolling_update_stress/indexer_query.yml deleted file mode 100644 index 1aadbffafc97e..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update_stress/indexer_query.yml +++ /dev/null @@ -1,28 +0,0 @@ -jtype: CompoundQueryExecutor -components: - - jtype: NumpyQueryIndexer - with: - index_filename: vec.bin - metric: euclidean - metas: - name: vecidx - - jtype: BinaryPbQueryIndexer - with: - index_filename: kv.bin - metas: - name: kvidx -metas: - name: compound_indexer - workspace: workspace_query -requests: - on: - SearchRequest: - - jtype: VectorSearchDriver - with: - executor: vecidx - fill_embedding: True # to test embeddings - - jtype: KVSearchDriver - with: - executor: kvidx - is_update: True - traversal_paths: ['m'] \ No newline at end of file diff --git a/tests/distributed/test_remote_flow_dump_rolling_update_stress/test_dumb_dbms_remote.py b/tests/distributed/test_remote_flow_dump_rolling_update_stress/test_dumb_dbms_remote.py deleted file mode 100644 index 76b6c7099f8f4..0000000000000 --- a/tests/distributed/test_remote_flow_dump_rolling_update_stress/test_dumb_dbms_remote.py +++ /dev/null @@ -1,310 +0,0 @@ -import os -import sys -import time -import traceback -from contextlib import ExitStack -from threading import Thread -from typing import List - -import numpy as np -import pytest -import requests -from requests.exceptions import ConnectionError -from urllib3.exceptions import ReadTimeoutError, NewConnectionError - -from jina import Document, Client -from jina.logging import JinaLogger - -cur_dir = os.path.dirname(os.path.abspath(__file__)) -dbms_flow_yml = os.path.join(cur_dir, 'flow_dbms.yml') -query_flow_yml = os.path.join(cur_dir, 'flow_query.yml') -compose_yml = os.path.join(cur_dir, 'docker-compose.yml') - -JINAD_PORT_DBMS = '8001' -JINAD_PORT_QUERY = '8001' -REST_PORT_DBMS = '9000' -REST_PORT_QUERY = '9001' - -DUMP_PATH_DOCKER = '/tmp/dump' - -logger = JinaLogger('test-dump') - -SHARDS = 3 -EMB_SIZE = 10 - -# global between threads -KEEP_RUNNING = True -INDEX_TIMES = 0 -QUERY_TIMES = 0 -DUMP_ROLL_UPDATE_TIME = 0 - - -class MyThread(Thread): - def run(self) -> None: - try: - super().run() - except Exception as e: - logger.error(sys.exc_info()) - raise e - - -def _index_client(nr_docs_index): - global INDEX_TIMES - logger.info(f'starting index thread. KEEP_RUNNING = {KEEP_RUNNING}') - while KEEP_RUNNING: - docs = list( - _get_documents( - nr=nr_docs_index, - index_start=INDEX_TIMES * nr_docs_index, - emb_size=EMB_SIZE, - ) - ) - Client.check_input(docs) - logger.info(f'indexing {len(docs)} docs...') - _send_rest_request( - REST_PORT_DBMS, 'index', 'post', [doc.dict() for doc in docs] - ) - INDEX_TIMES += 1 - time.sleep(7) - - -def _query_client(nr_docs_query): - global QUERY_TIMES - logger.info(f'starting query thread. KEEP_RUNNING = {KEEP_RUNNING}') - prev_len_matches = 0 - docs = list(_get_documents(nr=nr_docs_query, index_start=0, emb_size=EMB_SIZE)) - Client.check_input(docs) - query_docs = [doc.dict() for doc in docs] - while KEEP_RUNNING: - try: - logger.info(f'querying...') - r = _send_rest_request( - REST_PORT_QUERY, - 'search', - 'post', - query_docs, - timeout=8, - ) - for doc in r['search']['docs']: - len_matches = len(doc.get('matches')) - assert len_matches >= prev_len_matches - logger.info(f'got {len_matches} matches') - if len_matches != prev_len_matches: - # only count queries after a change in index size - QUERY_TIMES += 1 - prev_len_matches = len_matches - time.sleep(3) - except (ConnectionError, ReadTimeoutError) as e: - logger.error(f'querying failed: {e}. trying again...') - logger.error(traceback.format_exc()) - except (NewConnectionError, Exception) as e: - logger.error(f'error in query thread: {e!r}') - raise e - - -def _dump_roll_update(dbms_flow_id, query_flow_id): - global DUMP_ROLL_UPDATE_TIME - logger.info(f'starting _dump_roll_update thread. KEEP_RUNNING = {KEEP_RUNNING}') - folder_id = 10 - while KEEP_RUNNING: - this_dump_path = os.path.join(DUMP_PATH_DOCKER, f'dump-{folder_id}') - # jinad is used for ctrl requests - logger.info(f'dumping...') - _jinad_dump( - 'indexer_dbms', - this_dump_path, # the internal path in the docker container - SHARDS, - f'http://localhost:{JINAD_PORT_DBMS}/flows/{dbms_flow_id}', - ) - - logger.info(f'checking size...') - dir_size = _path_size_remote(this_dump_path) - assert dir_size > 0 - logger.info(f'dump path size: {dir_size}') - - # jinad is used for ctrl requests - logger.info(f'rolling update...') - _jinad_rolling_update( - 'indexer_query', - this_dump_path, # the internal path in the docker container - f'http://localhost:{JINAD_PORT_QUERY}/flows/{query_flow_id}', - ) - folder_id += 1 - logger.info(f'rolling update done!') - DUMP_ROLL_UPDATE_TIME += 1 - time.sleep(10) - - -def _path_size_remote(this_dump_path): - os.system( - f'docker exec jina_jinad_1 /bin/bash -c "du -sh {this_dump_path}" > dump_size.txt' - ) - contents = open('dump_size.txt').readline() - dir_size = float(contents.split('K')[0].split('M')[0]) - return dir_size - - -@pytest.mark.parametrize('docker_compose', [compose_yml], indirect=['docker_compose']) -def test_dump_dbms_remote_stress(docker_compose, reraise): - def _inner_query_client(nr_docs_search): - with reraise: - _query_client(nr_docs_search) - - def _inner_index_client(nr_docs_index): - with reraise: - _index_client(nr_docs_index) - - def _inner_dump_rolling_update(dbms_flow_id, query_flow_id): - with reraise: - _dump_roll_update(dbms_flow_id, query_flow_id) - - global KEEP_RUNNING - nr_docs_index = 20 - nr_docs_search = 3 - - time.sleep(2) - dbms_flow_id, query_flow_id = _create_flows() - time.sleep(4) - - query_thread = MyThread( - target=_inner_query_client, - name='_query_client', - args=(nr_docs_search,), - daemon=True, - ) - query_thread.start() - - index_thread = MyThread( - target=_inner_index_client, - name='_index_client', - args=(nr_docs_index,), - daemon=True, - ) - index_thread.start() - - # give it time to index - time.sleep(2) - dump_roll_update_thread = MyThread( - target=_inner_dump_rolling_update, - name='_dump_roll_update', - args=(dbms_flow_id, query_flow_id), - daemon=True, - ) - dump_roll_update_thread.start() - - threads = [query_thread, index_thread, dump_roll_update_thread] - - logger.info('sleeping') - time.sleep(60) - KEEP_RUNNING = False - - for t in threads: - if not t.is_alive(): - logger.warning(f'something went wrong in thread {t.name}') - t.join() - assert False, f'check error from thread {t.name}' - - assert INDEX_TIMES > 3 - assert QUERY_TIMES > 3 - assert DUMP_ROLL_UPDATE_TIME > 2 - - logger.info(f'ending and exit threads') - - -def _create_flows(): - # create dbms flow - dbms_deps = [os.path.join(cur_dir, 'indexer_dbms.yml')] - dbms_flow_id = _create_flow( - dbms_flow_yml, - dbms_deps, - flow_url=f'http://localhost:{JINAD_PORT_DBMS}/flows', - ws_url=f'http://localhost:{JINAD_PORT_DBMS}/workspaces', - ) - # create query flow - query_deps = [os.path.join(cur_dir, 'indexer_query.yml')] - query_flow_id = _create_flow( - query_flow_yml, - query_deps, - flow_url=f'http://localhost:{JINAD_PORT_QUERY}/flows', - ws_url=f'http://localhost:{JINAD_PORT_QUERY}/workspaces', - ) - return dbms_flow_id, query_flow_id - - -def _create_flow( - flow_yaml: str, - deps: List[str], - flow_url: str, - ws_url: str, -) -> str: - workspace_id = _create_workspace(deps, url=ws_url) - with open(flow_yaml, 'rb') as f: - r = requests.post( - flow_url, data={'workspace_id': workspace_id}, files={'flow': f} - ) - logger.info(f'Checking if the flow creation has succeeded: {r.json()}') - assert r.status_code == 201 - return r.json() - - -def _create_workspace(filepaths: List[str], url: str) -> str: - with ExitStack() as file_stack: - files = [ - ('files', file_stack.enter_context(open(filepath, 'rb'))) - for filepath in filepaths - ] - logger.info(f'uploading {files}') - r = requests.post(url, files=files) - assert r.status_code == 201 - - workspace_id = r.json() - logger.info(f'Got workspace_id: {workspace_id}') - return workspace_id - - -def _send_rest_request(port_expose, endpoint, method, data, timeout=13): - json = {'data': data} - url = f'http://0.0.0.0:{port_expose}/{endpoint}' - r = getattr(requests, method)(url, json=json, timeout=timeout) - - if r.status_code != 200: - # TODO status_code should be 201 for index - raise Exception( - f'api request failed, url: {url}, status: {r.status_code}, content: {r.content} data: {data}' - ) - return r.json() - - -def _get_documents(nr=10, index_start=0, emb_size=7): - for i in range(index_start, nr + index_start): - with Document() as d: - d.id = i - d.text = f'hello world {i}' - d.embedding = np.random.random(emb_size) - d.tags['tag_field'] = f'tag data {i}' - yield d - - -def _jinad_dump(pod_name, dump_path, shards, url): - params = { - 'kind': 'dump', - 'pod_name': pod_name, - 'dump_path': dump_path, - 'shards': shards, - } - # url params - logger.info(f'sending PUT req to dump') - r = requests.put(url, params=params) - assert r.status_code == 200 - - -def _jinad_rolling_update(pod_name, dump_path, url): - params = { - 'kind': 'rolling_update', - 'pod_name': pod_name, - 'dump_path': dump_path, - } - # url params - logger.info(f'sending PUT to roll update') - r = requests.put(url, params=params) - assert r.status_code == 200 diff --git a/tests/distributed/test_simple_distributed/.env b/tests/distributed/test_simple_distributed/.env deleted file mode 100644 index d91821045429b..0000000000000 --- a/tests/distributed/test_simple_distributed/.env +++ /dev/null @@ -1,4 +0,0 @@ -JINA_ENCODER_HOST=encoder -JINA_INDEXER_HOST=indexer -JINA_GATEWAY_REST_PORT=45678 -JINA_WORKSPACE=workspace diff --git a/tests/distributed/test_simple_distributed/__init__.py b/tests/distributed/test_simple_distributed/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_simple_distributed/docker-compose.yml b/tests/distributed/test_simple_distributed/docker-compose.yml deleted file mode 100644 index 23d6ce5c6b419..0000000000000 --- a/tests/distributed/test_simple_distributed/docker-compose.yml +++ /dev/null @@ -1,28 +0,0 @@ -version: "3.3" -services: - flow: - image: test_simple_distributed - build: - context: . - dockerfile: Dockerfiles/debianx.Dockerfile - ports: - - "8000:8000" - - "45678:45678" - env_file: - - tests/distributed/test_simple_distributed/.env - expose: - - 10000-60000 - encoder: - image: test_simple_distributed - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_simple_distributed/.env - indexer: - image: test_simple_distributed - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_simple_distributed/.env diff --git a/tests/distributed/test_simple_distributed/flow.yml b/tests/distributed/test_simple_distributed/flow.yml deleted file mode 100644 index f274bded7910a..0000000000000 --- a/tests/distributed/test_simple_distributed/flow.yml +++ /dev/null @@ -1,20 +0,0 @@ -!Flow -version: '1' -with: - read_only: true - rest_api: true - port_expose: $JINA_GATEWAY_REST_PORT -pods: - - name: encoder - # for now all the uploaded files are stored in the same folder, so no subpaths allowed in remote - uses: encode.yml - host: $JINA_ENCODER_HOST - port_expose: 8000 - upload_files: - - dummy-encoder.py - - name: indexer - # for now all the uploaded files are stored in the same folder, so no subpaths allowed in remote - uses: index.yml - read_only: False - host: $JINA_INDEXER_HOST - port_expose: 8000 \ No newline at end of file diff --git a/tests/distributed/test_simple_distributed/pods/__init__.py b/tests/distributed/test_simple_distributed/pods/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_simple_distributed/pods/dummy-encoder.py b/tests/distributed/test_simple_distributed/pods/dummy-encoder.py deleted file mode 100644 index 5490ee4375fde..0000000000000 --- a/tests/distributed/test_simple_distributed/pods/dummy-encoder.py +++ /dev/null @@ -1,16 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.encoders import BaseEncoder -from jina.executors.decorators import batching - - -class DummyEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=1) - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - return np.random.random((1, 3)) diff --git a/tests/distributed/test_simple_distributed/pods/encode.yml b/tests/distributed/test_simple_distributed/pods/encode.yml deleted file mode 100644 index 922f18188bc80..0000000000000 --- a/tests/distributed/test_simple_distributed/pods/encode.yml +++ /dev/null @@ -1,5 +0,0 @@ -!DummyEncoder -with: - {} -metas: - py_modules: dummy-encoder.py diff --git a/tests/distributed/test_simple_distributed/pods/index.yml b/tests/distributed/test_simple_distributed/pods/index.yml deleted file mode 100644 index dcce922fee8df..0000000000000 --- a/tests/distributed/test_simple_distributed/pods/index.yml +++ /dev/null @@ -1,16 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: indexer - workspace: $JINA_WORKSPACE diff --git a/tests/distributed/test_simple_distributed/test_integration.py b/tests/distributed/test_simple_distributed/test_integration.py deleted file mode 100644 index e0fce4e52761a..0000000000000 --- a/tests/distributed/test_simple_distributed/test_integration.py +++ /dev/null @@ -1,34 +0,0 @@ -import os - -import pytest - -from ..helpers import create_flow, assert_request - -cur_dir = os.path.dirname(os.path.abspath(__file__)) -compose_yml = os.path.join(cur_dir, 'docker-compose.yml') -flow_yml = os.path.join(cur_dir, 'flow.yml') -pod_dir = os.path.join(cur_dir, 'pods') - - -@pytest.mark.parametrize('docker_compose', [compose_yml], indirect=['docker_compose']) -def test_flow(docker_compose): - flow_id = create_flow(flow_yml, pod_dir) - print(f'Flow created with id {flow_id}') - - r = assert_request( - method='post', - url='http://0.0.0.0:45678/search', - payload={'top_k': 10, 'data': ['text:cats rulessss']}, - expect_rcode=200, - ) - text_indexed = r['search']['docs'][0]['text'] - print(f'Got response text_indexed: {text_indexed}') - assert text_indexed == 'text:cats rulessss' - - assert_request( - method='get', url=f'http://localhost:8000/flows/{flow_id}', expect_rcode=200 - ) - - assert_request( - method='delete', url=f'http://localhost:8000/flows/{flow_id}', expect_rcode=200 - ) diff --git a/tests/distributed/test_simple_distributed_with_shards/.env b/tests/distributed/test_simple_distributed_with_shards/.env deleted file mode 100644 index 7d2fd46a1bf62..0000000000000 --- a/tests/distributed/test_simple_distributed_with_shards/.env +++ /dev/null @@ -1,3 +0,0 @@ -JINA_POD1_HOST=pod1 -JINA_POD2_HOST=pod2 -JINA_GATEWAY_REST_PORT=45678 diff --git a/tests/distributed/test_simple_distributed_with_shards/__init__.py b/tests/distributed/test_simple_distributed_with_shards/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_simple_distributed_with_shards/docker-compose.yml b/tests/distributed/test_simple_distributed_with_shards/docker-compose.yml deleted file mode 100644 index 18a4dc7ade805..0000000000000 --- a/tests/distributed/test_simple_distributed_with_shards/docker-compose.yml +++ /dev/null @@ -1,28 +0,0 @@ -version: "3.3" -services: - flow: - image: test_simple_distributed_with_shards - build: - context: . - dockerfile: Dockerfiles/debianx.Dockerfile - ports: - - "8000:8000" - - "45678:45678" - env_file: - - tests/distributed/test_simple_distributed_with_shards/.env - expose: - - 10000-60000 - pod1: - image: test_simple_distributed_with_shards - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_simple_distributed_with_shards/.env - pod2: - image: test_simple_distributed_with_shards - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_simple_distributed_with_shards/.env diff --git a/tests/distributed/test_simple_distributed_with_shards/flow.yml b/tests/distributed/test_simple_distributed_with_shards/flow.yml deleted file mode 100644 index c578bb1019303..0000000000000 --- a/tests/distributed/test_simple_distributed_with_shards/flow.yml +++ /dev/null @@ -1,17 +0,0 @@ -!Flow -version: '1' -with: - read_only: true - rest_api: true - port_expose: $JINA_GATEWAY_REST_PORT -pods: - - name: pod1 - uses: _pass - parallel: 3 - host: $JINA_POD1_HOST - port_expose: 8000 - - name: pod2 - uses: _pass - parallel: 3 - host: $JINA_POD2_HOST - port_expose: 8000 diff --git a/tests/distributed/test_simple_distributed_with_shards/flow_distributed_peas_in_pod.yml b/tests/distributed/test_simple_distributed_with_shards/flow_distributed_peas_in_pod.yml deleted file mode 100644 index 11482f91b98ec..0000000000000 --- a/tests/distributed/test_simple_distributed_with_shards/flow_distributed_peas_in_pod.yml +++ /dev/null @@ -1,21 +0,0 @@ -!Flow -version: '1' -with: - read_only: true - rest_api: true - port_expose: $JINA_GATEWAY_REST_PORT -pods: - - name: pod1 - uses: _pass - parallel: 3 - host: $JINA_POD1_HOST - peas_hosts: - - $JINA_POD2_HOST - port_expose: 8000 - - name: pod2 - uses: _pass - parallel: 3 - host: $JINA_POD2_HOST - peas_hosts: - - $JINA_POD1_HOST - port_expose: 8000 diff --git a/tests/distributed/test_simple_distributed_with_shards/test_integration.py b/tests/distributed/test_simple_distributed_with_shards/test_integration.py deleted file mode 100644 index f088a38329c6a..0000000000000 --- a/tests/distributed/test_simple_distributed_with_shards/test_integration.py +++ /dev/null @@ -1,36 +0,0 @@ -import os - -import pytest - -from ..helpers import create_flow, assert_request - -cur_dir = os.path.dirname(os.path.abspath(__file__)) -compose_yml = os.path.join(cur_dir, 'docker-compose.yml') -pod_dir = os.path.join(cur_dir, 'pods') - - -@pytest.mark.parametrize( - 'flow_yml', - [ - os.path.join(cur_dir, 'flow.yml'), - os.path.join(cur_dir, 'flow_distributed_peas_in_pod.yml'), - ], -) -@pytest.mark.parametrize('docker_compose', [compose_yml], indirect=['docker_compose']) -def test_flow(flow_yml, docker_compose): - flow_id = create_flow(flow_yml, pod_dir) - print(f'Flow created with id {flow_id}') - - r = assert_request( - method='post', - url='http://0.0.0.0:45678/search', - payload={'top_k': 10, 'data': ['text:cats rulessss']}, - ) - assert r is not None - text_indexed = r['search']['docs'][0]['text'] - print(f'Got response text_indexed: {text_indexed}') - assert text_indexed == 'text:cats rulessss' - - assert_request(method='get', url=f'http://localhost:8000/flows/{flow_id}') - - assert_request(method='delete', url=f'http://localhost:8000/flows/{flow_id}') diff --git a/tests/distributed/test_simple_hub_pods/test_integration.py b/tests/distributed/test_simple_hub_pods/test_integration.py index 02c9f91cba753..8e09a54ae36cc 100644 --- a/tests/distributed/test_simple_hub_pods/test_integration.py +++ b/tests/distributed/test_simple_hub_pods/test_integration.py @@ -33,7 +33,7 @@ def test_simple_hub_pods(docker_compose): payload={'top_k': 10, 'data': ['text:anything will match the same']}, ) print(f'returned: {r}') - text_matched = r['search']['docs'][0]['matches'][0]['text'] + text_matched = r['data']['docs'][0]['matches'][0]['text'] assert expected_text == text_matched assert_request(method='get', url=f'http://0.0.0.0:8000/flow/{index_flow_id}') diff --git a/tests/distributed/test_simple_local_remote/.env b/tests/distributed/test_simple_local_remote/.env deleted file mode 100644 index 09dbb6a92b4c3..0000000000000 --- a/tests/distributed/test_simple_local_remote/.env +++ /dev/null @@ -1 +0,0 @@ -JINA_WORKSPACE=workspace diff --git a/tests/distributed/test_simple_local_remote/__init__.py b/tests/distributed/test_simple_local_remote/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/distributed/test_simple_local_remote/docker-compose.yml b/tests/distributed/test_simple_local_remote/docker-compose.yml deleted file mode 100644 index d741aa02fd469..0000000000000 --- a/tests/distributed/test_simple_local_remote/docker-compose.yml +++ /dev/null @@ -1,35 +0,0 @@ -version: "3.3" -services: - a: - image: test_simple_local_remote - build: - context: . - dockerfile: Dockerfiles/debianx.Dockerfile - ports: - - "8000:8000" - - "45000:45000" - - "45001:45001" - - "45002:45002" - env_file: - - tests/distributed/test_simple_local_remote/.env - expose: - - 10000-60000 - networks: - testing_net: - ipv4_address: 172.28.1.1 - b: - image: test_simple_local_remote - expose: - - 8000 - - 10000-60000 - env_file: - - tests/distributed/test_simple_local_remote/.env - networks: - testing_net: - ipv4_address: 172.28.1.2 -networks: - testing_net: - ipam: - driver: default - config: - - subnet: 172.28.0.0/16 diff --git a/tests/distributed/test_simple_local_remote/dummy-encoder.py b/tests/distributed/test_simple_local_remote/dummy-encoder.py deleted file mode 100644 index 5490ee4375fde..0000000000000 --- a/tests/distributed/test_simple_local_remote/dummy-encoder.py +++ /dev/null @@ -1,16 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.encoders import BaseEncoder -from jina.executors.decorators import batching - - -class DummyEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=1) - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - return np.random.random((1, 3)) diff --git a/tests/distributed/test_simple_local_remote/encode.yml b/tests/distributed/test_simple_local_remote/encode.yml deleted file mode 100644 index 922f18188bc80..0000000000000 --- a/tests/distributed/test_simple_local_remote/encode.yml +++ /dev/null @@ -1,5 +0,0 @@ -!DummyEncoder -with: - {} -metas: - py_modules: dummy-encoder.py diff --git a/tests/distributed/test_simple_local_remote/flow.yml b/tests/distributed/test_simple_local_remote/flow.yml deleted file mode 100644 index 6819e65c58937..0000000000000 --- a/tests/distributed/test_simple_local_remote/flow.yml +++ /dev/null @@ -1,24 +0,0 @@ -!Flow -version: '1' -with: - read_only: true -pods: - - name: encoder - uses: encode.yml - host: $JINA_ENCODER_HOST - port_in: 45000 - port_ctrl: 45002 - port_expose: 8000 - upload_files: - - encode.yml - - dummy-encoder.py - - name: indexer - uses: index.yml - read_only: False - host: $JINA_INDEXER_HOST - port_in: 45000 - port_out: 45001 - port_ctrl: 45002 - port_expose: 8000 - upload_files: - - index.yml diff --git a/tests/distributed/test_simple_local_remote/index.yml b/tests/distributed/test_simple_local_remote/index.yml deleted file mode 100644 index dcce922fee8df..0000000000000 --- a/tests/distributed/test_simple_local_remote/index.yml +++ /dev/null @@ -1,16 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: indexer - workspace: $JINA_WORKSPACE diff --git a/tests/distributed/test_simple_local_remote/test_integration.py b/tests/distributed/test_simple_local_remote/test_integration.py deleted file mode 100644 index 767159bbf8bba..0000000000000 --- a/tests/distributed/test_simple_local_remote/test_integration.py +++ /dev/null @@ -1,34 +0,0 @@ -import os - -import pytest - -from jina import Document -from jina.flow import Flow -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) -compose_yml = os.path.join(cur_dir, 'docker-compose.yml') -flow_yml = os.path.join(cur_dir, 'flow.yml') - - -@pytest.mark.skip('skip until `workspace-id` is fully implemented') -@pytest.mark.parametrize('docker_compose', [compose_yml], indirect=['docker_compose']) -def test_flow(docker_compose, mocker): - text = 'cats rules' - - def validate_output(resp): - assert len(resp.index.docs) == 1 - assert resp.index.docs[0].text == text - - os.environ['JINA_ENCODER_HOST'] = '172.28.1.1' - os.environ['JINA_INDEXER_HOST'] = '172.28.1.2' - - with Document() as doc: - doc.content = text - - mock = mocker.Mock() - with Flow.load_config(flow_yml) as f: - f.index([doc], on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_output) diff --git a/tests/integration/crud/__init__.py b/tests/integration/crud/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/crud/advanced/__init__.py b/tests/integration/crud/advanced/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/crud/advanced/flow-index.yml b/tests/integration/crud/advanced/flow-index.yml deleted file mode 100644 index 3b45790202c18..0000000000000 --- a/tests/integration/crud/advanced/flow-index.yml +++ /dev/null @@ -1,21 +0,0 @@ -!Flow -version: '1.0' -with: - restful: False -pods: - - name: segmenter - uses: yaml/segmenter.yml - read_only: true - - name: encoder - uses: yaml/encoder.yml - timeout_ready: 60000 - read_only: true - - name: chunk_indexer - uses: yaml/index-chunk.yml - - name: doc_indexer - uses: yaml/index-doc.yml - needs: gateway - - name: join_all - uses: _pass - needs: [doc_indexer, chunk_indexer] - read_only: true diff --git a/tests/integration/crud/advanced/flow-query.yml b/tests/integration/crud/advanced/flow-query.yml deleted file mode 100644 index e5cedf18ea495..0000000000000 --- a/tests/integration/crud/advanced/flow-query.yml +++ /dev/null @@ -1,17 +0,0 @@ -!Flow -version: '1.0' -with: - restful: False -pods: - - name: encoder - uses: yaml/encoder.yml - read_only: true - - name: chunk_indexer - uses: yaml/index-chunk.yml - read_only: true - - name: ranker - uses: yaml/rank.yml - read_only: true - - name: doc_indexer - uses: yaml/index-doc.yml - read_only: true diff --git a/tests/integration/crud/advanced/test_crud_advanced_example.py b/tests/integration/crud/advanced/test_crud_advanced_example.py deleted file mode 100644 index 25fd3947268d2..0000000000000 --- a/tests/integration/crud/advanced/test_crud_advanced_example.py +++ /dev/null @@ -1,158 +0,0 @@ -import os -import pytest - -from jina import Document -from jina.executors.indexers import BaseIndexer -from jina.flow import Flow - -from tests import validate_callback - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_CRUD_ADVANCED_WORKSPACE'] = str(tmpdir) - yield - del os.environ['JINA_CRUD_ADVANCED_WORKSPACE'] - - -def get_docs_to_index(contents): - for i, content in enumerate(contents): - document = Document() - document.id = str(f'{i}' * 16) - document.text = content - yield document - - -def get_docs_to_delete(doc_id_to_chunk_ids): - for i, (doc_id, chunks) in enumerate(doc_id_to_chunk_ids.items()): - document = Document() - document.id = str(f'{i}' * 16) - for chunk in chunks: - document.chunks.append(chunk) - assert chunk.parent_id == document.id - yield document - - -def validate_index(tmpdir, validation_data): - assert len(validation_data) > 0 - for index_file, expected_size in validation_data: - index = BaseIndexer.load(str(os.path.join(tmpdir, index_file))) - assert index.size == expected_size - - -def test_crud_advanced_example(tmpdir, config, mocker, monkeypatch): - """ - This test indexes documents into an example flow and updates one document. - The update is implemented as delete & index. - """ - - # generate documents to index - index_data = list( - get_docs_to_index( - [ - '0,1,2,3,4,5,6,7,8,9', - 'a ijk,b ijk,c jk', - 'w mno,x no,y op,z i', - ] - ) - ) - - response_docs = [] - - def on_index_done(resp): - response_docs.extend(resp.docs) - - # insert documents into the indexers - # response_docs is used to store the chunks generated by the segmenter via on_index_done - # at the moment the deletion of chunks via document_id is not possible - # therefore, the chunks are needed later on when when deleting the documents - with Flow.load_config('flow-index.yml') as index_flow: - index_flow.index(index_data, on_done=on_index_done) - - validate_index( - tmpdir, - validation_data=[ - ('docIndexer-0/docIndexer.bin', 3), - ('chunkidx-0/chunkidx.bin', 17), - ('vecidx-0/vecidx.bin', 17), - ], - ) - - # pick document 0 to be deleted - delete_data = list(get_docs_to_delete({0: response_docs[0].chunks})) - - # delete the docs and all its chunks - # 'a ijk,b ijk,c jk' is deleted? - delete_idx = [] - - for d in delete_data: - delete_idx.append(d.id) - for c in d.chunks: - delete_idx.append(c.id) - # assert ids not overlapping - assert len(delete_idx) == len(set(delete_idx)) - - # run flow for deletion - with Flow.load_config('flow-index.yml') as delete_flow: - delete_flow.delete(delete_idx) - - validate_index( - tmpdir, - validation_data=[ - ('docIndexer-0/docIndexer.bin', 2), - ('chunkidx-0/chunkidx.bin', 7), - ('vecidx-0/vecidx.bin', 7), - ], - ) - - # generate a new document 0 as a replacement for the deleted one - updated_data = get_docs_to_index( - [ - '1 ijk,2 jk,3 k', - ] - ) - - # insert the updated document - with Flow.load_config('flow-index.yml') as index_flow: - index_flow.index(updated_data) - - validate_index( - tmpdir, - validation_data=[ - ('docIndexer-0/docIndexer.bin', 3), - ('chunkidx-0/chunkidx.bin', 10), - ('vecidx-0/vecidx.bin', 10), - ], - ) - - def validate_granularity_1(resp): - assert len(resp.docs) == 3 - for doc in resp.docs: - assert doc.granularity == 0 - assert len(doc.matches) == 3 - assert doc.matches[0].granularity == 0 - - assert resp.docs[0].text == '2 jk' - assert resp.docs[0].matches[0].text == '1 ijk,2 jk,3 k' - - assert resp.docs[1].text == 'i' - assert resp.docs[1].matches[0].text == 'w mno,x no,y op,z i' - - assert resp.docs[2].text == 'm' - assert resp.docs[2].matches[0].text == 'w mno,x no,y op,z i' - - search_data = [ - '2 jk', - 'i', - 'm', - ] - - mock = mocker.Mock() - with Flow.load_config('flow-query.yml') as search_flow: - search_flow.search( - inputs=search_data, - on_done=mock, - ) - - mock.assert_called_once() - validate_callback(mock, validate_granularity_1) diff --git a/tests/integration/crud/advanced/yaml/custom_executor.py b/tests/integration/crud/advanced/yaml/custom_executor.py deleted file mode 100644 index c445f186c1ba1..0000000000000 --- a/tests/integration/crud/advanced/yaml/custom_executor.py +++ /dev/null @@ -1,53 +0,0 @@ -import re -import string -from typing import Dict, List - -from jina.executors.decorators import single -from jina.executors.rankers import Chunk2DocRanker -from jina.executors.segmenters import BaseSegmenter - - -class DummySentencizer(BaseSegmenter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - punct_chars = [','] - self._slit_pat = re.compile( - '\s*([^{0}]+)(? List[Dict]: - """ - Split the text into sentences. - - :param text: the raw text - :return: a list of chunks - """ - results = [] - ret = [ - (m.group(0), m.start(), m.end()) for m in re.finditer(self._slit_pat, text) - ] - if not ret: - ret = [(text, 0, len(text))] - for ci, (r, s, e) in enumerate(ret): - f = ''.join(filter(lambda x: x in string.printable, r)) - f = re.sub('\n+', ' ', f).strip() - f = f[:100] - results.append(dict(text=f)) - return results - - -class DummyMinRanker(Chunk2DocRanker): - """ - :class:`MinRanker` calculates the score of the matched doc from the matched chunks. For each matched doc, the score - is `1 / (1 + s)`, where `s` is the minimal score from all the matched chunks belonging to this doc. - - .. warning:: Here we suppose that the smaller chunk score means the more similar. - """ - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return 1.0 / (1.0 + match_idx[self.COL_SCORE].min()) diff --git a/tests/integration/crud/advanced/yaml/encoder.yml b/tests/integration/crud/advanced/yaml/encoder.yml deleted file mode 100644 index 440cb1c67d29f..0000000000000 --- a/tests/integration/crud/advanced/yaml/encoder.yml +++ /dev/null @@ -1,17 +0,0 @@ -!DummyOneHotTextEncoder -metas: - name: encoder - py_modules: - - custom_executor.py -requests: - on: - IndexRequest: - - !EncodeDriver - with: - traversal_paths: ['c'] - SearchRequest: - - !EncodeDriver - with: - traversal_paths: ['r'] - DeleteRequest: - - !ForwardDriver {} diff --git a/tests/integration/crud/advanced/yaml/index-chunk.yml b/tests/integration/crud/advanced/yaml/index-chunk.yml deleted file mode 100644 index 5b35123b8612d..0000000000000 --- a/tests/integration/crud/advanced/yaml/index-chunk.yml +++ /dev/null @@ -1,48 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: cosine - delete_on_dump: true - metas: - name: vecidx - workspace: $JINA_CRUD_ADVANCED_WORKSPACE - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx - workspace: $JINA_CRUD_ADVANCED_WORKSPACE -metas: - name: chunk_indexer - workspace: $JINA_CRUD_ADVANCED_WORKSPACE -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['c'] - - !KVIndexDriver - with: - executor: chunkidx - traversal_paths: ['c'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: chunkidx - is_update: true - traversal_paths: ['m'] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - - !DeleteDriver - with: - executor: chunkidx - diff --git a/tests/integration/crud/advanced/yaml/index-doc.yml b/tests/integration/crud/advanced/yaml/index-doc.yml deleted file mode 100644 index 08e34b37e8903..0000000000000 --- a/tests/integration/crud/advanced/yaml/index-doc.yml +++ /dev/null @@ -1,25 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gzip - delete_on_dump: true -metas: - name: docIndexer - workspace: $JINA_CRUD_ADVANCED_WORKSPACE -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - executor: docIndexer - traversal_paths: ['r'] - SearchRequest: - - !KVSearchDriver - with: - executor: docIndexer - is_update: true - traversal_paths: ['m'] - DeleteRequest: - - !DeleteDriver - with: - executor: docIndexer - diff --git a/tests/integration/crud/advanced/yaml/rank.yml b/tests/integration/crud/advanced/yaml/rank.yml deleted file mode 100644 index 511af32c6fd41..0000000000000 --- a/tests/integration/crud/advanced/yaml/rank.yml +++ /dev/null @@ -1,11 +0,0 @@ -!DummyMinRanker -metas: - name: ranker - py_modules: - - custom_executor.py -requests: - on: - SearchRequest: - - !AggregateMatches2DocRankDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/crud/advanced/yaml/segmenter.yml b/tests/integration/crud/advanced/yaml/segmenter.yml deleted file mode 100644 index bfe5b8b2e906d..0000000000000 --- a/tests/integration/crud/advanced/yaml/segmenter.yml +++ /dev/null @@ -1,15 +0,0 @@ -!DummySentencizer -with: - min_sent_len: 3 - max_sent_len: 128 - punct_chars: '.,;!?:' -metas: - name: sentencizer - py_modules: - - custom_executor.py -requests: - on: - [SearchRequest, IndexRequest]: - - !SegmentDriver {} - DeleteRequest: - - !ForwardDriver {} diff --git a/tests/integration/crud/rest/__init__.py b/tests/integration/crud/rest/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/crud/rest/flow.yml b/tests/integration/crud/rest/flow.yml deleted file mode 100644 index 4bbae290e0512..0000000000000 --- a/tests/integration/crud/rest/flow.yml +++ /dev/null @@ -1,7 +0,0 @@ -!Flow -version: '1' -with: - restful: True -pods: - - name: indexer - uses: yaml/index.yml diff --git a/tests/integration/crud/rest/flow_kv.yml b/tests/integration/crud/rest/flow_kv.yml deleted file mode 100644 index db166c02ca493..0000000000000 --- a/tests/integration/crud/rest/flow_kv.yml +++ /dev/null @@ -1,7 +0,0 @@ -!Flow -version: '1' -with: - restful: True -pods: - - name: indexer - uses: yaml/index_kv.yml diff --git a/tests/integration/crud/rest/flow_vector.yml b/tests/integration/crud/rest/flow_vector.yml deleted file mode 100644 index 69daf7fc22984..0000000000000 --- a/tests/integration/crud/rest/flow_vector.yml +++ /dev/null @@ -1,7 +0,0 @@ -!Flow -version: '1' -with: - restful: True -pods: - - name: indexer - uses: yaml/index_vector.yml diff --git a/tests/integration/crud/rest/test_rest.py b/tests/integration/crud/rest/test_rest.py deleted file mode 100644 index fc9a73c585d5d..0000000000000 --- a/tests/integration/crud/rest/test_rest.py +++ /dev/null @@ -1,213 +0,0 @@ -import os -import random -import string -from itertools import chain -from pathlib import Path - -import numpy as np -import pytest -import requests - -from jina.executors.indexers import BaseIndexer - -from jina import Document -from jina.flow import Flow - -random.seed(0) -np.random.seed(0) - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_REST_DIR'] = str(tmpdir) - yield - del os.environ['JINA_REST_DIR'] - - -def send_rest_request(flow_file, endpoint, method, data): - json = {'data': data} - with Flow.load_config(flow_file) as flow: - url = f'http://0.0.0.0:{flow.port_expose}/{endpoint}' - r = getattr(requests, method)(url, json=json) - - if r.status_code != 200: - # TODO status_code should be 201 for index - raise Exception( - f'api request failed, url: {url}, status: {r.status_code}, content: {r.content} data: {data}' - ) - return r - - -def send_rest_index_request(flow_file, documents): - data = [document.dict() for document in documents] - return send_rest_request(flow_file, 'index', 'post', data) - - -def send_rest_update_request(flow_file, documents): - data = [document.dict() for document in documents] - return send_rest_request(flow_file, 'update', 'put', data) - - -def send_rest_delete_request(flow_file, ids): - return send_rest_request(flow_file, 'delete', 'delete', ids) - - -def send_rest_search_request(flow_file, documents): - data = [document.dict() for document in documents] - return send_rest_request(flow_file, 'search', 'post', data) - - -def random_docs(start, end): - documents = [] - for j in range(start, end): - d = Document() - d.id = j - d.tags['id'] = j - d.text = ''.join( - random.choice(string.ascii_lowercase) for _ in range(10) - ).encode('utf8') - d.embedding = np.random.random([10 + np.random.randint(0, 1)]) - documents.append(d) - return documents - - -def get_ids_to_delete(start, end): - return [str(idx) for idx in range(start, end)] - - -def validate_index_size(num_indexed_docs): - from jina.executors.compound import CompoundExecutor - - path_compound = Path( - CompoundExecutor.get_component_workspace_from_compound_workspace( - os.environ['JINA_REST_DIR'], 'chunk_indexer', 0 - ) - ) - path = Path(os.environ['JINA_REST_DIR']) - bin_files = list(path_compound.glob('**/*.bin')) + list(path.glob('**/*.bin')) - assert len(bin_files) > 0 - for index_file in bin_files: - index = BaseIndexer.load(str(index_file)) - assert index.size == num_indexed_docs - - -@pytest.mark.parametrize('flow_file', ['flow.yml', 'flow_vector.yml']) -def test_delete_vector(config, flow_file): - NUMBER_OF_SEARCHES = 5 - - def validate_results(resp, num_matches): - documents = resp.json()['search']['docs'] - assert len(documents) == NUMBER_OF_SEARCHES - for doc in documents: - # TODO if there are no matches, the rest api should return an empty list instead of not having the attribute - assert len(Document(doc).matches) == num_matches - - send_rest_index_request(flow_file, random_docs(0, 10)) - validate_index_size(10) - - search_result = send_rest_search_request( - flow_file, random_docs(0, NUMBER_OF_SEARCHES) - ) - validate_results(search_result, 10) - - delete_ids = [] - for d in random_docs(0, 10): - delete_ids.append(d.id) - for c in d.chunks: - delete_ids.append(c.id) - - send_rest_delete_request(flow_file, delete_ids) - - validate_index_size(0) - - search_result = send_rest_search_request( - flow_file, random_docs(0, NUMBER_OF_SEARCHES) - ) - validate_results(search_result, 0) - - -def test_delete_kv(config): - flow_file = 'flow_kv.yml' - - def validate_results(resp, num_matches): - assert len(resp.json()['search']['docs']) == num_matches - - send_rest_index_request(flow_file, random_docs(0, 10)) - validate_index_size(10) - - search_result = send_rest_search_request( - flow_file, chain(random_docs(2, 5), random_docs(100, 120)) - ) - validate_results(search_result, 3) - - send_rest_delete_request(flow_file, get_ids_to_delete(0, 3)) - validate_index_size(7) - - search_result = send_rest_search_request(flow_file, random_docs(2, 4)) - validate_results(search_result, 1) - - -@pytest.mark.parametrize('flow_file', ['flow.yml', 'flow_vector.yml']) -def test_update_vector(config, flow_file): - NUMBER_OF_SEARCHES = 1 - docs_before = list(random_docs(0, 10)) - docs_updated = list(random_docs(0, 10)) - - def validate_results(resp, has_changed): - docs = resp.json()['search']['docs'] - assert len(docs) == NUMBER_OF_SEARCHES - hash_set_before = [hash(d.embedding.tobytes()) for d in docs_before] - hash_set_updated = [hash(d.embedding.tobytes()) for d in docs_updated] - for doc_dictionary in docs: - doc = Document(doc_dictionary) - assert len(doc.matches) == 10 - for match in doc.matches: - h = hash(match.embedding.tobytes()) - if has_changed: - assert h not in hash_set_before - assert h in hash_set_updated - else: - assert h in hash_set_before - assert h not in hash_set_updated - - send_rest_index_request(flow_file, docs_before) - validate_index_size(10) - - search_result = send_rest_search_request( - flow_file, random_docs(0, NUMBER_OF_SEARCHES) - ) - validate_results(search_result, has_changed=False) - - send_rest_update_request(flow_file, docs_updated) - validate_index_size(10) - - search_result = send_rest_search_request( - flow_file, random_docs(0, NUMBER_OF_SEARCHES) - ) - validate_results(search_result, has_changed=True) - - -def test_update_kv(config): - flow_file = 'flow_kv.yml' - NUMBER_OF_SEARCHES = 1 - docs_before = list(random_docs(0, 10)) - docs_updated = list(random_docs(0, 10)) - - def validate_results(resp): - assert len(resp.json()['search']['docs']) == NUMBER_OF_SEARCHES - - send_rest_index_request(flow_file, docs_before) - validate_index_size(10) - - search_result = send_rest_search_request( - flow_file, random_docs(0, NUMBER_OF_SEARCHES) - ) - validate_results(search_result) - - send_rest_update_request(flow_file, docs_updated) - validate_index_size(10) - - search_result = send_rest_search_request( - flow_file, random_docs(0, NUMBER_OF_SEARCHES) - ) - validate_results(search_result) diff --git a/tests/integration/crud/rest/yaml/index.yml b/tests/integration/crud/rest/yaml/index.yml deleted file mode 100644 index 394377c2b9ddd..0000000000000 --- a/tests/integration/crud/rest/yaml/index.yml +++ /dev/null @@ -1,56 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: cosine - delete_on_dump: true - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - delete_on_dump: true - metas: - name: docidx -metas: - name: chunk_indexer - workspace: $JINA_REST_DIR -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'r' ] - - !KVIndexDriver - with: - method: update - executor: docidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - - !DeleteDriver - with: - executor: docidx - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVIndexDriver - with: - executor: docidx - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: docidx - traversal_paths: ['m'] diff --git a/tests/integration/crud/rest/yaml/index_kv.yml b/tests/integration/crud/rest/yaml/index_kv.yml deleted file mode 100644 index 8823784d9428b..0000000000000 --- a/tests/integration/crud/rest/yaml/index_kv.yml +++ /dev/null @@ -1,30 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gzip - delete_on_dump: true -metas: - name: docIndexer - workspace: $JINA_REST_DIR - -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - executor: docIndexer - traversal_paths: ['r'] - SearchRequest: - - !KVSearchDriver - with: - executor: docIndexer - traversal_paths: ['r'] - UpdateRequest: - - !KVIndexDriver - with: - method: update - executor: docIndexer - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: docIndexer \ No newline at end of file diff --git a/tests/integration/crud/rest/yaml/index_vector.yml b/tests/integration/crud/rest/yaml/index_vector.yml deleted file mode 100644 index 72bc8675f0486..0000000000000 --- a/tests/integration/crud/rest/yaml/index_vector.yml +++ /dev/null @@ -1,32 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz - metric: cosine - delete_on_dump: true -metas: - name: vecidx - workspace: $JINA_REST_DIR - -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - fill_embedding: True - traversal_paths: ['r'] diff --git a/tests/integration/crud/simple/__init__.py b/tests/integration/crud/simple/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/crud/simple/chunks/__init__.py b/tests/integration/crud/simple/chunks/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/crud/simple/chunks/flow_kv.yml b/tests/integration/crud/simple/chunks/flow_kv.yml deleted file mode 100644 index 68c426016b1bb..0000000000000 --- a/tests/integration/crud/simple/chunks/flow_kv.yml +++ /dev/null @@ -1,5 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: yaml/index_kv.yml diff --git a/tests/integration/crud/simple/chunks/flow_vector.yml b/tests/integration/crud/simple/chunks/flow_vector.yml deleted file mode 100644 index c0b41822f2413..0000000000000 --- a/tests/integration/crud/simple/chunks/flow_vector.yml +++ /dev/null @@ -1,5 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: yaml/index_vector.yml diff --git a/tests/integration/crud/simple/chunks/test_chunks.py b/tests/integration/crud/simple/chunks/test_chunks.py deleted file mode 100644 index 93e34161575a1..0000000000000 --- a/tests/integration/crud/simple/chunks/test_chunks.py +++ /dev/null @@ -1,180 +0,0 @@ -import os -import random -from pathlib import Path - -import numpy as np -import pytest - -from jina import Document -from jina.executors.indexers import BaseIndexer -from jina.flow import Flow - -from tests import validate_callback - -TOP_K = 10 - - -@pytest.fixture -def config(tmpdir): - random.seed(0) - np.random.seed(0) - os.environ['JINA_CRUD_CHUNKS'] = str(tmpdir) - os.environ['JINA_TOPK'] = str(TOP_K) - yield - del os.environ['JINA_CRUD_CHUNKS'] - del os.environ['JINA_TOPK'] - - -d_embedding = np.random.random([9]) -c_embedding = np.random.random([9]) - - -def document_generator(start, num_docs, num_chunks): - chunk_id = num_docs - for idx in range(start, num_docs): - with Document() as doc: - doc.id = idx - doc.tags['id'] = idx - doc.text = f'I have {idx} cats' - doc.embedding = np.random.random([9]) - for chunk_idx in range(num_chunks): - with Document() as chunk: - chunk.id = chunk_id - chunk.tags['id'] = chunk_idx - chunk.text = f'I have {chunk_idx} chunky cats' - chunk.embedding = np.random.random([9]) - chunk_id += 1 - doc.chunks.append(chunk) - yield doc - - -def validate_index_size(num_indexed_docs): - path = Path(os.environ['JINA_CRUD_CHUNKS']) - index_files = list(path.glob('**/*.bin')) - assert len(index_files) > 0 - for index_file in index_files: - index = BaseIndexer.load(str(index_file)) - assert index.size == num_indexed_docs - - -@pytest.mark.parametrize('flow_file', ['flow_vector.yml']) -def test_delete_vector(config, mocker, flow_file): - num_searches = 10 - num_docs = 10 - num_chunks = 5 - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == num_searches - for doc in resp.docs: - assert len(doc.matches) == num_matches - - return validate_results - - with Flow.load_config(flow_file) as index_flow: - index_flow.index( - inputs=document_generator(start=0, num_docs=num_docs, num_chunks=num_chunks) - ) - validate_index_size(num_chunks * num_docs) # 5 chunks for each of the 10 docs - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search( - inputs=document_generator( - start=0, num_docs=num_docs, num_chunks=num_chunks - ), - on_done=mock, - ) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOP_K)) - - delete_ids = [] - for d in document_generator(start=0, num_docs=num_docs, num_chunks=num_chunks): - delete_ids.append(d.id) - for c in d.chunks: - delete_ids.append(c.id) - - with Flow.load_config(flow_file) as index_flow: - index_flow.delete(ids=delete_ids) - validate_index_size(0) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search( - inputs=document_generator( - start=0, num_docs=num_docs, num_chunks=num_chunks - ), - on_done=mock, - ) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) - - -@pytest.mark.parametrize('flow_file', ['flow_vector.yml']) -def test_update_vector(config, mocker, flow_file): - num_searches = 10 - num_docs = 10 - num_chunks = 5 - - docs_before = list( - document_generator(start=0, num_docs=num_docs, num_chunks=num_chunks) - ) - docs_updated = list( - document_generator(start=10, num_docs=20, num_chunks=num_chunks) - ) - ids_before = list() - ids_updated = list() - - def validate_result_factory(has_changed, num_matches): - def validate_results(resp): - assert len(resp.docs) == num_searches - for d in docs_before: - ids_before.append(d.id) - for d in docs_updated: - ids_updated.append(d.id) - for doc in resp.docs: - assert len(doc.matches) == num_matches - if has_changed: - assert doc.id in ids_updated - assert doc.id not in ids_before - else: - assert doc.id in ids_before - assert doc.id not in ids_updated - - return validate_results - - with Flow.load_config(flow_file) as index_flow: - index_flow.index(inputs=docs_before) - validate_index_size( - num_chunks * num_docs - ) # num_docs per all its chunks, 50 in this case - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search( - inputs=document_generator( - start=0, num_docs=num_docs, num_chunks=num_chunks - ), - on_done=mock, - ) - mock.assert_called_once() - validate_callback( - mock, validate_result_factory(has_changed=False, num_matches=TOP_K) - ) - - with Flow.load_config(flow_file) as index_flow: - index_flow.update(inputs=docs_updated) - validate_index_size( - num_chunks * num_docs - ) # num_docs per all its chunks, 50 in this case - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search( - inputs=document_generator(start=10, num_docs=20, num_chunks=num_chunks), - on_done=mock, - ) - mock.assert_called_once() - validate_callback( - mock, validate_result_factory(has_changed=True, num_matches=num_docs) - ) diff --git a/tests/integration/crud/simple/chunks/yaml/index_kv.yml b/tests/integration/crud/simple/chunks/yaml/index_kv.yml deleted file mode 100644 index 67c2dcceec201..0000000000000 --- a/tests/integration/crud/simple/chunks/yaml/index_kv.yml +++ /dev/null @@ -1,29 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gzip - delete_on_dump: true -metas: - name: docIndexer - workspace: $JINA_CRUD_CHUNKS -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - executor: docIndexer - traversal_paths: ['c'] - SearchRequest: - - !KVSearchDriver - with: - executor: docIndexer - traversal_paths: ['r'] - UpdateRequest: - - !KVIndexDriver - with: - method: update - executor: docIndexer - traversal_paths: [ 'c' ] - DeleteRequest: - - !DeleteDriver - with: - executor: docIndexer \ No newline at end of file diff --git a/tests/integration/crud/simple/chunks/yaml/index_vector.yml b/tests/integration/crud/simple/chunks/yaml/index_vector.yml deleted file mode 100644 index c05cf3cfa72a0..0000000000000 --- a/tests/integration/crud/simple/chunks/yaml/index_vector.yml +++ /dev/null @@ -1,33 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz - metric: cosine - delete_on_dump: true -metas: - name: vecidx - workspace: $JINA_CRUD_CHUNKS - -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'c' ] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['c'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - top_k: $JINA_TOPK - fill_embedding: True - traversal_paths: ['r'] diff --git a/tests/integration/crud/simple/flow.yml b/tests/integration/crud/simple/flow.yml deleted file mode 100644 index b570968b5f103..0000000000000 --- a/tests/integration/crud/simple/flow.yml +++ /dev/null @@ -1,5 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: yaml/index.yml diff --git a/tests/integration/crud/simple/flow_kv.yml b/tests/integration/crud/simple/flow_kv.yml deleted file mode 100644 index 68c426016b1bb..0000000000000 --- a/tests/integration/crud/simple/flow_kv.yml +++ /dev/null @@ -1,5 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: yaml/index_kv.yml diff --git a/tests/integration/crud/simple/flow_vector.yml b/tests/integration/crud/simple/flow_vector.yml deleted file mode 100644 index c0b41822f2413..0000000000000 --- a/tests/integration/crud/simple/flow_vector.yml +++ /dev/null @@ -1,5 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: yaml/index_vector.yml diff --git a/tests/integration/crud/simple/test_crud.py b/tests/integration/crud/simple/test_crud.py deleted file mode 100644 index e2d4d31583a43..0000000000000 --- a/tests/integration/crud/simple/test_crud.py +++ /dev/null @@ -1,224 +0,0 @@ -import os -import random -import string -from itertools import chain -from pathlib import Path - -import numpy as np -import pytest -from jina.executors.indexers import BaseIndexer - -from jina import Document -from jina.flow import Flow - -from tests import validate_callback - -random.seed(0) -np.random.seed(0) - -TOPK = 9 - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_TOPK_DIR'] = str(tmpdir) - os.environ['JINA_TOPK'] = str(TOPK) - yield - del os.environ['JINA_TOPK_DIR'] - del os.environ['JINA_TOPK'] - - -def random_docs(start, end, embed_dim=10, jitter=1, has_content=True): - for j in range(start, end): - d = Document() - d.id = j - if has_content: - d.tags['id'] = j - d.text = ''.join( - random.choice(string.ascii_lowercase) for _ in range(10) - ).encode('utf8') - assert d.embedding is None - d.embedding = np.random.random([embed_dim + np.random.randint(0, jitter)]) - yield d - - -def get_ids_to_delete(start, end, as_string): - if as_string: - return (str(idx) for idx in range(start, end)) - return range(start, end) - - -def validate_index_size(num_indexed_docs, compound=False): - from jina.executors.compound import CompoundExecutor - - if compound: - path = Path( - CompoundExecutor.get_component_workspace_from_compound_workspace( - os.environ['JINA_TOPK_DIR'], 'chunk_indexer', 0 - ) - ) - else: - path = Path(os.environ['JINA_TOPK_DIR']) - bin_files = list(path.glob('**/*.bin')) - assert len(bin_files) > 0 - for index_file in bin_files: - index = BaseIndexer.load(str(index_file)) - assert index.size == num_indexed_docs - - -@pytest.mark.parametrize( - 'flow_file, has_content, compound', - [ - ['flow.yml', True, True], - ['flow_vector.yml', True, False], - ['flow.yml', False, True], - ['flow_vector.yml', False, False], - ], -) -def test_delete_vector(config, mocker, flow_file, has_content, compound): - NUMBER_OF_SEARCHES = 5 - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == NUMBER_OF_SEARCHES - for doc in resp.docs: - assert len(doc.matches) == num_matches - - return validate_results - - with Flow.load_config(flow_file) as index_flow: - index_flow.index(inputs=random_docs(0, 10)) - validate_index_size(10, compound) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search(inputs=random_docs(0, NUMBER_OF_SEARCHES), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOPK)) - - delete_ids = [] - for d in random_docs(0, 10, has_content=has_content): - delete_ids.append(d.id) - for c in d.chunks: - delete_ids.append(c.id) - - with Flow.load_config(flow_file) as index_flow: - index_flow.delete(ids=delete_ids) - validate_index_size(0, compound) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search(inputs=random_docs(0, NUMBER_OF_SEARCHES), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) - - -@pytest.mark.parametrize('as_string', [True, False]) -def test_delete_kv(config, mocker, as_string): - flow_file = 'flow_kv.yml' - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == num_matches - - return validate_results - - with Flow.load_config(flow_file) as index_flow: - index_flow.index(inputs=random_docs(0, 10)) - validate_index_size(10) - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search( - inputs=chain(random_docs(2, 5), random_docs(100, 120)), on_done=mock - ) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(3)) - - with Flow.load_config(flow_file) as index_flow: - index_flow.delete(ids=get_ids_to_delete(0, 3, as_string)) - validate_index_size(7) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search(inputs=random_docs(2, 4), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(1)) - - -@pytest.mark.parametrize( - 'flow_file, compound', [('flow.yml', True), ('flow_vector.yml', False)] -) -def test_update_vector(config, mocker, flow_file, compound): - NUMBER_OF_SEARCHES = 1 - docs_before = list(random_docs(0, 10)) - docs_updated = list(random_docs(0, 10)) - - def validate_result_factory(has_changed): - def validate_results(resp): - assert len(resp.docs) == NUMBER_OF_SEARCHES - hash_set_before = [hash(d.embedding.tobytes()) for d in docs_before] - hash_set_updated = [hash(d.embedding.tobytes()) for d in docs_updated] - for doc in resp.docs: - assert len(doc.matches) == TOPK - for match in doc.matches: - h = hash(match.embedding.tobytes()) - if has_changed: - assert h not in hash_set_before - assert h in hash_set_updated - else: - assert h in hash_set_before - assert h not in hash_set_updated - - return validate_results - - with Flow.load_config(flow_file) as index_flow: - index_flow.index(inputs=docs_before) - validate_index_size(10, compound) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_docs = list(random_docs(0, NUMBER_OF_SEARCHES)) - search_flow.search(inputs=search_docs, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(has_changed=False)) - - with Flow.load_config(flow_file) as index_flow: - index_flow.update(inputs=docs_updated) - validate_index_size(10, compound) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search(inputs=random_docs(0, NUMBER_OF_SEARCHES), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(has_changed=True)) - - -def test_update_kv(config, mocker): - flow_file = 'flow_kv.yml' - NUMBER_OF_SEARCHES = 1 - docs_before = list(random_docs(0, 10)) - docs_updated = list(random_docs(0, 10)) - - def validate_results(resp): - assert len(resp.docs) == NUMBER_OF_SEARCHES - - with Flow.load_config(flow_file) as index_flow: - index_flow.index(inputs=docs_before) - validate_index_size(10) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_docs = list(random_docs(0, NUMBER_OF_SEARCHES)) - search_flow.search(inputs=search_docs, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_results) - - with Flow.load_config(flow_file) as index_flow: - index_flow.update(inputs=docs_updated) - validate_index_size(10) - - mock = mocker.Mock() - with Flow.load_config(flow_file) as search_flow: - search_flow.search(inputs=random_docs(0, NUMBER_OF_SEARCHES), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_results) diff --git a/tests/integration/crud/simple/test_crud_readme.py b/tests/integration/crud/simple/test_crud_readme.py deleted file mode 100644 index 97c905e124d42..0000000000000 --- a/tests/integration/crud/simple/test_crud_readme.py +++ /dev/null @@ -1,104 +0,0 @@ -import numpy as np -import shutil -import os -import pytest - -from jina import Flow, Document - -from tests import validate_callback - - -@pytest.fixture -def clean_me_up_mock(): - yield - to_be_deleted_folder = 'simple_indexer' - if os.path.isdir(to_be_deleted_folder): - shutil.rmtree(to_be_deleted_folder) - - -def test_crud_in_readme(mocker, clean_me_up_mock): - docs = [ - Document( - id='🐲', - embedding=np.array([0, 0]), - tags={'guardian': 'Azure Dragon', 'position': 'East'}, - ), - Document( - id='🐦', - embedding=np.array([1, 0]), - tags={'guardian': 'Vermilion Bird', 'position': 'South'}, - ), - Document( - id='🐢', - embedding=np.array([0, 1]), - tags={'guardian': 'Black Tortoise', 'position': 'North'}, - ), - Document( - id='🐯', - embedding=np.array([1, 1]), - tags={'guardian': 'White Tiger', 'position': 'West'}, - ), - ] - - # create - mock = mocker.Mock() - with Flow().add(uses='_index') as f: - f.index(docs, on_done=mock) - - mock.assert_called_once() - - # read - def validate(req): - assert len(req.docs[0].matches) == 3 - for match in req.docs[0].matches: - assert match.id != '🐯' - assert 'position' in match.tags - assert 'guardian' in match.tags - assert match.score.ref_id == req.docs[0].id - - mock = mocker.Mock() - - with f: - f.search(docs[0], top_k=3, on_done=mock) - validate_callback(mock, validate) - - # update - mock = mocker.Mock() - - d = docs[0] - d.embedding = np.array([1, 1]) - with f: - f.update(d, on_done=mock) - mock.assert_called_once() - - # search again - - def validate(req): - assert len(req.docs[0].matches) == 1 - req.docs[0].matches[0].id = req.docs[0].id - # embeddings are removed in the CompoundIndexer via ExcludeQL - np.testing.assert_array_equal(req.docs[0].matches[0].embedding, np.array(None)) - - mock = mocker.Mock() - - with f: - f.search(docs[0], top_k=1, on_done=mock) - validate_callback(mock, validate) - - # delete - mock = mocker.Mock() - - with f: - f.delete(['🐦', '🐲'], on_done=mock) - mock.assert_called_once() - - # search again - - def validate(req): - assert len(req.docs[0].matches) == 2 - - mock = mocker.Mock() - - with f: - f.search(docs[0], top_k=4, on_done=mock) - validate_callback(mock, validate) diff --git a/tests/integration/crud/simple/yaml/index.yml b/tests/integration/crud/simple/yaml/index.yml deleted file mode 100644 index e1c4b74c024e4..0000000000000 --- a/tests/integration/crud/simple/yaml/index.yml +++ /dev/null @@ -1,56 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: cosine - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - delete_on_dump: true - metas: - name: docidx -metas: - name: chunk_indexer - workspace: $JINA_TOPK_DIR -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'r' ] - - !KVIndexDriver - with: - method: update - executor: docidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - - !DeleteDriver - with: - executor: docidx - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVIndexDriver - with: - executor: docidx - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - top_k: $JINA_TOPK - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: docidx - traversal_paths: ['m'] diff --git a/tests/integration/crud/simple/yaml/index_kv.yml b/tests/integration/crud/simple/yaml/index_kv.yml deleted file mode 100644 index 067b49340bcbb..0000000000000 --- a/tests/integration/crud/simple/yaml/index_kv.yml +++ /dev/null @@ -1,30 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gzip - delete_on_dump: true -metas: - name: docIndexer - workspace: $JINA_TOPK_DIR - -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - executor: docIndexer - traversal_paths: ['r'] - SearchRequest: - - !KVSearchDriver - with: - executor: docIndexer - traversal_paths: ['r'] - UpdateRequest: - - !KVIndexDriver - with: - method: update - executor: docIndexer - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: docIndexer \ No newline at end of file diff --git a/tests/integration/crud/simple/yaml/index_vector.yml b/tests/integration/crud/simple/yaml/index_vector.yml deleted file mode 100644 index e2a32746717e9..0000000000000 --- a/tests/integration/crud/simple/yaml/index_vector.yml +++ /dev/null @@ -1,33 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz - metric: cosine - delete_on_dump: true -metas: - name: vecidx - workspace: $JINA_TOPK_DIR - -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - top_k: $JINA_TOPK - fill_embedding: True - traversal_paths: ['r'] diff --git a/tests/integration/crud_corrupted_docs/__init__.py b/tests/integration/crud_corrupted_docs/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/crud_corrupted_docs/flow-parallel.yml b/tests/integration/crud_corrupted_docs/flow-parallel.yml deleted file mode 100644 index 3494f649828b2..0000000000000 --- a/tests/integration/crud_corrupted_docs/flow-parallel.yml +++ /dev/null @@ -1,10 +0,0 @@ -!Flow -version: '1' -pods: - - name: vector - uses: vector.yml - - name: kv - uses: kv.yml - needs: [gateway] - - name: last - needs: [kv, vector] diff --git a/tests/integration/crud_corrupted_docs/flow.yml b/tests/integration/crud_corrupted_docs/flow.yml deleted file mode 100644 index 6a501ed1de5d4..0000000000000 --- a/tests/integration/crud_corrupted_docs/flow.yml +++ /dev/null @@ -1,5 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: index.yml diff --git a/tests/integration/crud_corrupted_docs/index.yml b/tests/integration/crud_corrupted_docs/index.yml deleted file mode 100644 index 70ad753b3228a..0000000000000 --- a/tests/integration/crud_corrupted_docs/index.yml +++ /dev/null @@ -1,58 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: cosine - metas: - name: vecidx - workspace: $JINA_CORRUPTED_DOCS_TEST_DIR - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx - workspace: $JINA_CORRUPTED_DOCS_TEST_DIR -metas: - name: chunk_indexer - workspace: $JINA_CORRUPTED_DOCS_TEST_DIR -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'r' ] - - !KVIndexDriver - with: - method: update - executor: docidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - - !DeleteDriver - with: - executor: docidx - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVIndexDriver - with: - executor: docidx - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - top_k: $JINA_TOPK - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: docidx - top_k: $JINA_TOPK - traversal_paths: ['m'] diff --git a/tests/integration/crud_corrupted_docs/kv.yml b/tests/integration/crud_corrupted_docs/kv.yml deleted file mode 100644 index cd0532f6b794d..0000000000000 --- a/tests/integration/crud_corrupted_docs/kv.yml +++ /dev/null @@ -1,28 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gz -metas: - name: docidx - workspace: $JINA_CORRUPTED_DOCS_TEST_DIR -requests: - on: - UpdateRequest: - - !KVIndexDriver - with: - method: update - executor: docidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver {} - IndexRequest: - - !KVIndexDriver - with: - executor: docidx - traversal_paths: ['r'] - SearchRequest: - - !KVSearchDriver - with: - executor: docidx - top_k: $JINA_TOPK - traversal_paths: ['m'] - diff --git a/tests/integration/crud_corrupted_docs/test_crud_corrupted_docs.py b/tests/integration/crud_corrupted_docs/test_crud_corrupted_docs.py deleted file mode 100644 index fcf32e84904a8..0000000000000 --- a/tests/integration/crud_corrupted_docs/test_crud_corrupted_docs.py +++ /dev/null @@ -1,334 +0,0 @@ -import os -from pathlib import Path - -import numpy as np -import pytest - -from jina import Document -from jina.executors.indexers import BaseIndexer -from jina.flow import Flow - -from tests import validate_callback - - -def random_docs_only_tags(nr_docs, start=0): - for j in range(start, nr_docs + start): - d = Document() - d.tags['id'] = j - d.tags['something'] = f'abcdef {j}' - yield d - - -def validate_index_size(num_indexed_docs, expected_indices): - path = Path(os.environ['JINA_CORRUPTED_DOCS_TEST_DIR']) - index_files = list(path.glob('**/*.bin')) - assert len(index_files) == expected_indices - for index_file in index_files: - index = BaseIndexer.load(str(index_file)) - assert index.size == num_indexed_docs - - -TOPK = 5 -NR_DOCS_INDEX = 20 -NUMBER_OF_SEARCHES = 5 -# since there is no content or embedding to match on -EXPECTED_ONLY_TAGS_RESULTS = 0 - - -def config_environ(path): - os.environ['JINA_CORRUPTED_DOCS_TEST_DIR'] = str(path) - os.environ['JINA_TOPK'] = str(TOPK) - - -def test_only_tags(tmp_path, mocker): - config_environ(path=tmp_path) - flow_file = 'flow.yml' - docs = list(random_docs_only_tags(NR_DOCS_INDEX)) - docs_update = list(random_docs_only_tags(NR_DOCS_INDEX, start=len(docs) + 1)) - all_docs_indexed = docs.copy() - all_docs_indexed.extend(docs_update) - docs_search = list( - random_docs_only_tags( - NUMBER_OF_SEARCHES, start=len(docs) + len(docs_update) + 1 - ) - ) - f = Flow.load_config(flow_file) - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == NUMBER_OF_SEARCHES - for doc in resp.docs: - assert len(doc.matches) == num_matches - - return validate_results - - with f: - f.index(inputs=docs) - validate_index_size(NR_DOCS_INDEX, expected_indices=1) - - mock = mocker.Mock() - with f: - f.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(EXPECTED_ONLY_TAGS_RESULTS)) - - # this won't increase the index size as the ids are new - with f: - f.update(inputs=docs_update) - validate_index_size(NR_DOCS_INDEX, expected_indices=1) - - mock = mocker.Mock() - with f: - f.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(EXPECTED_ONLY_TAGS_RESULTS)) - - mock = mocker.Mock() - with f: - f.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) - - -np.random.seed(0) - -EMBEDDING_SHAPE = 7 - -ORIGINAL_MIME_TYPE = 'image/jpeg' - - -def random_docs_content_field(nr_docs, field, start=0): - for i in range(start, nr_docs + start): - with Document() as d: - d.id = i - d.embedding = np.random.random(EMBEDDING_SHAPE) - d.mime_type = ORIGINAL_MIME_TYPE - if field == 'content': - # mime type will be overridden because it's `str` - d.content = 'I am text' - elif field == 'buffer': - # mime type will be preserved and ignored - d.buffer = b'hidden text in bytes' - elif field == 'blob': - # mime type is ignored and preserved - d.blob = np.random.random(EMBEDDING_SHAPE) - - yield d - - -@pytest.mark.parametrize('field', ['content', 'buffer', 'blob']) -def test_only_embedding_and_mime_type(tmp_path, mocker, field): - config_environ(path=tmp_path) - flow_file = 'flow.yml' - docs = list(random_docs_content_field(NR_DOCS_INDEX, field=field)) - docs_update = list( - random_docs_content_field(NR_DOCS_INDEX, field=field, start=len(docs) + 1) - ) - all_docs_indexed = docs.copy() - all_docs_indexed.extend(docs_update) - docs_search = list( - random_docs_content_field( - NUMBER_OF_SEARCHES, field=field, start=len(docs) + len(docs_update) + 1 - ) - ) - f = Flow.load_config(flow_file) - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == NUMBER_OF_SEARCHES - for doc in resp.docs: - assert len(doc.matches) == num_matches - for m in doc.matches: - if field == 'content': - assert m.content == 'I am text' - assert m.mime_type == 'text/plain' - elif field == 'buffer': - # mime type will be preserved from when we set it to the Doc - assert m.buffer == b'hidden text in bytes' - assert m.mime_type == ORIGINAL_MIME_TYPE - elif field == 'blob': - assert m.blob.shape == (EMBEDDING_SHAPE,) - assert m.mime_type == ORIGINAL_MIME_TYPE - - return validate_results - - with f: - f.index(inputs=docs) - validate_index_size(NR_DOCS_INDEX, expected_indices=2) - - mock = mocker.Mock() - with f: - f.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOPK)) - - # this won't increase the index size as the ids are new - with f: - f.update(inputs=docs_update) - validate_index_size(NR_DOCS_INDEX, expected_indices=2) - - mock = mocker.Mock() - with f: - f.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOPK)) - - with f: - f.delete(ids=[d.id for d in all_docs_indexed]) - validate_index_size(0, expected_indices=2) - - mock = mocker.Mock() - with f: - f.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) - - -def random_docs_image_mime_text_content(nr_docs, start=0): - for i in range(start, nr_docs + start): - with Document() as d: - d.embedding = np.random.random(EMBEDDING_SHAPE) - d.mime_type = 'image/jpeg' - d.text = f'document {i}' - yield d - - -def test_wrong_mime_type(tmp_path, mocker): - """we assign text to .text, 'image/jpeg' to .mime_type""" - config_environ(path=tmp_path) - flow_file = 'flow-parallel.yml' - flow_query_file = 'flow.yml' - docs = list(random_docs_image_mime_text_content(NR_DOCS_INDEX)) - docs_update = list( - random_docs_image_mime_text_content(NR_DOCS_INDEX, start=len(docs) + 1) - ) - all_docs_indexed = docs.copy() - all_docs_indexed.extend(docs_update) - docs_search = list( - random_docs_image_mime_text_content( - NUMBER_OF_SEARCHES, start=len(docs) + len(docs_update) + 1 - ) - ) - f_index = Flow.load_config(flow_file) - f_query = Flow.load_config(flow_query_file) - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == NUMBER_OF_SEARCHES - for doc in resp.docs: - assert len(doc.matches) == num_matches - for m in doc.matches: - assert m.mime_type == 'text/plain' - - return validate_results - - with f_index: - f_index.index(inputs=docs) - validate_index_size(NR_DOCS_INDEX, expected_indices=2) - - mock = mocker.Mock() - with f_query: - f_query.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOPK)) - - # this won't increase the index size as the ids are new - with f_index: - f_index.update(inputs=docs_update) - validate_index_size(NR_DOCS_INDEX, expected_indices=2) - - mock = mocker.Mock() - with f_query: - f_query.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOPK)) - - with f_index: - f_index.delete(ids=[d.id for d in all_docs_indexed]) - validate_index_size(0, expected_indices=2) - - mock = mocker.Mock() - with f_query: - f_query.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) - - -START_SHAPE = 7 -INDEX2_SHAPE = 6 -UPDATE_SHAPE = 7 - - -def random_docs_with_shapes(nr_docs, emb_shape, start=0): - for i in range(start, nr_docs + start): - with Document() as d: - d.id = i - d.embedding = np.random.random(emb_shape) - yield d - - -def test_dimensionality_search_wrong(tmp_path, mocker): - """will fail because search docs have diff shape in embedding""" - config_environ(path=tmp_path) - flow_file = 'flow.yml' - flow_query_file = 'flow.yml' - docs = list(random_docs_with_shapes(NR_DOCS_INDEX, START_SHAPE)) - docs_update = list( - random_docs_with_shapes(NR_DOCS_INDEX, INDEX2_SHAPE, start=len(docs) + 1) - ) - all_docs_indexed = docs.copy() - all_docs_indexed.extend(docs_update) - docs_search = list( - random_docs_with_shapes( - NUMBER_OF_SEARCHES, INDEX2_SHAPE, start=len(docs) + len(docs_update) + 1 - ) - ) - f_index = Flow.load_config(flow_file) - f_query = Flow.load_config(flow_query_file) - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == NUMBER_OF_SEARCHES - for doc in resp.docs: - assert len(doc.matches) == num_matches - - return validate_results - - with f_index: - f_index.index(inputs=docs) - validate_index_size(NR_DOCS_INDEX, expected_indices=2) - - mock = mocker.Mock() - with f_query: - f_query.search( - inputs=docs_search, - # 0 because search docs have wrong shape - on_done=mock, - ) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) - - # this won't increase the index size as the ids are new - with f_index: - f_index.update(inputs=docs_update) - validate_index_size(NR_DOCS_INDEX, expected_indices=2) - - mock = mocker.Mock() - with f_query: - f_query.search( - inputs=docs_search, - # 0 because search docs have wrong shape - on_done=mock, - ) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) - - with f_index: - f_index.delete(ids=[d.id for d in all_docs_indexed]) - validate_index_size(0, expected_indices=2) - - mock = mocker.Mock() - with f_query: - f_query.search(inputs=docs_search, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) diff --git a/tests/integration/crud_corrupted_docs/vector.yml b/tests/integration/crud_corrupted_docs/vector.yml deleted file mode 100644 index 08718a7a07c57..0000000000000 --- a/tests/integration/crud_corrupted_docs/vector.yml +++ /dev/null @@ -1,25 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz - metric: cosine -metas: - name: vecidx - workspace: $JINA_CORRUPTED_DOCS_TEST_DIR -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver {} - IndexRequest: - - !VectorIndexDriver - with: - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - top_k: $JINA_TOPK - traversal_paths: ['r'] diff --git a/tests/integration/debug/__init__.py b/tests/integration/debug/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/debug/components.py b/tests/integration/debug/components.py deleted file mode 100644 index fba7c9a3717c3..0000000000000 --- a/tests/integration/debug/components.py +++ /dev/null @@ -1,103 +0,0 @@ -import io -from typing import Tuple, Dict, Union - -import numpy as np - -from jina.executors.crafters import BaseCrafter -from jina.executors.decorators import single -from jina.importer import ImportExtensions -from .helper import _crop_image, _move_channel_axis, _load_image - - -class ImageReader(BaseCrafter): - """ - Load image file and craft it into image matrix. - - :class:`ImageReader` loads the image from the given file - path and save the `ndarray` of the image in the Document. - - :param channel_axis: the axis id of the color channel. - The ``-1`` indicates the color channel info at the last axis - """ - - def __init__(self, channel_axis: int = -1, *args, **kwargs): - super().__init__(*args, **kwargs) - self.channel_axis = channel_axis - - @single(slice_nargs=2) - def craft(self, buffer: bytes, uri: str, *args, **kwargs) -> Dict: - """ - Read image file and craft it into image matrix. - - Read the image from the given file path that specified in `buffer` and save the `ndarray` of the image in - the `blob` of the document. - - :param buffer: the image in raw bytes - :param uri: the image file path - - """ - with ImportExtensions( - required=True, - verbose=True, - pkg_name='Pillow', - logger=self.logger, - help_text='PIL is missing. Install it with `pip install Pillow`', - ): - from PIL import Image - - if buffer: - raw_img = Image.open(io.BytesIO(buffer)) - elif uri: - raw_img = Image.open(uri) - else: - raise ValueError('no value found in "buffer" and "uri"') - raw_img = raw_img.convert('RGB') - img = np.array(raw_img).astype('float32') - if self.channel_axis != -1: - img = np.moveaxis(img, -1, self.channel_axis) - return dict(blob=img) - - -class CenterImageCropper(BaseCrafter): - """ - Crop the image with the center crop box. - - The coordinate is the same coordinate-system in the - :py:mode:`PIL.Image`. - - :param target_size: Desired output size. If size - is a sequence like (h, w), the output size will - be matched to this. If size is an int, the - output will have the same height and width as - the `target_size`. - :param channel_axis: Axis for channel - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - """ - - def __init__( - self, - target_size: Union[Tuple[int, int], int] = 224, - channel_axis: int = -1, - *args, - **kwargs - ): - """Set Constructor.""" - super().__init__(*args, **kwargs) - self.target_size = target_size - self.channel_axis = channel_axis - - @single - def craft(self, blob: 'np.ndarray', *args, **kwargs) -> Dict: - """ - Crop the input image array. - - :param blob: The ndarray of the image - :param args: Additional positional arguments - :param kwargs: Additional keyword arguments - :return: A dict with the cropped image - """ - raw_img = _load_image(blob, self.channel_axis) - _img, top, left = _crop_image(raw_img, self.target_size, how='center') - img = _move_channel_axis(np.asarray(_img), -1, self.channel_axis) - return dict(offset=0, blob=img.astype('float32'), location=(top, left)) diff --git a/tests/integration/debug/craft.yml b/tests/integration/debug/craft.yml deleted file mode 100644 index 656862e58e947..0000000000000 --- a/tests/integration/debug/craft.yml +++ /dev/null @@ -1,45 +0,0 @@ -!CompoundExecutor -components: - - !ImageReader - metas: - name: img_read - - !ImageNormalizer - with: - img_mean: [ 0.485, 0.456, 0.406 ] - img_std: [ 0.229, 0.224, 0.225 ] - metas: - name: img_norm - - !CenterImageCropper - with: - target_size: 96 - channel_axis: -1 - metas: - name: img_cropper -metas: - name: compound_crafter - workspace: $JINA_WORKSPACE -requests: - on: - IndexRequest: - - !CraftDriver - with: - traversal_paths: [ 'r' ] - executor: img_read - - !CraftDriver - with: - traversal_paths: [ 'r' ] - executor: img_norm - - !PngToDiskDriver - with: - prefix: 'norm' - workspace: $JINA_WORKSPACE - top: $PNG_TOP - - !CraftDriver - with: - traversal_paths: [ 'r' ] - executor: img_cropper - - !PngToDiskDriver - with: - prefix: 'crop' - workspace: $JINA_WORKSPACE - top: $PNG_TOP \ No newline at end of file diff --git a/tests/integration/debug/helper.py b/tests/integration/debug/helper.py deleted file mode 100644 index 9bc0062a87d13..0000000000000 --- a/tests/integration/debug/helper.py +++ /dev/null @@ -1,125 +0,0 @@ -__copyright__ = "Copyright (c) 2021 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Tuple, Union - -import numpy as np - -from jina.importer import ImportExtensions - - -def _move_channel_axis( - img: 'np.ndarray', channel_axis_to_move: int, target_channel_axis: int = -1 -) -> 'np.ndarray': - """ - Ensure the color channel axis is the default axis. - """ - if channel_axis_to_move == target_channel_axis: - return img - return np.moveaxis(img, channel_axis_to_move, target_channel_axis) - - -def _load_image(blob: 'np.ndarray', channel_axis: int): - """ - Load an image array and return a `PIL.Image` object. - """ - with ImportExtensions( - required=True, - verbose=True, - pkg_name='Pillow', - help_text='PIL is missing. Install it with `pip install Pillow`', - ): - from PIL import Image - - img = _move_channel_axis(blob, channel_axis) - return Image.fromarray(img.astype('uint8')) - - -def _crop_image( - img, - target_size: Union[Tuple[int, int], int], - top: int = None, - left: int = None, - how: str = 'precise', -): - """ - Crop the input :py:mod:`PIL` image. - - :param img: :py:mod:`PIL.Image`, the image to be resized - :param target_size: desired output size. If size is a sequence like - (h, w), the output size will be matched to this. If size is an int, - the output will have the same height and width as the `target_size`. - :param top: the vertical coordinate of the top left corner of the crop box. - :param left: the horizontal coordinate of the top left corner of the crop box. - :param how: the way of cropping. Valid values include `center`, `random`, and, `precise`. Default is `precise`. - - `center`: crop the center part of the image - - `random`: crop a random part of the image - - `precise`: crop the part of the image specified by the crop box with the given ``top`` and ``left``. - .. warning:: When `precise` is used, ``top`` and ``left`` must be fed valid value. - - """ - import PIL.Image as Image - - assert isinstance(img, Image.Image), 'img must be a PIL.Image' - img_w, img_h = img.size - if isinstance(target_size, int): - target_h = target_w = target_size - elif isinstance(target_size, Tuple) and len(target_size) == 2: - target_h, target_w = target_size - else: - raise ValueError( - f'target_size should be an integer or a tuple of two integers: {target_size}' - ) - w_beg = left - h_beg = top - if how == 'center': - w_beg = int((img_w - target_w) / 2) - h_beg = int((img_h - target_h) / 2) - elif how == 'random': - w_beg = np.random.randint(0, img_w - target_w + 1) - h_beg = np.random.randint(0, img_h - target_h + 1) - elif how == 'precise': - assert w_beg is not None and h_beg is not None - assert ( - 0 <= w_beg <= (img_w - target_w) - ), f'left must be within [0, {img_w - target_w}]: {w_beg}' - assert ( - 0 <= h_beg <= (img_h - target_h) - ), f'top must be within [0, {img_h - target_h}]: {h_beg}' - else: - raise ValueError(f'unknown input how: {how}') - if not isinstance(w_beg, int): - raise ValueError(f'left must be int number between 0 and {img_w}: {left}') - if not isinstance(h_beg, int): - raise ValueError(f'top must be int number between 0 and {img_h}: {top}') - w_end = w_beg + target_w - h_end = h_beg + target_h - img = img.crop((w_beg, h_beg, w_end, h_end)) - return img, h_beg, w_beg - - -def _resize_short(img, target_size: Union[Tuple[int, int], int], how: str = 'LANCZOS'): - """ - Resize the input :py:mod:`PIL` image. - :param img: :py:mod:`PIL.Image`, the image to be resized - :param target_size: desired output size. If size is a sequence like (h, w), the output size will be matched to - this. If size is an int, the smaller edge of the image will be matched to this number maintain the aspect - ratio. - :param how: the interpolation method. Valid values include `NEAREST`, `BILINEAR`, `BICUBIC`, and `LANCZOS`. - Default is `LANCZOS`. Please refer to `PIL.Image` for detaisl. - """ - import PIL.Image as Image - - assert isinstance(img, Image.Image), 'img must be a PIL.Image' - if isinstance(target_size, int): - percent = float(target_size) / min(img.size[0], img.size[1]) - target_w = int(round(img.size[0] * percent)) - target_h = int(round(img.size[1] * percent)) - elif isinstance(target_size, Tuple) and len(target_size) == 2: - target_h, target_w = target_size - else: - raise ValueError( - f'target_size should be an integer or a tuple of two integers: {target_size}' - ) - img = img.resize((target_w, target_h), getattr(Image, how)) - return img diff --git a/tests/integration/debug/png/4.png b/tests/integration/debug/png/4.png deleted file mode 100644 index cd2b04e284eab..0000000000000 Binary files a/tests/integration/debug/png/4.png and /dev/null differ diff --git a/tests/integration/debug/png/5.png b/tests/integration/debug/png/5.png deleted file mode 100644 index 00e1a9a1e9f1d..0000000000000 Binary files a/tests/integration/debug/png/5.png and /dev/null differ diff --git a/tests/integration/debug/png/6.png b/tests/integration/debug/png/6.png deleted file mode 100644 index b00c7e5be3f84..0000000000000 Binary files a/tests/integration/debug/png/6.png and /dev/null differ diff --git a/tests/integration/debug/png/7.png b/tests/integration/debug/png/7.png deleted file mode 100644 index 955e2f893d2da..0000000000000 Binary files a/tests/integration/debug/png/7.png and /dev/null differ diff --git a/tests/integration/debug/png/8.png b/tests/integration/debug/png/8.png deleted file mode 100644 index ae349e4e32994..0000000000000 Binary files a/tests/integration/debug/png/8.png and /dev/null differ diff --git a/tests/integration/debug/png/9.png b/tests/integration/debug/png/9.png deleted file mode 100644 index 5a2ef7e7b2c9d..0000000000000 Binary files a/tests/integration/debug/png/9.png and /dev/null differ diff --git a/tests/integration/debug/test_pngtodisk.py b/tests/integration/debug/test_pngtodisk.py deleted file mode 100644 index 24e4f43a7cef6..0000000000000 --- a/tests/integration/debug/test_pngtodisk.py +++ /dev/null @@ -1,35 +0,0 @@ -import os -from glob import glob - -import pytest - -from jina import Flow - -# noinspection PyUnresolvedReferences -from . import components - - -@pytest.mark.parametrize('top', [1, 3, 99]) -def test_pngtodisk(tmpdir, top): - tmpdir = str(tmpdir) - cur_dir = os.path.dirname(os.path.abspath(__file__)) - image_src = os.path.join(cur_dir, 'png/**.png') - files = list(glob(image_src)) - assert files - - os.environ['JINA_WORKSPACE'] = tmpdir - os.environ['PNG_TOP'] = str(top) - - with Flow().add(uses='craft.yml') as f: - f.index_files(image_src, read_mode='rb') - - assert os.path.exists(os.path.join(tmpdir, 'norm')) - assert os.path.exists(os.path.join(tmpdir, 'crop')) - - results_expected = min((len(files), top)) - - glob_results_norm = glob(os.path.join(tmpdir, 'norm', '*.png')) - assert len(glob_results_norm) == results_expected - - glob_results_crop = glob(os.path.join(tmpdir, 'crop', '*.png')) - assert len(glob_results_crop) == results_expected diff --git a/tests/integration/doccache/__init__.py b/tests/integration/doccache/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/doccache/_merge_matches_topk.yml b/tests/integration/doccache/_merge_matches_topk.yml deleted file mode 100644 index d1b3aa2754838..0000000000000 --- a/tests/integration/doccache/_merge_matches_topk.yml +++ /dev/null @@ -1,19 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: merge_matches_topk -requests: - on: - [SearchRequest, TrainRequest, IndexRequest]: - - !ReduceAllDriver - with: - traversal_paths: ['m'] - - !SliceQL - with: - start: 0 - end: $JINA_TOPK - traversal_paths: ['m'] - ControlRequest: - - !ControlReqDriver {} - [DeleteRequest, UpdateRequest]: - - !RouteDriver {} \ No newline at end of file diff --git a/tests/integration/doccache/cache.yml b/tests/integration/doccache/cache.yml deleted file mode 100644 index 9d6a082777e28..0000000000000 --- a/tests/integration/doccache/cache.yml +++ /dev/null @@ -1,25 +0,0 @@ -!DocCache -with: - fields: [$JINA_CACHE_FIELD] - index_filename: cache -metas: - workspace: $JINA_TEST_CACHE_CRUD_WORKSPACE - name: cache -requests: - on: - IndexRequest: - - !TaggingCacheDriver - with: - executor: cache - traversal_paths: [r, c] - tags: - is_indexed: true - UpdateRequest: - - !BaseCacheDriver - with: - method: update - executor: cache - traversal_paths: [r, c] - DeleteRequest: - - !DeleteDriver { } - diff --git a/tests/integration/doccache/crud_cache_flow_index.yml b/tests/integration/doccache/crud_cache_flow_index.yml deleted file mode 100644 index 7c19d1cd447ba..0000000000000 --- a/tests/integration/doccache/crud_cache_flow_index.yml +++ /dev/null @@ -1,26 +0,0 @@ -!Flow -version: '1' -pods: - - name: cache - uses: cache.yml - - name: vector - uses: vector.yml - shards: $JINA_SHARDS - uses_after: '_merge_matches' - polling: $JINA_POLLING - timeout_ready: '-1' - - needs: cache - - name: kv - uses: kv.yml - shards: $JINA_SHARDS - uses_after: '_merge_matches' - polling: $JINA_POLLING - timeout_ready: '-1' - - needs: $JINA_KV_NEEDS - - name: final - needs: $JINA_MERGER_NEEDS - workspace: $JINA_TEST_CACHE_CRUD_WORKSPACE - - diff --git a/tests/integration/doccache/crud_cache_flow_query.yml b/tests/integration/doccache/crud_cache_flow_query.yml deleted file mode 100644 index 06514d8e46fb1..0000000000000 --- a/tests/integration/doccache/crud_cache_flow_query.yml +++ /dev/null @@ -1,17 +0,0 @@ -!Flow -version: '1' -pods: - - name: vector - uses: vector.yml - shards: $JINA_SHARDS - uses_after: '_merge_matches_topk.yml' - polling: $JINA_POLLING - timeout_ready: '-1' - - name: kv - uses: kv.yml - shards: $JINA_SHARDS - uses_after: '_merge_matches_topk.yml' - polling: $JINA_POLLING - timeout_ready: '-1' - - needs: $JINA_KV_NEEDS diff --git a/tests/integration/doccache/kv.yml b/tests/integration/doccache/kv.yml deleted file mode 100644 index 1406f4578d5d4..0000000000000 --- a/tests/integration/doccache/kv.yml +++ /dev/null @@ -1,37 +0,0 @@ -!CompoundIndexer -components: - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: $JINA_KV_IDX_NAME -metas: - name: inc_docindexer - workspace: $JINA_TEST_CACHE_CRUD_WORKSPACE -requests: - on: - UpdateRequest: - - !KVIndexDriver - with: - method: update - executor: $JINA_KV_IDX_NAME - traversal_paths: [r, c] - DeleteRequest: - - !DeleteDriver - with: - executor: $JINA_KV_IDX_NAME - IndexRequest: - - !FilterQL - with: - lookups: {tags__is_indexed__neq: true} - - !KVIndexDriver - with: - executor: $JINA_KV_IDX_NAME - traversal_paths: [r, c] - SearchRequest: - - !KVSearchDriver - with: - is_update: false - executor: $JINA_KV_IDX_NAME - top_k: $JINA_TOPK - traversal_paths: [m] diff --git a/tests/integration/doccache/test_crud_cache.py b/tests/integration/doccache/test_crud_cache.py deleted file mode 100644 index 7dc7b2642e0af..0000000000000 --- a/tests/integration/doccache/test_crud_cache.py +++ /dev/null @@ -1,253 +0,0 @@ -import os - -import numpy as np -import pytest - -from jina import Flow -from jina.executors.indexers import BaseIndexer -from jina.executors.indexers.cache import DocCache -from jina.executors.indexers.keyvalue import BinaryPbIndexer -from jina.executors.indexers.vector import NumpyIndexer -from tests import get_documents, validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - -KV_IDX_FILENAME = 'kv_idx.bin' -VEC_IDX_FILENAME = 'vec_idx.bin' -DOCS_TO_SEARCH = 1 -TOP_K = 5 -REQUEST_SIZE = 4 -DOCS_TO_INDEX = 10 - - -def config_env(field, tmp_workspace, shards, indexers, polling): - os.environ['JINA_SHARDS'] = str(shards) - os.environ['JINA_CACHE_FIELD'] = field - os.environ['JINA_POLLING'] = polling - os.environ['JINA_TOPK'] = str(TOP_K) - os.environ['JINA_TEST_CACHE_CRUD_WORKSPACE'] = str(tmp_workspace) - os.environ['JINA_KV_IDX_NAME'] = KV_IDX_FILENAME.split('.bin')[0] - os.environ['JINA_VEC_IDX_NAME'] = VEC_IDX_FILENAME.split('.bin')[0] - if indexers == 'parallel': - # the second indexer will be directly connected to entry gateway - os.environ['JINA_KV_NEEDS'] = 'cache' - os.environ['JINA_MERGER_NEEDS'] = '[vector, kv]' - else: - # else it requires to be in serial connection, after the first indexer - os.environ['JINA_KV_NEEDS'] = 'vector' - os.environ['JINA_MERGER_NEEDS'] = 'kv' - - -def get_index_flow(field, tmp_path, shards, indexers): - config_env(field, tmp_path, shards, indexers, polling='any') - f = Flow.load_config(os.path.join(cur_dir, 'crud_cache_flow_index.yml')) - return f - - -def get_query_flow(field, tmp_path, shards): - # searching must always be sequential - config_env(field, tmp_path, shards, 'sequential', polling='all') - f = Flow.load_config(os.path.join(cur_dir, 'crud_cache_flow_query.yml')) - return f - - -def get_delete_flow(field, tmp_path, shards, indexers): - config_env(field, tmp_path, shards, indexers, polling='all') - f = Flow.load_config(os.path.join(cur_dir, 'crud_cache_flow_index.yml')) - return f - - -def check_indexers_size( - chunks, nr_docs, field, tmp_path, same_content, shards, post_op -): - cache_indexer_path = os.path.join(tmp_path, 'cache-0', 'cache.bin') - with BaseIndexer.load(cache_indexer_path) as cache: - assert isinstance(cache, DocCache) - cache_full_size = cache.size - - for indexer_fname in [KV_IDX_FILENAME, VEC_IDX_FILENAME]: - indexers_full_size = 0 - for i in range(shards): - from jina.executors.compound import CompoundExecutor - - compound_name = ( - 'inc_docindexer' - if KV_IDX_FILENAME in indexer_fname - else 'inc_vecindexer' - ) - workspace_folder = ( - CompoundExecutor.get_component_workspace_from_compound_workspace( - tmp_path, compound_name, i - ) - ) - indexer_path = os.path.join( - BaseIndexer.get_shard_workspace( - workspace_folder=workspace_folder, - workspace_name=indexer_fname.rstrip('.bin'), - pea_id=i, - ), - f'{indexer_fname}', - ) - - # in the configuration of content-hash / same_content=True - # there aren't enough docs to satisfy batch size, only 1 shard will have it - if os.path.exists(indexer_path): - with BaseIndexer.load(indexer_path) as indexer: - if indexer_fname == KV_IDX_FILENAME: - assert isinstance(indexer, BinaryPbIndexer) - else: - assert isinstance(indexer, NumpyIndexer) - indexers_full_size += indexer.size - - if post_op == 'delete': - assert indexers_full_size == 0 - assert cache_full_size == 0 - else: - if field == 'content_hash' and same_content: - if chunks > 0: - # one content from Doc, one from chunk - expected = 2 - assert indexers_full_size == expected - assert cache_full_size == 2 - else: - assert indexers_full_size == 1 - assert cache_full_size == 1 - else: - nr_expected = ( - (nr_docs + chunks * nr_docs) * 2 - if post_op == 'index2' - else nr_docs + chunks * nr_docs - ) - assert indexers_full_size == nr_expected - assert cache_full_size == nr_expected - - -@pytest.mark.parametrize( - 'indexers, field, shards, chunks, same_content', - [ - ('sequential', 'id', 1, 5, False), - ('sequential', 'id', 3, 5, False), - ('sequential', 'id', 3, 5, True), - ('sequential', 'content_hash', 1, 0, False), - ('sequential', 'content_hash', 1, 0, True), - ('sequential', 'content_hash', 1, 5, False), - ('sequential', 'content_hash', 1, 5, True), - ('sequential', 'content_hash', 3, 5, True), - ('parallel', 'id', 3, 5, False), - ('parallel', 'id', 3, 5, True), - ('parallel', 'content_hash', 3, 5, False), - ('parallel', 'content_hash', 3, 5, True), - ], -) -def test_cache_crud(tmp_path, mocker, indexers, field, shards, chunks, same_content): - flow_index = get_index_flow( - field=field, tmp_path=tmp_path, shards=shards, indexers=indexers - ) - flow_query = get_query_flow(field=field, tmp_path=tmp_path, shards=shards) - flow_delete = get_delete_flow( - field=field, tmp_path=tmp_path, shards=shards, indexers=indexers - ) - - def validate_result_factory(num_matches): - def validate_results(resp): - assert len(resp.docs) == DOCS_TO_SEARCH - for d in resp.docs: - matches = list(d.matches) - # this differs depending on cache settings - # it could be lower - if num_matches != 0: - if field == 'content_hash' and same_content: - if chunks: - assert len(matches) == 2 - else: - assert len(matches) == 1 - else: - assert len(matches) == num_matches - - return validate_results - - docs = list( - get_documents(chunks=chunks, same_content=same_content, nr=DOCS_TO_INDEX) - ) - # ids in order to ensure no matches in KV - search_docs = list( - get_documents(chunks=0, same_content=False, nr=DOCS_TO_SEARCH, index_start=9999) - ) - - # INDEX - with flow_index as f: - f.index(docs, request_size=REQUEST_SIZE) - check_indexers_size( - chunks, len(docs), field, tmp_path, same_content, shards, 'index' - ) - - # INDEX (with new documents) - chunks_ids = np.concatenate([d.chunks for d in docs]) - index_start_new_docs = 1 + len(docs) + len(chunks_ids) - - new_docs = list( - get_documents( - chunks=chunks, same_content=same_content, index_start=index_start_new_docs - ) - ) - with flow_index as f: - f.index(new_docs, request_size=REQUEST_SIZE) - - check_indexers_size( - chunks, len(docs), field, tmp_path, same_content, shards, 'index2' - ) - - # QUERY - mock = mocker.Mock() - with flow_query as f: - f.search(search_docs, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOP_K)) - - # UPDATE - docs.extend(new_docs) - del new_docs - - # id stays the same, we change the content - for d in docs: - d_content_hash_before = d.content_hash - d.content = f'this is some new content for doc {d.id}' - d.update_content_hash() - assert d.content_hash != d_content_hash_before - for chunk in d.chunks: - c_content_hash_before = chunk.content_hash - chunk.content = f'this is some new content for chunk {chunk.id}' - chunk.update_content_hash() - assert chunk.content_hash != c_content_hash_before - - with flow_index as f: - f.update(docs) - - check_indexers_size( - chunks, len(docs) / 2, field, tmp_path, same_content, shards, 'index2' - ) - - # QUERY - mock = mocker.Mock() - with flow_query as f: - f.search(search_docs, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(TOP_K)) - - # DELETE - delete_ids = [] - for d in docs: - delete_ids.append(d.id) - for c in d.chunks: - delete_ids.append(c.id) - with flow_delete as f: - f.delete(delete_ids) - - check_indexers_size(chunks, 0, field, tmp_path, same_content, shards, 'delete') - - # QUERY - mock = mocker.Mock() - with flow_query as f: - f.search(search_docs, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_result_factory(0)) diff --git a/tests/integration/doccache/vector.yml b/tests/integration/doccache/vector.yml deleted file mode 100644 index b7f8fb696d186..0000000000000 --- a/tests/integration/doccache/vector.yml +++ /dev/null @@ -1,39 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - delete_on_dump: true - metas: - name: $JINA_VEC_IDX_NAME -metas: - name: inc_vecindexer - workspace: $JINA_TEST_CACHE_CRUD_WORKSPACE -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: $JINA_VEC_IDX_NAME - traversal_paths: [r, c] - DeleteRequest: - - !DeleteDriver - with: - executor: $JINA_VEC_IDX_NAME - IndexRequest: - - !FilterQL - with: - lookups: {tags__is_indexed__neq: true} - - !VectorIndexDriver - with: - executor: $JINA_VEC_IDX_NAME - traversal_paths: [r, c] - SearchRequest: - - !VectorSearchDriver - with: - executor: $JINA_VEC_IDX_NAME - top_k: $JINA_TOPK - fill_embedding: True - traversal_paths: [r] diff --git a/tests/integration/dump/__init__.py b/tests/integration/dump/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/dump/basic/index.yml b/tests/integration/dump/basic/index.yml deleted file mode 100644 index 2f52e47288047..0000000000000 --- a/tests/integration/dump/basic/index.yml +++ /dev/null @@ -1,12 +0,0 @@ -jtype: BinaryPbIndexer -with: - index_filename: binarypb.bin -metas: - workspace: $BASIC_INDEX_WS - name: binarypb -requests: - on: - IndexRequest: - - jtype: KVIndexDriver - with: - executor: binarypb diff --git a/tests/integration/dump/basic/query.yml b/tests/integration/dump/basic/query.yml deleted file mode 100644 index 19b2638379491..0000000000000 --- a/tests/integration/dump/basic/query.yml +++ /dev/null @@ -1,37 +0,0 @@ -jtype: CompoundExecutor -components: - - jtype: NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - jtype: BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: kvidx -metas: - name: compound_indexer - workspace: $BASIC_QUERY_WS -requests: - on: - SearchRequest: - - jtype: VectorSearchDriver - with: - executor: vecidx - fill_embedding: True # to test embeddings - - jtype: KVSearchDriver - with: - executor: kvidx - is_update: True - traversal_paths: ['m'] - IndexRequest: - - jtype: VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - - jtype: KVIndexDriver - with: - executor: kvidx - traversal_paths: ['r'] \ No newline at end of file diff --git a/tests/integration/dump/basic/query_kv.yml b/tests/integration/dump/basic/query_kv.yml deleted file mode 100644 index e734c5fbd0af7..0000000000000 --- a/tests/integration/dump/basic/query_kv.yml +++ /dev/null @@ -1,13 +0,0 @@ -jtype: BinaryPbQueryIndexer -with: - index_filename: kv.bin -metas: - name: kvidx -requests: - on: - SearchRequest: - - jtype: KVSearchDriver - with: - executor: kvidx - is_update: True - traversal_paths: ['r'] diff --git a/tests/integration/dump/basic/query_np.yml b/tests/integration/dump/basic/query_np.yml deleted file mode 100644 index 0da4744858221..0000000000000 --- a/tests/integration/dump/basic/query_np.yml +++ /dev/null @@ -1,13 +0,0 @@ -jtype: NumpyQueryIndexer -with: - index_filename: vec.bin - metric: euclidean -metas: - name: vecidx -requests: - on: - SearchRequest: - - jtype: VectorSearchDriver - with: - executor: vecidx - fill_embedding: True diff --git a/tests/integration/dump/flow_dbms.yml b/tests/integration/dump/flow_dbms.yml deleted file mode 100644 index 92f4302c3659a..0000000000000 --- a/tests/integration/dump/flow_dbms.yml +++ /dev/null @@ -1,7 +0,0 @@ -jtype: Flow -version: '1' -with: - restful: False -pods: - - name: indexer_dbms - uses: indexer_dbms.yml diff --git a/tests/integration/dump/flow_query.yml b/tests/integration/dump/flow_query.yml deleted file mode 100644 index 0abe0067920de..0000000000000 --- a/tests/integration/dump/flow_query.yml +++ /dev/null @@ -1,12 +0,0 @@ -jtype: Flow -version: '1' -with: - restful: False -pods: - - name: indexer_query - uses: indexer_query.yml - replicas: 2 - parallel: $QUERY_SHARDS - polling: all - uses_after: $USES_AFTER - timeout_ready: '-1' \ No newline at end of file diff --git a/tests/integration/dump/indexer_dbms.yml b/tests/integration/dump/indexer_dbms.yml deleted file mode 100644 index c9bc00bd70cb8..0000000000000 --- a/tests/integration/dump/indexer_dbms.yml +++ /dev/null @@ -1,16 +0,0 @@ -jtype: KeyValueDBMSIndexer -with: - index_filename: keyvalue.bin -metas: - workspace: $DBMS_WORKSPACE - name: keyvalue -requests: - on: - [IndexRequest, UpdateRequest]: - - jtype: DBMSIndexDriver - with: - executor: keyvalue - DeleteRequest: - - jtype: DeleteDriver - with: - executor: keyvalue diff --git a/tests/integration/dump/indexer_query.yml b/tests/integration/dump/indexer_query.yml deleted file mode 100644 index cad1e162ede83..0000000000000 --- a/tests/integration/dump/indexer_query.yml +++ /dev/null @@ -1,28 +0,0 @@ -jtype: CompoundQueryExecutor -components: - - jtype: NumpyQueryIndexer - with: - index_filename: vec.bin - metric: euclidean - metas: - name: vecidx - - jtype: BinaryPbQueryIndexer - with: - index_filename: kv.bin - metas: - name: kvidx -metas: - name: compound_indexer - workspace: $QUERY_WORKSPACE -requests: - on: - SearchRequest: - - jtype: VectorSearchDriver - with: - executor: vecidx - fill_embedding: True # to test embeddings - - jtype: KVSearchDriver - with: - executor: kvidx - is_update: True - traversal_paths: ['m'] \ No newline at end of file diff --git a/tests/integration/dump/test_dump_dbms.py b/tests/integration/dump/test_dump_dbms.py deleted file mode 100644 index de6d381582eca..0000000000000 --- a/tests/integration/dump/test_dump_dbms.py +++ /dev/null @@ -1,366 +0,0 @@ -import functools -import os -import time -from pathlib import Path -from threading import Thread -from typing import List - -import numpy as np -import pytest - -from jina import Flow, Document -from jina.drivers.index import DBMSIndexDriver -from jina.executors.indexers.dump import import_vectors, import_metas -from jina.executors.indexers.query import BaseQueryIndexer -from jina.executors.indexers.query.compound import CompoundQueryExecutor -from jina.logging.profile import TimeContext -from jina.peapods import Pod -from tests.distributed.helpers import get_client - - -def get_documents(nr=10, index_start=0, emb_size=7): - for i in range(index_start, nr + index_start): - with Document() as d: - d.id = i - d.text = f'hello world {i}' - d.embedding = np.random.random(emb_size) - d.tags['tag_field'] = f'tag data {i}' - yield d - - -def basic_benchmark(tmpdir, docs, validate_results_nonempty, error_callback, nr_search): - os.environ['BASIC_QUERY_WS'] = os.path.join(tmpdir, 'basic_query') - os.environ['BASIC_INDEX_WS'] = os.path.join(tmpdir, 'basic_index') - with Flow().add(uses='basic/query.yml') as flow: - flow.index(docs) - - with Flow(return_results=True).add(uses='basic/query.yml') as flow: - with TimeContext( - f'### baseline - query time with {nr_search} on {len(docs)} docs' - ): - results = flow.search( - docs[:nr_search], - ) - validate_results_nonempty(results[0]) - - with Flow().add(uses='basic/index.yml') as flow_dbms: - with TimeContext(f'### baseline - indexing: {len(docs)} docs'): - flow_dbms.index(docs) - - -def assert_dump_data(dump_path, docs, shards, pea_id): - size_shard = len(docs) // shards - size_shard_modulus = len(docs) % shards - ids_dump, vectors_dump = import_vectors( - dump_path, - str(pea_id), - ) - if pea_id == shards - 1: - docs_expected = docs[ - (pea_id) * size_shard : (pea_id + 1) * size_shard + size_shard_modulus - ] - else: - docs_expected = docs[(pea_id) * size_shard : (pea_id + 1) * size_shard] - print(f'### pea {pea_id} has {len(docs_expected)} docs') - - ids_dump = list(ids_dump) - vectors_dump = list(vectors_dump) - np.testing.assert_equal(ids_dump, [d.id for d in docs_expected]) - np.testing.assert_allclose(vectors_dump, [d.embedding for d in docs_expected]) - - _, metas_dump = import_metas( - dump_path, - str(pea_id), - ) - metas_dump = list(metas_dump) - np.testing.assert_equal( - metas_dump, - [ - DBMSIndexDriver._doc_without_embedding(d).SerializeToString() - for d in docs_expected - ], - ) - - # assert with Indexers - # TODO currently metas are only passed to the parent Compound, not to the inner components - with TimeContext(f'### reloading {len(docs_expected)}'): - # noinspection PyTypeChecker - cp: CompoundQueryExecutor = BaseQueryIndexer.load_config( - 'indexer_query.yml', - pea_id=pea_id, - metas={ - 'workspace': os.path.join(dump_path, 'new_ws'), - 'dump_path': dump_path, - }, - ) - for c in cp.components: - assert c.size == len(docs_expected) - - # test with the inner indexers separate from the Compound - for i, indexer_file in enumerate(['basic/query_np.yml', 'basic/query_kv.yml']): - indexer = BaseQueryIndexer.load_config( - indexer_file, - pea_id=pea_id, - metas={ - 'workspace': os.path.realpath(os.path.join(dump_path, f'new_ws-{i}')), - 'dump_path': dump_path, - }, - ) - assert indexer.size == len(docs_expected) - - -def path_size(dump_path): - dir_size = ( - sum(f.stat().st_size for f in Path(dump_path).glob('**/*') if f.is_file()) / 1e6 - ) - return dir_size - - -def _validate_results_nonempty(nr_search, nr_docs, emb_size, resp): - assert len(resp.docs) == nr_search - for d in resp.docs: - if nr_docs < 10: - # using np.testing since pytest+pycharm swallow the stack info on built-in assert - np.testing.assert_equal(len(d.matches), nr_docs) - else: - np.testing.assert_(len(d.matches) > 0, 'no matches') - for m in d.matches: - np.testing.assert_equal(m.embedding.shape[0], emb_size) - assert ( - DBMSIndexDriver._doc_without_embedding(m).SerializeToString() - is not None - ) - assert 'hello world' in m.text - assert f'tag data' in m.tags['tag_field'] - - -def _validate_results_empty(resp): - assert len(resp.docs[0].matches) == 0 - - -def _error_callback(resp): - raise Exception('error callback called') - - -@pytest.mark.parametrize('shards', [5, 3, 1]) -@pytest.mark.parametrize('nr_docs', [7]) -@pytest.mark.parametrize('emb_size', [10]) -def test_dump_dbms( - tmpdir, shards, nr_docs, emb_size, run_basic=False, times_to_index=2 -): - """showcases using replicas + dump + rolling update with independent clients""" - - cb, docs, dump_path, nr_search = _test_dump_prepare( - emb_size, - nr_docs, - run_basic, - shards, - tmpdir, - ) - times_indexed = 0 - full_docs = [] - with Flow.load_config('flow_dbms.yml') as flow_dbms: - with Flow.load_config('flow_query.yml') as flow_query: - while times_indexed < times_to_index: - dump_path = os.path.join(dump_path, f'dump-{str(times_indexed)}') - client_dbms = get_client(flow_dbms.port_expose) - client_query = get_client(flow_query.port_expose) - docs = list( - get_documents( - nr=nr_docs, - index_start=times_indexed * nr_docs, - emb_size=emb_size, - ) - ) - full_docs.extend(docs) - - with TimeContext(f'### indexing {len(docs)} docs'): - # client is used for data requests - client_dbms.index(docs) - - with TimeContext(f'### dumping {len(docs)} docs'): - # flow object is used for ctrl requests - flow_dbms.dump('indexer_dbms', dump_path=dump_path, shards=shards) - - dir_size = path_size(dump_path) - print(f'### dump path size: {dir_size} MBs') - - with TimeContext(f'### rolling update on {len(docs)}'): - # flow object is used for ctrl requests - flow_query.rolling_update('indexer_query', dump_path) - - # data request goes to client - result = client_query.search( - docs[:nr_search], - ) - cb(result[0]) - times_indexed += 1 - - # assert data dumped is correct - for pea_id in range(shards): - assert_dump_data(dump_path, full_docs, shards, pea_id) - - -def _test_dump_prepare(emb_size, nr_docs, run_basic, shards, tmpdir): - docs = list(get_documents(nr=nr_docs, index_start=0, emb_size=emb_size)) - assert len(docs) == nr_docs - nr_search = 3 - - os.environ['USES_AFTER'] = '_merge_matches' if shards > 1 else '_pass' - os.environ['QUERY_SHARDS'] = str(shards) - - validation_query = functools.partial( - _validate_results_nonempty, nr_search, nr_docs * 2, emb_size - ) # x 2 because we run it twice - - if run_basic: - basic_benchmark(tmpdir, docs, validation_query, _error_callback, nr_search) - - dump_path = os.path.join(str(tmpdir), 'dump_dir') - os.environ['DBMS_WORKSPACE'] = os.path.join(str(tmpdir), 'index_ws') - os.environ['QUERY_WORKSPACE'] = os.path.join(str(tmpdir), 'query_ws') - - return validation_query, docs, dump_path, nr_search - - -def _assert_order_ops(ops_log, ops: List[str]): - print(ops_log) - assert len(ops_log) > 0 - last_idx_found = -1 - for rec in ops_log: - for i, op in enumerate(ops): - if op in rec: - print(f'found {op} in {rec}') - if i != last_idx_found + 1: - return False - last_idx_found = i - return last_idx_found == len(ops) - 1 - - -# log of the statements in the threading example -# to assert order -operations = [] - - -def _print_and_append_to_ops(statement): - global operations - operations.append(statement) - print(statement, flush=True) - - -@pytest.mark.repeat(5) -@pytest.mark.parametrize('nr_docs', [700]) -@pytest.mark.parametrize('emb_size', [10]) -def test_threading_query_while_reloading(tmpdir, nr_docs, emb_size, mocker, reraise): - global operations - - def update_rolling(flow, pod_name, dump_path): - with reraise: - flow.rolling_update(pod_name, dump_path) - - # TODO better way to test async procedure call order - # patch - def _rolling_update(self, dump_path): - _print_and_append_to_ops(f'### calling patched rolling update') - for i in range(len(self.replicas)): - _print_and_append_to_ops(f'### replica {i} -- starting') - replica = self.replicas[i] - replica.close() - _print_and_append_to_ops(f'### replica {i} -- went offline') - time.sleep(3) # wait for query to hit system when one replica is offline - _args = self.replicas_args[i] - _args.noblock_on_start = False - _args.dump_path = dump_path - new_replica = Pod(_args) - self.enter_context(new_replica) - _print_and_append_to_ops(f'### replica {i} - new instance online') - self.replicas[i] = new_replica - time.sleep(5) - - mocker.patch( - 'jina.peapods.pods.compoundpod.CompoundPod.rolling_update', - new_callable=lambda: _rolling_update, - ) - - docs = list(get_documents(nr=nr_docs, index_start=0, emb_size=emb_size)) - assert len(docs) == nr_docs - nr_search = 3 - - dump_path = os.path.join(str(tmpdir), 'dump_dir') - os.environ['DBMS_WORKSPACE'] = os.path.join(str(tmpdir), 'index_ws') - os.environ['QUERY_WORKSPACE'] = os.path.join(str(tmpdir), 'query_ws') - - os.environ['USES_AFTER'] = '_pass' - os.environ['QUERY_SHARDS'] = str(1) - - with Flow.load_config('flow_dbms.yml') as flow_dbms: - with Flow.load_config('flow_query.yml') as flow_query: - client_dbms = get_client(flow_dbms.port_expose) - client_query = get_client(flow_query.port_expose) - - with TimeContext(f'### indexing {len(docs)} docs'): - client_dbms.index(docs) - - with TimeContext(f'### dumping {len(docs)} docs'): - flow_dbms.dump('indexer_dbms', dump_path=dump_path, shards=1) - - dir_size = path_size(dump_path) - print(f'### dump path size: {dir_size} MBs') - - # test with query while reloading async. - t = Thread( - target=update_rolling, args=(flow_query, 'indexer_query', dump_path) - ) - - # searching on the still empty replica - t.start() - time.sleep(1) # wait a bit for replica 1 to be offline - _print_and_append_to_ops(f'### querying -- expecting empty') - result = client_query.search( - docs[:nr_search], - ) - _validate_results_empty(result[0]) - - t.join() - - # done with both -- we should have matches now - cb = functools.partial( - _validate_results_nonempty, nr_search, nr_docs, emb_size - ) - - _print_and_append_to_ops(f'### querying -- expecting data') - result = client_query.search( - docs[:nr_search], - ) - cb(result[0]) - - # collect logs and assert order of operations - assert _assert_order_ops( - operations, - [ - '### replica 0 -- went offline', - '### querying -- expecting empty', - '### replica 0 - new instance online', - '### replica 1 -- went offline', - '### replica 1 - new instance online', - '### querying -- expecting data', - ], - ) - operations = [] - - -# benchmark only -@pytest.mark.skipif( - 'GITHUB_WORKFLOW' in os.environ, reason='skip the benchmark test on github workflow' -) -def test_benchmark(tmpdir, mocker): - nr_docs = 100000 - return test_dump_dbms( - tmpdir, - mocker, - shards=1, - nr_docs=nr_docs, - emb_size=128, - run_basic=True, - times_to_index=1, - ) diff --git a/tests/integration/eval_flow/__init__.py b/tests/integration/eval_flow/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/eval_flow/test_flow_eval_pod.py b/tests/integration/eval_flow/test_flow_eval_pod.py deleted file mode 100644 index 430b7e2b37924..0000000000000 --- a/tests/integration/eval_flow/test_flow_eval_pod.py +++ /dev/null @@ -1,100 +0,0 @@ -import os - -import pytest - -from jina.executors.crafters import BaseCrafter -from jina.executors.decorators import single -from jina.flow import Flow -from tests import random_docs, rm_files - - -class DummyEvaluator1(BaseCrafter): - tag = 1 - - @single - def craft(self, id, *args, **kwargs): - with open(f'tmp{self.tag}.txt', 'a') as fp: - fp.write(f'{id}') - return {} - - -class DummyEvaluator2(DummyEvaluator1): - tag = 2 - - -class DummyEvaluator3(DummyEvaluator1): - tag = 3 - - -docs = list(random_docs(1)) -params = ['HANG', 'REMOVE', 'COLLECT'] - - -def validate(ids, expect): - assert len(ids) > 0 - for j in ids: - fname = f'tmp{j}.txt' - assert os.path.exists(fname) == expect - if expect: - with open(fname) as fp: - assert fp.read() != '' - rm_files([fname]) - - -@pytest.mark.parametrize('inspect', params) -@pytest.mark.parametrize('restful', [False, True]) -def test_flow1(inspect, restful): - f = Flow(restful=restful, inspect=inspect).add() - - with f: - f.index(docs) - - -@pytest.mark.parametrize('inspect', params) -@pytest.mark.parametrize('restful', [False, True]) -def test_flow2(inspect, restful): - f = Flow(restful=restful, inspect=inspect).add().inspect(uses='DummyEvaluator1') - - with f: - f.index(docs) - - validate([1], expect=f.args.inspect.is_keep) - - -# TODO(Deepankar): Gets stuck when `restful: True` - issues with `needs='gateway'` -@pytest.mark.parametrize('inspect', params) -@pytest.mark.parametrize('restful', [False]) -def test_flow3(inspect, restful): - f = ( - Flow(restful=restful, inspect=inspect) - .add(name='p1') - .inspect(uses='DummyEvaluator1') - .add(name='p2', needs='gateway') - .needs(['p1', 'p2']) - .inspect(uses='DummyEvaluator2') - ) - - with f: - f.index(docs) - - validate([1, 2], expect=f.args.inspect.is_keep) - - -@pytest.mark.parametrize('inspect', params) -@pytest.mark.parametrize('restful', [False, True]) -def test_flow5(inspect, restful): - f = ( - Flow(restful=restful, inspect=inspect) - .add() - .inspect(uses='DummyEvaluator1') - .add() - .inspect(uses='DummyEvaluator2') - .add() - .inspect(uses='DummyEvaluator3') - .plot(build=True) - ) - - with f: - f.index(docs) - - validate([1, 2, 3], expect=f.args.inspect.is_keep) diff --git a/tests/integration/evaluation/__init__.py b/tests/integration/evaluation/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/evaluation/flow-evaluate-from-file-parallel.yml b/tests/integration/evaluation/flow-evaluate-from-file-parallel.yml deleted file mode 100644 index 5012ee5b719be..0000000000000 --- a/tests/integration/evaluation/flow-evaluate-from-file-parallel.yml +++ /dev/null @@ -1,14 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: gt_indexer - uses: yaml/index-gt.yml - polling: 'all' - needs: gateway - shards: 2 - - name: evaluator - - uses: yaml/single-evaluator.yml - diff --git a/tests/integration/evaluation/flow-evaluate-from-file.yml b/tests/integration/evaluation/flow-evaluate-from-file.yml deleted file mode 100644 index 3c292f45e3016..0000000000000 --- a/tests/integration/evaluation/flow-evaluate-from-file.yml +++ /dev/null @@ -1,8 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: evaluate_from_file - uses: yaml/evaluate-from-file.yml - read_only: true diff --git a/tests/integration/evaluation/flow-index-gt-parallel.yml b/tests/integration/evaluation/flow-index-gt-parallel.yml deleted file mode 100644 index 2b32b6edf66f6..0000000000000 --- a/tests/integration/evaluation/flow-index-gt-parallel.yml +++ /dev/null @@ -1,10 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: gt_indexer - uses: yaml/index-gt.yml - polling: 'all' - needs: gateway - shards: 2 diff --git a/tests/integration/evaluation/flow-index-gt.yml b/tests/integration/evaluation/flow-index-gt.yml deleted file mode 100644 index c518298f7de3b..0000000000000 --- a/tests/integration/evaluation/flow-index-gt.yml +++ /dev/null @@ -1,8 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: gt_indexer - uses: yaml/index-gt.yml - needs: gateway diff --git a/tests/integration/evaluation/flow-parallel-evaluate-from-file-parallel.yml b/tests/integration/evaluation/flow-parallel-evaluate-from-file-parallel.yml deleted file mode 100644 index e8ab15e9cfd04..0000000000000 --- a/tests/integration/evaluation/flow-parallel-evaluate-from-file-parallel.yml +++ /dev/null @@ -1,13 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: gt_indexer - uses: yaml/index-gt.yml - polling: 'all' - needs: gateway - shards: 2 - - name: evaluator - uses: yaml/single-evaluator.yml - parallel: 2 diff --git a/tests/integration/evaluation/rank/__init__.py b/tests/integration/evaluation/rank/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/evaluation/rank/flow-evaluate.yml b/tests/integration/evaluation/rank/flow-evaluate.yml deleted file mode 100644 index 141ac1d5abb84..0000000000000 --- a/tests/integration/evaluation/rank/flow-evaluate.yml +++ /dev/null @@ -1,14 +0,0 @@ -!Flow -version: '1' -pods: - - name: doc_indexer - uses: yaml/index-doc.yml - - name: evaluate_match - uses: yaml/evaluate.yml - read_only: true - - name: ranker - uses: yaml/rank.yml - read_only: true - - name: evaluate_rank - uses: yaml/evaluate.yml - read_only: true diff --git a/tests/integration/evaluation/rank/flow-index.yml b/tests/integration/evaluation/rank/flow-index.yml deleted file mode 100644 index af2aa7c18f15a..0000000000000 --- a/tests/integration/evaluation/rank/flow-index.yml +++ /dev/null @@ -1,6 +0,0 @@ -!Flow -version: '1' -pods: - - name: doc_indexer - uses: yaml/index-doc.yml - needs: gateway diff --git a/tests/integration/evaluation/rank/test_evaluation.py b/tests/integration/evaluation/rank/test_evaluation.py deleted file mode 100644 index f7e75a31e4556..0000000000000 --- a/tests/integration/evaluation/rank/test_evaluation.py +++ /dev/null @@ -1,151 +0,0 @@ -import os - -import numpy as np - -from jina import Document -from jina.flow import Flow - -from tests import validate_callback - - -def test_evaluation(tmpdir, mocker): - os.environ['JINA_TEST_RANKING_EVALUATION'] = str(tmpdir) - - def index_documents(): - """Index Documents: - doc: tag__id = 0 - tag__dummy_score = 0 - embedding = 0 - doc: tag__id = 1 - tag__dummy_score = -1 - embedding = 1 - doc: tag__id = 2 - tag__dummy_score = -2 - embedding = 2 - """ - with Document() as doc0: - doc0.tags['id'] = '0' - doc0.tags['dummy_score'] = 0 - doc0.embedding = np.array([0]) - - with Document() as doc1: - doc1.tags['id'] = '1' - doc1.tags['dummy_score'] = -1 - doc1.embedding = np.array([1]) - - with Document() as doc2: - doc2.tags['id'] = '2' - doc2.tags['dummy_score'] = -2 - doc2.embedding = np.array([2]) - - return [doc0, doc1, doc2] - - with Flow.load_config('flow-index.yml') as index_flow: - index_flow.index(inputs=index_documents) - - def validate_evaluation_response(resp): - assert len(resp.docs) == 2 - for doc in resp.docs: - assert ( - len(doc.evaluations) == 8 - ) # 2 evaluation Pods with 4 evaluations each - - doc = resp.docs[0] - assert len(doc.matches) == 2 - assert doc.evaluations[0].op_name == 'PrecisionEvaluator@1' - assert doc.evaluations[0].value == 1.0 - assert doc.evaluations[1].op_name == 'PrecisionEvaluator@2' - assert doc.evaluations[1].value == 0.5 - assert doc.evaluations[2].op_name == 'RecallEvaluator@1' - assert doc.evaluations[2].value == 0.5 - assert doc.evaluations[3].op_name == 'RecallEvaluator@2' - assert doc.evaluations[3].value == 0.5 - - assert doc.evaluations[4].op_name == 'PrecisionEvaluator@1' - assert doc.evaluations[4].value == 1.0 - assert doc.evaluations[5].op_name == 'PrecisionEvaluator@2' - assert doc.evaluations[5].value == 0.5 - assert doc.evaluations[6].op_name == 'RecallEvaluator@1' - assert doc.evaluations[6].value == 0.5 - assert doc.evaluations[7].op_name == 'RecallEvaluator@2' - assert doc.evaluations[7].value == 0.5 - - doc = resp.docs[1] - assert doc.evaluations[0].op_name == 'PrecisionEvaluator@1' - assert doc.evaluations[0].value == 1.0 - assert doc.evaluations[1].op_name == 'PrecisionEvaluator@2' - assert doc.evaluations[1].value == 1.0 - assert doc.evaluations[2].op_name == 'RecallEvaluator@1' - assert doc.evaluations[2].value == 0.5 - assert doc.evaluations[3].op_name == 'RecallEvaluator@2' - assert doc.evaluations[3].value == 1.0 - - assert doc.evaluations[4].op_name == 'PrecisionEvaluator@1' - assert doc.evaluations[4].value == 1.0 - assert doc.evaluations[5].op_name == 'PrecisionEvaluator@2' - assert doc.evaluations[5].value == 1.0 - assert doc.evaluations[6].op_name == 'RecallEvaluator@1' - assert doc.evaluations[6].value == 0.5 - assert doc.evaluations[7].op_name == 'RecallEvaluator@2' - assert doc.evaluations[7].value == 1.0 - - def doc_groundtruth_evaluation_pairs(): - with Document() as doc0: - doc0.embedding = np.array([0]) - - with Document() as groundtruth0: - m1 = Document(id='1' * 16) - m1.score.value = -1 - match0 = groundtruth0.matches.append(m1) - match0.tags['id'] = '0' - m2 = Document(id='2' * 16) - m2.score.value = -1 - match1 = groundtruth0.matches.append(m2) - match1.tags['id'] = '2' - # top_k is set to 2 for VectorSearchDriver - # expects as matches [0, 2] but given [0, 1] - # Precision@1 = 100% - # Precision@2 = 50% - # Recall@1 = 100% - # Recall@2 = 50% - - # expects as ranked [0, 2] but given [0, 1] - # Precision@1 = 100% - # Precision@2 = 50% - # Recall@1 = 100% - # Recall@2 = 50% - - with Document() as doc1: - doc1.embedding = np.array([2]) - - with Document() as groundtruth1: - m1 = Document(id='1' * 16) - m1.score.value = -1 - match0 = groundtruth1.matches.append(m1) - match0.tags['id'] = '1' - m2 = Document(id='2' * 16) - m2.score.value = -1 - match1 = groundtruth1.matches.append(m2) - match1.tags['id'] = '2' - # expects as matches [1, 2] but given [2, 1] - # Precision@1 = 100% - # Precision@2 = 100% - # Recall@1 = 100% - # Recall@2 = 100% - - # expects as ranked [1, 2] but given [2, 1] - # Precision@1 = 100% - # Precision@2 = 100% - # Recall@1 = 100% - # Recall@2 = 100% - - return [(doc0, groundtruth0), (doc1, groundtruth1)] - - response_mock = mocker.Mock() - with Flow.load_config('flow-evaluate.yml') as evaluate_flow: - evaluate_flow.search( - inputs=doc_groundtruth_evaluation_pairs, on_done=response_mock, top_k=2 - ) - - del os.environ['JINA_TEST_RANKING_EVALUATION'] - validate_callback(response_mock, validate_evaluation_response) diff --git a/tests/integration/evaluation/rank/yaml/__init__.py b/tests/integration/evaluation/rank/yaml/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/evaluation/rank/yaml/dummy_ranker.py b/tests/integration/evaluation/rank/yaml/dummy_ranker.py deleted file mode 100644 index 691942d0e0cee..0000000000000 --- a/tests/integration/evaluation/rank/yaml/dummy_ranker.py +++ /dev/null @@ -1,27 +0,0 @@ -from typing import Dict, List - -from jina.executors.rankers import Match2DocRanker -from jina.executors.decorators import batching - - -class DummyRanker(Match2DocRanker): - """ - :class:`LevenshteinRanker` Computes the negative Levenshtein distance - between a query and its matches. The distance is negative, in order to - achieve a bigger=better sorting in the respective driver. - """ - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.match_required_keys = ['tags__dummy_score'] - - @batching(slice_nargs=3) - def score( - self, - old_match_scores: List[Dict], - queries_metas: List[Dict], - matches_metas: List[List[Dict]], - ) -> List[List[float]]: - return [ - [m['tags__dummy_score'] for m in match_meta] for match_meta in matches_metas - ] diff --git a/tests/integration/evaluation/rank/yaml/evaluate.yml b/tests/integration/evaluation/rank/yaml/evaluate.yml deleted file mode 100644 index 4816e7619d4a1..0000000000000 --- a/tests/integration/evaluation/rank/yaml/evaluate.yml +++ /dev/null @@ -1,47 +0,0 @@ -!CompoundExecutor -components: - - !PrecisionEvaluator - with: - eval_at: 1 - metas: - name: precision-1 - - !PrecisionEvaluator - with: - eval_at: 2 - metas: - name: precision-2 - - !RecallEvaluator - with: - eval_at: 1 - metas: - name: recall-1 - - !RecallEvaluator - with: - eval_at: 2 - metas: - name: recall-2 -metas: - name: evaluation -requests: - on: - [ SearchRequest ]: - - !RankEvaluateDriver - with: - executor: precision-1 - traversal_paths: [ 'r' ] - fields: ['tags__id'] - - !RankEvaluateDriver - with: - executor: precision-2 - traversal_paths: [ 'r' ] - fields: ['tags__id'] - - !RankEvaluateDriver - with: - executor: recall-1 - traversal_paths: [ 'r' ] - fields: ['tags__id'] - - !RankEvaluateDriver - with: - executor: recall-2 - traversal_paths: [ 'r' ] - fields: ['tags__id'] diff --git a/tests/integration/evaluation/rank/yaml/index-doc.yml b/tests/integration/evaluation/rank/yaml/index-doc.yml deleted file mode 100644 index 04b412de409bf..0000000000000 --- a/tests/integration/evaluation/rank/yaml/index-doc.yml +++ /dev/null @@ -1,37 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: chunk_indexer - workspace: $JINA_TEST_RANKING_EVALUATION -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVIndexDriver - with: - executor: docidx - traversal_paths: ['r'] - [SearchRequest]: - - !VectorSearchDriver - with: - executor: vecidx - top_k: 2 - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: docidx - traversal_paths: ['m'] \ No newline at end of file diff --git a/tests/integration/evaluation/rank/yaml/rank.yml b/tests/integration/evaluation/rank/yaml/rank.yml deleted file mode 100644 index 967baa4fd5da7..0000000000000 --- a/tests/integration/evaluation/rank/yaml/rank.yml +++ /dev/null @@ -1,9 +0,0 @@ -!DummyRanker -metas: - py_modules: dummy_ranker.py -requests: - on: - SearchRequest: - - !Matches2DocRankDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/evaluation/test_evaluation_from_file.py b/tests/integration/evaluation/test_evaluation_from_file.py deleted file mode 100644 index a6233d16c9c71..0000000000000 --- a/tests/integration/evaluation/test_evaluation_from_file.py +++ /dev/null @@ -1,106 +0,0 @@ -import os - -import pytest - -from jina.flow import Flow -from jina import Document - -from tests import validate_callback - - -@pytest.fixture -def index_groundtruth(): - """Index Groundtruth: - doc: id = 00 - tag__groundtruth = True - text = aa - doc: id = 01 - tag__groundtruth = True - text = aa - doc: id = 02 - tag__groundtruth = True - text = aa - ... we will not have groundtruth for id 5, 10, 50 - """ - docs = [] - for idx in range(0, 100): - doc = Document() - doc.id = f'{idx:0>16}' - doc.tags['groundtruth'] = True - doc.text = 'aa' - if idx not in (5, 10, 50): - docs.append(doc) - return docs - - -@pytest.fixture -def evaluate_docs(): - """Evaluate Documents: - doc: id = 00 - tag__groundtruth = False - text = aaa - doc: id = 01 - tag__groundtruth = False - text = aaa - doc: id = 02 - tag__groundtruth = False - text = aaa - ... - """ - docs = [] - for idx in range(0, 100): - doc = Document() - doc.id = f'{idx:0>16}' - doc.tags['groundtruth'] = False - doc.text = 'aaa' - docs.append(doc) - return docs - - -@pytest.fixture -def random_workspace(tmpdir): - os.environ['JINA_TEST_EVALUATION_FROM_FILE'] = str(tmpdir) - yield - del os.environ['JINA_TEST_EVALUATION_FROM_FILE'] - - -@pytest.mark.parametrize( - 'index_yaml, search_yaml', - [ - ('flow-index-gt.yml', 'flow-evaluate-from-file.yml'), - ('flow-index-gt-parallel.yml', 'flow-evaluate-from-file-parallel.yml'), - ('flow-index-gt-parallel.yml', 'flow-parallel-evaluate-from-file-parallel.yml'), - ], -) -@pytest.mark.parametrize('restful', [False, True]) -def test_evaluation_from_file( - random_workspace, - index_groundtruth, - evaluate_docs, - index_yaml, - search_yaml, - restful, - mocker, - monkeypatch, -): - monkeypatch.setenv("RESTFUL", restful) - - with Flow.load_config(index_yaml) as index_gt_flow: - index_gt_flow.index(inputs=index_groundtruth, request_size=10) - - def validate_evaluation_response(resp): - assert len(resp.docs) == 97 - assert len(resp.groundtruths) == 97 - for doc in resp.docs: - assert len(doc.evaluations) == 1 - assert doc.evaluations[0].value == 1.0 - assert not doc.tags['groundtruth'] - for gt in resp.groundtruths: - assert gt.tags['groundtruth'] - - mock = mocker.Mock() - with Flow.load_config(search_yaml) as evaluate_flow: - evaluate_flow.search(inputs=evaluate_docs, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_evaluation_response) diff --git a/tests/integration/evaluation/yaml/__init__.py b/tests/integration/evaluation/yaml/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/evaluation/yaml/evaluate-from-file.yml b/tests/integration/evaluation/yaml/evaluate-from-file.yml deleted file mode 100644 index 0a756e2ec4ef3..0000000000000 --- a/tests/integration/evaluation/yaml/evaluate-from-file.yml +++ /dev/null @@ -1,24 +0,0 @@ -!FileBasedEvaluator -components: - - !BinaryPbIndexer - with: - index_filename: gt.gz - metas: - name: gtidx - workspace: $JINA_TEST_EVALUATION_FROM_FILE - - !TextLengthEvaluator - metas: - name: length-evaluator -metas: - name: file-based-evaluation - workspace: $JINA_TEST_EVALUATION_FROM_FILE -requests: - on: - SearchRequest: - - !LoadGroundTruthDriver - with: - executor: gtidx - - !TextEvaluateDriver - with: - executor: length-evaluator - traversal_paths: [ 'r' ] diff --git a/tests/integration/evaluation/yaml/index-gt.yml b/tests/integration/evaluation/yaml/index-gt.yml deleted file mode 100644 index 4874afac227e6..0000000000000 --- a/tests/integration/evaluation/yaml/index-gt.yml +++ /dev/null @@ -1,12 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: gt.gz -metas: - name: gtidx - workspace: $JINA_TEST_EVALUATION_FROM_FILE -requests: - on: - IndexRequest: - - !KVIndexDriver {} - SearchRequest: - - !LoadGroundTruthDriver {} diff --git a/tests/integration/evaluation/yaml/single-evaluator.yml b/tests/integration/evaluation/yaml/single-evaluator.yml deleted file mode 100644 index daca4c5465713..0000000000000 --- a/tests/integration/evaluation/yaml/single-evaluator.yml +++ /dev/null @@ -1,7 +0,0 @@ -!TextLengthEvaluator -requests: - on: - SearchRequest: - - !TextEvaluateDriver - with: - traversal_paths: [ 'r' ] \ No newline at end of file diff --git a/tests/integration/executors_batching/__init__.py b/tests/integration/executors_batching/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/executors_batching/test_craft_executors_batching.py b/tests/integration/executors_batching/test_craft_executors_batching.py deleted file mode 100644 index abc486f64c5b4..0000000000000 --- a/tests/integration/executors_batching/test_craft_executors_batching.py +++ /dev/null @@ -1,345 +0,0 @@ -import pytest - -import numpy as np - -from jina.executors.crafters import BaseCrafter -from jina.executors.decorators import ( - batching, - single, -) -from jina import Document -from jina.flow import Flow -from jina.types.ndarray.generic import NdArray -from jina.types.arrays import DocumentArray -from tests import validate_callback - - -class DummyCrafterTextBatching(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=3) - def craft(self, text, *args, **kwargs): - assert len(text) == 3 - return [{'text': f'{txt}-crafted'} for txt in text] - - -class DummyCrafterTextSingle(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @single - def craft(self, text, *args, **kwargs): - assert isinstance(text, str) - return {'text': f'{text}-crafted'} - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize( - 'crafter', [DummyCrafterTextSingle(), DummyCrafterTextBatching()] -) -def test_batching_text_one_argument(stack, crafter): - docs = DocumentArray([Document(text=f'text-{i}') for i in range(15)]) - texts, _ = docs.extract_docs('text', stack_contents=stack) - - crafted_docs = crafter.craft(texts) - for i, crafted_doc in enumerate(crafted_docs): - assert crafted_doc['text'] == f'text-{i}-crafted' - - -@pytest.mark.parametrize( - 'crafter', ['!DummyCrafterTextSingle', '!DummyCrafterTextBatching'] -) -def test_batching_text_one_argument_flow(crafter, mocker): - NUM_DOCS = 15 - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for i, doc in enumerate(resp.index.docs): - assert doc.text == f'text-{i}-crafted' - - docs = DocumentArray([Document(text=f'text-{i}') for i in range(NUM_DOCS)]) - mock = mocker.Mock() - - with Flow().add(name='crafter', uses=crafter) as f: - f.index(inputs=docs, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -class DummyCrafterTextIdBatching(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=3, slice_nargs=2) - def craft(self, text, id, *args, **kwargs): - assert len(text) == 3 - assert len(id) == 3 - return [ - {'text': f'{txt}-crafted', 'id': f'{i}-crafted'} for i, txt in zip(id, text) - ] - - -class DummyCrafterTextIdSingle(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @single(slice_nargs=2) - def craft(self, text, id, *args, **kwargs): - assert isinstance(text, str) - assert isinstance(id, str) - return {'text': f'{text}-crafted', 'id': f'{id}-crafted'} - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize( - 'crafter', [DummyCrafterTextIdSingle(), DummyCrafterTextIdBatching()] -) -def test_batching_text_multi(stack, crafter): - docs = DocumentArray([Document(text=f'text-{i}', id=f'id-{i}') for i in range(15)]) - required_keys = ['text', 'id'] - text_ids, _ = docs.extract_docs(*required_keys, stack_contents=stack) - - crafted_docs = crafter.craft(*text_ids) - - for i, crafted_doc in enumerate(crafted_docs): - assert crafted_doc['text'] == f'text-{i}-crafted' - assert crafted_doc['id'] == f'id-{i}-crafted' - - -@pytest.mark.parametrize( - 'crafter', ['!DummyCrafterTextIdSingle', '!DummyCrafterTextIdBatching'] -) -def test_batching_text_multi_flow(crafter, mocker): - NUM_DOCS = 15 - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for i, doc in enumerate(resp.index.docs): - assert doc.text == f'text-{i}-crafted' - assert doc.id == f'id-{i}-crafted' - - docs = DocumentArray( - [Document(text=f'text-{i}', id=f'id-{i}') for i in range(NUM_DOCS)] - ) - mock = mocker.Mock() - - with Flow().add(name='crafter', uses=crafter) as f: - f.index(inputs=docs, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -class DummyCrafterBlobBatching(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=3) - def craft(self, blob, *args, **kwargs): - assert len(blob) == 3 - return [{'blob': b} for b in blob] - - -class DummyCrafterBlobSingle(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @single - def craft(self, blob, *args, **kwargs): - assert isinstance(blob, np.ndarray) - return {'blob': blob} - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize( - 'crafter', [DummyCrafterBlobSingle(), DummyCrafterBlobBatching()] -) -def test_batching_blob_one_argument(stack, crafter): - docs = DocumentArray( - [Document(blob=np.array([[i] * 5, [i] * 5])) for i in range(15)] - ) - texts, _ = docs.extract_docs('blob', stack_contents=stack) - - crafted_docs = crafter.craft(texts) - for i, crafted_doc in enumerate(crafted_docs): - np.testing.assert_equal(crafted_doc['blob'], np.array([[i] * 5, [i] * 5])) - - -@pytest.mark.parametrize( - 'crafter', ['!DummyCrafterBlobSingle', '!DummyCrafterBlobBatching'] -) -def test_batching_blob_one_argument_flow(crafter, mocker): - NUM_DOCS = 15 - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for i, doc in enumerate(resp.index.docs): - np.testing.assert_equal( - NdArray(doc.blob).value, np.array([[i] * 5, [i] * 5]) - ) - - docs = DocumentArray( - [Document(blob=np.array([[i] * 5, [i] * 5])) for i in range(NUM_DOCS)] - ) - mock = mocker.Mock() - - with Flow().add(name='crafter', uses=crafter) as f: - f.index(inputs=docs, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -class DummyCrafterBlobEmbeddingBatching(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=3, slice_nargs=2) - def craft(self, blob, embedding, *args, **kwargs): - assert len(blob) == 3 - assert len(embedding) == 3 - return [{'blob': b, 'embedding': e} for b, e in zip(blob, embedding)] - - -class DummyCrafterBlobEmbeddingSingle(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @single(slice_nargs=2) - def craft(self, blob, embedding, *args, **kwargs): - assert isinstance(blob, np.ndarray) - assert isinstance(embedding, np.ndarray) - return {'blob': blob, 'embedding': embedding} - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize( - 'crafter', [DummyCrafterBlobEmbeddingSingle(), DummyCrafterBlobEmbeddingBatching()] -) -def test_batching_blob_multi(stack, crafter): - docs = DocumentArray( - [ - Document( - blob=np.array([[i] * 5, [i] * 5]), - embedding=np.array([i] * 5), - ) - for i in range(15) - ] - ) - required_keys = ['blob', 'embedding'] - text_ids, _ = docs.extract_docs(*required_keys, stack_contents=stack) - - crafted_docs = crafter.craft(*text_ids) - - for i, crafted_doc in enumerate(crafted_docs): - np.testing.assert_equal(crafted_doc['blob'], np.array([[i] * 5, [i] * 5])) - np.testing.assert_equal(crafted_doc['embedding'], np.array([i] * 5)) - - -@pytest.mark.parametrize( - 'crafter', - ['!DummyCrafterBlobEmbeddingSingle', '!DummyCrafterBlobEmbeddingBatching'], -) -def test_batching_blob_multi_flow(crafter, mocker): - NUM_DOCS = 15 - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for i, doc in enumerate(resp.index.docs): - np.testing.assert_equal( - NdArray(doc.blob).value, np.array([[i] * 5, [i] * 5]) - ) - np.testing.assert_equal(NdArray(doc.embedding).value, np.array([i] * 5)) - - docs = DocumentArray( - [ - Document( - blob=np.array([[i] * 5, [i] * 5]), - embedding=np.array([i] * 5), - ) - for i in range(NUM_DOCS) - ] - ) - mock = mocker.Mock() - - with Flow().add(name='crafter', uses=crafter) as f: - f.index(inputs=docs, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -class DummyCrafterTextEmbeddingBatching(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=3, slice_nargs=2) - def craft(self, text, embedding, *args, **kwargs): - assert len(text) == 3 - assert len(embedding) == 3 - return [ - {'text': f'{t}-crafted', 'embedding': e} for t, e in zip(text, embedding) - ] - - -class DummyCrafterTextEmbeddingSingle(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.required_keys = ['text', 'embedding'] - - @single(slice_nargs=2) - def craft(self, text, embedding, *args, **kwargs): - assert isinstance(text, str) - assert isinstance(embedding, np.ndarray) - return {'text': f'{text}-crafted', 'embedding': embedding} - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize( - 'crafter', [DummyCrafterTextEmbeddingSingle(), DummyCrafterTextEmbeddingBatching()] -) -def test_batching_mix_multi(stack, crafter): - docs = DocumentArray( - [Document(text=f'text-{i}', embedding=np.array([i] * 5)) for i in range(15)] - ) - required_keys = ['text', 'embedding'] - text_ids, _ = docs.extract_docs(*required_keys, stack_contents=stack) - - crafted_docs = crafter.craft(*text_ids) - - for i, crafted_doc in enumerate(crafted_docs): - assert crafted_doc['text'] == f'text-{i}-crafted' - np.testing.assert_equal(crafted_doc['embedding'], np.array([i] * 5)) - - -@pytest.mark.parametrize( - 'crafter', - ['!DummyCrafterTextEmbeddingSingle', '!DummyCrafterTextEmbeddingBatching'], -) -def test_batching_mix_multi_flow(crafter, mocker): - NUM_DOCS = 15 - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for i, doc in enumerate(resp.index.docs): - assert doc.text == f'text-{i}-crafted' - np.testing.assert_equal(NdArray(doc.embedding).value, np.array([i] * 5)) - - docs = DocumentArray( - [ - Document( - text=f'text-{i}', - embedding=np.array([i] * 5), - ) - for i in range(NUM_DOCS) - ] - ) - mock = mocker.Mock() - - with Flow().add(name='crafter', uses=crafter) as f: - f.index(inputs=docs, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) diff --git a/tests/integration/executors_batching/test_encoder_executors_batching.py b/tests/integration/executors_batching/test_encoder_executors_batching.py deleted file mode 100644 index 296aa923d24f9..0000000000000 --- a/tests/integration/executors_batching/test_encoder_executors_batching.py +++ /dev/null @@ -1,82 +0,0 @@ -import pytest - -import numpy as np - -from jina.executors.encoders import BaseEncoder -from jina.executors.decorators import batching, single, as_ndarray -from jina import Document -from jina.types.arrays import DocumentArray - -EMBED_SIZE = 10 - - -class DummyEncoderTextBatching(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @as_ndarray - @batching(batch_size=3) - def encode(self, content: 'np.ndarray', *args, **kwargs): - assert isinstance(content, np.ndarray) - assert isinstance(content[0], str) - assert content.shape[0] == 3 - return np.random.random((content.shape[0], EMBED_SIZE)) - - -class DummyEncoderTextSingle(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @as_ndarray - @single - def encode(self, content, *args, **kwargs): - assert isinstance(content, str) - return np.random.random(EMBED_SIZE) - - -@pytest.mark.parametrize( - 'encoder', [DummyEncoderTextSingle(), DummyEncoderTextBatching()] -) -def test_batching_encode_text(encoder): - docs = DocumentArray([Document(text=f'text-{i}') for i in range(15)]) - texts, _ = docs.all_contents - - embeds = encoder.encode(texts) - - assert embeds.shape == (15, 10) - - -class DummyEncoderBlobBatching(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @as_ndarray - @batching(batch_size=3) - def encode(self, content, *args, **kwargs): - assert isinstance(content, np.ndarray) - assert isinstance(content[0], np.ndarray) - assert content.shape[0] == 3 - return np.random.random((content.shape[0], EMBED_SIZE)) - - -class DummyEncoderBlobSingle(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @as_ndarray - @single - def encode(self, content, *args, **kwargs): - assert isinstance(content, np.ndarray) - return np.random.random(EMBED_SIZE) - - -@pytest.mark.parametrize( - 'encoder', [DummyEncoderBlobSingle(), DummyEncoderBlobBatching()] -) -def test_batching_encode_blob(encoder): - docs = DocumentArray([Document(blob=np.random.random((10, 20))) for _ in range(15)]) - blob, _ = docs.all_contents - - embeds = encoder.encode(blob) - - assert embeds.shape == (15, 10) diff --git a/tests/integration/executors_batching/test_match2docranker_exectuors_batching.py b/tests/integration/executors_batching/test_match2docranker_exectuors_batching.py deleted file mode 100644 index a539d179e5070..0000000000000 --- a/tests/integration/executors_batching/test_match2docranker_exectuors_batching.py +++ /dev/null @@ -1,104 +0,0 @@ -import pytest - -from typing import List, Dict -from jina.executors.rankers import Match2DocRanker -from jina.executors.decorators import batching, single -from jina import Document -from jina.types.arrays import DocumentArray -from jina.types.score import NamedScore -from jina.flow import Flow -from tests import validate_callback - - -class DummyRankerBatching(Match2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.match_required_keys = ['tags__dummy_score'] - - @batching(batch_size=3, slice_nargs=3) - def score( - self, - old_match_scores: List[Dict], - queries_metas: List[Dict], - matches_metas: List[List[Dict]], - ) -> List[List[float]]: - return [ - [m['tags__dummy_score'] for m in match_meta] for match_meta in matches_metas - ] - - -class DummyRankerSingle(Match2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.match_required_keys = ['tags__dummy_score'] - - @single(slice_nargs=3, flatten_output=False) - def score(self, old_match_scores, query_metas, match_meta) -> List[List[float]]: - return [m['tags__dummy_score'] for m in match_meta] - - -@pytest.mark.parametrize('ranker', [DummyRankerSingle(), DummyRankerBatching()]) -def test_match2docranker_batching(ranker): - NUM_DOCS_QUERIES = 15 - NUM_MATCHES = 10 - - old_matches_scores = [] - queries_metas = [] - matches_metas = [] - queries = DocumentArray([]) - for i in range(NUM_DOCS_QUERIES): - old_match_scores = [] - match_metas = [] - query = Document(id=f'query-{i}') - for j in range(NUM_MATCHES): - m = Document(id=f'match-{i}-{j}', tags={'dummy_score': j}) - query.matches.append(m) - old_match_scores.append(0) - match_metas.append(m.get_attrs('tags__dummy_score')) - queries.append(query) - old_matches_scores.append(old_match_scores) - queries_metas.append(None) - matches_metas.append(match_metas) - - queries_scores = ranker.score(old_matches_scores, queries_metas, matches_metas) - assert len(queries_scores) == NUM_DOCS_QUERIES - - for i, (query, matches_scores) in enumerate(zip(queries, queries_scores)): - assert len(matches_scores) == NUM_MATCHES - for j, (match, score) in enumerate(zip(query.matches, matches_scores)): - match.score = NamedScore(value=j) - assert score == j - - query.matches.sort(key=lambda x: x.score.value, reverse=True) - - for j, match in enumerate(query.matches, 1): - assert match.id == f'match-{i}-{NUM_MATCHES - j}' - assert match.score.value == NUM_MATCHES - j - - -@pytest.mark.parametrize('ranker', ['!DummyRankerSingle', '!DummyRankerBatching']) -def test_match2docranker_batching_flow(ranker, mocker): - NUM_DOCS_QUERIES = 15 - NUM_MATCHES = 10 - queries = DocumentArray([]) - for i in range(NUM_DOCS_QUERIES): - query = Document(id=f'query-{i}') - for j in range(NUM_MATCHES): - m = Document(id=f'match-{i}-{j}', tags={'dummy_score': j}) - query.matches.append(m) - queries.append(query) - - def validate_response(resp): - assert len(resp.search.docs) == NUM_DOCS_QUERIES - for i, query in enumerate(resp.search.docs): - for j, match in enumerate(query.matches, 1): - assert match.id == f'match-{i}-{NUM_MATCHES - j}' - assert match.score.value == NUM_MATCHES - j - - mock = mocker.Mock() - - with Flow().add(name='ranker', uses=ranker) as f: - f.search(inputs=queries, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) diff --git a/tests/integration/executors_batching/test_segment_executors_batching.py b/tests/integration/executors_batching/test_segment_executors_batching.py deleted file mode 100644 index f3a38a21628f0..0000000000000 --- a/tests/integration/executors_batching/test_segment_executors_batching.py +++ /dev/null @@ -1,71 +0,0 @@ -import pytest - -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import batching, single -from jina import Document -from jina.flow import Flow -from jina.types.arrays import DocumentArray -from tests import validate_callback - -NUM_CHUNKS = 3 - - -class DummySegmenterTextBatching(BaseSegmenter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.counter = 0 - - @batching(batch_size=3) - def segment(self, text, *args, **kwargs): - assert len(text) == 3 - return [ - [{'text': f'{txt}-chunk-{chunk}'} for chunk in range(NUM_CHUNKS)] - for txt in text - ] - - -class DummySegmenterTextSingle(BaseSegmenter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @single - def segment(self, text, *args, **kwargs): - assert isinstance(text, str) - return [{'text': f'{text}-chunk-{chunk}'} for chunk in range(NUM_CHUNKS)] - - -@pytest.mark.parametrize( - 'segmenter', [DummySegmenterTextSingle(), DummySegmenterTextBatching()] -) -def test_batching_text_one_argument(segmenter): - docs = DocumentArray([Document(text=f'text-{i}') for i in range(15)]) - texts, _ = docs.extract_docs('text') - - chunks_sets = segmenter.segment(texts) - for i, chunks in enumerate(chunks_sets): - assert len(chunks) == NUM_CHUNKS - for j, chunk in enumerate(chunks): - assert chunk['text'] == f'text-{i}-chunk-{j}' - - -@pytest.mark.parametrize( - 'segmenter', ['!DummySegmenterTextSingle', '!DummySegmenterTextBatching'] -) -def test_batching_text_one_argument_flow(segmenter, mocker): - NUM_DOCS = 15 - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for i, doc in enumerate(resp.index.docs): - assert len(doc.chunks) == NUM_CHUNKS - for j, chunk in enumerate(doc.chunks): - assert chunk.text == f'text-{i}-chunk-{j}' - - docs = DocumentArray([Document(text=f'text-{i}') for i in range(NUM_DOCS)]) - mock = mocker.Mock() - - with Flow().add(name='segmenter', uses=segmenter) as f: - f.index(inputs=docs, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) diff --git a/tests/integration/executors_generic/__init__.py b/tests/integration/executors_generic/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/executors_generic/test_generic_executor.py b/tests/integration/executors_generic/test_generic_executor.py deleted file mode 100644 index 0c9dc638529b4..0000000000000 --- a/tests/integration/executors_generic/test_generic_executor.py +++ /dev/null @@ -1,45 +0,0 @@ -import os - -import numpy as np -import pytest - -from jina import Flow, Document -from jina.executors import GenericExecutor -from jina.executors.decorators import requests - - -class MyExecutor(GenericExecutor): - @requests - def foo(self, id): - return [{'embedding': np.array([1, 2, 3])}] * len(id) - - @requests(on='SearchRequest') - def bar(self, id): - return [{'embedding': np.array([4, 5, 6])}] * len(id) - - @requests(on='UpdateRequest') - def bar2(self, id): - return [{'embedding': np.array([10, 11, 12])}] * len(id) - - -@pytest.mark.skipif( - 'GITHUB_WORKFLOW' in os.environ, - reason='locally it works fine, somehow this stuck on Github', -) -@pytest.mark.parametrize( - 'api, result', - [['index', [1, 2, 3]], ['search', [4, 5, 6]], ['update', [10, 11, 12]]], -) -def test_generic_executor_with_routing_default(api, result, mocker): - resp_mock = mocker.Mock() - - def validate(req): - resp_mock() - np.testing.assert_equal(req.docs[0].embedding, np.array(result)) - - f = Flow().add(uses=MyExecutor) - - with f: - getattr(f, api)(Document(), on_done=validate) - - resp_mock.assert_called() diff --git a/tests/integration/gateway_non_blocking/flow.yml b/tests/integration/gateway_non_blocking/flow.yml deleted file mode 100644 index 316b326d6eb4f..0000000000000 --- a/tests/integration/gateway_non_blocking/flow.yml +++ /dev/null @@ -1,9 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: pod - uses: yaml/fast_slow.yml - read_only: true - parallel: $JINA_NON_BLOCKING_PARALLEL diff --git a/tests/integration/gateway_non_blocking/test_gateway_non_blocking.py b/tests/integration/gateway_non_blocking/test_gateway_non_blocking.py index 3ea2514886922..3aaf395996781 100644 --- a/tests/integration/gateway_non_blocking/test_gateway_non_blocking.py +++ b/tests/integration/gateway_non_blocking/test_gateway_non_blocking.py @@ -1,29 +1,35 @@ -import os - import pytest -from jina.flow import Flow +from jina import Flow, Executor, requests, DocumentArray, Document + +import time + + +class FastSlowExecutor(Executor): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) -cur_dir = os.path.dirname(os.path.abspath(__file__)) + @requests(on=['/search']) + def encode(self, docs: DocumentArray, *args, **kwargs): + assert len(docs) == 1 + if docs[0].text == 'slow': + time.sleep(2) @pytest.mark.parametrize( 'parallel, expected_response', [(1, ['slow', 'fast']), (2, ['fast', 'slow'])] ) -@pytest.mark.parametrize('restful', [False]) -def test_non_blocking_gateway(parallel, expected_response, restful, monkeypatch): - monkeypatch.setenv("JINA_NON_BLOCKING_PARALLEL", str(parallel)) - monkeypatch.setenv("RESTFUL", str(restful)) +def test_non_blocking_gateway(parallel, expected_response): response = [] def fill_responses(resp): - assert len(resp.docs) == 1 - response.append(resp.docs[0].text) + assert len(resp.data.docs) == 1 + response.append(resp.data.docs[0].text) - data = ['slow', 'fast'] + data = DocumentArray([Document(text='slow'), Document(text='fast')]) - with Flow().load_config(os.path.join(cur_dir, 'flow.yml')) as f: - f.search(inputs=data, on_done=fill_responses, request_size=1) + f = Flow().add(uses=FastSlowExecutor, parallel=parallel) + with f: + f.post(on='/search', inputs=data, request_size=1, on_done=fill_responses) - del os.environ['JINA_NON_BLOCKING_PARALLEL'] assert response == expected_response diff --git a/tests/integration/gateway_non_blocking/yaml/fast_slow.yml b/tests/integration/gateway_non_blocking/yaml/fast_slow.yml deleted file mode 100644 index 805e16a886fef..0000000000000 --- a/tests/integration/gateway_non_blocking/yaml/fast_slow.yml +++ /dev/null @@ -1,7 +0,0 @@ -!BaseCrafter -metas: - py_modules: fast_slow_driver.py -requests: - on: - [IndexRequest, SearchRequest]: - - !FastSlowDriver {} diff --git a/tests/integration/gateway_non_blocking/yaml/fast_slow_driver.py b/tests/integration/gateway_non_blocking/yaml/fast_slow_driver.py deleted file mode 100644 index 617a8cc2a2205..0000000000000 --- a/tests/integration/gateway_non_blocking/yaml/fast_slow_driver.py +++ /dev/null @@ -1,23 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Optional - -from jina.drivers import FlatRecursiveMixin, BaseExecutableDriver -import time - -if False: - from jina import DocumentArray - - -class FastSlowDriver(FlatRecursiveMixin, BaseExecutableDriver): - def __init__( - self, executor: Optional[str] = None, method: str = 'craft', *args, **kwargs - ): - super().__init__(executor, method, *args, **kwargs) - - def _apply_all(self, docs: 'DocumentArray', *args, **kwargs): - if docs: - assert len(docs) == 1 - if docs[0].text == 'slow': - time.sleep(2) diff --git a/tests/integration/high_order_matches/__init__.py b/tests/integration/high_order_matches/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/high_order_matches/test-adjacency-integrated.yml b/tests/integration/high_order_matches/test-adjacency-integrated.yml deleted file mode 100644 index 831b110558528..0000000000000 --- a/tests/integration/high_order_matches/test-adjacency-integrated.yml +++ /dev/null @@ -1,30 +0,0 @@ -!NumpyIndexer -with: - index_filename: tmp2 - metric: euclidean -metas: - name: test2 - workspace: $JINA_TEST_HIGH_ORDER_MATCHES -requests: - on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !VectorSearchDriver - with: - top_k: 5 - priority: 4 - fill_embedding: true - - !VectorSearchDriver - with: - top_k: 5 - priority: 4 - fill_embedding: true - traversal_paths: ['m'] - IndexRequest: - - !ExcludeQL - with: - fields: - - buffer - - chunks - - !VectorIndexDriver {} diff --git a/tests/integration/high_order_matches/test-adjacency.yml b/tests/integration/high_order_matches/test-adjacency.yml deleted file mode 100644 index af989abaac804..0000000000000 --- a/tests/integration/high_order_matches/test-adjacency.yml +++ /dev/null @@ -1,39 +0,0 @@ -!NumpyIndexer -with: - index_filename: tmp2 - metric: euclidean -metas: - name: test2 - workspace: $JINA_TEST_HIGH_ORDER_MATCHES -requests: - on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !VectorSearchDriver {} # first match on the doc - - !SliceQL - with: - start: 0 - end: 5 - traversal_paths: ['m'] - - !VectorFillDriver - with: - traversal_paths: ['r', 'm'] - - !VectorSearchDriver # then go for the first-degree matches - with: - traversal_paths: ['m'] - - !SliceQL - with: - start: 0 - end: 5 - traversal_paths: ['m', 'mm'] - - !VectorFillDriver - with: - traversal_paths: ['r', 'm', 'mm', 'mmm'] - IndexRequest: - - !ExcludeQL - with: - fields: - - buffer - - chunks - - !VectorIndexDriver {} diff --git a/tests/integration/high_order_matches/test_adjacency.py b/tests/integration/high_order_matches/test_adjacency.py deleted file mode 100644 index 28edb496f9951..0000000000000 --- a/tests/integration/high_order_matches/test_adjacency.py +++ /dev/null @@ -1,56 +0,0 @@ -import os -import shutil - -import pytest - -from jina.flow import Flow -from tests import random_docs, validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def validate(req): - assert len(req.docs) == 1 - assert len(req.docs[0].matches) == 5 - assert len(req.docs[0].matches) == 5 - assert len(req.docs[0].matches[0].matches) == 5 - assert len(req.docs[0].matches[-1].matches) == 5 - assert len(req.docs[0].matches[0].matches[0].matches) == 0 - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_TEST_HIGH_ORDER_MATCHES'] = str(tmpdir) - yield - del os.environ['JINA_TEST_HIGH_ORDER_MATCHES'] - - -def test_high_order_matches(mocker, config): - response_mock = mocker.Mock() - - f = Flow().add(uses=os.path.join(cur_dir, 'test-adjacency.yml')) - - with f: - f.index(random_docs(100, chunks_per_doc=0, embed_dim=2)) - - with f: - f.search(random_docs(1, chunks_per_doc=0, embed_dim=2), on_done=response_mock) - validate_callback(response_mock, validate) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_high_order_matches_integrated(mocker, restful, config): - - response_mock = mocker.Mock() - # this is equivalent to the last test but with simplified YAML spec. - f = Flow(restful=restful).add( - uses=os.path.join(cur_dir, 'test-adjacency-integrated.yml') - ) - - with f: - f.index(random_docs(100, chunks_per_doc=0, embed_dim=2)) - - with f: - f.search(random_docs(1, chunks_per_doc=0, embed_dim=2), on_done=response_mock) - - validate_callback(response_mock, validate) diff --git a/tests/integration/high_order_matches/test_document.py b/tests/integration/high_order_matches/test_document.py new file mode 100644 index 0000000000000..ae3d03600c642 --- /dev/null +++ b/tests/integration/high_order_matches/test_document.py @@ -0,0 +1,46 @@ +import os +import shutil + +import pytest + +from jina.flow import Flow +from jina import Document, Executor, requests +from tests import validate_callback + + +def validate(req): + assert len(req.docs) == 1 + assert len(req.docs[0].matches) == 5 + assert len(req.docs[0].matches[0].matches) == 5 + assert len(req.docs[0].matches[-1].matches) == 5 + assert len(req.docs[0].matches[0].matches[0].matches) == 0 + + +class MyExecutor(Executor): + def __init__(self, **kwargs): + super().__init__(**kwargs) + self.doc = self.requests + + @requests(on='index') + def index(self, docs, **kwargs): + self.doc = docs[0] + for match in self.doc.matches: + for i in range(5): + match.matches.append(Document()) + + return None + + +def test_high_order_matches(mocker): + response_mock = mocker.Mock() + + f = Flow().add(uses=MyExecutor) + + with f: + f.post( + on='index', + inputs=Document(matches=[Document() for i in range(5)]), + on_done=response_mock, + ) + + validate_callback(response_mock, validate) diff --git a/tests/integration/hub_usage/dummyhub/__init__.py b/tests/integration/hub_usage/dummyhub/__init__.py index ab12991e955b9..2b3b7c9952081 100644 --- a/tests/integration/hub_usage/dummyhub/__init__.py +++ b/tests/integration/hub_usage/dummyhub/__init__.py @@ -1,9 +1,9 @@ -from jina.executors.crafters import BaseCrafter +from jina import Executor from .helper import foo -class DummyHubExecutor(BaseCrafter): +class DummyHubExecutor(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) diff --git a/tests/integration/hub_usage/dummyhub/config.yml b/tests/integration/hub_usage/dummyhub/config.yml index 13ca05eba9ed7..18e9cfc69b201 100644 --- a/tests/integration/hub_usage/dummyhub/config.yml +++ b/tests/integration/hub_usage/dummyhub/config.yml @@ -1,7 +1,6 @@ !DummyHubExecutor -with: - {} metas: py_modules: # - you can put more dependencies here - - __init__.py \ No newline at end of file + - __init__.py + - helper.py \ No newline at end of file diff --git a/tests/integration/hub_usage/dummyhub_abs/__init__.py b/tests/integration/hub_usage/dummyhub_abs/__init__.py index 413cb041ad897..33200b93066b6 100644 --- a/tests/integration/hub_usage/dummyhub_abs/__init__.py +++ b/tests/integration/hub_usage/dummyhub_abs/__init__.py @@ -1,9 +1,8 @@ from helper import foo +from jina import Executor -from jina.executors.crafters import BaseCrafter - -class DummyHubExecutorAbs(BaseCrafter): +class DummyHubExecutorAbs(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) diff --git a/tests/integration/hub_usage/dummyhub_pretrained/__init__.py b/tests/integration/hub_usage/dummyhub_pretrained/__init__.py index 270ca7be0f78c..7f40faddfdffd 100644 --- a/tests/integration/hub_usage/dummyhub_pretrained/__init__.py +++ b/tests/integration/hub_usage/dummyhub_pretrained/__init__.py @@ -1,14 +1,12 @@ from typing import Dict +from jina import Executor from jina.excepts import ModelCheckpointNotExist -from jina.executors.crafters import BaseCrafter -class DummyPretrainedExecutor(BaseCrafter): +class DummyPretrainedExecutor(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) - - def post_init(self): raise ModelCheckpointNotExist def craft(self, *args, **kwargs) -> Dict: diff --git a/tests/integration/hub_usage/dummyhub_slow/__init__.py b/tests/integration/hub_usage/dummyhub_slow/__init__.py index dc5811dbddc85..06bc6ef8d9565 100644 --- a/tests/integration/hub_usage/dummyhub_slow/__init__.py +++ b/tests/integration/hub_usage/dummyhub_slow/__init__.py @@ -1,10 +1,10 @@ import time -from jina.executors.crafters import BaseCrafter +from jina import Executor from .helper import foo -class DummyHubExecutorSlow(BaseCrafter): +class DummyHubExecutorSlow(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) time.sleep(15) diff --git a/tests/integration/hub_usage/hub-mwu/mwu_encoder.py b/tests/integration/hub_usage/hub-mwu/mwu_encoder.py index ff74b8d7897c3..7146535ce7ab3 100644 --- a/tests/integration/hub_usage/hub-mwu/mwu_encoder.py +++ b/tests/integration/hub_usage/hub-mwu/mwu_encoder.py @@ -1,16 +1,13 @@ from typing import Any -import numpy as np +from jina import Executor, requests -from jina.executors.encoders import BaseEncoder - -class MWUEncoder(BaseEncoder): +class MWUEncoder(Executor): def __init__(self, greetings: str, *args, **kwargs): super().__init__(*args, **kwargs) self._greetings = greetings - self.logger.success(f'look at me! {greetings}') - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.logger.info(f'{self._greetings} {content}') - return np.random.random([content.shape[0], 3]) + @requests + def encode(self, **kwargs) -> Any: + pass diff --git a/tests/integration/hub_usage/test_hub_usage.py b/tests/integration/hub_usage/test_hub_usage.py index 7e93212705fc7..9067f74c4fb7b 100644 --- a/tests/integration/hub_usage/test_hub_usage.py +++ b/tests/integration/hub_usage/test_hub_usage.py @@ -279,7 +279,7 @@ def test_hub_build_level_pass(monkeypatch, test_workspace, docker_image): _, failed_levels = HubIO(args)._test_build( docker_image, BuildTestLevel.EXECUTOR, - os.path.join(cur_dir, 'yaml/test-joint.yml'), + os.path.join(cur_dir, 'yaml/test-executor.yml'), 60000, True, JinaLogger('unittest'), @@ -297,7 +297,7 @@ def test_hub_build_level_fail(monkeypatch, test_workspace, docker_image): _, failed_levels = HubIO(args)._test_build( docker_image, BuildTestLevel.FLOW, - os.path.join(cur_dir, 'yaml/test-joint.yml'), + os.path.join(cur_dir, 'yaml/test-executor.yml'), 60000, True, JinaLogger('unittest'), diff --git a/tests/integration/hub_usage/yaml/test-executor.yml b/tests/integration/hub_usage/yaml/test-executor.yml new file mode 100644 index 0000000000000..4e1c0ac74251b --- /dev/null +++ b/tests/integration/hub_usage/yaml/test-executor.yml @@ -0,0 +1,4 @@ +jtype: TestExecutor +metas: + py_modules: + - test_executor.py diff --git a/tests/integration/hub_usage/yaml/test-joint.yml b/tests/integration/hub_usage/yaml/test-joint.yml deleted file mode 100644 index bb993daa9353f..0000000000000 --- a/tests/integration/hub_usage/yaml/test-joint.yml +++ /dev/null @@ -1,16 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - metric: euclidean - index_filename: vec.gz - metas: - name: vecidx # a customized name - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx -metas: - name: chunk_compound_indexer - workspace: $JINA_TEST_JOINT diff --git a/tests/integration/hub_usage/yaml/test_executor.py b/tests/integration/hub_usage/yaml/test_executor.py new file mode 100644 index 0000000000000..7d486464892f1 --- /dev/null +++ b/tests/integration/hub_usage/yaml/test_executor.py @@ -0,0 +1,5 @@ +from jina import Executor + + +class TestExecutor(Executor): + pass diff --git a/tests/integration/incremental_indexing/__init__.py b/tests/integration/incremental_indexing/__init__.py deleted file mode 100644 index 4cd4acf9ba20e..0000000000000 --- a/tests/integration/incremental_indexing/__init__.py +++ /dev/null @@ -1,29 +0,0 @@ -import os - -import numpy as np -import pytest - -from jina import Document - - -@pytest.fixture(scope='function') -def random_workspace(tmp_path): - os.environ['JINA_TEST_INCREMENTAL_INDEX_WORKSPACE'] = str(tmp_path) - yield tmp_path - del os.environ['JINA_TEST_INCREMENTAL_INDEX_WORKSPACE'] - - -def get_duplicate_docs(num_docs=10, same_content=False): - result = [] - for idx in range(num_docs): - with Document() as doc: - content = int(idx / 2) - if same_content: - doc.embedding = np.array([0]) - doc.text = f'I am doc' - else: - doc.embedding = np.array([content]) - doc.text = f'I am doc{content}' - result.append(doc) - num_uniques = len(set(d.id for d in result)) - return result, num_uniques diff --git a/tests/integration/incremental_indexing/_unique_doc.yml b/tests/integration/incremental_indexing/_unique_doc.yml deleted file mode 100644 index c6de4a557696f..0000000000000 --- a/tests/integration/incremental_indexing/_unique_doc.yml +++ /dev/null @@ -1,18 +0,0 @@ -!DocCache -with: - index_path: cache.tmp -metas: - name: cache-doc - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE/doc_cache -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, ControlRequest]: - - !RouteDriver {} - IndexRequest: - - !TaggingCacheDriver - with: - tags: - is_indexed: true - - !FilterQL - with: - lookups: {tags__is_indexed__neq: true} \ No newline at end of file diff --git a/tests/integration/incremental_indexing/_unique_vec.yml b/tests/integration/incremental_indexing/_unique_vec.yml deleted file mode 100644 index 059932c0717d6..0000000000000 --- a/tests/integration/incremental_indexing/_unique_vec.yml +++ /dev/null @@ -1,18 +0,0 @@ -!DocCache -with: - index_path: cache.tmp -metas: - name: cache-vec - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE/vec_cache -requests: - on: - [SearchRequest, TrainRequest, IndexRequest, ControlRequest]: - - !RouteDriver {} - IndexRequest: - - !TaggingCacheDriver - with: - tags: - is_indexed: true - - !FilterQL - with: - lookups: {tags__is_indexed__neq: true} \ No newline at end of file diff --git a/tests/integration/incremental_indexing/docindexer.yml b/tests/integration/incremental_indexing/docindexer.yml deleted file mode 100644 index 00040aac2f7fc..0000000000000 --- a/tests/integration/incremental_indexing/docindexer.yml +++ /dev/null @@ -1,6 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gz -metas: - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE - name: doc_idx diff --git a/tests/integration/incremental_indexing/test_incremental_indexing.py b/tests/integration/incremental_indexing/test_incremental_indexing.py deleted file mode 100644 index fa8784b5f51fe..0000000000000 --- a/tests/integration/incremental_indexing/test_incremental_indexing.py +++ /dev/null @@ -1,255 +0,0 @@ -import os - -import pytest - -from jina.clients import Client -from jina.executors import BaseExecutor -from jina.executors.indexers.keyvalue import BinaryPbIndexer -from jina.executors.indexers.vector import NumpyIndexer -from jina.flow import Flow -from tests.integration.incremental_indexing import random_workspace, get_duplicate_docs - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - -# don't remove this line, otherwise auto-code-format will remove `random_workspace` -print(random_workspace) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_incremental_indexing_sequential_indexers(random_workspace, restful): - total_docs = 20 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - f = ( - Flow(restful=restful) - .add(uses=os.path.join(cur_dir, 'uniq_vectorindexer.yml')) - .add(uses=os.path.join(cur_dir, 'uniq_docindexer.yml')) - ) - - Client.check_input(duplicate_docs[:10]) - Client.check_input(duplicate_docs) - - with f: - f.index(duplicate_docs[:10]) - - with f: - f.index(duplicate_docs) - - print(f' random_workspace {random_workspace}') - - with BaseExecutor.load( - random_workspace / 'inc_vecindexer' / 'vec_idx-0' / 'vec_idx.bin' - ) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - assert vector_indexer._size == num_uniq_docs - - with BaseExecutor.load( - random_workspace / 'inc_docindexer' / 'doc_idx-0' / 'doc_idx.bin' - ) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - assert doc_indexer._size == num_uniq_docs - - -@pytest.mark.parametrize('restful', [False, True]) -def test_incremental_indexing_sequential_indexers_content_hash_same_content( - random_workspace, restful -): - total_docs = 20 - duplicate_docs, _ = get_duplicate_docs(num_docs=total_docs, same_content=True) - # because they all have the same content - num_uniq_docs = 1 - - f = ( - Flow(restful=restful) - .add(uses=os.path.join(cur_dir, 'uniq_vectorindexer_content_hash.yml')) - .add(uses=os.path.join(cur_dir, 'uniq_docindexer_content_hash.yml')) - ) - - Client.check_input(duplicate_docs[:10]) - Client.check_input(duplicate_docs) - - with f: - f.index(duplicate_docs[:10]) - - with f: - f.index(duplicate_docs) - - with BaseExecutor.load( - random_workspace / 'inc_vecindexer' / 'vec_idx-0' / 'vec_idx.bin' - ) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - assert vector_indexer._size == num_uniq_docs - - with BaseExecutor.load( - random_workspace / 'inc_docindexer' / 'doc_idx-0' / 'doc_idx.bin' - ) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - assert doc_indexer._size == num_uniq_docs - - -@pytest.mark.parametrize('restful', [False, True]) -def test_incremental_indexing_sequential_indexers_content_hash( - random_workspace, restful -): - total_docs = 20 - duplicate_docs, _ = get_duplicate_docs(num_docs=total_docs, same_content=False) - # because the content is % 2 - num_uniq_docs = 10 - - f = ( - Flow(restful=restful) - .add(uses=os.path.join(cur_dir, 'uniq_vectorindexer_content_hash.yml')) - .add(uses=os.path.join(cur_dir, 'uniq_docindexer_content_hash.yml')) - ) - - Client.check_input(duplicate_docs[:10]) - Client.check_input(duplicate_docs) - - with f: - f.index(duplicate_docs[:10]) - - with f: - f.index(duplicate_docs) - - with BaseExecutor.load( - random_workspace / 'inc_vecindexer' / 'vec_idx-0' / 'vec_idx.bin' - ) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - assert vector_indexer._size == num_uniq_docs - - with BaseExecutor.load( - random_workspace / 'inc_docindexer' / 'doc_idx-0' / 'doc_idx.bin' - ) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - assert doc_indexer._size == num_uniq_docs - - -# TODO(Deepankar): Gets stuck when `restful: True` - issues with `needs='gateway'` -@pytest.mark.parametrize('restful', [False]) -def test_incremental_indexing_parallel_indexers(random_workspace, restful): - total_docs = 1000 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - f = ( - Flow(restful=restful) - .add(uses=os.path.join(cur_dir, 'uniq_vectorindexer.yml'), name='inc_vec') - .add( - uses=os.path.join(cur_dir, 'uniq_docindexer.yml'), - name='inc_doc', - needs=['gateway'], - ) - .add(needs=['inc_vec', 'inc_doc']) - ) - with f: - f.index(duplicate_docs[:500]) - - with f: - f.index(duplicate_docs) - - with BaseExecutor.load( - (random_workspace / 'inc_vecindexer' / 'vec_idx-0' / 'vec_idx.bin') - ) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - assert vector_indexer._size == num_uniq_docs - - with BaseExecutor.load( - (random_workspace / 'inc_docindexer' / 'doc_idx-0' / 'doc_idx.bin') - ) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - assert doc_indexer._size == num_uniq_docs - - -@pytest.mark.parametrize('restful', [False, True]) -def test_incremental_indexing_sequential_indexers_with_shards( - random_workspace, restful -): - total_docs = 1000 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - num_shards = 4 - # can't use plain _unique in uses_before because workspace will conflict with other - f = ( - Flow(restful=restful) - .add( - uses=os.path.join(cur_dir, 'vectorindexer.yml'), - uses_before=os.path.join(cur_dir, '_unique_vec.yml'), - shards=num_shards, - ) - .add( - uses=os.path.join(cur_dir, 'docindexer.yml'), - uses_before=os.path.join(cur_dir, '_unique_doc.yml'), - shards=num_shards, - ) - ) - - with f: - f.index(duplicate_docs[:500]) - - with f: - f.index(duplicate_docs) - - vect_idx_size = 0 - for shard_idx in range(num_shards): - save_abspath = random_workspace / f'vec_idx-{shard_idx}' / 'vec_idx.bin' - with BaseExecutor.load(save_abspath) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - vect_idx_size += vector_indexer._size - assert vect_idx_size == num_uniq_docs - - doc_idx_size = 0 - for shard_idx in range(num_shards): - save_abspath = random_workspace / f'doc_idx-{shard_idx}' / 'doc_idx.bin' - with BaseExecutor.load(save_abspath) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - doc_idx_size += doc_indexer._size - assert doc_idx_size == num_uniq_docs - - -# TODO(Deepankar): Gets stuck when `restful: True` - issues with `needs='gateway'` -@pytest.mark.parametrize('restful', [False]) -def test_incremental_indexing_parallel_indexers_with_shards(random_workspace, restful): - total_docs = 1000 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - num_shards = 4 - - # can't use plain _unique in uses_before because workspace will conflict with other - f = ( - Flow(restful=restful) - .add( - uses=os.path.join(cur_dir, 'vectorindexer.yml'), - uses_before=os.path.join(cur_dir, '_unique_vec.yml'), - shards=num_shards, - name='inc_vec', - ) - .add( - uses=os.path.join(cur_dir, 'docindexer.yml'), - uses_before=os.path.join(cur_dir, '_unique_doc.yml'), - shards=num_shards, - name='inc_doc', - needs=['gateway'], - ) - .add(needs=['inc_vec', 'inc_doc']) - ) - - with f: - f.index(duplicate_docs[:500]) - - with f: - f.index(duplicate_docs) - - vect_idx_size = 0 - for shard_idx in range(num_shards): - save_abspath = random_workspace / f'vec_idx-{shard_idx}' / 'vec_idx.bin' - with BaseExecutor.load(save_abspath) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - vect_idx_size += vector_indexer._size - assert vect_idx_size == num_uniq_docs - - doc_idx_size = 0 - for shard_idx in range(num_shards): - save_abspath = random_workspace / f'doc_idx-{shard_idx}' / 'doc_idx.bin' - with BaseExecutor.load(save_abspath) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - doc_idx_size += doc_indexer._size - assert doc_idx_size == num_uniq_docs diff --git a/tests/integration/incremental_indexing/test_unique_indexing.py b/tests/integration/incremental_indexing/test_unique_indexing.py deleted file mode 100644 index e11ec0f89ae1f..0000000000000 --- a/tests/integration/incremental_indexing/test_unique_indexing.py +++ /dev/null @@ -1,94 +0,0 @@ -import os - -import pytest - -from jina.executors import BaseExecutor -from jina.executors.indexers.keyvalue import BinaryPbIndexer -from jina.executors.indexers.vector import NumpyIndexer -from jina.flow import Flow -from tests.integration.incremental_indexing import get_duplicate_docs, random_workspace - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - -# don't remove this line, otherwise auto-code-format will remove `random_workspace` -print(random_workspace) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_unique_indexing_vecindexers(random_workspace, restful): - total_docs = 10 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - f = Flow(restful=restful).add( - uses=os.path.join(cur_dir, 'uniq_vectorindexer.yml'), name='vec_idx' - ) - - with f: - f.index(duplicate_docs) - - with BaseExecutor.load( - (random_workspace / 'inc_vecindexer' / 'vec_idx-0' / 'vec_idx.bin') - ) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - assert vector_indexer.size == num_uniq_docs - - -@pytest.mark.parametrize('restful', [False, True]) -def test_unique_indexing_docindexers(random_workspace, restful): - total_docs = 10 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - f = Flow(restful=restful).add( - uses=os.path.join(cur_dir, 'uniq_docindexer.yml'), shards=1 - ) - - with f: - f.index(duplicate_docs) - - with BaseExecutor.load( - (random_workspace / 'inc_docindexer' / 'doc_idx-0' / 'doc_idx.bin') - ) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - assert doc_indexer.size == num_uniq_docs - - -@pytest.mark.parametrize('restful', [False, True]) -def test_unique_indexing_vecindexers_before(random_workspace, restful): - total_docs = 10 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - # can't use plain _unique because workspace will conflict with other tests - f = Flow(restful=restful).add( - uses=os.path.join(cur_dir, 'vectorindexer.yml'), - uses_before=os.path.join(cur_dir, '_unique_vec.yml'), - ) - - with f: - f.index(duplicate_docs) - - with BaseExecutor.load( - (random_workspace / 'vec_idx-0' / 'vec_idx.bin') - ) as vector_indexer: - assert isinstance(vector_indexer, NumpyIndexer) - assert vector_indexer.size == num_uniq_docs - - -@pytest.mark.parametrize('restful', [False, True]) -def test_unique_indexing_docindexers_before(random_workspace, restful): - total_docs = 10 - duplicate_docs, num_uniq_docs = get_duplicate_docs(num_docs=total_docs) - - # can't use plain _unique because workspace will conflict with other tests - f = Flow(restful=restful).add( - uses=os.path.join(cur_dir, 'docindexer.yml'), - uses_before=os.path.join(cur_dir, '_unique_doc.yml'), - ) - - with f: - f.index(duplicate_docs) - - with BaseExecutor.load( - (random_workspace / 'doc_idx-0' / 'doc_idx.bin') - ) as doc_indexer: - assert isinstance(doc_indexer, BinaryPbIndexer) - assert doc_indexer.size == num_uniq_docs diff --git a/tests/integration/incremental_indexing/uniq_docindexer.yml b/tests/integration/incremental_indexing/uniq_docindexer.yml deleted file mode 100644 index 5fbf32b38e80f..0000000000000 --- a/tests/integration/incremental_indexing/uniq_docindexer.yml +++ /dev/null @@ -1,13 +0,0 @@ -!UniquePbIndexer -components: - - !DocCache - metas: - name: cache - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: doc_idx -metas: - name: inc_docindexer - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE diff --git a/tests/integration/incremental_indexing/uniq_docindexer_content_hash.yml b/tests/integration/incremental_indexing/uniq_docindexer_content_hash.yml deleted file mode 100644 index cb4fc180641c9..0000000000000 --- a/tests/integration/incremental_indexing/uniq_docindexer_content_hash.yml +++ /dev/null @@ -1,15 +0,0 @@ -!UniquePbIndexer -components: - - !DocCache - with: - fields: [content_hash] - metas: - name: cache - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: doc_idx -metas: - name: inc_docindexer - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE diff --git a/tests/integration/incremental_indexing/uniq_vectorindexer.yml b/tests/integration/incremental_indexing/uniq_vectorindexer.yml deleted file mode 100644 index 0a6afc5adf520..0000000000000 --- a/tests/integration/incremental_indexing/uniq_vectorindexer.yml +++ /dev/null @@ -1,14 +0,0 @@ -!UniqueVectorIndexer -components: - - !DocCache - metas: - name: cache - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vec_idx -metas: - name: inc_vecindexer - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE diff --git a/tests/integration/incremental_indexing/uniq_vectorindexer_content_hash.yml b/tests/integration/incremental_indexing/uniq_vectorindexer_content_hash.yml deleted file mode 100644 index 978090c82316c..0000000000000 --- a/tests/integration/incremental_indexing/uniq_vectorindexer_content_hash.yml +++ /dev/null @@ -1,16 +0,0 @@ -!UniqueVectorIndexer -components: - - !DocCache - with: - fields: [content_hash] - metas: - name: cache - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vec_idx -metas: - name: inc_vecindexer - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE diff --git a/tests/integration/incremental_indexing/vectorindexer.yml b/tests/integration/incremental_indexing/vectorindexer.yml deleted file mode 100644 index fa03573875886..0000000000000 --- a/tests/integration/incremental_indexing/vectorindexer.yml +++ /dev/null @@ -1,7 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz - metric: euclidean -metas: - workspace: $JINA_TEST_INCREMENTAL_INDEX_WORKSPACE - name: vec_idx diff --git a/jina/flow/mixin/__init__.py b/tests/integration/inspect_pods_flow/__init__.py similarity index 100% rename from jina/flow/mixin/__init__.py rename to tests/integration/inspect_pods_flow/__init__.py diff --git a/tests/integration/inspect_pods_flow/test_inspect_pods_flow.py b/tests/integration/inspect_pods_flow/test_inspect_pods_flow.py new file mode 100644 index 0000000000000..147af2d601c23 --- /dev/null +++ b/tests/integration/inspect_pods_flow/test_inspect_pods_flow.py @@ -0,0 +1,179 @@ +import os + +import pytest + +from jina.types.score import NamedScore +from jina import Flow, Executor, DocumentArray, requests +from tests import random_docs, validate_callback + + +class DummyEvaluator1(Executor): + tag = 1 + + @requests(on=['/index']) + def craft(self, docs, *args, **kwargs): + tmp_dir = os.environ.get('TEST_EVAL_FLOW_TMPDIR') + with open(f'{tmp_dir}/{self.tag}.txt', 'a') as fp: + fp.write(f'{docs[0].id}') + return None + + +class DummyEvaluator2(DummyEvaluator1): + tag = 2 + + +class DummyEvaluator3(DummyEvaluator1): + tag = 3 + + +docs = DocumentArray([x for x in random_docs(1)]) +params = ['HANG', 'REMOVE', 'COLLECT'] + + +def validate(ids, expect): + assert len(ids) > 0 + for j in ids: + tmp_dir = os.environ.get('TEST_EVAL_FLOW_TMPDIR') + fname = f'{tmp_dir}/{j}.txt' + assert os.path.exists(fname) == expect + if expect: + with open(fname) as fp: + assert fp.read() != '' + + +@pytest.fixture +def temp_folder(tmpdir): + os.environ['TEST_EVAL_FLOW_TMPDIR'] = str(tmpdir) + yield + del os.environ['TEST_EVAL_FLOW_TMPDIR'] + + +@pytest.mark.parametrize('inspect', params) +@pytest.mark.parametrize('restful', [False, True]) +def test_flow1(inspect, restful, temp_folder): + f = Flow(restful=restful, inspect=inspect).add( + uses=DummyEvaluator1, + env={'TEST_EVAL_FLOW_TMPDIR': os.environ.get('TEST_EVAL_FLOW_TMPDIR')}, + ) + + with f: + f.post(on='/index', inputs=docs) + + +@pytest.mark.parametrize('inspect', params) +@pytest.mark.parametrize('restful', [False, True]) +def test_flow2(inspect, restful, temp_folder): + f = ( + Flow(restful=restful, inspect=inspect) + .add() + .inspect( + uses=DummyEvaluator1, + env={'TEST_EVAL_FLOW_TMPDIR': os.environ.get('TEST_EVAL_FLOW_TMPDIR')}, + ) + ) + + with f: + f.index(docs) + + validate([1], expect=f.args.inspect.is_keep) + + +@pytest.mark.parametrize('inspect', params) +@pytest.mark.parametrize('restful', [False]) +def test_flow3(inspect, restful, temp_folder): + env = {'TEST_EVAL_FLOW_TMPDIR': os.environ.get('TEST_EVAL_FLOW_TMPDIR')} + + f = ( + Flow(restful=restful, inspect=inspect) + .add(name='p1') + .inspect(uses='DummyEvaluator1', env=env) + .add(name='p2', needs='gateway') + .needs(['p1', 'p2']) + .inspect(uses='DummyEvaluator2', env=env) + ) + + with f: + f.index(docs) + + validate([1, 2], expect=f.args.inspect.is_keep) + + +@pytest.mark.parametrize('inspect', params) +@pytest.mark.parametrize('restful', [False, True]) +def test_flow4(inspect, restful, temp_folder): + env = {'TEST_EVAL_FLOW_TMPDIR': os.environ.get('TEST_EVAL_FLOW_TMPDIR')} + + f = ( + Flow(restful=restful, inspect=inspect) + .add() + .inspect(uses='DummyEvaluator1', env=env) + .add() + .inspect(uses='DummyEvaluator2', env=env) + .add() + .inspect(uses='DummyEvaluator3', env=env) + .plot(build=True) + ) + + with f: + f.index(docs) + + validate([1, 2, 3], expect=f.args.inspect.is_keep) + + +class AddEvaluationExecutor(Executor): + @requests + def transform(self, docs, *args, **kwargs): + import time + + time.sleep(0.5) + for doc in docs: + eval = doc.evaluations.add() + eval.value = 10.0 + + +@pytest.mark.repeat(5) +@pytest.mark.parametrize('restful', [False, True]) +def test_flow_returned_collect(restful, mocker): + # TODO(Joan): This test passes because we pass the `SlowExecutor` but I do not know how to make the `COLLECT` pod + # use an specific executor. + + def validate_func(resp): + for doc in resp.data.docs: + assert len(doc.evaluations) == 1 + + f = ( + Flow(restful=restful, inspect='COLLECT') + .add() + .inspect( + uses=AddEvaluationExecutor, + ) + ) + + mock = mocker.Mock() + with f: + f.index(inputs=docs, on_done=mock) + + validate_callback(mock, validate_func) + + +@pytest.mark.repeat(5) +@pytest.mark.parametrize('inspect', ['HANG', 'REMOVE']) +@pytest.mark.parametrize('restful', [False, True]) +def test_flow_not_returned(inspect, restful, mocker): + def validate_func(resp): + for doc in resp.data.docs: + assert len(doc.evaluations) == 0 + + f = ( + Flow(restful=restful, inspect=inspect) + .add() + .inspect( + uses=AddEvaluationExecutor, + ) + ) + + mock = mocker.Mock() + with f: + f.index(inputs=docs, on_done=mock) + + validate_callback(mock, validate_func) diff --git a/tests/integration/issues/github_1072/__init__.py b/tests/integration/issues/github_1072/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_1072/test_queryset.py b/tests/integration/issues/github_1072/test_queryset.py deleted file mode 100644 index cdcdaff015bb0..0000000000000 --- a/tests/integration/issues/github_1072/test_queryset.py +++ /dev/null @@ -1,64 +0,0 @@ -import os - -import numpy as np -import pytest - -from jina import QueryLang -from jina.flow import Flow -from jina.proto import jina_pb2 -from jina.types.ndarray.generic import NdArray - -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture(scope='function') -def random_workspace(tmpdir): - os.environ['JINA_TEST_QUERYSET_WORKSPACE'] = str(tmpdir) - yield tmpdir - del os.environ['JINA_TEST_QUERYSET_WORKSPACE'] - - -def test_queryset_with_struct(random_workspace, mocker): - total_docs = 4 - docs = [] - for doc_id in range(total_docs): - doc = jina_pb2.DocumentProto() - doc.text = f'I am doc{doc_id}' - NdArray(doc.embedding).value = np.array([doc_id]) - doc.tags['label'] = f'label{doc_id % 2 + 1}' - docs.append(doc) - - f = Flow().add( - uses='- !FilterQL | {lookups: {tags__label__in: [label1, label2]}, traversal_paths: [r]}' - ) - - def validate_all_docs(resp): - assert len(resp.docs) == total_docs - - def validate_label2_docs(resp): - assert len(resp.docs) == total_docs / 2 - - mock1 = mocker.Mock() - mock2 = mocker.Mock() - with f: - # keep all the docs - f.index(docs, on_done=mock1) - # keep only the docs with label2 - qs = QueryLang( - { - 'name': 'FilterQL', - 'priority': 1, - 'parameters': { - 'lookups': {'tags__label': 'label2'}, - 'traversal_paths': ['r'], - }, - } - ) - f.index(docs, queryset=qs, on_done=mock2) - - mock1.assert_called_once() - validate_callback(mock1, validate_all_docs) - mock2.assert_called_once() - validate_callback(mock2, validate_label2_docs) diff --git a/tests/integration/issues/github_1229/__init__.py b/tests/integration/issues/github_1229/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_1229/test_sharding_empty_index.py b/tests/integration/issues/github_1229/test_sharding_empty_index.py deleted file mode 100644 index 39f716403a4a1..0000000000000 --- a/tests/integration/issues/github_1229/test_sharding_empty_index.py +++ /dev/null @@ -1,69 +0,0 @@ -import os -import numpy as np -import pytest - -from jina import Flow, Document - -from tests import validate_callback - -callback_was_called = False - - -def get_index_flow(): - num_shards = 2 - f = Flow().add(uses='vectorindexer.yml', shards=num_shards) - return f - - -def get_search_flow(): - num_shards = 2 - f = Flow(read_only=True).add( - uses='vectorindexer.yml', - shards=num_shards, - uses_after='_merge_matches', - polling='all', - timeout_ready='-1', - ) - return f - - -# required because we don't know the order of the pod returning -# and, when the test failed, we still some time didn't see the error -@pytest.mark.parametrize('execution_number', range(10)) -def test_sharding_empty_index(tmpdir, execution_number, mocker): - os.environ['JINA_TEST_1229_WORKSPACE'] = os.path.abspath(tmpdir) - - f = get_index_flow() - - num_docs = 1 - data = [] - for i in range(num_docs): - with Document() as doc: - doc.content = f'data {i}' - doc.embedding = np.array([i]) - data.append(doc) - - with f: - f.index(data) - - f = get_search_flow() - - num_query = 10 - query = [] - for i in range(num_query): - with Document() as doc: - doc.content = f'query {i}' - doc.embedding = np.array([i]) - query.append(doc) - - def callback(result): - assert len(result.docs) == num_query - for d in result.docs: - assert len(list(d.matches)) == num_docs - - mock = mocker.Mock() - with f: - f.search(query, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, callback) diff --git a/tests/integration/issues/github_1229/vectorindexer.yml b/tests/integration/issues/github_1229/vectorindexer.yml deleted file mode 100644 index f32a480d493f4..0000000000000 --- a/tests/integration/issues/github_1229/vectorindexer.yml +++ /dev/null @@ -1,7 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz - metric: euclidean -metas: - workspace: $JINA_TEST_1229_WORKSPACE - name: vec_idx diff --git a/tests/integration/issues/github_1468/__init__.py b/tests/integration/issues/github_1468/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_1468/flow.yml b/tests/integration/issues/github_1468/flow.yml deleted file mode 100644 index bc969d27c190e..0000000000000 --- a/tests/integration/issues/github_1468/flow.yml +++ /dev/null @@ -1,18 +0,0 @@ -!Flow -version: '1' -with: - prefetch: 10 - read_only: true # better add this in the query time -pods: - - name: filter_chunk_1 - uses: pods/filter_chunk_1.yml - read_only: true - - name: filter_chunk_2 - uses: pods/filter_chunk_1.yml - read_only: true - needs: [gateway] - - name: merge_route_process - uses: _forward - parallel: 2 - uses_before: _merge_chunks - needs: [filter_chunk_1, filter_chunk_2] diff --git a/tests/integration/issues/github_1468/pods/filter_chunk_1.yml b/tests/integration/issues/github_1468/pods/filter_chunk_1.yml deleted file mode 100644 index c7b678766dc13..0000000000000 --- a/tests/integration/issues/github_1468/pods/filter_chunk_1.yml +++ /dev/null @@ -1,8 +0,0 @@ -!BaseExecutor -requests: - on: - SearchRequest: - - !FilterQL - with: - traversal_paths: ['c'] - lookups: {'modality': '1'} diff --git a/tests/integration/issues/github_1468/pods/filter_chunk_2.yml b/tests/integration/issues/github_1468/pods/filter_chunk_2.yml deleted file mode 100644 index 8d1c74ed396e3..0000000000000 --- a/tests/integration/issues/github_1468/pods/filter_chunk_2.yml +++ /dev/null @@ -1,8 +0,0 @@ -!BaseExecutor -requests: - on: - SearchRequest: - - !FilterQL - with: - traversal_paths: ['c'] - lookups: {'modality': '2'} diff --git a/tests/integration/issues/github_1468/test_reduce_route.py b/tests/integration/issues/github_1468/test_reduce_route.py deleted file mode 100644 index 4a33a7fe590b0..0000000000000 --- a/tests/integration/issues/github_1468/test_reduce_route.py +++ /dev/null @@ -1,19 +0,0 @@ -import pytest - -from jina.flow import Flow -from jina.types.document.multimodal import MultimodalDocument - - -def multimodal_generator(): - for i in range(0, 5): - document = MultimodalDocument( - modality_content_map={'1': f'aaa {i}', '2': f'bbb {i}'} - ) - yield document - - -@pytest.mark.skip(' Failing until issue 1468 is fixed ') -@pytest.mark.timeout(10) -def test_reduce_route(): - with Flow.load_config('flow.yml') as f: - f.search(inputs=multimodal_generator()) diff --git a/tests/integration/issues/github_1545/__init__.py b/tests/integration/issues/github_1545/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_1545/test_executor_metas.py b/tests/integration/issues/github_1545/test_executor_metas.py deleted file mode 100644 index 67a2cd8d58b9d..0000000000000 --- a/tests/integration/issues/github_1545/test_executor_metas.py +++ /dev/null @@ -1,11 +0,0 @@ -from jina.executors.indexers.vector import NumpyIndexer - - -def test_numpy_indexer_defaults(test_metas): - indexer = NumpyIndexer() - # this values come from v0.8.12 before introducing JAML, add here for regression - assert indexer.pea_id == 0 - assert indexer.workspace is None - assert indexer._workspace == './' - assert indexer.py_modules is None - assert indexer.name.startswith('jina.executors.indexers') diff --git a/tests/integration/issues/github_1546/__init__.py b/tests/integration/issues/github_1546/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_1546/bad1/crafter.yml b/tests/integration/issues/github_1546/bad1/crafter.yml deleted file mode 100644 index 5234fedca6485..0000000000000 --- a/tests/integration/issues/github_1546/bad1/crafter.yml +++ /dev/null @@ -1,3 +0,0 @@ -!CustomCrafter1 -metas: - py_modules: custom_crafter.py # I also tried to add helper.py here \ No newline at end of file diff --git a/tests/integration/issues/github_1546/bad1/custom_crafter.py b/tests/integration/issues/github_1546/bad1/custom_crafter.py deleted file mode 100644 index 1e19299f81e6f..0000000000000 --- a/tests/integration/issues/github_1546/bad1/custom_crafter.py +++ /dev/null @@ -1,8 +0,0 @@ -from jina.executors.crafters import BaseCrafter -from .helper import helper_function - - -class CustomCrafter1(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - print(helper_function) diff --git a/tests/integration/issues/github_1546/bad1/helper.py b/tests/integration/issues/github_1546/bad1/helper.py deleted file mode 100644 index ad02e90ba120f..0000000000000 --- a/tests/integration/issues/github_1546/bad1/helper.py +++ /dev/null @@ -1,2 +0,0 @@ -def helper_function(): - pass diff --git a/tests/integration/issues/github_1546/bad2/__init__.py b/tests/integration/issues/github_1546/bad2/__init__.py deleted file mode 100644 index dd3f2ad798190..0000000000000 --- a/tests/integration/issues/github_1546/bad2/__init__.py +++ /dev/null @@ -1,9 +0,0 @@ -from .helper import helper_function - -from jina.executors.crafters import BaseCrafter - - -class CustomCrafter2(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - print(helper_function) diff --git a/tests/integration/issues/github_1546/bad2/crafter.yml b/tests/integration/issues/github_1546/bad2/crafter.yml deleted file mode 100644 index cb21ec021d2c0..0000000000000 --- a/tests/integration/issues/github_1546/bad2/crafter.yml +++ /dev/null @@ -1,5 +0,0 @@ -!CustomCrafter2 -metas: - py_modules: - - __init__.py - - helper.py diff --git a/tests/integration/issues/github_1546/bad2/helper.py b/tests/integration/issues/github_1546/bad2/helper.py deleted file mode 100644 index ad02e90ba120f..0000000000000 --- a/tests/integration/issues/github_1546/bad2/helper.py +++ /dev/null @@ -1,2 +0,0 @@ -def helper_function(): - pass diff --git a/tests/integration/issues/github_1546/good1/crafter.yml b/tests/integration/issues/github_1546/good1/crafter.yml deleted file mode 100644 index 3ba6929c89dbb..0000000000000 --- a/tests/integration/issues/github_1546/good1/crafter.yml +++ /dev/null @@ -1,5 +0,0 @@ -!CustomCrafter3 -metas: - py_modules: - - helper.py - - custom_crafter.py \ No newline at end of file diff --git a/tests/integration/issues/github_1546/good1/custom_crafter.py b/tests/integration/issues/github_1546/good1/custom_crafter.py deleted file mode 100644 index d35c9591ab2d2..0000000000000 --- a/tests/integration/issues/github_1546/good1/custom_crafter.py +++ /dev/null @@ -1,9 +0,0 @@ -from jinahub.helper import helper_function - -from jina.executors.crafters import BaseCrafter - - -class CustomCrafter3(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - print(helper_function) diff --git a/tests/integration/issues/github_1546/good1/helper.py b/tests/integration/issues/github_1546/good1/helper.py deleted file mode 100644 index ad02e90ba120f..0000000000000 --- a/tests/integration/issues/github_1546/good1/helper.py +++ /dev/null @@ -1,2 +0,0 @@ -def helper_function(): - pass diff --git a/tests/integration/issues/github_1546/good2/__init__.py b/tests/integration/issues/github_1546/good2/__init__.py deleted file mode 100644 index e63e2133963d2..0000000000000 --- a/tests/integration/issues/github_1546/good2/__init__.py +++ /dev/null @@ -1,9 +0,0 @@ -from .helper import helper_function - -from jina.executors.crafters import BaseCrafter - - -class CustomCrafter4(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - print(helper_function) diff --git a/tests/integration/issues/github_1546/good2/crafter.yml b/tests/integration/issues/github_1546/good2/crafter.yml deleted file mode 100644 index 1e5a04d780958..0000000000000 --- a/tests/integration/issues/github_1546/good2/crafter.yml +++ /dev/null @@ -1,5 +0,0 @@ -!CustomCrafter4 -metas: - py_modules: - - helper.py - - __init__.py \ No newline at end of file diff --git a/tests/integration/issues/github_1546/good2/helper.py b/tests/integration/issues/github_1546/good2/helper.py deleted file mode 100644 index ad02e90ba120f..0000000000000 --- a/tests/integration/issues/github_1546/good2/helper.py +++ /dev/null @@ -1,2 +0,0 @@ -def helper_function(): - pass diff --git a/tests/integration/issues/github_1546/good3/__init__.py b/tests/integration/issues/github_1546/good3/__init__.py deleted file mode 100644 index f3fabcee947c6..0000000000000 --- a/tests/integration/issues/github_1546/good3/__init__.py +++ /dev/null @@ -1,102 +0,0 @@ -from typing import Tuple, Dict, List, Union - -import numpy as np - -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import single - -from .helper import _crop_image, _move_channel_axis, _load_image - - -class FiveImageCropper2(BaseSegmenter): - """ - :class:`FiveImageCropper` crops the image into four corners and the central crop. - """ - - def __init__( - self, - target_size: Union[Tuple[int, int], int] = 224, - channel_axis: int = -1, - *args, - **kwargs, - ): - """ - - :param target_size: desired output size. If size is a sequence like (h, w), the output size will be matched to - this. If size is an int, the output will have the same height and width as the `target_size`. - """ - super().__init__(*args, **kwargs) - self.target_size = target_size - self.channel_axis = channel_axis - - @single - def segment(self, blob: 'np.ndarray', *args, **kwargs) -> List[Dict]: - """ - Crop the input image array. - - :param blob: the ndarray of the image with the color channel at the last axis - :return: a list of five chunk dicts with the cropped images - """ - raw_img = _load_image(blob, self.channel_axis) - image_width, image_height = raw_img.size - if isinstance(self.target_size, int): - target_h = target_w = self.target_size - elif isinstance(self.target_size, Tuple) and len(self.target_size) == 2: - target_h, target_w = self.target_size - else: - raise ValueError( - f'target_size should be an integer or a tuple of two integers: {self.target_size}' - ) - _tl, top_tl, left_tl = _crop_image(raw_img, self.target_size, 0, 0) - tl = _move_channel_axis(np.asarray(_tl), -1, self.channel_axis) - _tr, top_tr, left_tr = _crop_image( - raw_img, self.target_size, top=0, left=image_width - target_w - ) - tr = _move_channel_axis(np.asarray(_tr), -1, self.channel_axis) - _bl, top_bl, left_bl = _crop_image( - raw_img, self.target_size, top=image_height - target_h, left=0 - ) - bl = _move_channel_axis(np.asarray(_bl), -1, self.channel_axis) - _br, top_br, left_br = _crop_image( - raw_img, - self.target_size, - top=image_height - target_h, - left=image_width - target_w, - ) - br = _move_channel_axis(np.asarray(_br), -1, self.channel_axis) - _center, top_center, left_center = _crop_image( - raw_img, self.target_size, how='center' - ) - center = _move_channel_axis(np.asarray(_center), -1, self.channel_axis) - return [ - dict( - offset=0, - weight=1.0, - blob=tl.astype('float32'), - location=(top_tl, left_tl), - ), - dict( - offset=0, - weight=1.0, - blob=tr.astype('float32'), - location=(top_tr, left_tr), - ), - dict( - offset=0, - weight=1.0, - blob=bl.astype('float32'), - location=(top_bl, left_bl), - ), - dict( - offset=0, - weight=1.0, - blob=br.astype('float32'), - location=(top_br, left_br), - ), - dict( - offset=0, - weight=1.0, - blob=center.astype('float32'), - location=(top_center, left_center), - ), - ] diff --git a/tests/integration/issues/github_1546/good3/config.yml b/tests/integration/issues/github_1546/good3/config.yml deleted file mode 100644 index 8b3458a8b5ab6..0000000000000 --- a/tests/integration/issues/github_1546/good3/config.yml +++ /dev/null @@ -1,8 +0,0 @@ -!FiveImageCropper2 -with: - {} -metas: - py_modules: - # - you can put more dependencies here - - helper.py - - __init__.py \ No newline at end of file diff --git a/tests/integration/issues/github_1546/good3/helper.py b/tests/integration/issues/github_1546/good3/helper.py deleted file mode 100644 index ac8a336198657..0000000000000 --- a/tests/integration/issues/github_1546/good3/helper.py +++ /dev/null @@ -1,118 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -from typing import Tuple, Union, Optional - -import numpy as np - - -def _move_channel_axis( - img: 'np.ndarray', channel_axis_to_move: int, target_channel_axis: int = -1 -) -> 'np.ndarray': - """ - Ensure the color channel axis is the default axis. - """ - if channel_axis_to_move == target_channel_axis: - return img - return np.moveaxis(img, channel_axis_to_move, target_channel_axis) - - -def _load_image(blob: 'np.ndarray', channel_axis: int): - """ - Load an image array and return a `PIL.Image` object. - """ - - from PIL import Image - - img = _move_channel_axis(blob, channel_axis) - return Image.fromarray(img.astype('uint8')) - - -def _crop_image( - img, - target_size: Union[Tuple[int, int], int], - top: Optional[int] = None, - left: Optional[int] = None, - how: str = 'precise', -): - """ - Crop the input :py:mod:`PIL` image. - - :param img: :py:mod:`PIL.Image`, the image to be resized - :param target_size: desired output size. If size is a sequence like - (h, w), the output size will be matched to this. If size is an int, - the output will have the same height and width as the `target_size`. - :param top: the vertical coordinate of the top left corner of the crop box. - :param left: the horizontal coordinate of the top left corner of the crop box. - :param how: the way of cropping. Valid values include `center`, `random`, and, `precise`. Default is `precise`. - - `center`: crop the center part of the image - - `random`: crop a random part of the image - - `precise`: crop the part of the image specified by the crop box with the given ``top`` and ``left``. - .. warning:: When `precise` is used, ``top`` and ``left`` must be fed valid value. - - """ - import PIL.Image as Image - - assert isinstance(img, Image.Image), 'img must be a PIL.Image' - img_w, img_h = img.size - if isinstance(target_size, int): - target_h = target_w = target_size - elif isinstance(target_size, Tuple) and len(target_size) == 2: - target_h, target_w = target_size - else: - raise ValueError( - f'target_size should be an integer or a tuple of two integers: {target_size}' - ) - w_beg = left - h_beg = top - if how == 'center': - w_beg = int((img_w - target_w) / 2) - h_beg = int((img_h - target_h) / 2) - elif how == 'random': - w_beg = np.random.randint(0, img_w - target_w + 1) - h_beg = np.random.randint(0, img_h - target_h + 1) - elif how == 'precise': - assert w_beg is not None and h_beg is not None - assert ( - 0 <= w_beg <= (img_w - target_w) - ), f'left must be within [0, {img_w - target_w}]: {w_beg}' - assert ( - 0 <= h_beg <= (img_h - target_h) - ), f'top must be within [0, {img_h - target_h}]: {h_beg}' - else: - raise ValueError(f'unknown input how: {how}') - if not isinstance(w_beg, int): - raise ValueError(f'left must be int number between 0 and {img_w}: {left}') - if not isinstance(h_beg, int): - raise ValueError(f'top must be int number between 0 and {img_h}: {top}') - w_end = w_beg + target_w - h_end = h_beg + target_h - img = img.crop((w_beg, h_beg, w_end, h_end)) - return img, h_beg, w_beg - - -def _resize_short(img, target_size: Union[Tuple[int, int], int], how: str = 'LANCZOS'): - """ - Resize the input :py:mod:`PIL` image. - :param img: :py:mod:`PIL.Image`, the image to be resized - :param target_size: desired output size. If size is a sequence like (h, w), the output size will be matched to - this. If size is an int, the smaller edge of the image will be matched to this number maintain the aspect - ratio. - :param how: the interpolation method. Valid values include `NEAREST`, `BILINEAR`, `BICUBIC`, and `LANCZOS`. - Default is `LANCZOS`. Please refer to `PIL.Image` for detaisl. - """ - import PIL.Image as Image - - assert isinstance(img, Image.Image), 'img must be a PIL.Image' - if isinstance(target_size, int): - percent = float(target_size) / min(img.size[0], img.size[1]) - target_w = int(round(img.size[0] * percent)) - target_h = int(round(img.size[1] * percent)) - elif isinstance(target_size, Tuple) and len(target_size) == 2: - target_h, target_w = target_size - else: - raise ValueError( - f'target_size should be an integer or a tuple of two integers: {target_size}' - ) - img = img.resize((target_w, target_h), getattr(Image, how)) - return img diff --git a/tests/integration/issues/github_1546/test_pymodules_import.py b/tests/integration/issues/github_1546/test_pymodules_import.py deleted file mode 100644 index 4a8d07705f5d5..0000000000000 --- a/tests/integration/issues/github_1546/test_pymodules_import.py +++ /dev/null @@ -1,85 +0,0 @@ -import pytest - -from jina.executors import BaseExecutor - - -def test_import_with_abs_namespace_should_pass(): - """ - This is a valid structure: - - "my_cust_module" is not a python module (lack of __init__.py under the root) - - to import ``foo.py``, you must to use ``from jinahub.foo import bar`` - - ``jinahub`` is a common namespace for all plugin-modules, not changeable. - - ``helper.py`` needs to be put BEFORE `my_cust.py` in YAML ``py_modules`` - - File structure: - - my_cust_module - |- my_cust.py - |- helper.py - |- config.yml - |- py_modules - |- helper.py - |- my_cust.py - """ - - b = BaseExecutor.load_config('good1/crafter.yml') - assert b.__class__.__name__ == 'CustomCrafter3' - - -def test_import_with_module_structure_should_pass(): - """ - This is a valid structure and it is RECOMMENDED: - - "my_cust_module" is a python module - - all core logic of your customized executor goes to ``__init__.py`` - - to import ``foo.py``, you can use relative import, e.g. ``from .foo import bar`` - - ``helper.py`` needs to be put BEFORE `__init__.py` in YAML ``py_modules`` - - This is also the structure given by ``jina hub new`` CLI. - - File structure: - - my_cust_module - |- __init__.py - |- helper.py - |- config.yml - |- py_modules - |- helper.py - |- __init__.py - """ - b = BaseExecutor.load_config('good2/crafter.yml') - assert b.__class__.__name__ == 'CustomCrafter4' - - -def test_import_with_hub_structure_should_pass(): - """ - copy paste from hub module structure should work - this structure is copy-paste from: https://github.com/jina-ai/jina-hub/tree/master/crafters/image/FiveImageCropper - - File structure: - my_cust_modul - | - |- __init__.py - |- helper.py - |- config.yml - |- py_modules - |- helper.py - |- __init.py - :return: - """ - b = BaseExecutor.load_config('good3/config.yml') - assert b.__class__.__name__ == 'FiveImageCropper2' - - -def test_import_casual_structure_should_fail(): - # this structure is a copy-paste from - # https://github.com/jina-ai/jina/issues/1546#issuecomment-751481422 - with pytest.raises(ImportError): - BaseExecutor.load_config('bad1/crafter.yml') - - -def test_import_good_structure_but_wrong_import_order_should_fail(): - # this structure is a copy paste of "test_import_with_module_structure_should_pass" but with wrong import order - # if A depends on B, i.e. in A.py you write "import B" - # then B.py should be put in front of A.py in py_modules, otherwise it will fail - with pytest.raises(ImportError): - BaseExecutor.load_config('bad2/crafter.yml') diff --git a/tests/integration/issues/github_1684/__init__.py b/tests/integration/issues/github_1684/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_1684/flow.yml b/tests/integration/issues/github_1684/flow.yml deleted file mode 100644 index 6a501ed1de5d4..0000000000000 --- a/tests/integration/issues/github_1684/flow.yml +++ /dev/null @@ -1,5 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: index.yml diff --git a/tests/integration/issues/github_1684/index.yml b/tests/integration/issues/github_1684/index.yml deleted file mode 100644 index ccf2d5e11eb84..0000000000000 --- a/tests/integration/issues/github_1684/index.yml +++ /dev/null @@ -1,6 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz -metas: - workspace: $JINA_TEST_1684_WORKSPACE - diff --git a/tests/integration/issues/github_1684/test_empty_shard.py b/tests/integration/issues/github_1684/test_empty_shard.py deleted file mode 100644 index b1fbf0c44b383..0000000000000 --- a/tests/integration/issues/github_1684/test_empty_shard.py +++ /dev/null @@ -1,39 +0,0 @@ -import os - -import pytest -import numpy as np - -from jina.flow import Flow -from jina import Document - -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture -def workdir(tmpdir): - os.environ['JINA_TEST_1684_WORKSPACE'] = str(tmpdir) - yield - del os.environ['JINA_TEST_1684_WORKSPACE'] - - -def test_empty_shard(mocker, workdir): - doc = Document() - doc.text = 'text' - doc.embedding = np.array([1, 1, 1]) - - def validate_response(resp): - assert len(resp.docs) == 1 - assert len(resp.docs[0].matches) == 0 - - mock = mocker.Mock() - error_mock = mocker.Mock() - - with Flow.load_config(os.path.join(cur_dir, 'flow.yml')) as f: - f.search([doc], on_done=mock, on_error=error_mock) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - error_mock.assert_not_called() diff --git a/tests/integration/issues/github_1861/__init__.py b/tests/integration/issues/github_1861/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_1861/test_pea_closing.py b/tests/integration/issues/github_1861/test_pea_closing.py deleted file mode 100644 index 03e7c1db43389..0000000000000 --- a/tests/integration/issues/github_1861/test_pea_closing.py +++ /dev/null @@ -1,21 +0,0 @@ -import pytest - -from jina.executors.encoders import BaseEncoder -from jina.flow import Flow - - -class ExceptionExecutor(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.name = 'exception-executor' - - def post_init(self): - raise Exception - - -@pytest.mark.timeout(10) -def test_pea_closing(): - with pytest.raises(Exception): - with Flow().add(uses='!ExceptionExecutor', parallel=3) as f: - pod1 = f._pod_nodes['pod0'] - assert len(pod1.peas) == 0 diff --git a/tests/integration/issues/github_2103/test_search_attributes.py b/tests/integration/issues/github_2103/test_search_attributes.py index 7733eaa466224..8f1f6c81b111d 100644 --- a/tests/integration/issues/github_2103/test_search_attributes.py +++ b/tests/integration/issues/github_2103/test_search_attributes.py @@ -9,7 +9,7 @@ from jina.proto import jina_pb2 from jina import Document from jina import helper -from jina.executors.encoders import BaseEncoder +from jina import Executor, requests from tests import validate_callback cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -27,11 +27,11 @@ def docs(): def test_no_matches_grpc(mocker, docs): def validate_response(resp): - for doc in resp.search.docs: + for doc in resp.data.docs: assert len(doc.matches) == 0 mock_on_done = mocker.Mock() - with Flow().add(uses='_pass') as f: + with Flow().add() as f: f.search(inputs=docs, on_done=mock_on_done) validate_callback(mock_on_done, validate_response) @@ -41,25 +41,16 @@ def query_dict(): return {'top_k': 3, 'mode': 'search', 'data': [f'text:query']} -class MockExecutor(BaseEncoder): - def get_docs(self, req_type): - if req_type == 'ControlRequest': - return [] - driver = self._drivers[req_type][0] - return driver.docs - - def __call__(self, req_type, *args, **kwargs): - if req_type == 'ControlRequest': - for d in self._drivers[req_type]: - d() - else: - for doc in self.get_docs(req_type): - doc.tags['tag'] = 'test' +class MockExecutor(Executor): + @requests + def foo(self, docs, *args, **kwargs): + for doc in docs: + doc.tags['tag'] = 'test' def test_no_matches_rest(query_dict): port = helper.random_port() - with Flow(rest_api=True, port_expose=port).add(uses='!MockExecutor'): + with Flow(rest_api=True, port_expose=port).add(uses=MockExecutor): # temporarily adding sleep time.sleep(0.5) query = json.dumps(query_dict).encode('utf-8') @@ -69,8 +60,8 @@ def test_no_matches_rest(query_dict): headers={'content-type': 'application/json'}, ) resp = request.urlopen(req).read().decode('utf8') - doc = json.loads(resp)['search']['docs'][0] + doc = json.loads(resp)['data']['docs'][0] present_keys = sorted(doc.keys()) for field in _document_fields: - if field not in IGNORED_FIELDS + ['buffer', 'content', 'blob']: + if field not in IGNORED_FIELDS + ['buffer', 'content', 'blob', 'uri']: assert field in present_keys diff --git a/tests/integration/issues/github_2254/__init__.py b/tests/integration/issues/github_2254/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_2254/flow.yml b/tests/integration/issues/github_2254/flow.yml deleted file mode 100644 index 9fea45cdbd932..0000000000000 --- a/tests/integration/issues/github_2254/flow.yml +++ /dev/null @@ -1,18 +0,0 @@ -jtype: Flow -version: 1.0 -pods: - - name: segmenter_0 - uses: - jtype: SimpleSegmenter - with: - sep: ',' - - name: segmenter_1 - uses: - jtype: SimpleSegmenter - with: - sep: ' ' - needs: gateway - - name: join - uses: _merge_chunks - needs: [segmenter_0, segmenter_1] - diff --git a/tests/integration/issues/github_2254/test_siblings.py b/tests/integration/issues/github_2254/test_siblings.py deleted file mode 100644 index 34402429dd48a..0000000000000 --- a/tests/integration/issues/github_2254/test_siblings.py +++ /dev/null @@ -1,37 +0,0 @@ -from jina import Flow, Document -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import single - -from tests import validate_callback - - -class SimpleSegmenter(BaseSegmenter): - def __init__(self, sep=','): - super(SimpleSegmenter, self).__init__() - self.sep = sep - - @single - def segment(self, text, *args, **kwargs): - return [{'text': t, 'mime_type': 'text/plain'} for t in text.split(self.sep)] - - -def test_segment_siblings(mocker): - test_text = '1,2,3,4,5 a b c' - - def validate(resp): - assert resp.index.docs[0].chunks[0].siblings == len( - test_text.split(',') + test_text.split(' ') - ) - - f = Flow().load_config('flow.yml') - - mock = mocker.Mock() - with f: - f.index( - [ - Document(text=test_text), - ], - on_done=mock, - ) - mock.assert_called_once() - validate_callback(mock, validate) diff --git a/tests/integration/issues/github_2295/__init__.py b/tests/integration/issues/github_2295/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_2295/flow_index.yml b/tests/integration/issues/github_2295/flow_index.yml deleted file mode 100644 index 725f983a595b6..0000000000000 --- a/tests/integration/issues/github_2295/flow_index.yml +++ /dev/null @@ -1,4 +0,0 @@ -!Flow -pods: - doc_indexer: - uses: index.yml \ No newline at end of file diff --git a/tests/integration/issues/github_2295/flow_query.yml b/tests/integration/issues/github_2295/flow_query.yml deleted file mode 100644 index b6f46bfb4925f..0000000000000 --- a/tests/integration/issues/github_2295/flow_query.yml +++ /dev/null @@ -1,4 +0,0 @@ -!Flow -pods: - doc_indexer: - uses: search.yml # documents cannot be found diff --git a/tests/integration/issues/github_2295/index.yml b/tests/integration/issues/github_2295/index.yml deleted file mode 100644 index b1bb9c456dc28..0000000000000 --- a/tests/integration/issues/github_2295/index.yml +++ /dev/null @@ -1,13 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gz - delete_on_dump: true -metas: - name: doc_idx_file # a customized name - workspace: $TEST_2295_WORKSPACE -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/issues/github_2295/search.yml b/tests/integration/issues/github_2295/search.yml deleted file mode 100644 index 6ff651f06aa95..0000000000000 --- a/tests/integration/issues/github_2295/search.yml +++ /dev/null @@ -1,12 +0,0 @@ -!SearchDocIndexer -with: - index_filename: doc.gz -metas: - name: doc_idx_search # a customized name - workspace: $TEST_2295_WORKSPACE -requests: - on: - SearchRequest: - - !SearchDocDriver - with: - traversal_paths: ['r'] \ No newline at end of file diff --git a/tests/integration/issues/github_2295/test_issue.py b/tests/integration/issues/github_2295/test_issue.py deleted file mode 100644 index bb7b6432a0d80..0000000000000 --- a/tests/integration/issues/github_2295/test_issue.py +++ /dev/null @@ -1,63 +0,0 @@ -import os -from typing import Iterable, List - -import pytest - -from jina.drivers.search import KVSearchDriver -from jina.executors.indexers.keyvalue import BinaryPbIndexer -from jina.flow import Flow -from jina import Document, DocumentArray - -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -class SearchDocIndexer(BinaryPbIndexer): - def query(self, jina_id: str = None, mongo_ids: List[str] = None): - return super().query([jina_id])[0] # serialized document - - def post_init(self): - super().post_init() - # key to have user workaround https://github.com/jina-ai/jina/issues/2295. - # Underlying problem in https://github.com/jina-ai/jina/issues/2299 - self.name = 'doc_idx_file' - - -class SearchDocDriver(KVSearchDriver): - def _apply_all( - self, doc_sequences: Iterable['DocumentArray'], *args, **kwargs - ) -> None: - for docs in doc_sequences: - for idx, doc in enumerate(docs): - serialized_doc = self.exec_fn(jina_id=doc.id) - if serialized_doc: - doc.MergeFrom(Document(serialized_doc)) # merge! - - -@pytest.fixture -def test_workspace(tmpdir): - os.environ['TEST_2295_WORKSPACE'] = str(tmpdir) - yield - del os.environ['TEST_2295_WORKSPACE'] - - -def test_issue_2295(test_workspace, mocker): - # This tests the proposed workaround to user in 2295, once https://github.com/jina-ai/jina/issues/2299 this test - # can be removed - def validate_response(resp): - assert resp.search.docs[0].id == 'id' - assert resp.search.docs[0].text == 'text' - - index_set = DocumentArray([Document(id='id', text='text')]) - query_set = DocumentArray([Document(id='id')]) - - with Flow.load_config(os.path.join(cur_dir, 'flow_index.yml')) as f: - f.index(inputs=index_set) - - mock_on_done = mocker.Mock() - - with Flow.load_config(os.path.join(cur_dir, 'flow_query.yml')) as f: - f.search(inputs=query_set, on_done=mock_on_done) - - validate_callback(mock_on_done, validate_response) diff --git a/tests/integration/issues/github_2320/__init__.py b/tests/integration/issues/github_2320/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_2320/mock_indexer.yml b/tests/integration/issues/github_2320/mock_indexer.yml deleted file mode 100644 index 29f74e190f628..0000000000000 --- a/tests/integration/issues/github_2320/mock_indexer.yml +++ /dev/null @@ -1,3 +0,0 @@ -!MockIndexer -metas: - workspace: $TMP_2230_WORKSPACE \ No newline at end of file diff --git a/tests/integration/issues/github_2320/test_write_handler_close_after_delete.py b/tests/integration/issues/github_2320/test_write_handler_close_after_delete.py deleted file mode 100644 index ccfeab6369bee..0000000000000 --- a/tests/integration/issues/github_2320/test_write_handler_close_after_delete.py +++ /dev/null @@ -1,45 +0,0 @@ -import os -import pytest - -from jina.flow import Flow -from jina.executors.indexers.keyvalue import BinaryPbIndexer -from jina import Document - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture() -def tmp_workspace(tmpdir): - os.environ['TMP_2230_WORKSPACE'] = str(tmpdir) - yield - del os.environ['TMP_2230_WORKSPACE'] - - -@pytest.fixture() -def docs(): - return [Document() for _ in range(100)] - - -class MockIndexer(BinaryPbIndexer): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.delete_on_dump = True - - def add(self, *args, **kwargs): - import time - - time.sleep(1) - return super().add(*args, **kwargs) - - -def test_file_handler_not_closed(mocker, docs, tmp_workspace): - mock = mocker.Mock() - error_mock = mocker.Mock() - - with Flow().add( - uses=os.path.join(cur_dir, 'mock_indexer.yml'), dump_interval=1 - ) as f: - f.index(inputs=docs, request_size=50, on_done=mock, on_error=error_mock) - - mock.assert_called() - error_mock.assert_not_called() diff --git a/tests/integration/issues/github_867/__init__.py b/tests/integration/issues/github_867/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_867/test_close_executor.py b/tests/integration/issues/github_867/test_close_executor.py deleted file mode 100644 index 9ec99a68f4ca7..0000000000000 --- a/tests/integration/issues/github_867/test_close_executor.py +++ /dev/null @@ -1,55 +0,0 @@ -import os -import time -from typing import Any - -from jina import Document -from jina.executors import BaseExecutor -from jina.executors.encoders import BaseEncoder -from jina.flow import Flow - -cur_dir = os.path.dirname(os.path.abspath(__file__)) -save_abs_path = os.path.join(cur_dir, 'slow-save-executor.bin') - - -class SlowSaveExecutor(BaseEncoder): - """ - Github issue: https://github.com/jina-ai/jina/issues/867 and https://github.com/jina-ai/jina/issues/873 - - Problem that user encountered is that his `indexer` was a little slow to save because `key_bytes` is large. - Then, Pea closing logic was wrong and Flow was killing its Pods before they were able to properly close its resources. - because they were running as `daemon` processes. - - This test tries to be a single proxy to that issue simulating via `sleep` an expensive pickling operation. - Before https://github.com/jina-ai/jina/pull/907 this test would fail because at loading time no pickle object would be properly closed. - This is similar to the case seen by the user where the `index` files are not properly flushed and closed. - """ - - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.touch() - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.name = 'slow-save-executor' - - @property - def save_abspath(self) -> str: - return save_abs_path - - def __getstate__(self): - d = super().__getstate__() - time.sleep(2) - d['test'] = 10 - return d - - -def test_close_and_load_executor(): - with Flow().add(uses=os.path.join(cur_dir, 'yaml/slowexecutor.yml')) as f: - f.index(['abc', 'def']) - - exec = BaseExecutor.load(save_abs_path) - - assert isinstance(exec, SlowSaveExecutor) - assert hasattr(exec, 'test') - assert exec.test == 10 - assert exec.save_abspath == save_abs_path - os.remove(save_abs_path) diff --git a/tests/integration/issues/github_867/yaml/slowexecutor.yml b/tests/integration/issues/github_867/yaml/slowexecutor.yml deleted file mode 100644 index 13487c903799e..0000000000000 --- a/tests/integration/issues/github_867/yaml/slowexecutor.yml +++ /dev/null @@ -1,3 +0,0 @@ -!SlowSaveExecutor -metas: - is_updated: True \ No newline at end of file diff --git a/tests/integration/issues/github_929/__init__.py b/tests/integration/issues/github_929/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_929/binarypb.yml b/tests/integration/issues/github_929/binarypb.yml deleted file mode 100644 index 7fd5df83d7fa0..0000000000000 --- a/tests/integration/issues/github_929/binarypb.yml +++ /dev/null @@ -1,6 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: test-docs - delete_on_dump: true -metas: - workspace: 'test-workspace' \ No newline at end of file diff --git a/tests/integration/issues/github_929/test_shelfindexer.py b/tests/integration/issues/github_929/test_shelfindexer.py deleted file mode 100644 index 487ff930445a5..0000000000000 --- a/tests/integration/issues/github_929/test_shelfindexer.py +++ /dev/null @@ -1,43 +0,0 @@ -import os -import shutil - -import pytest - -from jina.flow import Flow -from jina.logging.profile import used_memory -from jina.proto import jina_pb2 -from tests import random_docs, validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.mark.parametrize('uses', ['binarypb.yml']) -def test_shelf_in_flow(uses, mocker): - m1 = used_memory() - # shelve does not support embed > 1000?? - # _dbm.error: cannot add item to database - # HASH: Out of overflow pages. Increase page size - docs = random_docs(10000, embed_dim=1000) - f = Flow().add(uses=os.path.join(cur_dir, uses)) - - with f: - f.index(docs) - - m2 = used_memory() - d = jina_pb2.DocumentProto() - - def validate(req): - m4 = used_memory() - print( - f'before: {m1}, after index: {m2}, after loading: {m3} after searching {m4}' - ) - - mock = mocker.Mock() - - with f: - m3 = used_memory() - f.search([d], on_done=mock) - - shutil.rmtree('test-workspace', ignore_errors=False, onerror=None) - mock.assert_called_once() - validate_callback(mock, validate) diff --git a/tests/integration/issues/github_969/__init__.py b/tests/integration/issues/github_969/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_969/test_messages_different_types.py b/tests/integration/issues/github_969/test_messages_different_types.py deleted file mode 100644 index 7beef98791d94..0000000000000 --- a/tests/integration/issues/github_969/test_messages_different_types.py +++ /dev/null @@ -1,174 +0,0 @@ -import pytest -import numpy as np - -from jina.flow import Flow -from jina.proto import jina_pb2 -from jina.types.ndarray.generic import NdArray - -from tests import validate_callback - -random_np_array = np.random.randint(10, size=(50, 10)) -buffer = 'text_buffer'.encode() -text = 'text_content' - - -@pytest.fixture -def input_doc_with_chunks(): - doc = jina_pb2.DocumentProto() - doc.tags['id'] = 1 - chunk0 = doc.chunks.add() - chunk0.tags['id'] = 10 - chunk0.text = text - NdArray(chunk0.embedding).value = random_np_array - chunk1 = doc.chunks.add() - chunk1.tags['id'] = 20 - NdArray(chunk1.blob).value = random_np_array - chunk2 = doc.chunks.add() - chunk2.tags['id'] = 30 - chunk2.buffer = buffer - return doc - - -def test_message_docs_different_chunk_types(input_doc_with_chunks, mocker): - def validate_chunks_fn(resp): - assert len(resp.search.docs) == 1 - doc = resp.search.docs[0] - assert int(doc.tags['id']) == 1 - assert len(doc.chunks) == 3 - - chunk0 = doc.chunks[0] - assert int(chunk0.tags['id']) == 10 - assert chunk0.text == text - np.testing.assert_almost_equal(random_np_array, NdArray(chunk0.embedding).value) - - chunk1 = doc.chunks[1] - assert int(chunk1.tags['id']) == 20 - np.testing.assert_almost_equal(random_np_array, NdArray(chunk1.blob).value) - - chunk2 = doc.chunks[2] - assert int(chunk2.tags['id']) == 30 - assert chunk2.buffer == buffer - - mock = mocker.Mock() - - with Flow().add() as f: - f.search(inputs=[input_doc_with_chunks], on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate_chunks_fn) - - -@pytest.fixture -def input_doc_with_matches(): - doc = jina_pb2.DocumentProto() - doc.tags['id'] = 1 - match0 = doc.matches.add() - match0.tags['id'] = 10 - match0.text = text - NdArray(match0.embedding).value = random_np_array - match1 = doc.matches.add() - match1.tags['id'] = 20 - NdArray(match1.blob).value = random_np_array - match2 = doc.matches.add() - match2.tags['id'] = 30 - match2.buffer = buffer - return doc - - -def test_message_docs_different_matches_types(input_doc_with_matches, mocker): - def validate_matches_fn(resp): - assert len(resp.search.docs) == 1 - doc = resp.search.docs[0] - assert int(doc.tags['id']) == 1 - assert len(doc.matches) == 3 - - match0 = doc.matches[0] - assert int(match0.tags['id']) == 10 - assert match0.text == text - np.testing.assert_almost_equal(random_np_array, NdArray(match0.embedding).value) - - match1 = doc.matches[1] - assert int(match1.tags['id']) == 20 - np.testing.assert_almost_equal(random_np_array, NdArray(match1.blob).value) - - match2 = doc.matches[2] - assert int(match2.tags['id']) == 30 - assert match2.buffer == buffer - - mock = mocker.Mock() - with Flow().add() as f: - f.search(inputs=[input_doc_with_matches], on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_matches_fn) - - -@pytest.fixture -def input_doc_chunks_and_matches(): - doc = jina_pb2.DocumentProto() - doc.tags['id'] = 1 - chunk0 = doc.chunks.add() - chunk0.tags['id'] = 10 - chunk0.text = text - NdArray(chunk0.embedding).value = random_np_array - chunk1 = doc.chunks.add() - chunk1.tags['id'] = 20 - NdArray(chunk1.blob).value = random_np_array - chunk2 = doc.chunks.add() - chunk2.tags['id'] = 30 - chunk2.buffer = buffer - match0 = doc.matches.add() - match0.tags['id'] = 10 - match0.text = text - NdArray(match0.embedding).value = random_np_array - match1 = doc.matches.add() - match1.tags['id'] = 20 - NdArray(match1.blob).value = random_np_array - match2 = doc.matches.add() - match2.tags['id'] = 30 - match2.buffer = buffer - return doc - - -def test_message_docs_different_chunks_and_matches_types( - input_doc_chunks_and_matches, mocker -): - def validate_chunks_and_matches_fn(resp): - assert len(resp.search.docs) == 1 - doc = resp.search.docs[0] - assert int(doc.tags['id']) == 1 - assert len(doc.chunks) == 3 - - chunk0 = doc.chunks[0] - assert int(chunk0.tags['id']) == 10 - assert chunk0.text == text - np.testing.assert_almost_equal(random_np_array, NdArray(chunk0.embedding).value) - - chunk1 = doc.chunks[1] - assert int(chunk1.tags['id']) == 20 - np.testing.assert_almost_equal(random_np_array, NdArray(chunk1.blob).value) - - chunk2 = doc.chunks[2] - assert int(chunk2.tags['id']) == 30 - assert chunk2.buffer == buffer - - assert len(doc.matches) == 3 - - match0 = doc.matches[0] - assert int(match0.tags['id']) == 10 - assert match0.text == text - np.testing.assert_almost_equal(random_np_array, NdArray(match0.embedding).value) - - match1 = doc.matches[1] - assert int(match1.tags['id']) == 20 - np.testing.assert_almost_equal(random_np_array, NdArray(match1.blob).value) - - match2 = doc.matches[2] - assert int(match2.tags['id']) == 30 - assert match2.buffer == buffer - - mock = mocker.Mock() - - with Flow().add() as f: - f.search(inputs=[input_doc_chunks_and_matches], on_done=mock) - - validate_callback(mock, validate_chunks_and_matches_fn) diff --git a/tests/integration/issues/github_976/__init__.py b/tests/integration/issues/github_976/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/issues/github_976/flow.yml b/tests/integration/issues/github_976/flow.yml deleted file mode 100644 index 385cbe35db1a9..0000000000000 --- a/tests/integration/issues/github_976/flow.yml +++ /dev/null @@ -1,6 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: yaml/index.yml - needs: gateway diff --git a/tests/integration/issues/github_976/test_topk.py b/tests/integration/issues/github_976/test_topk.py deleted file mode 100644 index c37a1c82c6497..0000000000000 --- a/tests/integration/issues/github_976/test_topk.py +++ /dev/null @@ -1,81 +0,0 @@ -import os - -import numpy as np -import pytest - -from jina import QueryLang -from jina.flow import Flow -from jina.proto import jina_pb2 -from jina.types.ndarray.generic import NdArray - -from tests import validate_callback - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_TOPK_DIR'] = str(tmpdir) - os.environ['JINA_TOPK'] = '9' - yield - del os.environ['JINA_TOPK_DIR'] - del os.environ['JINA_TOPK'] - - -def random_docs(num_docs, embed_dim=10, jitter=1): - for j in range(num_docs): - d = jina_pb2.DocumentProto() - d.tags['id'] = j - d.text = b'hello' - NdArray(d.embedding).value = np.random.random( - [embed_dim + np.random.randint(0, jitter)] - ) - yield d - - -def test_topk(config, mocker): - NDOCS = 3 - TOPK = int(os.getenv('JINA_TOPK')) - - def validate(resp): - assert len(resp.search.docs) == NDOCS - for doc in resp.search.docs: - assert len(doc.matches) == TOPK - - with Flow.load_config('flow.yml') as index_flow: - index_flow.index(inputs=random_docs(100)) - - mock = mocker.Mock() - with Flow.load_config('flow.yml') as search_flow: - search_flow.search(inputs=random_docs(NDOCS), on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) - - -def test_topk_override(config, mocker): - NDOCS = 3 - TOPK_OVERRIDE = 11 - - def validate(resp): - assert len(resp.search.docs) == NDOCS - for doc in resp.search.docs: - assert len(doc.matches) == TOPK_OVERRIDE - - # Making queryset - top_k_queryset = QueryLang( - { - 'name': 'VectorSearchDriver', - 'parameters': {'top_k': TOPK_OVERRIDE}, - 'priority': 1, - } - ) - - with Flow.load_config('flow.yml') as index_flow: - index_flow.index(inputs=random_docs(100)) - - mock = mocker.Mock() - with Flow.load_config('flow.yml') as search_flow: - search_flow.search( - inputs=random_docs(NDOCS), on_done=mock, queryset=[top_k_queryset] - ) - mock.assert_called_once() - validate_callback(mock, validate) diff --git a/tests/integration/issues/github_976/test_topk_rest_api.py b/tests/integration/issues/github_976/test_topk_rest_api.py deleted file mode 100644 index 61b061314fb22..0000000000000 --- a/tests/integration/issues/github_976/test_topk_rest_api.py +++ /dev/null @@ -1,32 +0,0 @@ -import json -import time -from urllib import request - -import pytest - -from jina.flow import Flow -from jina import helper - -TOP_K = 2 - - -@pytest.fixture -def query_dict(): - return {'top_k': TOP_K, 'mode': 'search', 'data': [f'text:query']} - - -def test_top_k_with_rest_api(query_dict): - port = helper.random_port() - with Flow(rest_api=True, port_expose=port).add(): - # temporarily adding sleep - time.sleep(0.5) - query = json.dumps(query_dict).encode('utf-8') - req = request.Request( - f'http://0.0.0.0:{port}/search', - data=query, - headers={'content-type': 'application/json'}, - ) - resp = request.urlopen(req).read().decode('utf8') - assert json.loads(resp)['queryset'][0]['name'] == 'SliceQL' - assert json.loads(resp)['queryset'][0]['parameters']['end'] == TOP_K - assert json.loads(resp)['queryset'][0]['priority'] == 1 diff --git a/tests/integration/issues/github_976/yaml/index.yml b/tests/integration/issues/github_976/yaml/index.yml deleted file mode 100644 index 1afad7a362bec..0000000000000 --- a/tests/integration/issues/github_976/yaml/index.yml +++ /dev/null @@ -1,37 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: cosine - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: chunk_indexer - workspace: $JINA_TOPK_DIR -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVIndexDriver - with: - executor: docidx - traversal_paths: ['r'] - [SearchRequest]: - - !VectorSearchDriver - with: - executor: vecidx - top_k: $JINA_TOPK - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: docidx - traversal_paths: ['m'] diff --git a/tests/integration/level_depth/__init__.py b/tests/integration/level_depth/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/level_depth/flow-index.yml b/tests/integration/level_depth/flow-index.yml deleted file mode 100644 index 8e613862be9df..0000000000000 --- a/tests/integration/level_depth/flow-index.yml +++ /dev/null @@ -1,21 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: segmenter - uses: yaml/segmenter.yml - read_only: true - - name: encoder - uses: yaml/encoder.yml - timeout_ready: 60000 - read_only: true - - name: chunk_indexer - uses: yaml/index-chunk.yml - - name: doc_indexer - uses: yaml/index-doc.yml - needs: gateway - - name: join_all - uses: _pass - needs: [doc_indexer, chunk_indexer] - read_only: true diff --git a/tests/integration/level_depth/flow-query.yml b/tests/integration/level_depth/flow-query.yml deleted file mode 100644 index dc2a1dba608b0..0000000000000 --- a/tests/integration/level_depth/flow-query.yml +++ /dev/null @@ -1,17 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: encoder - uses: yaml/encoder.yml - read_only: true - - name: chunk_indexer - uses: yaml/index-chunk.yml - read_only: true - - name: ranker - uses: yaml/rank.yml - read_only: true - - name: doc_indexer - uses: yaml/index-doc.yml - read_only: true diff --git a/tests/integration/level_depth/test_search_different_depths.py b/tests/integration/level_depth/test_search_different_depths.py deleted file mode 100644 index 868267724473a..0000000000000 --- a/tests/integration/level_depth/test_search_different_depths.py +++ /dev/null @@ -1,62 +0,0 @@ -import pytest - -from jina.flow import Flow - -from tests import validate_callback - - -# TODO(Deepankar): Gets stuck when `restful: True` - issues with `needs='gateway'` -@pytest.mark.parametrize('restful', [False]) -def test_index_depth_0_search_depth_1(tmpdir, mocker, monkeypatch, restful): - monkeypatch.setenv("RESTFUL", restful) - monkeypatch.setenv("JINA_TEST_LEVEL_DEPTH_WORKSPACE", str(tmpdir)) - - index_data = [ - 'I am chunk 0 of doc 1, I am chunk 1 of doc 1, I am chunk 2 of doc 1', - 'I am chunk 0 of doc 2, I am chunk 1 of doc 2', - 'I am chunk 0 of doc 3, I am chunk 1 of doc 3, I am chunk 2 of doc 3, I am chunk 3 of doc 3', - ] - - index_flow = Flow.load_config('flow-index.yml') - with index_flow: - index_flow.index(index_data) - - def validate_granularity_1(resp): - assert len(resp.docs) == 3 - for doc in resp.docs: - assert doc.granularity == 0 - assert len(doc.matches) == 3 - assert doc.matches[0].granularity == 0 - - assert resp.docs[0].text == ' I am chunk 1 of doc 1,' - assert ( - resp.docs[0].matches[0].text - == 'I am chunk 0 of doc 1, I am chunk 1 of doc 1, I am chunk 2 of doc 1' - ) - - assert resp.docs[1].text == 'I am chunk 0 of doc 2,' - assert ( - resp.docs[1].matches[0].text - == 'I am chunk 0 of doc 2, I am chunk 1 of doc 2' - ) - - assert resp.docs[2].text == ' I am chunk 3 of doc 3' - assert ( - resp.docs[2].matches[0].text - == 'I am chunk 0 of doc 3, I am chunk 1 of doc 3, I am chunk 2 of doc 3, I am chunk 3 of doc 3' - ) - - search_data = [ - ' I am chunk 1 of doc 1,', - 'I am chunk 0 of doc 2,', - ' I am chunk 3 of doc 3', - ] - - mock = mocker.Mock() - with Flow.load_config('flow-query.yml') as search_flow: - search_flow.search( - inputs=search_data, on_done=mock, on_error=lambda r: print(r) - ) - - mock.assert_called_once() - validate_callback(mock, validate_granularity_1) diff --git a/tests/integration/level_depth/yaml/custom.py b/tests/integration/level_depth/yaml/custom.py deleted file mode 100644 index d839798cab4a9..0000000000000 --- a/tests/integration/level_depth/yaml/custom.py +++ /dev/null @@ -1,44 +0,0 @@ -import re -import string -from typing import Dict, List - -from jina.executors.rankers import Chunk2DocRanker -from jina.executors.decorators import single -from jina.executors.segmenters import BaseSegmenter - - -class DummySentencizer(BaseSegmenter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - punct_chars = [','] - self._slit_pat = re.compile( - '\s*([^{0}]+)(? List[Dict]: - """ - Split the text into sentences. - - :param text: the raw text - :return: a list of chunk dicts with the cropped images - """ - results = [] - ret = [ - (m.group(0), m.start(), m.end()) for m in re.finditer(self._slit_pat, text) - ] - if not ret: - ret = [(text, 0, len(text))] - for ci, (r, s, e) in enumerate(ret): - f = ''.join(filter(lambda x: x in string.printable, r)) - f = re.sub('\n+', ' ', f).strip() - f = f[:100] - if len(f) > 3: - results.append(dict(text=f)) - return results - - -class MockMinRanker(Chunk2DocRanker): - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return 1.0 / (1.0 + match_idx[self.COL_SCORE].min()) diff --git a/tests/integration/level_depth/yaml/encoder.yml b/tests/integration/level_depth/yaml/encoder.yml deleted file mode 100644 index 4795eb9db3d74..0000000000000 --- a/tests/integration/level_depth/yaml/encoder.yml +++ /dev/null @@ -1,13 +0,0 @@ -!DummyOneHotTextEncoder -metas: - name: encoder -requests: - on: - IndexRequest: - - !EncodeDriver - with: - traversal_paths: ['c'] - SearchRequest: - - !EncodeDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/level_depth/yaml/index-chunk.yml b/tests/integration/level_depth/yaml/index-chunk.yml deleted file mode 100644 index dfca4bd9c1d71..0000000000000 --- a/tests/integration/level_depth/yaml/index-chunk.yml +++ /dev/null @@ -1,39 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: cosine - metas: - name: vecidx - workspace: $JINA_TEST_LEVEL_DEPTH_WORKSPACE - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx - workspace: $JINA_TEST_LEVEL_DEPTH_WORKSPACE -metas: - name: chunk_indexer - workspace: $JINA_TEST_LEVEL_DEPTH_WORKSPACE -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['c'] - - !KVIndexDriver - with: - executor: chunkidx - traversal_paths: ['c'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: chunkidx - is_update: true - traversal_paths: ['m'] diff --git a/tests/integration/level_depth/yaml/index-doc.yml b/tests/integration/level_depth/yaml/index-doc.yml deleted file mode 100644 index 18fe7d3f13ce7..0000000000000 --- a/tests/integration/level_depth/yaml/index-doc.yml +++ /dev/null @@ -1,19 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gzip -metas: - name: docIndexer - workspace: $JINA_TEST_LEVEL_DEPTH_WORKSPACE -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - executor: docIndexer - traversal_paths: ['r'] - SearchRequest: - - !KVSearchDriver - with: - executor: docIndexer - is_update: true - traversal_paths: ['m'] diff --git a/tests/integration/level_depth/yaml/rank.yml b/tests/integration/level_depth/yaml/rank.yml deleted file mode 100644 index b79b4cab06941..0000000000000 --- a/tests/integration/level_depth/yaml/rank.yml +++ /dev/null @@ -1,10 +0,0 @@ -!MockMinRanker -metas: - py_modules: - - custom.py -requests: - on: - SearchRequest: - - !AggregateMatches2DocRankDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/level_depth/yaml/segmenter.yml b/tests/integration/level_depth/yaml/segmenter.yml deleted file mode 100644 index d79278ef6edb7..0000000000000 --- a/tests/integration/level_depth/yaml/segmenter.yml +++ /dev/null @@ -1,13 +0,0 @@ -!DummySentencizer -with: - min_sent_len: 3 - max_sent_len: 128 - punct_chars: '.,;!?:' -metas: - name: sentencizer - py_modules: - - custom.py -requests: - on: - [SearchRequest, IndexRequest]: - - !SegmentDriver {} diff --git a/tests/integration/memmap/__init__.py b/tests/integration/memmap/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/memmap/test_mmap.py b/tests/integration/memmap/test_mmap.py deleted file mode 100644 index 29924595d092c..0000000000000 --- a/tests/integration/memmap/test_mmap.py +++ /dev/null @@ -1,57 +0,0 @@ -import json -import os - -import numpy as np -import pytest - -from jina.executors.indexers.vector import NumpyIndexer -from jina.helper import get_readable_size -from jina.logging.profile import used_memory_readable, TimeContext, used_memory - -num_data = 10000 -num_dim = 10000 -queries = np.random.random([100, num_dim]) -vec_idx = np.array( - np.random.randint(0, high=num_data, size=[num_data]), dtype=(np.str_, 16) -) -vec = np.random.random([num_data, num_dim]) -filename = 'a.gz' -summary_file = 'summary.json' - - -@pytest.fixture -def test_standard(test_metas): - test_metas['name'] = 'a' - with NumpyIndexer( - index_filename=filename, compress_level=0, metas=test_metas - ) as ni: - ni.batch_size = 512 - ni.add(vec_idx, vec) - - -@pytest.mark.run(order=4) -@pytest.mark.timeout(360) -def test_standard_query(tmpdir, test_standard): - mem1 = used_memory(1) - print(used_memory_readable()) - with NumpyIndexer.load(os.path.join(tmpdir, 'a-0/a.bin')) as ni: - ni.batch_size = 256 - print(used_memory_readable()) - print(ni._raw_ndarray.shape) - print(used_memory_readable()) - with TimeContext('query topk') as ti: - result = ni.query(queries, top_k=10) - mem2 = used_memory(1) - print(used_memory_readable()) - print(result[0].shape) - with open(summary_file, 'a') as fp: - json.dump( - { - 'name': 'naive', - 'memory': mem2 - mem1, - 'readable': get_readable_size(mem2 - mem1), - 'time': ti.duration, - }, - fp, - ) - fp.write('\n') diff --git a/tests/integration/merge_pass/__init__.py b/tests/integration/merge_pass/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/merge_pass/test_merge_pass_messages.py b/tests/integration/merge_pass/test_merge_pass_messages.py deleted file mode 100644 index 4a59c55782ca5..0000000000000 --- a/tests/integration/merge_pass/test_merge_pass_messages.py +++ /dev/null @@ -1,99 +0,0 @@ -import time - -from jina import Flow -from jina import Document -from jina.executors.crafters import BaseCrafter -from jina.executors.decorators import single - -from tests import validate_callback - - -class SlowCrafter(BaseCrafter): - @single - def craft(self, text, *args, **kwargs): - time.sleep(2) - return {'text': text, 'tags': {'id': 'slow'}} - - -def test_flow_pass(mocker): - def validate(resp): - assert len(resp.index.docs) == 1 - # only the second part of the message is passed by _pass - assert resp.index.docs[0].tags['id'] == 'slow' - - f = ( - Flow() - .add(name='pod0', uses='_pass') - .add(name='pod1', uses='!SlowCrafter', needs=['gateway']) - .add(name='pod2', uses='_pass', needs=['pod0', 'pod1']) - ) - doc = Document() - doc.text = 'text' - mock = mocker.Mock() - with f: - f.index([doc], on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) - - -def test_flow_merge(mocker): - def validate(resp): - assert len(resp.index.docs) == 2 - assert resp.index.docs[0].id == resp.index.docs[1].id - - f = ( - Flow() - .add(name='pod0', uses='_pass') - .add(name='pod1', uses='_pass', needs=['gateway']) - .add(name='pod2', uses='_merge', needs=['pod0', 'pod1']) - ) - doc = Document() - doc.text = 'text' - mock = mocker.Mock() - with f: - f.index([doc], on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) - - -def test_flow_merge_root(mocker): - def validate(resp): - assert len(resp.index.docs) == 2 - assert resp.index.docs[0].id == resp.index.docs[1].id - - f = ( - Flow() - .add(name='pod0', uses='_pass') - .add(name='pod1', uses='_pass', needs=['gateway']) - .add(name='pod2', uses='_merge_root', needs=['pod0', 'pod1']) - ) - doc = Document() - doc.text = 'text' - mock = mocker.Mock() - with f: - f.index([doc], on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) - - -def test_flow_merge_chunks(mocker): - def validate(resp): - assert len(resp.index.docs) == 1 - - f = ( - Flow() - .add(name='pod0', uses='_pass') - .add(name='pod1', uses='_pass', needs=['gateway']) - .add(name='pod2', uses='_merge_chunks', needs=['pod0', 'pod1']) - ) - doc = Document() - doc.text = 'text' - mock = mocker.Mock() - with f: - f.index([doc], on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) diff --git a/tests/integration/mime/__init__.py b/tests/integration/mime/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/mime/dummy-seg-not-random.yml b/tests/integration/mime/dummy-seg-not-random.yml deleted file mode 100644 index 019d0256dc49b..0000000000000 --- a/tests/integration/mime/dummy-seg-not-random.yml +++ /dev/null @@ -1,9 +0,0 @@ -!DummySegment -requests: - on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest]: - - !SegmentDriver - with: - method: segment diff --git a/tests/integration/mime/dummy-seg-random.yml b/tests/integration/mime/dummy-seg-random.yml deleted file mode 100644 index 019d0256dc49b..0000000000000 --- a/tests/integration/mime/dummy-seg-random.yml +++ /dev/null @@ -1,9 +0,0 @@ -!DummySegment -requests: - on: - ControlRequest: - - !ControlReqDriver {} - [SearchRequest, TrainRequest, IndexRequest]: - - !SegmentDriver - with: - method: segment diff --git a/tests/integration/mime/test_mime.py b/tests/integration/mime/test_mime.py deleted file mode 100644 index 1b4d4046f28db..0000000000000 --- a/tests/integration/mime/test_mime.py +++ /dev/null @@ -1,102 +0,0 @@ -import glob -import os - -import pytest - -from jina.flow import Flow - -num_docs = 100 - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def input_function(pattern='../../**/*.png'): - idx = 0 - for g in glob.glob(pattern, recursive=True)[:num_docs]: - with open(g, 'rb') as fp: - yield fp.read() - idx += 1 - - -def input_function2(pattern=os.path.join(cur_dir, '../*.*')): - for g in glob.glob(pattern, recursive=True)[:num_docs]: - yield g - - -def input_function3(): - for g in [ - os.path.abspath(__file__), # local file - 'https://github.com/jina-ai/jina/raw/master/.github/1500%D1%85667.gif?raw=true', - '', - 'https://cdn.bulbagarden.net/upload/thumb/2/21/001Bulbasaur.png/240px-001Bulbasaur.png', - ]: - yield g - - -@pytest.mark.parametrize('restful', [False, True]) -def test_dummy_seg(mocker, restful): - response_mock = mocker.Mock() - f = Flow(restful=restful).add(uses='- !Buffer2URI | {mimetype: png}') - with f: - f.index(inputs=input_function, on_done=response_mock) - - response_mock.assert_called() - response_mock_2 = mocker.Mock() - f = Flow(restful=restful).add(uses='- !Buffer2URI | {mimetype: png, base64: true}') - with f: - f.index(inputs=input_function, on_done=response_mock_2) - response_mock_2.assert_called() - - -@pytest.mark.parametrize('restful', [False, True]) -def test_any_file(mocker, restful): - response_mock = mocker.Mock() - f = Flow(restful=restful).add(uses='- !URI2DataURI | {base64: true}') - with f: - f.index(inputs=input_function2, on_done=response_mock) - response_mock.assert_called() - - -@pytest.mark.parametrize('restful', [False, True]) -def test_aba(mocker, restful): - response_mock = mocker.Mock() - f = ( - Flow(restful=restful) - .add(uses='- !Buffer2URI | {mimetype: png}') - .add(uses='- !URI2Buffer {}') - .add(uses='- !Buffer2URI | {mimetype: png}') - ) - - with f: - f.index(inputs=input_function, on_done=response_mock) - response_mock.assert_called() - - -@pytest.mark.parametrize('restful', [False, True]) -def test_pathURI2Buffer(mocker, restful): - response_mock = mocker.Mock() - f = Flow(restful=restful).add(uses='- !URI2Buffer {}').add(uses='- !Buffer2URI {}') - - with f: - f.index(inputs=input_function3, on_done=response_mock) - response_mock.assert_called() - - -@pytest.mark.parametrize('restful', [False, True]) -def test_text2datauri(mocker, restful): - response_mock = mocker.Mock() - f = Flow(restful=restful).add(uses='- !Text2URI {}') - - with f: - f.index(['abc', '123', 'hello, world'], on_done=response_mock) - response_mock.assert_called() - - -@pytest.mark.parametrize('restful', [False, True]) -def test_gateway_dataui(mocker, restful): - response_mock = mocker.Mock() - f = Flow(restful=restful).add() - - with f: - f.index(['abc', '123', 'hello, world'], on_done=response_mock) - response_mock.assert_called() diff --git a/tests/integration/mime/test_segmenter.py b/tests/integration/mime/test_segmenter.py deleted file mode 100644 index 41302ed9a2c38..0000000000000 --- a/tests/integration/mime/test_segmenter.py +++ /dev/null @@ -1,54 +0,0 @@ -import os - -import pytest - -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import single -from jina.flow import Flow -from tests import random_docs, validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -class DummySegment(BaseSegmenter): - @single - def segment(self, id, *args, **kwargs): - return [dict(buffer=b'aa'), dict(buffer=b'bb')] - - -def validate(req): - chunk_ids = [c.id for d in req.index.docs for c in d.chunks] - assert len(chunk_ids) == len(set(chunk_ids)) - assert len(chunk_ids) == 20 - - -@pytest.mark.parametrize('restful', [False, True]) -def test_dummy_seg(mocker, restful): - mock = mocker.Mock() - f = Flow(restful=restful).add(uses='DummySegment') - with f: - f.index(inputs=random_docs(10, chunks_per_doc=0), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_dummy_seg_random(mocker, restful): - mock = mocker.Mock() - f = Flow(restful=restful).add(uses=os.path.join(cur_dir, 'dummy-seg-random.yml')) - with f: - f.index(inputs=random_docs(10, chunks_per_doc=0), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_dummy_seg_not_random(mocker, restful): - mock = mocker.Mock() - f = Flow(restful=restful).add( - uses=os.path.join(cur_dir, 'dummy-seg-not-random.yml') - ) - with f: - f.index(inputs=random_docs(10, chunks_per_doc=0), on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate) diff --git a/tests/integration/multimodal/__init__.py b/tests/integration/multimodal/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/multimodal/flow-embedding-multimodal-parallel.yml b/tests/integration/multimodal/flow-embedding-multimodal-parallel.yml deleted file mode 100644 index 262198f33976f..0000000000000 --- a/tests/integration/multimodal/flow-embedding-multimodal-parallel.yml +++ /dev/null @@ -1,19 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: encoder_modality_1 - uses_before: '- !FilterQL | {lookups: {modality__in: [modality1]}, traversal_paths: [c]}' - uses: yaml/encoder-modality1.yml - read_only: true - - name: encoder_modality_2 - uses_before: '- !FilterQL | {lookups: {modality__in: [modality2]}, traversal_paths: [c]}' - uses: yaml/encoder-modality2.yml - read_only: true - needs: gateway - - name: multimodal_encoder - uses_before: _merge_chunks - uses: yaml/embedding-multimodal-encoder.yml - read_only: true - needs: [encoder_modality_1, encoder_modality_2] diff --git a/tests/integration/multimodal/flow-multimodal-all-types-parallel.yml b/tests/integration/multimodal/flow-multimodal-all-types-parallel.yml deleted file mode 100644 index 12ceba912c7ba..0000000000000 --- a/tests/integration/multimodal/flow-multimodal-all-types-parallel.yml +++ /dev/null @@ -1,25 +0,0 @@ -!Flow -version: '1' -with: - restful: $RESTFUL -pods: - - name: pass_modality_1 - uses: '- !FilterQL | {lookups: {modality__in: [modality1]}, traversal_paths: [c]}' - read_only: true - - name: pass_modality_2 - uses: '- !FilterQL | {lookups: {modality__in: [modality2]}, traversal_paths: [c]}' - read_only: true - needs: gateway - - name: pass_modality_3 - uses: '- !FilterQL | {lookups: {modality__in: [modality3]}, traversal_paths: [c]}' - read_only: true - needs: gateway - - name: pass_modality_4 - uses: '- !FilterQL | {lookups: {modality__in: [modality4]}, traversal_paths: [c]}' - read_only: true - needs: gateway - - name: multimodal_encoder - uses_before: _merge_chunks - uses: yaml/all-types-multimodal-encoder.yml - read_only: true - needs: [pass_modality_1, pass_modality_2, pass_modality_3, pass_modality_4] diff --git a/tests/integration/multimodal/test_multimodal_parallel.py b/tests/integration/multimodal/test_multimodal_parallel.py deleted file mode 100644 index ddba9840ff3f7..0000000000000 --- a/tests/integration/multimodal/test_multimodal_parallel.py +++ /dev/null @@ -1,118 +0,0 @@ -import os - -import numpy as np -import pytest - -from jina.flow import Flow -from jina.proto import jina_pb2 -from jina.types.ndarray.generic import NdArray - -from tests import validate_callback - -NUM_DOCS = 100 -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture -def multimodal_documents(): - docs = [] - for idx in range(0, NUM_DOCS): - """ - doc - idx - | - | - chunk - blob [idx, idx] - modality1 -> The dummy encoder will pass the blob to embedding - | - chunk - blob [idx, idx, idx] - modality2 -> The dummy encoder will pass the blob to embedding - Result: - doc - idx - embedding [idx, idx, idx, idx, idx] - """ - doc = jina_pb2.DocumentProto() - doc.text = f'{idx}' - - for modality in ['modality1', 'modality2']: - chunk = doc.chunks.add() - chunk.modality = modality - if modality == 'modality1': - NdArray(chunk.blob).value = np.array([idx, idx]) - else: - NdArray(chunk.blob).value = np.array([idx, idx, idx]) - docs.append(doc) - return docs - - -# TODO(Deepankar): Gets stuck when `restful: True` - issues with `needs='gateway'` -@pytest.mark.parametrize('restful', [False]) -def test_multimodal_embedding_parallel( - multimodal_documents, mocker, monkeypatch, restful -): - monkeypatch.setenv("RESTFUL", restful) - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for idx, doc in enumerate(resp.index.docs): - np.testing.assert_almost_equal( - NdArray(doc.embedding).value, np.array([idx, idx, idx, idx, idx]) - ) - - mock = mocker.Mock() - with Flow.load_config( - os.path.join(cur_dir, 'flow-embedding-multimodal-parallel.yml') - ) as index_gt_flow: - index_gt_flow.index(inputs=multimodal_documents, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_response) - - -@pytest.fixture -def multimodal_all_types_documents(): - docs = [] - for idx in range(0, NUM_DOCS): - """ - doc - idx - | - | - chunk - embedding [idx, idx] - modality1 - | - chunk - blob [idx, idx, idx] - modality2 - | - chunk - text 'modality3' - modality3 -> Inside multimodal encoder will be encoded into [3, 3] - | - chunk - buffer b'modality4' - modality4 -> Inside multimodal encoder will be encoded into [4, 4] - Result: - doc - idx - embedding [idx, idx, idx, idx, idx, 3, 3, 4, 4] - """ - doc = jina_pb2.DocumentProto() - doc.text = f'{idx}' - - for modality in ['modality1', 'modality2', 'modality3', 'modality4']: - chunk = doc.chunks.add() - chunk.modality = modality - if modality == 'modality1': - NdArray(chunk.embedding).value = np.array([idx, idx]) - elif modality == 'modality2': - NdArray(chunk.blob).value = np.array([idx, idx, idx]) - elif modality == 'modality3': - chunk.text = 'modality3' - elif modality == 'modality4': - chunk.buffer = 'modality4'.encode() - docs.append(doc) - return docs - - -# TODO(Deepankar): Gets stuck when `restful: True` - issues with `needs='gateway'` -@pytest.mark.parametrize('restful', [False]) -def test_multimodal_all_types_parallel( - multimodal_all_types_documents, mocker, monkeypatch, restful -): - monkeypatch.setenv("RESTFUL", restful) - - def validate_response(resp): - assert len(resp.index.docs) == NUM_DOCS - for idx, doc in enumerate(resp.index.docs): - np.testing.assert_almost_equal( - NdArray(doc.embedding).value, - np.array([idx, idx, idx, idx, idx, 3, 3, 4, 4]), - ) - - mock = mocker.Mock() - with Flow.load_config( - os.path.join(cur_dir, 'flow-multimodal-all-types-parallel.yml') - ) as index_gt_flow: - index_gt_flow.index(inputs=multimodal_all_types_documents, on_done=mock) - mock.assert_called_once() - validate_callback(mock, validate_response) diff --git a/tests/integration/multimodal/yaml/all-types-multimodal-encoder.py b/tests/integration/multimodal/yaml/all-types-multimodal-encoder.py deleted file mode 100644 index 8cf07e552011b..0000000000000 --- a/tests/integration/multimodal/yaml/all-types-multimodal-encoder.py +++ /dev/null @@ -1,46 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.decorators import batching, as_ndarray -from jina.executors.encoders.multimodal import BaseMultiModalEncoder - - -class AllTypesConcatenateMultiModalEncoder(BaseMultiModalEncoder): - batch_size = 10 - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(slice_nargs=4) - @as_ndarray - def encode(self, *data: 'np.ndarray', **kwargs): - assert len(data) == 4 - for d in data: - assert self.batch_size == AllTypesConcatenateMultiModalEncoder.batch_size - assert len(d) == self.batch_size - # modality1 is blob, modality2 is embedding, modality3 is text, modality4 is buffer - modality1 = data[0] - modality2 = data[1] - modality3 = data[2] - modality4 = data[3] - assert len(modality1) == len(modality2) - assert len(modality2) == len(modality4) - assert len(modality3) == len(modality4) - assert isinstance(modality3[0], str) - assert isinstance(modality4[0], bytes) - - embed_modality3 = [] - for _ in modality3: - embed_modality3.append([3, 3]) - embed_modality3 = np.stack(embed_modality3) - - embed_modality4 = [] - for _ in modality4: - embed_modality4.append([4, 4]) - embed_modality4 = np.stack(embed_modality4) - - return np.concatenate( - (modality1, modality2, embed_modality3, embed_modality4), axis=1 - ) diff --git a/tests/integration/multimodal/yaml/all-types-multimodal-encoder.yml b/tests/integration/multimodal/yaml/all-types-multimodal-encoder.yml deleted file mode 100644 index a2a99f73349dc..0000000000000 --- a/tests/integration/multimodal/yaml/all-types-multimodal-encoder.yml +++ /dev/null @@ -1,15 +0,0 @@ -!AllTypesConcatenateMultiModalEncoder -with: - positional_modality: - - modality1 - - modality2 - - modality3 - - modality4 -metas: - py_modules: all-types-multimodal-encoder.py -requests: - on: - [IndexRequest, SearchRequest]: - - !MultiModalDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/multimodal/yaml/dummy-encoder.py b/tests/integration/multimodal/yaml/dummy-encoder.py deleted file mode 100644 index 08089989a4522..0000000000000 --- a/tests/integration/multimodal/yaml/dummy-encoder.py +++ /dev/null @@ -1,14 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.encoders import BaseEncoder - - -class DummyEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - return content diff --git a/tests/integration/multimodal/yaml/embedding-multimodal-concatenate.py b/tests/integration/multimodal/yaml/embedding-multimodal-concatenate.py deleted file mode 100644 index dc98549c2ca21..0000000000000 --- a/tests/integration/multimodal/yaml/embedding-multimodal-concatenate.py +++ /dev/null @@ -1,27 +0,0 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - -import numpy as np - -from jina.executors.decorators import batching, as_ndarray -from jina.executors.encoders.multimodal import BaseMultiModalEncoder - - -class ConcatenateMultiModalEncoder(BaseMultiModalEncoder): - batch_size = 10 - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(slice_nargs=2) - @as_ndarray - def encode(self, *data: 'np.ndarray', **kwargs): - assert len(data) == 2 - for d in data: - assert self.batch_size == ConcatenateMultiModalEncoder.batch_size - assert len(d) == self.batch_size - assert isinstance(d, np.ndarray) - modality1 = data[0] - modality2 = data[1] - assert len(modality1) == len(modality2) - return np.concatenate((modality1, modality2), axis=1) diff --git a/tests/integration/multimodal/yaml/embedding-multimodal-encoder.yml b/tests/integration/multimodal/yaml/embedding-multimodal-encoder.yml deleted file mode 100644 index ce676b60a2ff6..0000000000000 --- a/tests/integration/multimodal/yaml/embedding-multimodal-encoder.yml +++ /dev/null @@ -1,13 +0,0 @@ -!ConcatenateMultiModalEncoder -with: - positional_modality: - - modality1 - - modality2 -metas: - py_modules: embedding-multimodal-concatenate.py -requests: - on: - [IndexRequest, SearchRequest]: - - !MultiModalDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/multimodal/yaml/encoder-modality1.yml b/tests/integration/multimodal/yaml/encoder-modality1.yml deleted file mode 100644 index 86e76e0458af5..0000000000000 --- a/tests/integration/multimodal/yaml/encoder-modality1.yml +++ /dev/null @@ -1,11 +0,0 @@ -!DummyEncoder -with: - {} -metas: - py_modules: dummy-encoder.py -requests: - on: - [IndexRequest, SearchRequest]: - - !EncodeDriver - with: - traversal_paths: ['c'] diff --git a/tests/integration/multimodal/yaml/encoder-modality2.yml b/tests/integration/multimodal/yaml/encoder-modality2.yml deleted file mode 100644 index 86e76e0458af5..0000000000000 --- a/tests/integration/multimodal/yaml/encoder-modality2.yml +++ /dev/null @@ -1,11 +0,0 @@ -!DummyEncoder -with: - {} -metas: - py_modules: dummy-encoder.py -requests: - on: - [IndexRequest, SearchRequest]: - - !EncodeDriver - with: - traversal_paths: ['c'] diff --git a/tests/integration/optimizers/__init__.py b/tests/integration/optimizers/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/optimizers/data.jsonlines b/tests/integration/optimizers/data.jsonlines deleted file mode 100644 index a8d93f33820d7..0000000000000 --- a/tests/integration/optimizers/data.jsonlines +++ /dev/null @@ -1,10 +0,0 @@ -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} -{"document": {"mimeType": "text/plain", "text": "hello"}, "groundtruth": {"mimeType": "text/plain", "text": "hello"}} diff --git a/tests/integration/optimizers/flow.yml b/tests/integration/optimizers/flow.yml deleted file mode 100644 index 5d3a2adfeba9e..0000000000000 --- a/tests/integration/optimizers/flow.yml +++ /dev/null @@ -1,9 +0,0 @@ -!Flow -version: '1' -env: - JINA_DUMMYCRAFTER_PARAM1_VAR: ${{JINA_DUMMYCRAFTER_PARAM1}} - JINA_DUMMYCRAFTER_PARAM2_VAR: ${{JINA_DUMMYCRAFTER_PARAM2}} - JINA_DUMMYCRAFTER_PARAM3_VAR: ${{JINA_DUMMYCRAFTER_PARAM3}} -pods: - - uses: pods/craft.yml - - uses: pods/evaluate.yml diff --git a/tests/integration/optimizers/optimizer_conf.yml b/tests/integration/optimizers/optimizer_conf.yml deleted file mode 100644 index 9686e9d66779c..0000000000000 --- a/tests/integration/optimizers/optimizer_conf.yml +++ /dev/null @@ -1,15 +0,0 @@ - !FlowOptimizer - version: '1' - with: - flow_runner: !SingleFlowRunner - with: - flow_yaml: flow.yml - overwrite_workspace: True - documents: ${{JINA_OPTIMIZER_DATA_FILE}} - request_size: 1 - execution_method: 'search_lines' - documents_parameter_name: 'filepath' - evaluation_callback: !MeanEvaluationCallback {} - parameter_yaml: ${{JINA_OPTIMIZER_PARAMETER_FILE}} - workspace_base_dir: ${{JINA_OPTIMIZER_WORKSPACE_DIR}} - n_trials: 5 diff --git a/tests/integration/optimizers/parameter.yml b/tests/integration/optimizers/parameter.yml deleted file mode 100644 index c9713ea49c6d1..0000000000000 --- a/tests/integration/optimizers/parameter.yml +++ /dev/null @@ -1,18 +0,0 @@ -- !IntegerParameter - jaml_variable: JINA_DUMMYCRAFTER_PARAM1 - high: 1 - low: 0 - step_size: 1 - parameter_name: param1 -- !IntegerParameter - jaml_variable: JINA_DUMMYCRAFTER_PARAM2 - high: 1 - low: 1 - step_size: 1 - parameter_name: param2 -- !IntegerParameter - jaml_variable: JINA_DUMMYCRAFTER_PARAM3 - high: 2 - low: 1 - step_size: 1 - parameter_name: param3 diff --git a/tests/integration/optimizers/pods/craft.yml b/tests/integration/optimizers/pods/craft.yml deleted file mode 100644 index 2425838435756..0000000000000 --- a/tests/integration/optimizers/pods/craft.yml +++ /dev/null @@ -1,7 +0,0 @@ -!DummyCrafter -with: - param1: ${{JINA_DUMMYCRAFTER_PARAM1_VAR}} - param2: ${{JINA_DUMMYCRAFTER_PARAM2_VAR}} - param3: ${{JINA_DUMMYCRAFTER_PARAM3_VAR}} -metas: - py_modules: 'dummy_crafter.py' diff --git a/tests/integration/optimizers/pods/dummy_crafter.py b/tests/integration/optimizers/pods/dummy_crafter.py deleted file mode 100644 index 870184671d3b8..0000000000000 --- a/tests/integration/optimizers/pods/dummy_crafter.py +++ /dev/null @@ -1,57 +0,0 @@ -from typing import Dict - -from jina.executors.crafters import BaseCrafter -from jina.executors.decorators import single -from jina.optimizers.parameters import IntegerParameter - - -class DummyCrafter(BaseCrafter): - DEFAULT_OPTIMIZATION_PARAMETER = [ - IntegerParameter( - executor_name='DummyCrafter', - parameter_name='param1', - low=0, - high=1, - step_size=1, - ), - IntegerParameter( - executor_name='DummyCrafter', - parameter_name='param2', - low=0, - high=1, - step_size=1, - ), - IntegerParameter( - executor_name='DummyCrafter', - parameter_name='param3', - low=0, - high=2, - step_size=1, - ), - ] - - GOOD_PARAM_1 = 0 - GOOD_PARAM_2 = 1 - GOOD_PARAM_3 = 1 - - def __init__(self, param1: int, param2: int, param3: int, *args, **kwargs): - super().__init__(*args, **kwargs) - self.param1 = param1 - self.param2 = param2 - self.param3 = param3 - - @property - def good_params(self): - return ( - self.param1 == DummyCrafter.GOOD_PARAM_1 - and self.param2 == DummyCrafter.GOOD_PARAM_2 - and self.param3 == DummyCrafter.GOOD_PARAM_3 - ) - - @single - def craft(self, text, *args, **kwargs) -> Dict: - - if self.good_params: - return {'text': text} - else: - return {'text': ''} diff --git a/tests/integration/optimizers/pods/dummy_evaluate.py b/tests/integration/optimizers/pods/dummy_evaluate.py deleted file mode 100644 index 88773cdd9ebb5..0000000000000 --- a/tests/integration/optimizers/pods/dummy_evaluate.py +++ /dev/null @@ -1,16 +0,0 @@ -from jina.executors.evaluators.text import BaseTextEvaluator - - -class DummyTextEvaluator(BaseTextEvaluator): - @property - def metric(self) -> str: - return 'DummyTextEvaluator' - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def evaluate(self, actual: str, desired: str, *args, **kwargs) -> float: - if actual == desired: - return 1.0 - else: - return 0.0 diff --git a/tests/integration/optimizers/pods/evaluate.yml b/tests/integration/optimizers/pods/evaluate.yml deleted file mode 100644 index 709b34d310df1..0000000000000 --- a/tests/integration/optimizers/pods/evaluate.yml +++ /dev/null @@ -1,9 +0,0 @@ -!DummyTextEvaluator -metas: - py_modules: 'dummy_evaluate.py' -requests: - on: - SearchRequest: - - !TextEvaluateDriver - with: - traversal_paths: [ 'r' ] \ No newline at end of file diff --git a/tests/integration/optimizers/test_discovery.py b/tests/integration/optimizers/test_discovery.py deleted file mode 100644 index b98400e50331f..0000000000000 --- a/tests/integration/optimizers/test_discovery.py +++ /dev/null @@ -1,18 +0,0 @@ -import os -from distutils.dir_util import copy_tree -from shutil import copy2 - -from jina.optimizers.discovery import run_parameter_discovery - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def test_discovery(tmpdir): - copy2(os.path.join(cur_dir, 'flow.yml'), tmpdir) - pod_dir = os.path.join(tmpdir, 'pods') - copy_tree(os.path.join(cur_dir, 'pods'), pod_dir) - parameter_result_file = os.path.join(tmpdir, 'parameter.yml') - run_parameter_discovery( - [os.path.join(tmpdir, 'flow.yml')], parameter_result_file, True - ) - assert os.path.exists(parameter_result_file) diff --git a/tests/integration/optimizers/test_optimizer.py b/tests/integration/optimizers/test_optimizer.py deleted file mode 100644 index 500f6da51d6cb..0000000000000 --- a/tests/integration/optimizers/test_optimizer.py +++ /dev/null @@ -1,193 +0,0 @@ -import json -import os - -import pytest -import yaml -from google.protobuf.json_format import MessageToJson - -from jina import Document -from jina.jaml import JAML -from jina.optimizers import FlowOptimizer, MeanEvaluationCallback -from jina.optimizers import run_optimizer_cli -from jina.optimizers.flow_runner import SingleFlowRunner, MultiFlowRunner -from jina.parsers.optimizer import set_optimizer_parser - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - -BEST_PARAMETERS = { - 'JINA_DUMMYCRAFTER_PARAM1': 0, - 'JINA_DUMMYCRAFTER_PARAM2': 1, - 'JINA_DUMMYCRAFTER_PARAM3': 1, -} - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_OPTIMIZER_WORKSPACE_DIR'] = str(tmpdir) - os.environ['JINA_OPTIMIZER_PARAMETER_FILE'] = os.path.join(cur_dir, 'parameter.yml') - os.environ['JINA_OPTIMIZER_DATA_FILE'] = os.path.join(cur_dir, 'data.jsonlines') - yield - del os.environ['JINA_OPTIMIZER_WORKSPACE_DIR'] - del os.environ['JINA_OPTIMIZER_PARAMETER_FILE'] - del os.environ['JINA_OPTIMIZER_DATA_FILE'] - - -def validate_result(result, tmpdir): - result_path = os.path.join(tmpdir, 'out/best_parameters.yml') - result.save_parameters(result_path) - assert result.best_parameters == BEST_PARAMETERS - assert yaml.load(open(result_path)) == BEST_PARAMETERS - - -def document_generator(num_doc): - for _ in range(num_doc): - doc = Document(content='hello') - groundtruth_doc = Document(content='hello') - yield doc, groundtruth_doc - - -def test_optimizer_single_flow(tmpdir, config): - eval_flow_runner = SingleFlowRunner( - flow_yaml=os.path.join(cur_dir, 'flow.yml'), - documents=document_generator(10), - request_size=1, - execution_method='search', - ) - opt = FlowOptimizer( - flow_runner=eval_flow_runner, - parameter_yaml=os.path.join(cur_dir, 'parameter.yml'), - evaluation_callback=MeanEvaluationCallback(), - workspace_base_dir=str(tmpdir), - n_trials=5, - ) - result = opt.optimize_flow() - validate_result(result, tmpdir) - - -def test_optimizer_multi_flow(tmpdir, config): - multi_flow_runner = MultiFlowRunner( - [ - SingleFlowRunner( - flow_yaml=os.path.join(cur_dir, 'flow.yml'), - documents=document_generator(10), - request_size=1, - execution_method='index', - ), - SingleFlowRunner( - flow_yaml=os.path.join(cur_dir, 'flow.yml'), - documents=document_generator(10), - request_size=1, - execution_method='search', - ), - ] - ) - opt = FlowOptimizer( - flow_runner=multi_flow_runner, - parameter_yaml=os.path.join(cur_dir, 'parameter.yml'), - evaluation_callback=MeanEvaluationCallback(), - workspace_base_dir=str(tmpdir), - n_trials=5, - ) - result = opt.optimize_flow() - validate_result(result, tmpdir) - - -def test_yaml_multi_flow(tmpdir, config): - jsonlines_file = os.path.join(tmpdir, 'docs.jsonlines') - optimizer_yaml = f'''!FlowOptimizer -version: 1 -with: - flow_runner: !MultiFlowRunner - with: - flows: - - !SingleFlowRunner - with: - flow_yaml: '{os.path.join(cur_dir, 'flow.yml')}' - documents: {jsonlines_file} - request_size: 1 - execution_method: 'index_lines' - documents_parameter_name: 'filepath' - - !SingleFlowRunner - with: - flow_yaml: '{os.path.join(cur_dir, 'flow.yml')}' - documents: {jsonlines_file} - request_size: 1 - execution_method: 'search_lines' - documents_parameter_name: 'filepath' - evaluation_callback: !MeanEvaluationCallback {{}} - parameter_yaml: '{os.path.join(cur_dir, 'parameter.yml')}' - workspace_base_dir: {tmpdir} - n_trials: 5 -''' - documents = document_generator(10) - with open(jsonlines_file, 'w') as f: - for document, groundtruth_doc in documents: - document.id = "" - groundtruth_doc.id = "" - json.dump( - { - 'document': json.loads(MessageToJson(document).replace('\n', '')), - 'groundtruth': json.loads( - MessageToJson(groundtruth_doc).replace('\n', '') - ), - }, - f, - ) - f.write('\n') - - optimizer = JAML.load(optimizer_yaml) - result = optimizer.optimize_flow() - validate_result(result, tmpdir) - - -def test_yaml_single_flow(tmpdir, config): - jsonlines_file = os.path.join(tmpdir, 'docs.jsonlines') - optimizer_yaml = f'''!FlowOptimizer -version: 1 -with: - flow_runner: !SingleFlowRunner - with: - flow_yaml: '{os.path.join(cur_dir, 'flow.yml')}' - documents: {jsonlines_file} - request_size: 1 - execution_method: 'search_lines' - documents_parameter_name: 'filepath' - evaluation_callback: !MeanEvaluationCallback {{}} - parameter_yaml: '{os.path.join(cur_dir, 'parameter.yml')}' - workspace_base_dir: {tmpdir} - n_trials: 5 -''' - documents = document_generator(10) - with open(jsonlines_file, 'w') as f: - for document, groundtruth_doc in documents: - document.id = "" - groundtruth_doc.id = "" - json.dump( - { - 'document': json.loads(MessageToJson(document).replace('\n', '')), - 'groundtruth': json.loads( - MessageToJson(groundtruth_doc).replace('\n', '') - ), - }, - f, - ) - f.write('\n') - - optimizer = JAML.load(optimizer_yaml) - result = optimizer.optimize_flow() - validate_result(result, tmpdir) - - -@pytest.mark.parametrize('uses_output_dir', (True, False)) -def test_cli(tmpdir, config, uses_output_dir): - print(os.environ['JINA_OPTIMIZER_PARAMETER_FILE']) - args = ['--uses', os.path.join(cur_dir, 'optimizer_conf.yml')] - output_dir = os.path.join(tmpdir, 'out') - if uses_output_dir: - args.extend(['--output-dir', output_dir]) - run_optimizer_cli(set_optimizer_parser().parse_args(args)) - if uses_output_dir: - assert ( - yaml.load(open(os.path.join(output_dir, 'best_parameters.yml'))) - == BEST_PARAMETERS - ) diff --git a/tests/integration/ranker_training/__init__.py b/tests/integration/ranker_training/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/ranker_training/offline/__init__.py b/tests/integration/ranker_training/offline/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/ranker_training/offline/flow_offline_search.yml b/tests/integration/ranker_training/offline/flow_offline_search.yml deleted file mode 100644 index 4a27eac266199..0000000000000 --- a/tests/integration/ranker_training/offline/flow_offline_search.yml +++ /dev/null @@ -1,5 +0,0 @@ -jtype: Flow -version: '1' -pods: - - name: search - uses: yaml/search.yml diff --git a/tests/integration/ranker_training/offline/flow_offline_train.yml b/tests/integration/ranker_training/offline/flow_offline_train.yml deleted file mode 100644 index 015cdcb17c9a3..0000000000000 --- a/tests/integration/ranker_training/offline/flow_offline_train.yml +++ /dev/null @@ -1,5 +0,0 @@ -jtype: Flow -version: '1' -pods: - - name: trainer - uses: yaml/train.yml diff --git a/tests/integration/ranker_training/offline/test_offline_training.py b/tests/integration/ranker_training/offline/test_offline_training.py deleted file mode 100644 index 6bf5a64b78a49..0000000000000 --- a/tests/integration/ranker_training/offline/test_offline_training.py +++ /dev/null @@ -1,96 +0,0 @@ -import os -import pickle - -import pytest - -from jina import Document, Flow -from jina.types.sets import DocumentSet - -''' -User -> Train request -> RankTrainer Train -> RankTrainer Dump Weights/Parameters/Model -> -Ranker Load Model -> Re-rank -''' - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture -def documents_to_train(): - """The relevance is correlated to the size feature.""" - queries = [] - for q in range(1, 100): - query = Document() - for i in range(1, 50): - match = Document() - # large size higher relevance - match.tags['price'] = 1 - match.tags['size'] = i * 2 - match.tags['relevance'] = i - query.matches.add(match) - queries.append(query) - return DocumentSet(queries) - - -@pytest.fixture -def doc_to_query(): - doc = Document() - for i in range(1, 5): - match = Document() - match.tags['price'] = 1 - match.tags['size'] = i * 2 - doc.matches.add(match) - return doc - - -def test_train_offline(documents_to_train, doc_to_query): - """ - The objective of this test is to ensure ranker trainer works as expected. - Our data set consist of 2 features field, `price` and `size`. Label field is named as `relevance`. - Before using ranker trainer, we manually train a linear model based on `price` field, use a - Jina search flow to find documents and scores with the `doc_to_query`. Since the `price` of the `doc_to_query` - has been set to 1, so the pre-trained model will always return the same value and all the scores will be the same. - so we assert the length of prediction is 1 in `validate_ranking_by_price`. - Afterwords, we fire a ranker trainer, it will dump a new model. The trainiang set of the new model is based on `size` - feature, see `docs_to_train`, and the `price` is not going to have any impact on the predictions. When we search the result - with `doc_to_query`, we expect the relevance score keep increase since the `size` in `doc_to_query` keeps increase. - see `validate_ranking_by_size`. - """ - - def validate_ranking_by_price(req): - pred = set() - for match in req.docs[0].matches: - pred.add(match.score.value) - assert len(pred) == 1 # since price tag never changes, all scores are the same. - - def validate_ranking_by_size(req): - pred = [] - for match in req.docs[0].matches: - pred.append(match.score.value) - assert ( - sorted(pred, reverse=True) == pred - ) # assure predictions are ordered since size increases - - # Before Ranker Trainer, the feature is completely rely on `price` tag, `size` can be seen as a bias. - - from sklearn.linear_model import LinearRegression - - model = LinearRegression() - - X = [[1, 1], [2, 1], [4, 1], [8, 1], [16, 1]] - y = [1, 2, 3, 4, 5] - model.fit(X, y) - with open('model.pickle', mode='wb') as model_file_name: - pickle.dump(model, model_file_name) - - with Flow.load_config(os.path.join(cur_dir, 'flow_offline_search.yml')) as f: - f.search(inputs=[doc_to_query], on_done=validate_ranking_by_price) - - # Run Ranker Trainer - - with Flow.load_config(os.path.join(cur_dir, 'flow_offline_train.yml')) as f: - f.train(inputs=documents_to_train) - - # After Ranker Trainer, the feature should be completely rely on `size` tag. - - with Flow.load_config(os.path.join(cur_dir, 'flow_offline_search.yml')) as f: - f.search(inputs=[doc_to_query], on_done=validate_ranking_by_size) diff --git a/tests/integration/ranker_training/offline/yaml/ranker.py b/tests/integration/ranker_training/offline/yaml/ranker.py deleted file mode 100644 index 2852b32111fb1..0000000000000 --- a/tests/integration/ranker_training/offline/yaml/ranker.py +++ /dev/null @@ -1,48 +0,0 @@ -import pickle -from typing import List, Dict - -import numpy as np - -from jina.executors.decorators import single -from jina.executors.rankers import Match2DocRanker - - -class SGDRegressorRanker(Match2DocRanker): - """The :class:`SGDRegressorRanker` loads an :class:`SGDRegressor` and make use of the trained model - to predict relevance score.""" - - MODEL_FILENAME = 'model.pickle' - - def __init__(self, model_path: str, *args, **kwargs): - super().__init__(*args, **kwargs) - self.model = None - self.model_path = model_path - self.match_required_keys = {'tags__size', 'tags__price'} - self.query_required_keys = None - - def post_init(self): - super().post_init() - with open(self.model_path + '/' + self.MODEL_FILENAME, 'rb') as pickle_file: - self.model = pickle.load(pickle_file) - - @single - def score( - self, - old_match_scores: List[List[float]], - queries_metas: List[Dict], - matches_metas: List[List[Dict]], - ) -> 'np.ndarray': - """ - Scoring function of the ranker. - - This method extract features, i.e. `size` and `price` from `Document` tags, - make prediction based on the loaded model and return the score. - """ - sizes = [] - prices = [] - for match_meta in matches_metas: - for m in match_meta: - sizes.append(m['tags__size']) - prices.append(m['tags__price']) - X = np.column_stack((prices, sizes)) - return self.model.predict(X) diff --git a/tests/integration/ranker_training/offline/yaml/search.yml b/tests/integration/ranker_training/offline/yaml/search.yml deleted file mode 100644 index 9db1012fdaf42..0000000000000 --- a/tests/integration/ranker_training/offline/yaml/search.yml +++ /dev/null @@ -1,11 +0,0 @@ -!SGDRegressorRanker -with: - model_path: . -metas: - py_modules: ranker.py -requests: - on: - SearchRequest: - - jtype: Matches2DocRankDriver - with: - traversal_paths: ['r'] diff --git a/tests/integration/ranker_training/offline/yaml/train.yml b/tests/integration/ranker_training/offline/yaml/train.yml deleted file mode 100644 index 26511e934e5dd..0000000000000 --- a/tests/integration/ranker_training/offline/yaml/train.yml +++ /dev/null @@ -1,9 +0,0 @@ -!SGDRegressorRankerTrainer -with: - model_path: . -metas: - py_modules: trainer.py -requests: - on: - TrainRequest: - - jtype: RankerTrainerDriver {} diff --git a/tests/integration/ranker_training/offline/yaml/trainer.py b/tests/integration/ranker_training/offline/yaml/trainer.py deleted file mode 100644 index cd719247521b6..0000000000000 --- a/tests/integration/ranker_training/offline/yaml/trainer.py +++ /dev/null @@ -1,55 +0,0 @@ -import pickle -from pathlib import Path - -import numpy as np - -from jina.executors.rankers.trainer import RankerTrainer - - -class SGDRegressorRankerTrainer(RankerTrainer): - """The :class:`SGDRegressorRankerTrainer` trains an :class:`SGDRegressor` and save the trained model - to the expected directory. To be loaded into :class:`SGDRegressorRanker`.""" - - MODEL_FILENAME = 'model.pickle' - - def __init__(self, model_path: str, *args, **kwargs): - super().__init__(*args, **kwargs) - self.model = None - self.model_path = model_path - self.match_required_keys = {'tags__size', 'tags__price', 'tags__relevance'} - self.query_required_keys = None - - def post_init(self): - from sklearn.linear_model import LinearRegression - - self.model = LinearRegression() - - def train(self, query_metas, matches_metas, *args, **kwargs): - """ - Train the ranker, the core function of trainer. - - This method extract `size` and `price` features from ``Document`` tags, - serve as the features for model training. And extract `relevance` from ``Document``, - serve as the labels for model training. - """ - sizes = [] - prices = [] - relevance = [] - for match_meta in matches_metas: - for m in match_meta: - sizes.append(m['tags__size']) - prices.append(m['tags__price']) - relevance.append(m['tags__relevance']) - X = np.column_stack((prices, sizes)) - y = np.asarray(relevance) - self.model.fit(X, y) - - def save(self): - """Save the updated the ranker model.""" - path = Path(self.model_path) - - if not path.exists(): - path.mkdir(parents=True) - - with open(str(path) + '/' + self.MODEL_FILENAME, mode='wb') as model_file_name: - pickle.dump(self.model, model_file_name) diff --git a/tests/integration/ref_indexer/__init__.py b/tests/integration/ref_indexer/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/ref_indexer/compound-index.yml b/tests/integration/ref_indexer/compound-index.yml deleted file mode 100644 index 229db933ee350..0000000000000 --- a/tests/integration/ref_indexer/compound-index.yml +++ /dev/null @@ -1,6 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: compound-indexer.yml - parallel: $JINA_TEST_REF_INDEXER_PARALLEL diff --git a/tests/integration/ref_indexer/compound-indexer-with-ref.yml b/tests/integration/ref_indexer/compound-indexer-with-ref.yml deleted file mode 100644 index e38974e82f93b..0000000000000 --- a/tests/integration/ref_indexer/compound-indexer-with-ref.yml +++ /dev/null @@ -1,31 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - ref_indexer: - !NumpyIndexer - metas: - name: wrapidx - metas: - name: outeridx -metas: - name: compound - workspace: $JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY -requests: - on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !VectorSearchDriver - with: - executor: outeridx - [IndexRequest, UpdateRequest]: - - !VectorIndexDriver - with: - executor: outeridx - DeleteRequest: - - !DeleteDriver - with: - executor: outeridx - - diff --git a/tests/integration/ref_indexer/compound-indexer.yml b/tests/integration/ref_indexer/compound-indexer.yml deleted file mode 100644 index c35f1ad8b042c..0000000000000 --- a/tests/integration/ref_indexer/compound-indexer.yml +++ /dev/null @@ -1,26 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - metas: - name: wrapidx - with: - index_filename: 'index.gz' -metas: - name: compound - workspace: $JINA_TEST_INDEXER_WITH_REF_INDEXER -requests: - on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !VectorSearchDriver - with: - executor: wrapidx - [IndexRequest, UpdateRequest]: - - !VectorIndexDriver - with: - executor: wrapidx - DeleteRequest: - - !DeleteDriver - with: - executor: wrapidx \ No newline at end of file diff --git a/tests/integration/ref_indexer/compound-query.yml b/tests/integration/ref_indexer/compound-query.yml deleted file mode 100644 index f75e94bf01042..0000000000000 --- a/tests/integration/ref_indexer/compound-query.yml +++ /dev/null @@ -1,11 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: $JINA_QUERY_USES_COMPOUND - uses_internal: $JINA_QUERY_USES_COMPOUND_INTERNAL - parallel: $JINA_TEST_REF_INDEXER_PARALLEL - volumes: $JINA_VOLUMES - docker_kwargs: - environment: - JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY: $JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY diff --git a/tests/integration/ref_indexer/index.yml b/tests/integration/ref_indexer/index.yml deleted file mode 100644 index 44f500248cee3..0000000000000 --- a/tests/integration/ref_indexer/index.yml +++ /dev/null @@ -1,6 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: indexer.yml - parallel: $JINA_TEST_REF_INDEXER_PARALLEL diff --git a/tests/integration/ref_indexer/indexer.yml b/tests/integration/ref_indexer/indexer.yml deleted file mode 100644 index 232ef2be27759..0000000000000 --- a/tests/integration/ref_indexer/indexer.yml +++ /dev/null @@ -1,6 +0,0 @@ -!NumpyIndexer -with: - index_filename: 'index.gz' -metas: - workspace: $JINA_TEST_INDEXER_WITH_REF_INDEXER - name: wrapidx diff --git a/tests/integration/ref_indexer/indexer_with_ref.yml b/tests/integration/ref_indexer/indexer_with_ref.yml deleted file mode 100644 index 799db0ae9df6e..0000000000000 --- a/tests/integration/ref_indexer/indexer_with_ref.yml +++ /dev/null @@ -1,12 +0,0 @@ -!NumpyIndexer -with: - ref_indexer: - !NumpyIndexer - with: - index_filename: 'index.gz' - metas: - workspace: $JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY - name: wrapidx -metas: - name: indexer - workspace: $JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY diff --git a/tests/integration/ref_indexer/numpyhub/Dockerfile b/tests/integration/ref_indexer/numpyhub/Dockerfile deleted file mode 100644 index c2476d23f2f68..0000000000000 --- a/tests/integration/ref_indexer/numpyhub/Dockerfile +++ /dev/null @@ -1,7 +0,0 @@ -FROM jinaai/jina:test-pip - -# setup the workspace -COPY . /workspace/ -WORKDIR /workspace - -ENTRYPOINT ["jina", "pod", "--uses", "config.yml"] diff --git a/tests/integration/ref_indexer/numpyhub/config.yml b/tests/integration/ref_indexer/numpyhub/config.yml deleted file mode 100644 index e7c8ab584b96d..0000000000000 --- a/tests/integration/ref_indexer/numpyhub/config.yml +++ /dev/null @@ -1 +0,0 @@ -!NumpyIndexer \ No newline at end of file diff --git a/tests/integration/ref_indexer/numpyhub/manifest.yml b/tests/integration/ref_indexer/numpyhub/manifest.yml deleted file mode 100644 index 852cf89ed3785..0000000000000 --- a/tests/integration/ref_indexer/numpyhub/manifest.yml +++ /dev/null @@ -1,12 +0,0 @@ -manifest_version: 1 -name: DummyNumpyIndexer -kind: indexer -type: pod -description: testing -author: Jina AI Dev-Team (dev-team@jina.ai) -url: https://jina.ai -vendor: Jina AI Limited -documentation: https://github.com/jina-ai/jina-hub -version: 0.0.0 -license: apache-2.0 -keywords: [Some keywords to describe the executor, separated by commas] diff --git a/tests/integration/ref_indexer/query.yml b/tests/integration/ref_indexer/query.yml deleted file mode 100644 index 81b66aaa462b5..0000000000000 --- a/tests/integration/ref_indexer/query.yml +++ /dev/null @@ -1,12 +0,0 @@ -!Flow -version: '1' -pods: - - name: indexer - uses: $JINA_QUERY_USES - uses_internal: $JINA_QUERY_USES_INTERNAL - parallel: $JINA_TEST_REF_INDEXER_PARALLEL - volumes: $JINA_VOLUMES - timeout_ready: -1 - docker_kwargs: - environment: - JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY: $JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY diff --git a/tests/integration/ref_indexer/test_numpy_indexer_with_ref_indexer.py b/tests/integration/ref_indexer/test_numpy_indexer_with_ref_indexer.py deleted file mode 100644 index 19fdaf399ceca..0000000000000 --- a/tests/integration/ref_indexer/test_numpy_indexer_with_ref_indexer.py +++ /dev/null @@ -1,250 +0,0 @@ -import os -import shutil - -import numpy as np -import pytest - -from jina.flow import Flow -from jina import Document - -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture -def uses_no_docker(): - os.environ['JINA_QUERY_USES'] = 'indexer_with_ref.yml' - os.environ['JINA_QUERY_USES_INTERNAL'] = '' - os.environ['JINA_QUERY_USES_COMPOUND'] = 'compound-indexer-with-ref.yml' - os.environ['JINA_QUERY_USES_COMPOUND_INTERNAL'] = '' - yield - del os.environ['JINA_QUERY_USES'] - del os.environ['JINA_QUERY_USES_COMPOUND'] - del os.environ['JINA_QUERY_USES_INTERNAL'] - del os.environ['JINA_QUERY_USES_COMPOUND_INTERNAL'] - - -@pytest.fixture -def parallel(request): - os.environ['JINA_TEST_REF_INDEXER_PARALLEL'] = str(request.param) - yield - del os.environ['JINA_TEST_REF_INDEXER_PARALLEL'] - - -@pytest.fixture -def index_docs(): - docs = [] - for idx in range(0, 100): - doc = Document() - doc.id = f'{idx:0>16}' - doc.embedding = doc.embedding = np.array([idx, idx]) - docs.append(doc) - return docs - - -@pytest.fixture -def random_workspace(tmpdir): - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'] = str(tmpdir) - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'] = str(tmpdir) - yield - del os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'] - del os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'] - - -@pytest.mark.parametrize('parallel', [1, 2], indirect=True) -def test_indexer_with_ref_indexer( - random_workspace, parallel, index_docs, mocker, uses_no_docker -): - top_k = 10 - with Flow.load_config(os.path.join('index.yml')) as index_flow: - index_flow.index(inputs=index_docs, request_size=10) - - mock = mocker.Mock() - - def validate_response(resp): - assert len(resp.search.docs) == 1 - assert len(resp.search.docs[0].matches) == top_k - - query_document = Document() - query_document.embedding = np.array([1, 1]) - with Flow.load_config(os.path.join('query.yml')) as query_flow: - query_flow.search(inputs=[query_document], on_done=mock, top_k=top_k) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -@pytest.mark.parametrize('parallel', [1, 2], indirect=True) -def test_indexer_with_ref_indexer_compound( - random_workspace, parallel, index_docs, mocker, uses_no_docker -): - top_k = 10 - with Flow.load_config(os.path.join(cur_dir, 'compound-index.yml')) as index_flow: - index_flow.index(inputs=index_docs, request_size=10) - - mock = mocker.Mock() - - def validate_response(resp): - assert len(resp.search.docs) == 1 - assert len(resp.search.docs[0].matches) == top_k - - query_document = Document() - query_document.embedding = np.array([1, 1]) - with Flow.load_config(os.path.join(cur_dir, 'compound-query.yml')) as query_flow: - query_flow.search(inputs=[query_document], on_done=mock, top_k=top_k) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -@pytest.fixture -def random_workspace_move(tmpdir): - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'] = str(tmpdir) + '/index' - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'] = str(tmpdir) + '/query' - yield - del os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'] - del os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'] - - -@pytest.mark.parametrize('parallel', [1, 2], indirect=True) -def test_indexer_with_ref_indexer_move( - random_workspace_move, parallel, index_docs, mocker, uses_no_docker -): - top_k = 10 - with Flow.load_config(os.path.join(cur_dir, 'index.yml')) as index_flow: - index_flow.index(inputs=index_docs, request_size=10) - - mock = mocker.Mock() - - shutil.copytree( - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'], - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'], - ) - - shutil.rmtree(os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER']) - - def validate_response(resp): - assert len(resp.search.docs) == 1 - assert len(resp.search.docs[0].matches) == top_k - - query_document = Document() - query_document.embedding = np.array([1, 1]) - with Flow.load_config(os.path.join(cur_dir, 'query.yml')) as query_flow: - query_flow.search(inputs=[query_document], on_done=mock, top_k=top_k) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -@pytest.mark.parametrize('parallel', [1, 2], indirect=True) -def test_indexer_with_ref_indexer_compound_move( - random_workspace_move, parallel, index_docs, mocker, uses_no_docker -): - top_k = 10 - with Flow.load_config(os.path.join(cur_dir, 'compound-index.yml')) as index_flow: - index_flow.index(inputs=index_docs, request_size=10) - - mock = mocker.Mock() - - shutil.copytree( - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'], - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'], - ) - - shutil.rmtree(os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER']) - - def validate_response(resp): - assert len(resp.search.docs) == 1 - assert len(resp.search.docs[0].matches) == top_k - - query_document = Document() - query_document.embedding = np.array([1, 1]) - with Flow.load_config(os.path.join(cur_dir, 'compound-query.yml')) as query_flow: - query_flow.search(inputs=[query_document], on_done=mock, top_k=top_k) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -@pytest.fixture -def docker_image(): - from jina.parsers.hub import set_hub_build_parser - from jina.docker.hubio import HubIO - - args = set_hub_build_parser().parse_args([os.path.join(cur_dir, 'numpyhub')]) - HubIO(args).build() - - -@pytest.fixture -def uses_docker(docker_image): - from jina import __version__ as jina_version - - os.environ[ - 'JINA_QUERY_USES' - ] = f'docker://jinahub/pod.indexer.dummynumpyindexer:0.0.0-{jina_version}' - os.environ[ - 'JINA_QUERY_USES_COMPOUND' - ] = f'docker://jinahub/pod.indexer.dummynumpyindexer:0.0.0-{jina_version}' - os.environ['JINA_QUERY_USES_INTERNAL'] = 'indexer_with_ref.yml' - os.environ['JINA_QUERY_USES_COMPOUND_INTERNAL'] = 'compound-indexer-with-ref.yml' - yield - del os.environ['JINA_QUERY_USES'] - del os.environ['JINA_QUERY_USES_COMPOUND'] - - -@pytest.fixture -def random_workspace_in_docker(tmpdir): - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'] = str(tmpdir) - os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'] = '/docker-workspace' - os.environ['JINA_VOLUMES'] = f'{str(tmpdir)}:/docker-workspace' - yield - del os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER'] - del os.environ['JINA_TEST_INDEXER_WITH_REF_INDEXER_QUERY'] - del os.environ['JINA_VOLUMES'] - - -@pytest.mark.parametrize('parallel', [1, 2], indirect=True) -def test_indexer_with_ref_indexer_in_docker( - random_workspace_in_docker, parallel, index_docs, mocker, uses_docker -): - top_k = 10 - with Flow.load_config(os.path.join('index.yml')) as index_flow: - index_flow.index(inputs=index_docs, request_size=10) - - mock = mocker.Mock() - - def validate_response(resp): - assert len(resp.search.docs) == 1 - assert len(resp.search.docs[0].matches) == top_k - - query_document = Document() - query_document.embedding = np.array([1, 1]) - with Flow.load_config(os.path.join('query.yml')) as query_flow: - query_flow.search(inputs=[query_document], on_done=mock, top_k=top_k) - - mock.assert_called_once() - validate_callback(mock, validate_response) - - -@pytest.mark.parametrize('parallel', [1, 2], indirect=True) -def test_indexer_with_ref_indexer_compound_in_docker( - random_workspace_in_docker, parallel, index_docs, mocker, uses_docker -): - top_k = 10 - with Flow.load_config(os.path.join(cur_dir, 'compound-index.yml')) as index_flow: - index_flow.index(inputs=index_docs, request_size=10) - - mock = mocker.Mock() - - def validate_response(resp): - assert len(resp.search.docs) == 1 - assert len(resp.search.docs[0].matches) == top_k - - query_document = Document() - query_document.embedding = np.array([1, 1]) - with Flow.load_config(os.path.join(cur_dir, 'compound-query.yml')) as query_flow: - query_flow.search(inputs=[query_document], on_done=mock, top_k=top_k) - - mock.assert_called_once() - validate_callback(mock, validate_response) diff --git a/tests/integration/rolling_update/test_rolling_update.py b/tests/integration/rolling_update/test_rolling_update.py index 9e97a2c5efbd6..a68cc6a15d6b8 100644 --- a/tests/integration/rolling_update/test_rolling_update.py +++ b/tests/integration/rolling_update/test_rolling_update.py @@ -1,13 +1,11 @@ +import collections import os import threading -import pytest import numpy as np -import collections +import pytest -from jina import Document -from jina.executors.encoders import BaseEncoder -from jina.flow import Flow +from jina import Document, Flow, Executor, requests cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -27,21 +25,21 @@ def docs(): ] -class DummyMarkExecutor(BaseEncoder): - def get_docs(self, req_type): - if req_type == 'ControlRequest': - return [] - driver = self._drivers[req_type][0] - return driver.docs +class DummyMarkExecutor(Executor): + def __init__(self, *args, **kwargs): + super().__init__(*args, **kwargs) + self.metas.name = 'dummy' - def __call__(self, req_type, *args, **kwargs): - if req_type == 'ControlRequest': - for d in self._drivers[req_type]: - d() - else: - for doc in self.get_docs(req_type): - doc.tags['replica'] = self.replica_id - doc.tags['shard'] = self.pea_id + @requests + def foo(self, docs, *args, **kwargs): + for doc in docs: + doc.tags['replica'] = self.runtime_args.replica_id + doc.tags['shard'] = self.runtime_args.pea_id + + def close(self) -> None: + import os + + os.makedirs(self.workspace, exist_ok=True) def test_normal(docs): @@ -50,12 +48,12 @@ def test_normal(docs): doc_id_path = collections.OrderedDict() def handle_search_result(resp): - for doc in resp.search.docs: + for doc in resp.data.docs: doc_id_path[int(doc.id)] = (doc.tags['replica'], doc.tags['shard']) flow = Flow().add( name='pod1', - uses='!DummyMarkExecutor', + uses=DummyMarkExecutor, replicas=NUM_REPLICAS, parallel=NUM_SHARDS, ) @@ -130,7 +128,7 @@ def update_rolling(flow, pod_name): error_mock = mocker.Mock() with Flow().add( name='pod1', - uses='!DummyMarkExecutor', + uses=DummyMarkExecutor, replicas=2, parallel=3, timeout_ready=5000, @@ -155,7 +153,8 @@ def update_rolling(flow, pod_name): def test_workspace(config, tmpdir, docs): with Flow().add( name='pod1', - uses=os.path.join(cur_dir, 'yaml/simple_index_vector.yml'), + uses=DummyMarkExecutor, + workspace=str(tmpdir), replicas=2, parallel=3, ) as flow: @@ -163,17 +162,15 @@ def test_workspace(config, tmpdir, docs): for i in range(10): flow.index(docs) - # validate created workspaces - dirs = set(os.listdir(tmpdir)) - expected_dirs = { - 'vecidx-0-0', - 'vecidx-0-1', - 'vecidx-0-2', - 'vecidx-1-0', - 'vecidx-1-1', - 'vecidx-1-2', + # validate created workspaces + assert set(os.listdir(str(tmpdir))) == {'dummy'} + assert set(os.listdir(os.path.join(tmpdir, 'dummy'))) == {'0', '1'} + for replica_id in {'0', '1'}: + assert set(os.listdir(os.path.join(tmpdir, 'dummy', replica_id))) == { + '0', + '1', + '2', } - assert dirs == expected_dirs @pytest.mark.parametrize( diff --git a/tests/integration/sharding/__init__.py b/tests/integration/sharding/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/sharding/test_search_non_existent.py b/tests/integration/sharding/test_search_non_existent.py deleted file mode 100644 index 5c81239eeb301..0000000000000 --- a/tests/integration/sharding/test_search_non_existent.py +++ /dev/null @@ -1,59 +0,0 @@ -import os -import random -import string - -import numpy as np -import pytest - -from jina import Document, Flow - -from tests import validate_callback - -random.seed(0) -np.random.seed(0) - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_SHARDING_DIR'] = str(tmpdir) - yield - del os.environ['JINA_SHARDING_DIR'] - - -def random_docs(start, end, embed_dim=10): - for j in range(start, end): - d = Document() - d.id = f'{j:0>16}' - d.tags['id'] = j - d.text = ''.join( - random.choice(string.ascii_lowercase) for _ in range(10) - ).encode('utf8') - d.embedding = np.random.random([embed_dim]) - yield d - - -def test_search_non_existent(config, mocker): - yaml_file = 'index_kv_simple.yml' - - def validate_results(resp): - assert len(resp.docs) == 3 - - with Flow().add( - uses=os.path.join(cur_dir, 'yaml', yaml_file), - shards=2, - ) as index_flow: - index_flow.index(inputs=random_docs(0, 3), request_size=1) - - mock = mocker.Mock() - with Flow(read_only=True).add( - uses=os.path.join(cur_dir, 'yaml', yaml_file), - shards=2, - uses_after='_merge_root', - polling='all', - ) as search_flow: - search_flow.search(inputs=random_docs(0, 5), on_done=mock, request_size=5) - - mock.assert_called_once() - validate_callback(mock, validate_results) diff --git a/tests/integration/sharding/test_sharding.py b/tests/integration/sharding/test_sharding.py deleted file mode 100644 index a74fce0d9e237..0000000000000 --- a/tests/integration/sharding/test_sharding.py +++ /dev/null @@ -1,272 +0,0 @@ -import os -import random -import string -from pathlib import Path - -import numpy as np -import pytest - -from jina import Document -from jina.executors.indexers import BaseIndexer -from jina.flow import Flow - -from tests import validate_callback - -random.seed(0) -np.random.seed(0) - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def get_index_flow(yaml_file, num_shards): - f = Flow().add( - uses=os.path.join(cur_dir, 'yaml', yaml_file), - shards=num_shards, - ) - return f - - -def get_delete_flow(yaml_file, num_shards): - f = Flow().add( - uses=os.path.join(cur_dir, 'yaml', yaml_file), - shards=num_shards, - polling='all', - ) - return f - - -def get_update_flow(yaml_file, num_shards): - f = Flow().add( - uses=os.path.join(cur_dir, 'yaml', yaml_file), - shards=num_shards, - polling='all', - ) - return f - - -def get_search_flow(yaml_file, num_shards, uses_after='_merge_matches_topk'): - f = Flow(read_only=True).add( - uses=os.path.join(cur_dir, 'yaml', yaml_file), - shards=num_shards, - uses_after=uses_after, - polling='all', - timeout_ready='-1', - ) - return f - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_SHARDING_DIR'] = str(tmpdir) - os.environ['JINA_TOPK'] = '10' - yield - del os.environ['JINA_SHARDING_DIR'] - del os.environ['JINA_TOPK'] - - -def random_docs(start, end, embed_dim=10): - for j in range(start, end): - d = Document() - d.id = f'{j:0>16}' - d.tags['id'] = j - for i in range(5): - c = Document() - c.id = f'{j:0>16}' - d.text = ''.join( - random.choice(string.ascii_lowercase) for _ in range(10) - ).encode('utf8') - d.embedding = np.random.random([embed_dim]) - d.chunks.append(c) - d.text = ''.join( - random.choice(string.ascii_lowercase) for _ in range(10) - ).encode('utf8') - d.embedding = np.random.random([embed_dim]) - yield d - - -def validate_index_size(expected_count, index_name): - path = Path(os.environ['JINA_SHARDING_DIR']) - index_files = list(path.glob(f'{index_name}.bin')) + list( - path.glob(f'*/{index_name}.bin') - ) - assert len(index_files) > 0 - actual_count_list = [] - assert len(index_files) > 0 - count_sum = 0 - for index_file in index_files: - index = BaseIndexer.load(str(index_file)) - count_sum += index.size - actual_count_list.sort() - assert count_sum == expected_count - - -@pytest.mark.parametrize('num_shards', (1, 2, 3, 10)) -@pytest.mark.parametrize( - 'index_conf, index_names', - [['index.yml', ['kvidx', 'vecidx']], ['index_vector.yml', ['vecidx']]], -) -def test_delete_vector(config, mocker, index_conf, index_names, num_shards): - def _validate_result_factory(num_matches): - def _validate_results(resp): - assert len(resp.docs) == 7 - for doc in resp.docs: - assert len(doc.matches) == num_matches - - return _validate_results - - with get_index_flow(index_conf, num_shards) as index_flow: - index_flow.index(inputs=random_docs(0, 201), request_size=100) - - for index_name in index_names: - validate_index_size(201, index_name) - - with get_delete_flow(index_conf, num_shards) as index_flow: - index_flow.delete(ids=[d.id for d in random_docs(0, 30)], request_size=100) - - with get_delete_flow(index_conf, num_shards) as index_flow: - index_flow.delete(ids=[d.id for d in random_docs(100, 150)], request_size=100) - - for index_name in index_names: - validate_index_size(121, index_name) - - mock = mocker.Mock() - with get_search_flow(index_conf, num_shards) as search_flow: - search_flow.search(inputs=random_docs(28, 35), on_done=mock, request_size=100) - mock.assert_called_once() - validate_callback(mock, _validate_result_factory(10)) - - -@pytest.mark.parametrize('num_shards', (1, 2, 3, 10)) -def test_delete_kv(config, mocker, num_shards): - index_conf = 'index_kv.yml' - index_name = 'kvidx' - - def _validate_result_factory(num_matches): - def _validate_results(resp): - assert len(resp.docs) == num_matches - - return _validate_results - - with get_index_flow(index_conf, num_shards) as index_flow: - index_flow.index(inputs=random_docs(0, 201), request_size=100) - - validate_index_size(201, index_name) - - with get_delete_flow(index_conf, num_shards) as delete_flow: - delete_flow.delete(ids=[d.id for d in random_docs(0, 30)], request_size=100) - - with get_delete_flow(index_conf, num_shards) as delete_flow: - delete_flow.delete(ids=[d.id for d in random_docs(100, 150)], request_size=100) - - validate_index_size(121, index_name) - - mock = mocker.Mock() - with get_search_flow(index_conf, num_shards, '_merge_root') as search_flow: - search_flow.search(inputs=random_docs(28, 35), on_done=mock, request_size=100) - mock.assert_called_once() - validate_callback(mock, _validate_result_factory(5)) - - -@pytest.mark.parametrize( - 'num_shards', - (1, 2, 3, 10), -) -@pytest.mark.parametrize( - 'index_conf, index_names', - [['index.yml', ['kvidx', 'vecidx']], ['index_vector.yml', ['vecidx']]], -) -def test_update_vector(config, mocker, index_conf, index_names, num_shards): - docs_before = list(random_docs(0, 201)) - docs_updated = list(random_docs(0, 210)) - hash_set_before = [hash(d.embedding.tobytes()) for d in docs_before] - hash_set_updated = [hash(d.embedding.tobytes()) for d in docs_updated] - - def _validate_result_factory(): - def _validate_results(resp): - assert len(resp.docs) == 1 - for doc in resp.docs: - assert len(doc.matches) == 10 - for match in doc.matches: - h = hash(match.embedding.tobytes()) - assert h not in hash_set_before - assert h in hash_set_updated - - return _validate_results - - with get_index_flow(index_conf, num_shards) as index_flow: - index_flow.index(inputs=docs_before, request_size=100) - - for index_name in index_names: - validate_index_size(201, index_name) - - with get_update_flow(index_conf, num_shards) as update_flow: - update_flow.update(inputs=docs_updated, request_size=100) - - for index_name in index_names: - validate_index_size(201, index_name) - - mock = mocker.Mock() - - with get_search_flow(index_conf, num_shards) as search_flow: - search_flow.search(inputs=random_docs(0, 1), on_done=mock, request_size=100) - mock.assert_called_once() - validate_callback(mock, _validate_result_factory()) - - -@pytest.mark.parametrize('num_shards', (1, 2, 3, 10)) -def test_update_kv(config, mocker, num_shards): - index_conf = 'index_kv.yml' - index_name = 'kvidx' - - docs_before = list(random_docs(0, 201)) - docs_updated = list(random_docs(190, 210)) - hash_set_before = [hash(d.embedding.tobytes()) for d in docs_before] - hash_set_updated = [hash(d.embedding.tobytes()) for d in docs_updated] - - def _validate_results_1(resp): - assert len(resp.docs) == 100 - for i, doc in enumerate(resp.docs): - h = hash(doc.embedding.tobytes()) - - assert h in hash_set_before - assert h not in hash_set_updated - - def _validate_results_2(resp): - assert len(resp.docs) == 100 - for i, doc in enumerate(resp.docs): - h = hash(doc.embedding.tobytes()) - if i < 90: - assert h in hash_set_before - assert h not in hash_set_updated - else: - assert h not in hash_set_before - assert h in hash_set_updated - - def _validate_results_3(resp): - assert len(resp.docs) == 1 - h = hash(resp.docs[0].embedding.tobytes()) - assert h not in hash_set_before - assert h in hash_set_updated - - with get_index_flow(index_conf, num_shards) as index_flow: - index_flow.index(inputs=docs_before, request_size=100) - - validate_index_size(201, index_name) - - with get_update_flow(index_conf, num_shards) as update_flow: - update_flow.update(inputs=docs_updated, request_size=100) - - validate_index_size(201, index_name) - - for start, end, validate_results in ( - (0, 100, _validate_results_1), - (100, 200, _validate_results_2), - (200, 201, _validate_results_3), - ): - mock = mocker.Mock() - with get_search_flow(index_conf, num_shards, '_merge_root') as search_flow: - search_flow.search( - inputs=random_docs(start, end), on_done=mock, request_size=100 - ) - validate_callback(mock, validate_results) - mock.assert_called_once() diff --git a/tests/integration/sharding/yaml/index.yml b/tests/integration/sharding/yaml/index.yml deleted file mode 100644 index 04a566656bb28..0000000000000 --- a/tests/integration/sharding/yaml/index.yml +++ /dev/null @@ -1,57 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: cosine - metas: - name: vecidx - workspace: $JINA_SHARDING_DIR - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: kvidx - workspace: $JINA_SHARDING_DIR -metas: - name: chunk_indexer - workspace: $JINA_SHARDING_DIR -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'r' ] - - !KVIndexDriver - with: - method: update - executor: kvidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver - with: - executor: vecidx - - !DeleteDriver - with: - executor: kvidx - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - - !KVIndexDriver - with: - executor: kvidx - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - top_k: $JINA_TOPK - traversal_paths: ['r'] - - !KVSearchDriver - with: - executor: kvidx - traversal_paths: ['m'] diff --git a/tests/integration/sharding/yaml/index_kv.yml b/tests/integration/sharding/yaml/index_kv.yml deleted file mode 100644 index 47998e9a11858..0000000000000 --- a/tests/integration/sharding/yaml/index_kv.yml +++ /dev/null @@ -1,29 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gzip -metas: - name: kvidx - workspace: $JINA_SHARDING_DIR - -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - executor: kvidx - traversal_paths: ['r'] - SearchRequest: - - !KVSearchDriver - with: - executor: kvidx - is_update: false - traversal_paths: ['r'] - - UpdateRequest: - - !KVIndexDriver - with: - method: update - executor: kvidx - traversal_paths: ['r'] - DeleteRequest: - - !DeleteDriver { } \ No newline at end of file diff --git a/tests/integration/sharding/yaml/index_kv_simple.yml b/tests/integration/sharding/yaml/index_kv_simple.yml deleted file mode 100644 index e3c0348292b91..0000000000000 --- a/tests/integration/sharding/yaml/index_kv_simple.yml +++ /dev/null @@ -1,20 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: doc.gzip -metas: - name: kvidx - workspace: $JINA_SHARDING_DIR - -requests: - on: - IndexRequest: - - !KVIndexDriver - with: - executor: kvidx - traversal_paths: ['r'] - SearchRequest: - - !KVSearchDriver - with: - executor: kvidx - is_update: false - traversal_paths: ['r'] \ No newline at end of file diff --git a/tests/integration/sharding/yaml/index_vector.yml b/tests/integration/sharding/yaml/index_vector.yml deleted file mode 100644 index 18712ca3f93a5..0000000000000 --- a/tests/integration/sharding/yaml/index_vector.yml +++ /dev/null @@ -1,30 +0,0 @@ -!NumpyIndexer -with: - index_filename: vec.gz - metric: cosine -metas: - name: vecidx - workspace: $JINA_SHARDING_DIR - -requests: - on: - UpdateRequest: - - !VectorIndexDriver - with: - method: update - executor: vecidx - traversal_paths: [ 'r' ] - DeleteRequest: - - !DeleteDriver {} - IndexRequest: - - !VectorIndexDriver - with: - executor: vecidx - traversal_paths: ['r'] - SearchRequest: - - !VectorSearchDriver - with: - executor: vecidx - top_k: $JINA_TOPK - fill_embedding: True - traversal_paths: ['r'] \ No newline at end of file diff --git a/tests/integration/sparse_pipeline/__init__.py b/tests/integration/sparse_pipeline/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/integration/sparse_pipeline/indexer.yml b/tests/integration/sparse_pipeline/indexer.yml deleted file mode 100644 index dae5e49881f57..0000000000000 --- a/tests/integration/sparse_pipeline/indexer.yml +++ /dev/null @@ -1,11 +0,0 @@ -!DummyCSRSparseIndexer -requests: - on: - ControlRequest: - - !ControlReqDriver {} - SearchRequest: - - !VectorSearchDriver - with: - fill_embedding: True - IndexRequest: - - !VectorIndexDriver {} \ No newline at end of file diff --git a/tests/integration/sparse_pipeline/test_sparse_pipeline.py b/tests/integration/sparse_pipeline/test_sparse_pipeline.py deleted file mode 100644 index 32d814525c367..0000000000000 --- a/tests/integration/sparse_pipeline/test_sparse_pipeline.py +++ /dev/null @@ -1,116 +0,0 @@ -from typing import Any, Iterable -import os - -import pytest -import numpy as np -from scipy import sparse - -from jina import Flow, Document -from jina.types.arrays import DocumentArray -from jina.executors.encoders import BaseEncoder -from jina.executors.indexers import BaseVectorIndexer - -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture(scope='function') -def num_docs(): - return 10 - - -@pytest.fixture(scope='function') -def docs_to_index(num_docs): - docs = [] - for idx in range(1, num_docs + 1): - doc = Document(id=str(idx), content=np.array([idx * 5])) - docs.append(doc) - return DocumentArray(docs) - - -class DummySparseEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - embed = sparse.csr_matrix(content) - return embed - - -class DummyCSRSparseIndexer(BaseVectorIndexer): - embedding_cls_type = 'scipy_csr' - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.keys = [] - self.vectors = {} - - def add( - self, keys: Iterable[str], vectors: 'scipy.sparse.csr_matrix', *args, **kwargs - ) -> None: - assert isinstance(vectors, sparse.csr_matrix) - self.keys.extend(keys) - for i, key in enumerate(keys): - self.vectors[key] = vectors.getrow(i) - - def query(self, vectors: 'scipy.sparse.csr_matrix', top_k: int, *args, **kwargs): - assert isinstance(vectors, sparse.csr_matrix) - distances = [item for item in range(0, min(top_k, len(self.keys)))] - return [self.keys[:top_k]], np.array([distances]) - - def query_by_key(self, keys: Iterable[str], *args, **kwargs): - from scipy.sparse import vstack - - vectors = [] - for key in keys: - vectors.append(self.vectors[key]) - - return vstack(vectors) - - def save(self): - # avoid creating dump, do not polute workspace - pass - - def close(self): - # avoid creating dump, do not polute workspace - pass - - def get_create_handler(self): - pass - - def get_write_handler(self): - pass - - def get_add_handler(self): - pass - - def get_query_handler(self): - pass - - -def test_sparse_pipeline(mocker, docs_to_index): - def validate(response): - assert len(response.docs) == 1 - assert len(response.docs[0].matches) == 10 - for doc in response.docs: - for i, match in enumerate(doc.matches): - assert match.id == docs_to_index[i].id - assert isinstance(match.embedding, sparse.coo_matrix) - - f = ( - Flow() - .add(uses=DummySparseEncoder) - .add(uses=os.path.join(cur_dir, 'indexer.yml')) - ) - - mock = mocker.Mock() - error_mock = mocker.Mock() - - with f: - f.index(inputs=docs_to_index) - f.search(inputs=docs_to_index[0], on_done=mock, on_error=error_mock) - - mock.assert_called_once() - validate_callback(mock, validate) - error_mock.assert_not_called() diff --git a/jina/jaml/parsers/driver/__init__.py b/tests/integration/v2_api/__init__.py similarity index 100% rename from jina/jaml/parsers/driver/__init__.py rename to tests/integration/v2_api/__init__.py diff --git a/tests/integration/v2_api/test_func_routing.py b/tests/integration/v2_api/test_func_routing.py new file mode 100644 index 0000000000000..bdeef61c3074e --- /dev/null +++ b/tests/integration/v2_api/test_func_routing.py @@ -0,0 +1,178 @@ +from jina import Flow, Document, Executor, requests, DocumentArray + + +def test_func_simple_routing(mocker): + class MyExecutor(Executor): + @requests(on='/search') + def foo(self, **kwargs): + for j in ('docs', 'groundtruths', 'parameters'): + assert j in kwargs + assert len(kwargs['docs']) == 3 + + f = Flow().add(uses=MyExecutor) + + done_mock = mocker.Mock() + fail_mock = mocker.Mock() + + with f: + f.post( + on='/search', + inputs=[Document() for _ in range(3)], + parameters={'hello': 'world', 'topk': 10}, + on_done=done_mock, + on_error=fail_mock, + ) + + done_mock.assert_called_once() + fail_mock.assert_not_called() + + done_mock = mocker.Mock() + fail_mock = mocker.Mock() + + with f: + f.post( + on='/random', + inputs=[Document() for _ in range(3)], + parameters={'hello': 'world', 'topk': 10}, + on_done=done_mock, + on_error=fail_mock, + ) + + fail_mock.assert_not_called() + done_mock.assert_called_once() + + +def test_func_default_routing(): + class MyExecutor(Executor): + @requests + def foo(self, **kwargs): + for j in ('docs', 'groundtruths', 'parameters'): + assert j in kwargs + assert len(kwargs['docs']) == 3 + + f = Flow().add(uses=MyExecutor) + + with f: + f.post( + on='/some_endpoint', + inputs=[Document() for _ in range(3)], + parameters={'hello': 'world', 'topk': 10}, + ) + + +def test_func_return_(): + class MyExecutor(Executor): + @requests + def foo(self, **kwargs): + return DocumentArray([Document(), Document()]) + + f = Flow().add(uses=MyExecutor) + + with f: + f.post( + on='/some_endpoint', + inputs=[Document() for _ in range(3)], + parameters={'hello': 'world', 'topk': 10}, + on_done=print, + ) + + +def test_func_joiner(mocker): + class Joiner(Executor): + @requests + def foo(self, docs, **kwargs): + for d in docs: + d.text += '!!!' + return docs + + class M1(Executor): + @requests + def foo(self, docs, **kwargs): + for idx, d in enumerate(docs): + d.text = f'hello {idx}' + + class M2(Executor): + @requests + def foo(self, docs, **kwargs): + for idx, d in enumerate(docs): + d.text = f'world {idx}' + + f = ( + Flow() + .add(uses=M1) + .add(uses=M2, needs='gateway') + .add(uses=Joiner, needs=['pod0', 'pod1']) + ) + + mock = mocker.Mock() + + def validate(req): + texts = {d.text for d in req.docs} + assert len(texts) == 6 + expect = { + 'hello 0!!!', + 'hello 1!!!', + 'hello 2!!!', + 'world 0!!!', + 'world 1!!!', + 'world 2!!!', + } + assert texts == expect + mock() + + with f: + f.post( + on='/some_endpoint', + inputs=[Document() for _ in range(3)], + parameters={'hello': 'world', 'topk': 10}, + on_done=validate, + ) + + mock.assert_called_once() + + +def test_dealer_routing(mocker): + f = Flow().add(parallel=3) + mock = mocker.Mock() + with f: + f.post( + on='/some_endpoint', + inputs=[Document() for _ in range(100)], + request_size=2, + on_done=mock, + ) + + mock.assert_called() + + +def test_target_peapod(mocker): + class Foo(Executor): + @requests(on='/hello') + def foo(self, **kwargs): + pass + + class Bar(Executor): + @requests(on='/bye') + def bar(self, **kwargs): + pass + + f = Flow().add(name='p0', uses=Foo).add(name='p1', uses=Bar) + + with f: + success_mock = mocker.Mock() + fail_mock = mocker.Mock() + f.post( + '/hello', + target_peapod='p0', + inputs=Document(), + on_done=success_mock, + on_error=fail_mock, + ) + success_mock.assert_called() + fail_mock.assert_not_called() + + success_mock = mocker.Mock() + fail_mock = mocker.Mock() + f.post('/hello', inputs=Document(), on_done=success_mock, on_error=fail_mock) + success_mock.assert_called() + fail_mock.assert_not_called() diff --git a/tests/integration/v2_api/test_returns.py b/tests/integration/v2_api/test_returns.py new file mode 100644 index 0000000000000..af7be05a00bff --- /dev/null +++ b/tests/integration/v2_api/test_returns.py @@ -0,0 +1,48 @@ +import pytest + +from jina import Document, DocumentArray, Executor, Flow, requests +from tests import validate_callback + + +@pytest.fixture() +def test_docs(): + return DocumentArray([Document(id='1')]) + + +def test_different_responses(test_docs, mocker): + def assert_response(response): + assert len(response.data.docs) == 1 + assert response.data.docs[0].id == '1' + + class MyExecutor(Executor): + + @requests(on='/return_docs') + def return_docs(self, docs, *args, **kwargs): + return docs + + @requests(on='/return_none') + def return_none(self, docs, *args, **kwargs): + return None + + @requests(on='/return_copy') + def return_copy(self, docs, *args, **kwargs): + import copy + + return copy.copy(docs) + + @requests(on='/return_deepcopy') + def return_deep_copy(self, docs, *args, **kwargs): + import copy + + return copy.deepcopy(docs) + + mock = mocker.Mock() + with Flow().add(uses=MyExecutor) as flow: + flow.post(inputs=test_docs, on='/return_docs', on_done=mock) + validate_callback(mock, assert_response) + flow.post(inputs=test_docs, on='/return_none', on_done=mock) + validate_callback(mock, assert_response) + flow.post(inputs=test_docs, on='/return_copy', on_done=mock) + validate_callback(mock, assert_response) + flow.post(inputs=test_docs, on='/return_deepcopy', on_done=mock) + validate_callback(mock, assert_response) diff --git a/tests/integration/v2_api/test_yaml_dump_load.py b/tests/integration/v2_api/test_yaml_dump_load.py new file mode 100644 index 0000000000000..a720afa78f763 --- /dev/null +++ b/tests/integration/v2_api/test_yaml_dump_load.py @@ -0,0 +1,79 @@ +import os + +import pytest + +from jina import Executor, requests, Flow, Document + + +class MyExec(Executor): + def __init__(self, bar: str, bar2: int = 3, **kwargs): + super().__init__(**kwargs) + self.bar = bar + self.bar2 = bar2 + + @requests(on=['/foo', '/foo2']) + def foo(self, docs, **kwargs): + print(f'foo: {kwargs}') + print(self.workspace) + for d in docs: + d.text = '/foo' + + @requests + def bar(self, docs, **kwargs): + print(f'bar: {kwargs}') + for d in docs: + d.text = '/bar' + + def random(self, docs, **kwargs): + print(f'random: {kwargs}') + for d in docs: + d.text = '/random' + + +y = """ +jtype: MyExec +with: + bar: hello + bar2: 1 +metas: + name: my-awesomeness + description: this is an awesome executor +requests: + /foo_endpoint: foo + /random_endpoint: random +""" + + +def test_load_save_yml(tmp_path): + m = Executor.load_config(y) + m.save_config(os.path.join(tmp_path, 'a.yml')) + + assert m.bar == 'hello' + assert m.bar2 == 1 + assert m.metas.name == 'my-awesomeness' + for k in ('/foo', '/foo2', '/default', '/foo_endpoint', '/random_endpoint'): + assert k in m.requests + + +@pytest.mark.parametrize( + 'req_endpoint, doc_text', + [ + ('/foo', '/foo'), + ('/foo2', '/foo'), + ('/foo3', '/bar'), + ('/foo_endpoint', '/foo'), + ('/random_endpoint', '/random'), + ('/bar', '/bar'), + ], +) +def test_load_yaml_route(mocker, req_endpoint, doc_text): + f = Flow().add(uses=y) + + mock = mocker.Mock() + + def validate(req): + mock() + assert req.docs[0].text == doc_text + + with f: + f.post(req_endpoint, Document(), on_done=validate) diff --git a/tests/jinahub/app.py b/tests/jinahub/app.py index 36ba35434ed42..fc478945df3f5 100644 --- a/tests/jinahub/app.py +++ b/tests/jinahub/app.py @@ -1,6 +1,3 @@ -__copyright__ = "Copyright (c) 2020 Jina AI Limited. All rights reserved." -__license__ = "Apache-2.0" - from jina.flow import Flow if __name__ == '__main__': diff --git a/tests/jinahub/hub_mwu/mwu_encoder.py b/tests/jinahub/hub_mwu/mwu_encoder.py index ff74b8d7897c3..7146535ce7ab3 100644 --- a/tests/jinahub/hub_mwu/mwu_encoder.py +++ b/tests/jinahub/hub_mwu/mwu_encoder.py @@ -1,16 +1,13 @@ from typing import Any -import numpy as np +from jina import Executor, requests -from jina.executors.encoders import BaseEncoder - -class MWUEncoder(BaseEncoder): +class MWUEncoder(Executor): def __init__(self, greetings: str, *args, **kwargs): super().__init__(*args, **kwargs) self._greetings = greetings - self.logger.success(f'look at me! {greetings}') - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.logger.info(f'{self._greetings} {content}') - return np.random.random([content.shape[0], 3]) + @requests + def encode(self, **kwargs) -> Any: + pass diff --git a/tests/jinahub/test_integration.sh b/tests/jinahub/test_integration.sh index 928f2b24a91e7..cc03932bf2e48 100755 --- a/tests/jinahub/test_integration.sh +++ b/tests/jinahub/test_integration.sh @@ -16,10 +16,12 @@ sleep 10 RESPONSE=$(curl --request POST -d '{"top_k": 10, "data": ["text:hey, dude"]}' -H 'Content-Type: application/json' '0.0.0.0:45678/index') -TEXT_RESPONSE=$(echo $RESPONSE | jq -e ".index.docs[] | .text") - echo "Response is: ${RESPONSE}" +TEXT_RESPONSE=$(echo $RESPONSE | jq -e ".data.docs[] | .text") + +echo "Text Response is: ${TEXT_RESPONSE}" + # remove the new pods docker ps -a | awk '{ print $1,$2 }' | grep hubpod:test | awk '{print $1 }' | xargs -I {} docker rm -f {} docker rm -f $CONTAINER_ID diff --git a/tests/system/fashion/test_fashion.py b/tests/system/fashion/test_fashion.py index 4e2eb442d5262..fce10340e734c 100644 --- a/tests/system/fashion/test_fashion.py +++ b/tests/system/fashion/test_fashion.py @@ -1,14 +1,11 @@ import os import pytest -import numpy as np -from pkg_resources import resource_filename -from jina import Document -from jina.flow import Flow -from jina.helloworld.fashion import hello_world +import jina +from jina.helloworld.fashion.app import hello_world +from jina.helloworld.fashion.executors import * from jina.parsers.helloworld import set_hw_parser -from tests import validate_callback def check_hello_world_results(html_path: str): @@ -20,10 +17,10 @@ def check_hello_world_results(html_path: str): soup = BeautifulSoup(page) table = soup.find('table') rows = table.find_all('tr') - assert len(rows) == 129 + assert len(rows) > 1 for row in rows[1:]: cols = row.find_all('img') - assert len(cols) == 51 # query + results + assert len(cols) > 1 # query + results evaluation = soup.find_all('h3')[0].text assert 'Precision@50' in evaluation @@ -45,31 +42,12 @@ def query_document(): return Document(content=np.random.rand(28, 28)) -def test_fashion(helloworld_args, query_document, mocker, tmpdir): - """Regression test for fashion example.""" - - def validate_response(resp): - assert len(resp.search.docs) == 1 - for doc in resp.search.docs: - assert len(doc.matches) == 10 - - hello_world(helloworld_args) - check_hello_world_results(os.path.join(str(tmpdir), 'hello-world.html')) +root_dir = os.path.abspath(os.path.dirname(jina.__file__)) +os.environ['PATH'] += os.pathsep + os.path.join(root_dir, 'helloworld/fashion/') - flow_query_path = os.path.join(resource_filename('jina', 'resources'), 'fashion') - mock_on_done = mocker.Mock() - mock_on_fail = mocker.Mock() - - with Flow.load_config( - os.path.join(flow_query_path, 'helloworld.flow.query.yml') - ) as f: - f.search( - inputs=[query_document], - on_done=mock_on_done, - on_fail=mock_on_fail, - top_k=10, - ) +def test_fashion(helloworld_args, query_document, tmpdir): + """Regression test for fashion example.""" - mock_on_fail.assert_not_called() - validate_callback(mock_on_done, validate_response) + hello_world(helloworld_args) + check_hello_world_results(os.path.join(str(tmpdir), 'demo.html')) diff --git a/tests/system/multimodal/test_multimodal.py b/tests/system/multimodal/test_multimodal.py index 162e266bcc49d..f99512d88c684 100644 --- a/tests/system/multimodal/test_multimodal.py +++ b/tests/system/multimodal/test_multimodal.py @@ -1,11 +1,10 @@ import os import pytest -from pkg_resources import resource_filename -from jina import Document -from jina.flow import Flow -from jina.helloworld.multimodal import hello_world +import jina +from jina import Document, Flow +from jina.helloworld.multimodal.app import hello_world from jina.parsers.helloworld import set_hw_multimodal_parser from tests import validate_callback @@ -49,32 +48,39 @@ def text_chunk(): @pytest.fixture def query_document(image_chunk, text_chunk): query_document = Document() - query_document.chunks.add(image_chunk) - query_document.chunks.add(text_chunk) + query_document.chunks.append(image_chunk) + query_document.chunks.append(text_chunk) return query_document +root_dir = os.path.abspath(os.path.dirname(jina.__file__)) +os.environ['PATH'] += os.pathsep + os.path.join(root_dir, 'helloworld/multimodal/') + + +def search(query_document, on_done_callback, on_fail_callback, top_k): + with Flow.load_config('jina/helloworld/multimodal/flow-search.yml') as f: + f.search( + inputs=query_document, + on_done=on_done_callback, + on_fail=on_fail_callback, + parameters={'top_k': top_k}, + ) + + def test_multimodal(helloworld_args, query_document, mocker): """Regression test for multimodal example.""" def validate_response(resp): - assert len(resp.search.docs) == 1 - for doc in resp.search.docs: + assert len(resp.data.docs) == 1 + for doc in resp.data.docs: assert len(doc.matches) == 10 hello_world(helloworld_args) - flow_query_path = os.path.join(resource_filename('jina', 'resources'), 'multimodal') mock_on_done = mocker.Mock() mock_on_fail = mocker.Mock() - with Flow.load_config(os.path.join(flow_query_path, 'flow-query.yml')) as f: - f.search( - inputs=[query_document], - on_done=mock_on_done, - on_fail=mock_on_fail, - top_k=10, - ) + search(query_document, mock_on_done, mock_on_fail, 10) mock_on_fail.assert_not_called() validate_callback(mock_on_done, validate_response) diff --git a/tests/unit/clients/python/test_client.py b/tests/unit/clients/python/test_client.py index 367328cfb724f..02cab0d0c33a0 100644 --- a/tests/unit/clients/python/test_client.py +++ b/tests/unit/clients/python/test_client.py @@ -4,14 +4,14 @@ import pytest import requests +from jina import helper, Document from jina.clients import Client, WebSocketClient -from jina.clients.sugary_io import _input_files from jina.excepts import BadClientInput from jina.flow import Flow -from jina import helper, Document from jina.parsers import set_gateway_parser, set_client_cli_parser from jina.peapods import Pea from jina.proto.jina_pb2 import DocumentProto +from jina import Executor, DocumentArray, requests as req cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -53,7 +53,7 @@ def test_check_input_fail(inputs): @pytest.mark.parametrize( 'port_expose, route, status_code', - [(helper.random_port(), '/status', 200), (helper.random_port(), '/api/ass', 405)], + [(helper.random_port(), '/status', 200), (helper.random_port(), '/api/ass', 404)], ) def test_gateway_ready(port_expose, route, status_code): p = set_gateway_parser().parse_args( @@ -74,21 +74,27 @@ def test_gateway_index(flow_with_rest_api_enabled, test_img_1, test_img_2): ) assert r.status_code == 200 resp = r.json() - assert 'index' in resp - assert len(resp['index']['docs']) == 2 - assert resp['index']['docs'][0]['uri'] == test_img_1 + assert 'data' in resp + assert len(resp['data']['docs']) == 2 + assert resp['data']['docs'][0]['uri'] == test_img_1 @pytest.mark.parametrize('restful', [False, True]) def test_mime_type(restful): - f = Flow(restful=restful).add(uses='- !URI2Buffer {}') + class MyExec(Executor): + @req + def foo(self, docs: 'DocumentArray', **kwargs): + for d in docs: + d.convert_uri_to_buffer() + + f = Flow(restful=restful).add(uses=MyExec) def validate_mime_type(req): - for d in req.index.docs: + for d in req.data.docs: assert d.mime_type == 'text/x-python' with f: - f.index(_input_files('*.py'), validate_mime_type) + f.index(Document.from_files('*.py'), validate_mime_type) @pytest.mark.parametrize('func_name', ['index', 'search']) @@ -98,7 +104,7 @@ def test_client_ndjson(restful, mocker, func_name): os.path.join(cur_dir, 'docs.jsonlines') ) as fp: mock = mocker.Mock() - getattr(f, f'{func_name}_ndjson')(fp, on_done=mock) + getattr(f, f'{func_name}')(Document.from_ndjson(fp), on_done=mock) mock.assert_called_once() @@ -109,7 +115,7 @@ def test_client_csv(restful, mocker, func_name): os.path.join(cur_dir, 'docs.csv') ) as fp: mock = mocker.Mock() - getattr(f, f'{func_name}_csv')(fp, on_done=mock) + getattr(f, f'{func_name}')(Document.from_csv(fp), on_done=mock) mock.assert_called_once() diff --git a/tests/unit/clients/python/test_io.py b/tests/unit/clients/python/test_io.py index 07cefaa8d46a1..0aac4094ec310 100644 --- a/tests/unit/clients/python/test_io.py +++ b/tests/unit/clients/python/test_io.py @@ -5,13 +5,6 @@ from jina import Document from jina.clients import Client -from jina.clients.sugary_io import ( - _input_files, - _input_lines, - _input_ndarray, - _input_csv, -) -from jina.enums import DataInputType from jina.excepts import BadClientInput cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -26,14 +19,14 @@ def filepath(tmpdir): def test_input_lines_with_filepath(filepath): - result = list(_input_lines(filepath=filepath, size=2)) + result = list(Document.from_lines(filepath=filepath, size=2)) assert len(result) == 2 assert isinstance(result[0], Document) def test_input_csv_from_file(): with open(os.path.join(cur_dir, 'docs.csv')) as fp: - result = list(_input_csv(fp)) + result = list(Document.from_csv(fp)) assert len(result) == 2 assert isinstance(result[0], Document) assert result[0].tags['source'] == 'testsrc' @@ -41,7 +34,7 @@ def test_input_csv_from_file(): def test_input_csv_from_lines(): with open(os.path.join(cur_dir, 'docs.csv')) as fp: - result = list(_input_lines(fp, line_format='csv')) + result = list(Document.from_lines(fp, line_format='csv')) assert len(result) == 2 assert isinstance(result[0], Document) assert result[0].tags['source'] == 'testsrc' @@ -50,21 +43,21 @@ def test_input_csv_from_lines(): def test_input_csv_from_lines_field_resolver(): with open(os.path.join(cur_dir, 'docs.csv')) as fp: result = list( - _input_lines( - fp, line_format='csv', field_resolver={'url': 'uri', 'question': 'text'} + Document.from_lines( + fp, line_format='csv', field_resolver={'question': 'text'} ) ) assert len(result) == 2 assert isinstance(result[0], Document) assert result[0].tags['source'] == 'testsrc' - assert result[0].uri + assert not result[0].uri assert result[0].text def test_input_csv_from_strings(): with open(os.path.join(cur_dir, 'docs.csv')) as fp: lines = fp.readlines() - result = list(_input_csv(lines)) + result = list(Document.from_csv(lines)) assert len(result) == 2 assert isinstance(result[0], Document) assert result[0].tags['source'] == 'testsrc' @@ -72,13 +65,13 @@ def test_input_csv_from_strings(): def test_input_lines_with_empty_filepath_and_lines(): with pytest.raises(ValueError): - lines = _input_lines(lines=None, filepath=None) + lines = Document.from_lines(lines=None, filepath=None) for _ in lines: pass def test_input_lines_with_jsonlines_docs(): - result = list(_input_lines(filepath='tests/unit/clients/python/docs.jsonlines')) + result = list(Document.from_lines(filepath='tests/unit/clients/python/docs.jsonlines')) assert len(result) == 2 assert result[0].text == "a" assert result[1].text == "b" @@ -86,7 +79,7 @@ def test_input_lines_with_jsonlines_docs(): def test_input_lines_with_jsonlines_docs_groundtruth(): result = list( - _input_lines(filepath='tests/unit/clients/python/docs_groundtruth.jsonlines') + Document.from_lines(filepath='tests/unit/clients/python/docs_groundtruth.jsonlines') ) assert len(result) == 2 assert result[0][0].text == "a" @@ -107,24 +100,23 @@ def test_input_lines_with_jsonlines_docs_groundtruth(): ) def test_input_files(patterns, recursive, size, sampling_rate, read_mode): Client.check_input( - _input_files( + Document.from_files( patterns=patterns, recursive=recursive, size=size, sampling_rate=sampling_rate, read_mode=read_mode, - ), - data_type=DataInputType.CONTENT, + ) ) def test_input_files_with_invalid_read_mode(): with pytest.raises(BadClientInput): - Client.check_input(_input_files(patterns='*.*', read_mode='invalid')) + Client.check_input(Document.from_files(patterns='*.*', read_mode='invalid')) @pytest.mark.parametrize( 'array', [np.random.random([100, 4, 2]), ['asda', 'dsadas asdasd']] ) def test_input_numpy(array): - Client.check_input(_input_ndarray(array)) + Client.check_input(Document.from_ndarray(array)) diff --git a/tests/unit/clients/python/test_on_err.py b/tests/unit/clients/python/test_on_err.py index ea2b3e76277b8..3251556833850 100644 --- a/tests/unit/clients/python/test_on_err.py +++ b/tests/unit/clients/python/test_on_err.py @@ -1,6 +1,7 @@ import pytest import numpy as np +from jina import Document from jina.excepts import BadClientCallback from jina.flow import Flow @@ -27,14 +28,14 @@ def validate(x): with Flow(restful=restful).add() as f: t = 0 try: - f.index_ndarray( - np.random.random([5, 4]), on_done=validate, continue_on_error=False + f.index( + Document.from_ndarray(np.random.random([5, 4])), on_done=validate, continue_on_error=False ) except BadClientCallback: # bad client callback will break the `async for req in stub.Call(req_iter)` t = 1 # now query the gateway again, make sure gateway's channel is still usable - f.index_ndarray( - np.random.random([5, 4]), on_done=validate, continue_on_error=True + f.index( + Document.from_ndarray(np.random.random([5, 4])), on_done=validate, continue_on_error=True ) assert t == 1 diff --git a/tests/unit/clients/python/test_request.py b/tests/unit/clients/python/test_request.py index e17aad54dbf21..fe3cfc82f5fb8 100644 --- a/tests/unit/clients/python/test_request.py +++ b/tests/unit/clients/python/test_request.py @@ -83,24 +83,23 @@ def random_lines(num_lines): for j in range(1, num_lines + 1): yield f'i\'m dummy doc {j}' - req = request_generator(data=random_lines(100), request_size=100) + req = request_generator('', data=random_lines(100), request_size=100) request = next(req) - assert len(request.index.docs) == 100 - for index, doc in enumerate(request.index.docs, 1): - assert doc.mime_type == 'text/plain' - assert doc.text == f'i\'m dummy doc {index}' + assert len(request.docs) == 100 + assert request.docs[0].mime_type == 'text/plain' + assert request.docs[0].text == 'i\'m dummy doc 1' def test_request_generate_lines_from_list(): def random_lines(num_lines): return [f'i\'m dummy doc {j}' for j in range(1, num_lines + 1)] - req = request_generator(data=random_lines(100), request_size=100) + req = request_generator('', data=random_lines(100), request_size=100) request = next(req) - assert len(request.index.docs) == 100 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 100 + for index, doc in enumerate(request.docs, 1): assert doc.mime_type == 'text/plain' assert doc.text == f'i\'m dummy doc {index}' @@ -110,11 +109,11 @@ def random_lines(num_lines): for j in range(1, num_lines + 1): yield f'https://github.com i\'m dummy doc {j}' - req = request_generator(data=random_lines(100), request_size=100) + req = request_generator('', data=random_lines(100), request_size=100) request = next(req) - assert len(request.index.docs) == 100 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 100 + for index, doc in enumerate(request.docs, 1): assert doc.mime_type == 'text/plain' assert doc.text == f'https://github.com i\'m dummy doc {index}' @@ -124,11 +123,11 @@ def random_lines(num_lines): for j in range(1, num_lines + 1): yield f'i\'m dummy doc {j}' - req = request_generator(data=random_lines(100), request_size=100) + req = request_generator('', data=random_lines(100), request_size=100) request = next(req) - assert len(request.index.docs) == 100 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 100 + for index, doc in enumerate(request.docs, 1): assert doc.text == f'i\'m dummy doc {index}' assert doc.mime_type == 'text/plain' @@ -143,11 +142,11 @@ def random_docs(num_docs): doc.mime_type = 'mime_type' yield doc - req = request_generator(data=random_docs(100), request_size=100) + req = request_generator('', data=random_docs(100), request_size=100) request = next(req) - assert len(request.index.docs) == 100 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 100 + for index, doc in enumerate(request.docs, 1): assert doc.mime_type == 'mime_type' assert doc.text == f'i\'m dummy doc {index}' assert doc.offset == 1000 @@ -167,11 +166,11 @@ def random_docs(num_docs): } yield doc - req = request_generator(data=random_docs(100), request_size=100) + req = request_generator('', data=random_docs(100), request_size=100) request = next(req) - assert len(request.index.docs) == 100 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 100 + for index, doc in enumerate(request.docs, 1): assert doc.text == f'i\'m dummy doc {index}' assert doc.offset == 1000 assert doc.tags['id'] == 1000 @@ -198,11 +197,11 @@ def random_docs(num_docs): } yield json.dumps(doc) - req = request_generator(data=random_docs(100), request_size=100) + req = request_generator('', data=random_docs(100), request_size=100) request = next(req) - assert len(request.index.docs) == 100 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 100 + for index, doc in enumerate(request.docs, 1): assert doc.text == f'i\'m dummy doc {index}' assert doc.offset == 1000 assert doc.tags['id'] == 1000 @@ -216,16 +215,16 @@ def random_docs(num_docs): def test_request_generate_numpy_arrays(): input_array = np.random.random([10, 10]) - req = request_generator(data=input_array, request_size=5) + req = request_generator('', data=input_array, request_size=5) request = next(req) - assert len(request.index.docs) == 5 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 5 + for index, doc in enumerate(request.docs, 1): assert NdArray(doc.blob).value.shape == (10,) request = next(req) - assert len(request.index.docs) == 5 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 5 + for index, doc in enumerate(request.docs, 1): assert NdArray(doc.blob).value.shape == (10,) @@ -236,14 +235,14 @@ def generator(): for array in input_array: yield array - req = request_generator(data=generator(), request_size=5) + req = request_generator('', data=generator(), request_size=5) request = next(req) - assert len(request.index.docs) == 5 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 5 + for index, doc in enumerate(request.docs, 1): assert NdArray(doc.blob).value.shape == (10,) request = next(req) - assert len(request.index.docs) == 5 - for index, doc in enumerate(request.index.docs, 1): + assert len(request.docs) == 5 + for index, doc in enumerate(request.docs, 1): assert NdArray(doc.blob).value.shape == (10,) diff --git a/tests/unit/clients/test_asyncio.py b/tests/unit/clients/test_asyncio.py index 521f829d33aac..e039c5181663d 100644 --- a/tests/unit/clients/test_asyncio.py +++ b/tests/unit/clients/test_asyncio.py @@ -1,11 +1,10 @@ -import pytest import asyncio - -from jina.clients.request.asyncio import request_generator -from jina.enums import RequestType from math import ceil +import pytest + from jina import Document +from jina.clients.request.asyncio import request_generator NUM_INPUT_DOCS = 30 REQUEST_SIZE = 10 @@ -18,9 +17,9 @@ async def input_function(): for doc in data: yield doc - generator = request_generator( - input_function(), request_size=REQUEST_SIZE, mode=RequestType.INDEX - ) + generator = request_generator('/', + input_function(), request_size=REQUEST_SIZE + ) i = 0 async for req in generator: i += 1 @@ -32,4 +31,4 @@ async def input_function(): def test_asyncio_bad_input_generator(): # exception not handled data = ['text' for _ in range(20)] - request_generator(data, request_size=10, mode='not_index') + request_generator('/', data, request_size=10) diff --git a/tests/unit/docker/hub-mwu/mwu_encoder.py b/tests/unit/docker/hub-mwu/mwu_encoder.py index ff74b8d7897c3..7146535ce7ab3 100644 --- a/tests/unit/docker/hub-mwu/mwu_encoder.py +++ b/tests/unit/docker/hub-mwu/mwu_encoder.py @@ -1,16 +1,13 @@ from typing import Any -import numpy as np +from jina import Executor, requests -from jina.executors.encoders import BaseEncoder - -class MWUEncoder(BaseEncoder): +class MWUEncoder(Executor): def __init__(self, greetings: str, *args, **kwargs): super().__init__(*args, **kwargs) self._greetings = greetings - self.logger.success(f'look at me! {greetings}') - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.logger.info(f'{self._greetings} {content}') - return np.random.random([content.shape[0], 3]) + @requests + def encode(self, **kwargs) -> Any: + pass diff --git a/tests/unit/docker/test_helper.py b/tests/unit/docker/test_helper.py index 65911a95f1225..63cb6a6b33d4b 100644 --- a/tests/unit/docker/test_helper.py +++ b/tests/unit/docker/test_helper.py @@ -1,6 +1,8 @@ import os from pathlib import PosixPath +import pytest + from jina import __version__ from jina.docker.helper import credentials_file from jina.docker.hubio import HubIO @@ -10,6 +12,7 @@ def test_credentials_file(): assert isinstance(credentials_file(), PosixPath) +@pytest.mark.skip(reason='2.0.0rc does not include jina hub') def test_alias2path_transform(): # bad naming result to itself assert HubIO._alias_to_local_path('abcdefg') == 'abcdefg' @@ -19,6 +22,7 @@ def test_alias2path_transform(): assert os.path.exists(HubIO._alias_to_local_path('MongoDBIndexer')) +@pytest.mark.skip(reason='2.0.0rc does not include jina hub') def test_alias2tag_transform(): # bad naming result to itself assert HubIO._alias_to_docker_image_name('abcdefg') == 'abcdefg' diff --git a/tests/unit/docker/test_hub_list.py b/tests/unit/docker/test_hub_list.py index 99d365a97ecbb..8eba41de61561 100644 --- a/tests/unit/docker/test_hub_list.py +++ b/tests/unit/docker/test_hub_list.py @@ -1,10 +1,12 @@ import json import mock +import pytest from jina.docker.hubio import HubIO from jina.parsers.hub import set_hub_list_parser +@pytest.mark.skip('2.0.0rc disabled the hub submodule') def test_hub_list_local_with_submodule(): args = set_hub_list_parser().parse_args(['--local-only']) response = HubIO(args).list() diff --git a/tests/unit/drivers/__init__.py b/tests/unit/drivers/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/drivers/querylang/__init__.py b/tests/unit/drivers/querylang/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/drivers/querylang/test_querylang_drivers.py b/tests/unit/drivers/querylang/test_querylang_drivers.py deleted file mode 100644 index 8df0f67c86b4a..0000000000000 --- a/tests/unit/drivers/querylang/test_querylang_drivers.py +++ /dev/null @@ -1,240 +0,0 @@ -from google.protobuf import json_format - -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import single -from jina.flow import Flow -from jina import Document - -from tests import validate_callback - - -def random_docs(num_docs): - for j in range(num_docs): - d = Document() - d.text = 'hello world' - d.tags['id'] = j - for m in range(10): - dm = Document() - dm.text = 'match to other world' - dm.tags['id'] = m - d.matches.add(dm) - yield d - - -def random_docs_to_chunk(): - d1 = Document() - d1.tags['id'] = 1 - d1.text = 'chunk1 chunk2' - yield d1 - d2 = Document() - d2.tags['id'] = 1 - d2.text = 'chunk3' - yield d2 - - -def random_docs_with_tags(): - d1 = Document() - d1.tags['id'] = 1 - d1.text = 'a' - d1.tags.update({'id': 1}) - yield d1 - d2 = Document() - d2.tags['id'] = 2 - d2.tags.update({'id': 2}) - d2.text = 'b' - yield d2 - - -class DummySegmenter(BaseSegmenter): - @single - def segment(self, text, *args, **kwargs): - return [{'text': 'adasd' * (j + 1), 'tags': {'id': j}} for j in range(10)] - - -class DummyModeIdSegmenter(BaseSegmenter): - @single - def segment(self, text, *args, **kwargs): - if 'chunk3' not in text: - return [ - {'text': f'chunk{j + 1}', 'modality': f'mode{j + 1}'} for j in range(2) - ] - elif 'chunk3' in text: - return [{'text': f'chunk3', 'modality': 'mode3'}] - - -def test_select_ql(mocker): - def validate(req): - assert req.docs[0].text == '' - assert req.docs[-1].text == '' - assert req.docs[0].matches[0].text == '' - assert req.docs[0].chunks[0].text == '' - - f = ( - Flow() - .add(uses='DummySegmenter') - .add( - uses='- !SelectQL | {fields: [uri, matches, chunks], traversal_paths: [r, c, m]}' - ) - ) - - response_mock = mocker.Mock() - - with f: - f.index(random_docs(10), on_done=response_mock) - - f = ( - Flow() - .add(uses='DummySegmenter') - .add(uses='- !ExcludeQL | {fields: [text], traversal_paths: [r, c, m]}') - ) - - validate_callback(response_mock, validate) - - response_mock_2 = mocker.Mock() - - with f: - f.index(random_docs(10), on_done=response_mock_2) - - validate_callback(response_mock_2, validate) - - -def test_sort_ql(mocker): - def validate(req): - # print('---------------------------') - assert req.docs[-1].tags['id'] < req.docs[0].tags['id'] - assert req.docs[0].matches[-1].tags['id'] < req.docs[0].matches[0].tags['id'] - assert req.docs[0].chunks[-1].tags['id'] < req.docs[0].chunks[0].tags['id'] - - response_mock = mocker.Mock() - - f = ( - Flow() - .add(uses='DummySegmenter') - .add( - uses='- !SortQL | {field: tags__id, reverse: true, traversal_paths: [r, c, m]}' - ) - ) - - with f: - f.index(random_docs(10), on_done=response_mock) - - validate_callback(response_mock, validate) - - response_mock_2 = mocker.Mock() - - f = ( - Flow() - .add(uses='DummySegmenter') - .add( - uses='- !SortQL | {field: tags__id, reverse: false, traversal_paths: [r, c, m]}' - ) - .add(uses='- !ReverseQL | {traversal_paths: [r, c, m]}') - ) - - with f: - f.index(random_docs(10), on_done=response_mock_2) - - validate_callback(response_mock_2, validate) - - -def test_filter_ql(mocker): - def validate(req): - assert len(req.docs) == 1 - assert int(req.docs[0].tags['id']) == 2 - assert len(req.docs[0].matches) == 1 - assert int(req.docs[0].matches[0].tags['id']) == 2 - - response_mock = mocker.Mock() - - f = ( - Flow() - .add(uses='DummySegmenter') - .add(uses='- !FilterQL | {lookups: {tags__id: 2}, traversal_paths: [r, c, m]}') - ) - - with f: - f.index(random_docs(10), on_done=response_mock) - - validate_callback(response_mock, validate) - - -def test_filter_ql_in_tags(mocker): - def validate(req): - assert len(req.docs) == 1 - assert int(req.docs[0].tags['id']) == 2 - assert json_format.MessageToDict(req.docs[0].tags)['id'] == 2 - - response_mock = mocker.Mock() - - f = Flow().add( - uses='- !FilterQL | {lookups: {tags__id: 2}, traversal_paths: [r, c, m]}' - ) - - with f: - f.index(random_docs_with_tags(), on_done=response_mock) - - validate_callback(response_mock, validate) - - -def test_filter_ql_modality_wrong_depth(mocker): - def validate(req): - # since no doc has modality mode2 they are all erased from the list of docs - assert len(req.docs) == 0 - - response_mock = mocker.Mock() - - f = ( - Flow() - .add(uses='DummyModeIdSegmenter') - .add( - uses='- !FilterQL | {lookups: {modality: mode2}, traversal_paths: [r, c, m]}' - ) - ) - - with f: - f.index(random_docs_to_chunk(), on_done=response_mock) - - validate_callback(response_mock, validate) - - -def test_filter_ql_modality(mocker): - def validate(req): - # docs are not filtered, so 2 docs are returned, but only the chunk at depth1 with modality mode2 is returned - assert len(req.docs) == 2 - assert len(req.docs[0].chunks) == 1 - assert len(req.docs[1].chunks) == 0 - - response_mock = mocker.Mock() - - f = ( - Flow() - .add(uses='DummyModeIdSegmenter') - .add(uses='- !FilterQL | {lookups: {modality: mode2}, traversal_paths: [c]}') - ) - - with f: - f.index(random_docs_to_chunk(), on_done=response_mock) - - validate_callback(response_mock, validate) - - -def test_filter_compose_ql(mocker): - def validate(req): - assert len(req.docs) == 1 - assert int(req.docs[0].tags['id']) == 2 - assert len(req.docs[0].matches) == 0 # matches do not contain "hello" - - response_mock = mocker.Mock() - - f = ( - Flow() - .add(uses='DummySegmenter') - .add( - uses='- !FilterQL | {lookups: {tags__id: 2, text__contains: hello}, traversal_paths: [r, c, m]}' - ) - ) - - with f: - f.index(random_docs(10), on_done=response_mock) - - validate_callback(response_mock, validate) diff --git a/tests/unit/drivers/querylang/test_querylang_reader.py b/tests/unit/drivers/querylang/test_querylang_reader.py deleted file mode 100644 index 5336f37522a1b..0000000000000 --- a/tests/unit/drivers/querylang/test_querylang_reader.py +++ /dev/null @@ -1,146 +0,0 @@ -import pytest - -from jina.clients import Client -from jina.drivers import QuerySetReader, BaseDriver -from jina.drivers.querylang.sort import SortQL -from jina.drivers.querylang.slice import SliceQL -from jina.drivers.querylang.select import ExcludeQL -from jina.flow import Flow -from jina import Document -from jina.types.querylang import QueryLang -from jina.types.arrays import QueryLangArray - -from tests import validate_callback - - -def random_docs(num_docs): - for j in range(num_docs): - d = Document() - d.tags['id'] = j - d.text = 'hello world' - for m in range(10): - dm = Document() - dm.text = 'match to hello world' - dm.tags['id'] = m - dm.score.ref_id = d.id - d.matches.add(dm) - for mm in range(10): - dmm = Document() - dmm.text = 'nested match to match' - dmm.tags['id'] = mm - dmm.score.ref_id = dm.id - dm.matches.add(dmm) - yield d - - -class DummyDriver(QuerySetReader, BaseDriver): - def __init__(self, arg1='hello', arg2=456, *args, **kwargs): - super().__init__(*args, **kwargs) - self._arg1 = arg1 - self._arg2 = arg2 - - -def test_querylang_request(): - qs = QueryLang( - {'name': 'SliceQL', 'parameters': {'start': 1, 'end': 4}, 'priority': 1} - ) - Client.check_input(random_docs(10), queryset=qs) - - -def test_read_from_req(mocker): - def validate1(req): - assert len(req.docs) == 5 - - def validate2(req): - assert len(req.docs) == 3 - - response_mock = mocker.Mock() - response_mock_2 = mocker.Mock() - response_mock_3 = mocker.Mock() - - qs = QueryLang( - {'name': 'SliceQL', 'priority': 1, 'parameters': {'start': 1, 'end': 4}} - ) - - f = Flow().add(uses='- !SliceQL | {start: 0, end: 5}') - - # without queryset - with f: - f.index(random_docs(10), on_done=response_mock) - - validate_callback(response_mock, validate1) - # with queryset - with f: - f.index(random_docs(10), queryset=qs, on_done=response_mock_2) - - validate_callback(response_mock_2, validate2) - - qs.priority = -1 - # with queryset, but priority is no larger than driver's default - with f: - f.index(random_docs(10), queryset=qs, on_done=response_mock_3) - - validate_callback(response_mock_3, validate1) - - -def test_querlang_driver(): - qld2 = DummyDriver(arg1='world') - assert qld2.arg1 == 'world' - - -def test_as_querylang(): - sortql = SortQL(field='field_value', reverse=False, priority=2) - sort_querylang = sortql.as_querylang - assert sort_querylang.name == 'SortQL' - assert sort_querylang.priority == 2 - assert sort_querylang.parameters['field'] == 'field_value' - assert not sort_querylang.parameters['reverse'] - - sliceql = SliceQL(start=10, end=20) - slice_querylang = sliceql.as_querylang - assert slice_querylang.name == 'SliceQL' - assert slice_querylang.priority == 0 - assert slice_querylang.parameters['start'] == 10 - assert slice_querylang.parameters['end'] == 20 - - excludeql = ExcludeQL(fields=('field1', 'field2')) - exclude_querylang = excludeql.as_querylang - assert exclude_querylang.name == 'ExcludeQL' - assert exclude_querylang.priority == 0 - assert list(exclude_querylang.parameters['fields']) == ['field1', 'field2'] - - excludeql2 = ExcludeQL(fields='field1') - exclude_querylang2 = excludeql2.as_querylang - assert exclude_querylang2.name == 'ExcludeQL' - assert exclude_querylang2.priority == 0 - assert list(exclude_querylang2.parameters['fields']) == ['field1'] - - -class MockExcludeQL(ExcludeQL): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - ql = QueryLang( - { - 'name': 'MockExcludeQL', - 'parameters': {'fields': ['updated_field1', 'updated_field2']}, - 'priority': 3, - } - ) - self.qset = QueryLangArray([ql.proto]) - - @property - def queryset(self): - return self.qset - - -@pytest.mark.parametrize('driver_priority', [0, 4]) -def test_queryset_reader_excludeql(driver_priority): - querysetreader = MockExcludeQL( - fields=('local_field1', 'local_field2'), priority=driver_priority - ) - fields = querysetreader._get_parameter('fields', default=None) - - if driver_priority == 0: - assert list(fields) == ['updated_field1', 'updated_field2'] - else: - assert list(fields) == ['local_field1', 'local_field2'] diff --git a/tests/unit/drivers/querylang/test_slice_ql.py b/tests/unit/drivers/querylang/test_slice_ql.py deleted file mode 100644 index 001f13886c435..0000000000000 --- a/tests/unit/drivers/querylang/test_slice_ql.py +++ /dev/null @@ -1,173 +0,0 @@ -from jina import DocumentArray -from jina.drivers.querylang.slice import SliceQL -from jina.proto import jina_pb2 - - -class SimpleSliceQL(SliceQL): - def __init__(self, docs, *args, **kwargs): - super().__init__(*args, **kwargs) - self._docs = docs - - @property - def docs(self): - return self._docs - - -def random_docs_with_chunks(num_docs): - docs = [] - for j in range(num_docs): - d = jina_pb2.DocumentProto() - d.granularity = 0 - d.tags['id'] = j - d.text = 'hello world' - d.uri = 'doc://' - for c in range(10): - dc = d.chunks.add() - dc.text = 'chunk to hello world' - dc.granularity = 1 - dc.uri = 'doc://chunk' - dc.tags['id'] = c - for cc in range(10): - dcc = dc.chunks.add() - dcc.text = 'nested chunk to chunk' - dcc.uri = 'doc://chunk/chunk' - dcc.tags['id'] = cc - dcc.granularity = 2 - docs.append(d) - return DocumentArray(docs) - - -def random_docs_with_chunks_and_matches(num_docs): - # doc |- chunk |- chunk - # | |- chunk - # | |- match | - chunk - # | - chunk - # | |- match - # |- chunk - # |- chunk - # |- match | - chunk - # | - chunk - docs = [] - for j in range(num_docs): - d = jina_pb2.DocumentProto() - d.granularity = 0 - d.tags['id'] = j - d.text = 'hello world' - d.uri = 'doc://' - for c in range(10): - dc = d.chunks.add() - dc.text = 'chunk to hello world' - dc.granularity = d.granularity + 1 - dc.uri = 'doc://chunk' - dc.tags['id'] = c - for cc in range(10): - dcc = dc.chunks.add() - dcc.text = 'nested chunk to chunk' - dcc.uri = 'doc://chunk/chunk' - dcc.tags['id'] = cc - dcc.granularity = dc.granularity + 1 - for m in range(10): - cm = dc.matches.add() - cm.text = 'match to chunk to hello-world' - cm.uri = 'doc://chunk/match' - cm.tags['id'] = m - cm.granularity = dc.granularity - for mc in range(10): - cmc = cm.chunks.add() - cmc.text = 'chunk to match to chunk to hello-world' - cmc.uri = 'doc://chunk/match/chunk' - cmc.tags['id'] = mc - cmc.granularity = cm.granularity + 1 - for m in range(10): - dm = d.matches.add() - dm.text = 'match to hello-world' - dm.uri = 'doc://match' - dm.tags['id'] = m - dm.granularity = d.granularity - for c in range(10): - dmc = dm.chunks.add() - dmc.text = 'chunk to match to hello-world' - dmc.uri = 'doc://match/chunk' - dmc.tags['id'] = m - dmc.granularity = dm.granularity + 1 - - docs.append(d) - return DocumentArray(docs) - - -def test_slice_ql_on_chunks(): - docs = random_docs_with_chunks(10) - driver = SimpleSliceQL(docs=docs, start=0, end=2, traversal_paths=('cc', 'c', 'r')) - driver() - assert len(docs) == 2 - assert len(docs[0].chunks) == 2 # slice on level 1 - assert len(docs[0].chunks[0].chunks) == 2 # slice on level 2 for chunks - assert len(docs[0].chunks[-1].chunks) == 2 # slice on level 2 for chunks - assert len(docs[-1].chunks) == 2 # slice on level 1 - assert len(docs[-1].chunks[0].chunks) == 2 # slice on level 2 for chunks - assert len(docs[-1].chunks[-1].chunks) == 2 # slice on level 2 for chunks - - -def test_slice_ql_on_matches_and_chunks(): - docs = random_docs_with_chunks_and_matches(10) - assert len(docs) == 10 - assert len(docs[0].chunks) == 10 - assert len(docs[-1].chunks) == 10 - assert len(docs[0].matches) == 10 - assert len(docs[-1].matches) == 10 - assert len(docs[0].matches[0].chunks) == 10 - assert len(docs[0].matches[-1].chunks) == 10 - assert len(docs[-1].matches[0].chunks) == 10 - assert len(docs[-1].matches[-1].chunks) == 10 - assert len(docs[0].chunks[0].chunks) == 10 - assert len(docs[0].chunks[0].matches) == 10 - assert len(docs[0].chunks[0].matches[0].chunks) == 10 - assert len(docs[0].chunks[0].matches[-1].chunks) == 10 - assert len(docs[0].chunks[-1].matches[0].chunks) == 10 - assert len(docs[0].chunks[-1].matches[-1].chunks) == 10 - assert len(docs[0].chunks[-1].chunks) == 10 - assert len(docs[0].chunks[-1].matches) == 10 - assert len(docs[-1].chunks[0].chunks) == 10 - assert len(docs[-1].chunks[0].matches) == 10 - assert len(docs[-1].chunks[-1].chunks) == 10 - assert len(docs[-1].chunks[-1].matches) == 10 - driver = SimpleSliceQL( - docs=docs, start=0, end=2, traversal_paths=('cc', 'c', 'r', 'mm', 'm') - ) - driver() - - assert len(docs) == 2 - - assert len(docs[0].chunks) == 2 # slice on level 1 - assert len(docs[0].matches) == 2 # slice on level 1 - - assert len(docs[0].chunks[0].chunks) == 2 # slice on level 2 for chunks - assert len(docs[0].chunks[-1].chunks) == 2 # slice on level 2 for chunks - - assert len(docs[0].chunks[0].matches) == 10 # traverses directly on matches - assert len(docs[0].chunks[0].matches[0].chunks) == 10 - assert len(docs[0].chunks[0].matches[-1].chunks) == 10 - assert len(docs[0].chunks[-1].matches) == 10 # traverses directly on matches - assert len(docs[0].chunks[-1].matches[0].chunks) == 10 - assert len(docs[0].chunks[-1].matches[-1].chunks) == 10 - - assert len(docs[0].matches[0].chunks) == 10 - assert len(docs[0].matches[-1].chunks) == 10 - - assert len(docs[-1].chunks) == 2 # slice on level 1 of chunks - assert len(docs[-1].matches) == 2 # slice on level 1 of chunks - - assert len(docs[-1].chunks[0].chunks) == 2 # slice on level 2 for matches of chunks - assert ( - len(docs[-1].chunks[-1].chunks) == 2 - ) # slice on level 2 for matches of chunks - - assert len(docs[-1].chunks[0].matches) == 10 # traverses directly on matches - assert len(docs[-1].chunks[0].matches[0].chunks) == 10 - assert len(docs[-1].chunks[0].matches[-1].chunks) == 10 - assert len(docs[-1].chunks[-1].matches) == 10 # traverses directly on matches - assert len(docs[-1].chunks[-1].matches[0].chunks) == 10 - assert len(docs[-1].chunks[-1].matches[-1].chunks) == 10 - - assert len(docs[-1].matches[0].chunks) == 10 - assert len(docs[-1].matches[-1].chunks) == 10 diff --git a/tests/unit/drivers/rank/__init__.py b/tests/unit/drivers/rank/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/drivers/rank/aggregate/__init__.py b/tests/unit/drivers/rank/aggregate/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/drivers/rank/aggregate/test_aggregate_matches_rank_driver.py b/tests/unit/drivers/rank/aggregate/test_aggregate_matches_rank_driver.py deleted file mode 100644 index 44ff414d6cfa2..0000000000000 --- a/tests/unit/drivers/rank/aggregate/test_aggregate_matches_rank_driver.py +++ /dev/null @@ -1,161 +0,0 @@ -import pytest - -from jina import Document -from jina.drivers.rank.aggregate import AggregateMatches2DocRankDriver -from jina.executors.rankers import Chunk2DocRanker -from jina.types.score import NamedScore -from jina.types.arrays import DocumentArray - - -class MockMaxRanker(Chunk2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__( - query_required_keys=('length',), - match_required_keys=('length',), - *args, - **kwargs, - ) - - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return match_idx[self.COL_SCORE].max() - - -class MockMinRanker(Chunk2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__( - query_required_keys=('length',), - match_required_keys=('length',), - *args, - **kwargs, - ) - - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return 1.0 / (1.0 + match_idx[self.COL_SCORE].min()) - - -class SimpleCollectMatchesRankDriver(AggregateMatches2DocRankDriver): - def __init__(self, docs, *args, **kwargs): - super().__init__(*args, **kwargs) - self._docs = docs - - @property - def exec_fn(self): - return self._exec_fn - - @property - def docs(self): - return self._docs - - -class MockLengthRanker(Chunk2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__( - query_required_keys=('weight',), - match_required_keys=('weight',), - *args, - **kwargs, - ) - - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return match_chunk_meta[match_idx[0][self.COL_DOC_CHUNK_ID]]['weight'] - - -def create_document_to_score_same_depth_level(): - # doc: 1 - # | matches: (id: 2, parent_id: 20, score.value: 30, length: 3), - # | matches: (id: 3, parent_id: 20, score.value: 40, length: 4), - # | matches: (id: 4, parent_id: 30, score.value: 20, length: 2), - # | matches: (id: 5, parent_id: 30, score.value: 10, length: 1), - - doc = Document() - doc.id = 1 - - for match_id, parent_id, match_score, weight in [ - (2, 20, 30, 3), - (3, 20, 40, 4), - (4, 30, 20, 2), - (5, 30, 10, 1), - ]: - match = Document() - match.id = match_id - match.parent_id = parent_id - match.weight = weight - match.score = NamedScore(value=match_score, ref_id=doc.id) - doc.matches.append(match) - return doc - - -def test_collect_matches2doc_ranker_driver_mock_ranker(): - doc = create_document_to_score_same_depth_level() - driver = SimpleCollectMatchesRankDriver(docs=DocumentArray([doc])) - executor = MockLengthRanker() - driver.attach(executor=executor, runtime=None) - driver() - dm = list(doc.matches) - assert len(dm) == 2 - assert dm[0].id == '20' - assert dm[0].score.value == 3.0 - assert dm[1].id == '30' - assert dm[1].score.value == 2.0 - for match in dm: - # match score is computed w.r.t to doc.id - assert match.score.ref_id == doc.id - - -@pytest.mark.parametrize('keep_source_matches_as_chunks', [False, True]) -def test_collect_matches2doc_ranker_driver_min_ranker(keep_source_matches_as_chunks): - doc = create_document_to_score_same_depth_level() - driver = SimpleCollectMatchesRankDriver( - docs=DocumentArray([doc]), - keep_source_matches_as_chunks=keep_source_matches_as_chunks, - ) - executor = MockMinRanker() - driver.attach(executor=executor, runtime=None) - import sys - - min_value_30 = sys.maxsize - min_value_20 = sys.maxsize - for match in doc.matches: - if match.parent_id == '30': - if match.score.value < min_value_30: - min_value_30 = match.score.value - if match.parent_id == '20': - if match.score.value < min_value_20: - min_value_20 = match.score.value - - assert min_value_30 < min_value_20 - driver() - dm = list(doc.matches) - assert len(dm) == 2 - assert dm[0].id == '30' - assert dm[0].score.value == pytest.approx((1.0 / (1.0 + min_value_30)), 0.0000001) - assert dm[1].id == '20' - assert dm[1].score.value == pytest.approx((1.0 / (1.0 + min_value_20)), 0.0000001) - for match in dm: - # match score is computed w.r.t to doc.id - assert match.score.ref_id == doc.id - expected_chunk_matches_length = 2 if keep_source_matches_as_chunks else 0 - assert len(match.chunks) == expected_chunk_matches_length - - -@pytest.mark.parametrize('keep_source_matches_as_chunks', [False, True]) -def test_collect_matches2doc_ranker_driver_max_ranker(keep_source_matches_as_chunks): - doc = create_document_to_score_same_depth_level() - driver = SimpleCollectMatchesRankDriver( - docs=DocumentArray([doc]), - keep_source_matches_as_chunks=keep_source_matches_as_chunks, - ) - executor = MockMaxRanker() - driver.attach(executor=executor, runtime=None) - driver() - dm = list(doc.matches) - assert len(dm) == 2 - assert dm[0].id == '20' - assert dm[0].score.value == 40 - assert dm[1].id == '30' - assert dm[1].score.value == 20 - for match in dm: - # match score is computed w.r.t to doc.id - assert match.score.ref_id == doc.id - expected_chunk_matches_length = 2 if keep_source_matches_as_chunks else 0 - assert len(match.chunks) == expected_chunk_matches_length diff --git a/tests/unit/drivers/rank/aggregate/test_chunk2doc_rank_drivers.py b/tests/unit/drivers/rank/aggregate/test_chunk2doc_rank_drivers.py deleted file mode 100644 index 5542b620dfc42..0000000000000 --- a/tests/unit/drivers/rank/aggregate/test_chunk2doc_rank_drivers.py +++ /dev/null @@ -1,285 +0,0 @@ -import pytest - -from jina import Document -from jina.drivers.rank.aggregate import Chunk2DocRankDriver -from jina.executors.rankers import Chunk2DocRanker -from jina.types.score import NamedScore -from jina.types.arrays import DocumentArray - -DISCOUNT_VAL = 0.5 - - -class MockMaxRanker(Chunk2DocRanker): - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return match_idx[self.COL_SCORE].max() - - -class MockMinRanker(Chunk2DocRanker): - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return 1.0 / (1.0 + match_idx[self.COL_SCORE].min()) - - -class MockLengthRanker(Chunk2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__( - query_required_keys=['weight'], - match_required_keys=['weight'], - *args, - **kwargs, - ) - - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - return match_chunk_meta[match_idx[0][self.COL_DOC_CHUNK_ID]]['weight'] - - -class MockPriceDiscountRanker(Chunk2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__( - query_required_keys=('tags__price', 'tags__discount'), - match_required_keys=('tags__price', 'tags__discount'), - *args, - **kwargs, - ) - - def score(self, match_idx, query_chunk_meta, match_chunk_meta, *args, **kwargs): - price = match_chunk_meta[match_idx[0][self.COL_DOC_CHUNK_ID]]['tags__price'] - discount = match_chunk_meta[match_idx[0][self.COL_DOC_CHUNK_ID]][ - 'tags__discount' - ] - return price - price * discount - - -class SimpleChunk2DocRankDriver(Chunk2DocRankDriver): - def __init__(self, docs, *args, **kwargs): - super().__init__(*args, **kwargs) - self._docs = docs - - @property - def exec_fn(self): - return self._exec_fn - - @property - def docs(self): - return self._docs - - -def create_document_to_score(): - # doc: 1 - # |- chunk: 2 - # | |- matches: (id: 4, parent_id: 40, score.value: 4), - # | |- matches: (id: 5, parent_id: 50, score.value: 5), - # | - # |- chunk: 3 - # |- matches: (id: 6, parent_id: 60, score.value: 6), - # |- matches: (id: 7, parent_id: 70, score.value: 7) - doc = Document() - doc.id = '1' - for c in range(2): - chunk = Document() - chunk_id = str(c + 2) - chunk.id = chunk_id - for m in range(2): - match = Document() - match_id = 2 * int(chunk_id) + m - match.id = str(match_id) - parent_id = 10 * int(match_id) - match.parent_id = str(parent_id) - # to be used by MaxRanker and MinRanker - match.score = NamedScore(value=int(match_id), ref_id=chunk.id) - match.tags['price'] = match.score.value - match.tags['discount'] = DISCOUNT_VAL - match.weight = 2 * int(chunk_id) + m - chunk.matches.append(match) - doc.chunks.append(chunk) - return doc - - -def create_chunk_matches_to_score(): - # doc: (id: 100, granularity=0) - # |- chunks: (id: 10) - # | |- matches: (id: 11, parent_id: 1, score.value: 2), - # | |- matches: (id: 12, parent_id: 1, score.value: 3), - # |- chunks: (id: 20) - # |- matches: (id: 21, parent_id: 2, score.value: 4), - # |- matches: (id: 22, parent_id: 2, score.value: 5) - doc = Document() - doc.id = '1' - doc.granularity = 0 - num_matches = 2 - for parent_id in range(1, 3): - chunk = Document() - chunk_id = parent_id * 10 - chunk.id = str(chunk_id) - chunk.granularity = doc.granularity + 1 - for score_value in range(parent_id * 2, parent_id * 2 + num_matches): - match = Document() - match.granularity = chunk.granularity - match.parent_id = str(parent_id) - match.score = NamedScore(value=score_value, ref_id=chunk.id) - match.id = str(10 * int(parent_id) + score_value) - chunk.matches.append(match) - doc.chunks.append(chunk) - return doc - - -def create_chunk_chunk_matches_to_score(): - # doc: (id: 100, granularity=0) - # |- chunk: (id: 101, granularity=1) - # |- chunks: (id: 10) - # | |- matches: (id: 11, parent_id: 1, score.value: 2), - # | |- matches: (id: 12, parent_id: 1, score.value: 3), - # |- chunks: (id: 20) - # |- matches: (id: 21, parent_id: 2, score.value: 4), - # |- matches: (id: 22, parent_id: 2, score.value: 5) - doc = Document() - doc.id = '100' - doc.granularity = 0 - chunk = Document() - chunk.id = '101' - chunk.parent_id = doc.id - chunk.granularity = doc.granularity + 1 - num_matches = 2 - for parent_id in range(1, 3): - chunk_chunk = Document() - chunk_chunk.id = str(parent_id * 10) - chunk_chunk.parent_id = str(parent_id) - chunk_chunk.granularity = chunk.granularity + 1 - for score_value in range(parent_id * 2, parent_id * 2 + num_matches): - match = Document() - match.parent_id = str(parent_id) - match.score = NamedScore(value=score_value, ref_id=chunk_chunk.id) - match.id = str(10 * parent_id + score_value) - chunk_chunk.matches.append(match) - chunk.chunks.append(chunk_chunk) - doc.chunks.append(chunk) - return Document(doc) - - -@pytest.mark.parametrize('keep_source_matches_as_chunks', [False, True]) -def test_chunk2doc_ranker_driver_mock_ranker(keep_source_matches_as_chunks): - doc = create_document_to_score() - driver = SimpleChunk2DocRankDriver( - docs=DocumentArray([doc]), - keep_source_matches_as_chunks=keep_source_matches_as_chunks, - ) - executor = MockLengthRanker() - driver.attach(executor=executor, runtime=None) - driver() - assert len(doc.matches) == 4 - assert doc.matches[0].id == '70' - assert doc.matches[0].score.value == 7 - assert doc.matches[1].id == '60' - assert doc.matches[1].score.value == 6 - assert doc.matches[2].id == '50' - assert doc.matches[2].score.value == 5 - assert doc.matches[3].id == '40' - assert doc.matches[3].score.value == 4 - for match in doc.matches: - # match score is computed w.r.t to doc.id - assert match.score.ref_id == doc.id - expected_chunk_matches_length = 1 if keep_source_matches_as_chunks else 0 - assert len(match.chunks) == expected_chunk_matches_length - - -@pytest.mark.parametrize('keep_source_matches_as_chunks', [False, True]) -def test_chunk2doc_ranker_driver_max_ranker(keep_source_matches_as_chunks): - doc = create_document_to_score() - driver = SimpleChunk2DocRankDriver( - docs=DocumentArray([doc]), - keep_source_matches_as_chunks=keep_source_matches_as_chunks, - ) - executor = MockMaxRanker() - driver.attach(executor=executor, runtime=None) - driver() - scale = 1 if not isinstance(executor, MockPriceDiscountRanker) else DISCOUNT_VAL - assert len(doc.matches) == 4 - assert doc.matches[0].id == '70' - assert doc.matches[0].score.value == 7 * scale - assert doc.matches[1].id == '60' - assert doc.matches[1].score.value == 6 * scale - assert doc.matches[2].id == '50' - assert doc.matches[2].score.value == 5 * scale - assert doc.matches[3].id == '40' - assert doc.matches[3].score.value == 4 * scale - for match in doc.matches: - # match score is computed w.r.t to doc.id - assert match.score.ref_id == doc.id - expected_chunk_matches_length = 1 if keep_source_matches_as_chunks else 0 - assert len(match.chunks) == expected_chunk_matches_length - - -@pytest.mark.parametrize('keep_source_matches_as_chunks', [False, True]) -def test_chunk2doc_ranker_driver_min_ranker(keep_source_matches_as_chunks): - doc = create_document_to_score() - driver = SimpleChunk2DocRankDriver( - docs=DocumentArray([doc]), - keep_source_matches_as_chunks=keep_source_matches_as_chunks, - ) - executor = MockMinRanker() - driver.attach(executor=executor, runtime=None) - driver() - assert len(doc.matches) == 4 - assert doc.matches[0].id == '40' - assert doc.matches[0].score.value == pytest.approx(1 / (1 + 4), 0.0001) - assert doc.matches[1].id == '50' - assert doc.matches[1].score.value == pytest.approx(1 / (1 + 5), 0.0001) - assert doc.matches[2].id == '60' - assert doc.matches[2].score.value == pytest.approx(1 / (1 + 6), 0.0001) - assert doc.matches[3].id == '70' - assert doc.matches[3].score.value == pytest.approx(1 / (1 + 7), 0.0001) - for match in doc.matches: - # match score is computed w.r.t to doc.id - assert match.score.ref_id == doc.id - expected_chunk_matches_length = 1 if keep_source_matches_as_chunks else 0 - assert len(match.chunks) == expected_chunk_matches_length - - -@pytest.mark.parametrize('keep_source_matches_as_chunks', [False, True]) -def test_chunk2doc_ranker_driver_traverse_apply(keep_source_matches_as_chunks): - docs = [ - create_chunk_matches_to_score(), - ] - driver = SimpleChunk2DocRankDriver( - docs=DocumentArray(docs), - keep_source_matches_as_chunks=keep_source_matches_as_chunks, - ) - executor = MockMinRanker() - driver.attach(executor=executor, runtime=None) - driver() - for doc in docs: - assert len(doc.matches) == 2 - for idx, match in enumerate(doc.matches): - # the score should be 1 / (1 + id * 2) - assert match.score.value == pytest.approx( - 1.0 / (1 + float(match.id[0]) * 2.0), 0.0001 - ) - expected_chunk_matches_length = 2 if keep_source_matches_as_chunks else 0 - assert len(match.chunks) == expected_chunk_matches_length - - -@pytest.mark.skip('TODO: https://github.com/jina-ai/jina/issues/1014') -def test_chunk2doc_ranker_driver_traverse_apply_larger_range(): - docs = [ - create_chunk_chunk_matches_to_score(), - ] - driver = SimpleChunk2DocRankDriver( - traversal_paths=('cc', 'c'), docs=DocumentArray(docs) - ) - executor = MockMinRanker() - driver.attach(executor=executor, runtime=None) - driver() - for doc in docs: - assert len(doc.matches) == 1 - assert len(doc.chunks) == 1 - chunk = doc.chunks[0] - assert len(chunk.matches) == 2 - min_granularity_2 = chunk.matches[0].score.value - for idx, m in enumerate(chunk.matches): - # the score should be 1 / (1 + id * 2) - if m.score.value < min_granularity_2: - min_granularity_2 = m.score.value - assert m.score.value == pytest.approx(1.0 / (1 + float(m.id) * 2.0), 0.0001) - assert m.score.ref_id == 101 - match = doc.matches[0] - assert match.score.ref_id == 100 - assert match.score.value == pytest.approx(1.0 / (1 + min_granularity_2), 0.0001) diff --git a/tests/unit/drivers/rank/test_matches2doc_rank_drivers.py b/tests/unit/drivers/rank/test_matches2doc_rank_drivers.py deleted file mode 100644 index 7c025d17b9d43..0000000000000 --- a/tests/unit/drivers/rank/test_matches2doc_rank_drivers.py +++ /dev/null @@ -1,86 +0,0 @@ -from typing import Dict, List - -from jina import Document -from jina.drivers.rank import Matches2DocRankDriver -from jina.executors.rankers import Match2DocRanker -from jina.types.score import NamedScore -from jina.executors.decorators import batching -from jina.types.arrays import DocumentArray - - -class MockMatches2DocRankDriver(Matches2DocRankDriver): - def __init__(self, docs, *args, **kwargs): - super().__init__(*args, **kwargs) - self._docs = docs - - @property - def exec_fn(self): - return self._exec_fn - - @property - def docs(self): - return self._docs - - -class MockAbsoluteLengthRanker(Match2DocRanker): - def __init__(self, *args, **kwargs): - super().__init__( - query_required_keys=('weight',), - match_required_keys=('weight',), - *args, - **kwargs, - ) - - @batching(slice_nargs=3) - def score( - self, - old_match_scores: List[Dict], - queries_metas: List[Dict], - matches_metas: List[List[Dict]], - ) -> List[List[float]]: - return [ - [-abs(m['weight'] - query_meta['weight']) for m in match_meta] - for query_meta, match_meta in zip(queries_metas, matches_metas) - ] - - -def create_document_to_score(): - # doc: 1 - # |- matches: (id: 2, parent_id: 1, score.value: 2), - # |- matches: (id: 3, parent_id: 1, score.value: 3), - # |- matches: (id: 4, parent_id: 1, score.value: 4), - # |- matches: (id: 5, parent_id: 1, score.value: 5), - doc = Document() - doc.id = '1' * 20 - for match_id, match_score, match_length in [ - (2, 3, 16), - (3, 6, 24), - (4, 1, 8), - (5, 8, 16), - ]: - with Document() as match: - match.id = match_id - match.score = NamedScore(value=match_score, ref_id=doc.id) - match.weight = match_length - doc.matches.append(match) - return doc - - -def test_chunk2doc_ranker_driver_mock_exec(): - doc = create_document_to_score() - driver = MockMatches2DocRankDriver(DocumentArray([doc])) - executor = MockAbsoluteLengthRanker() - driver.attach(executor=executor, runtime=None) - driver() - assert len(doc.matches) == 4 - assert doc.matches[0].id == '4' - assert doc.matches[0].score.value == -8.0 - assert doc.matches[1].id == '2' - assert doc.matches[1].score.value == -16.0 - assert doc.matches[2].id == '5' - assert doc.matches[2].score.value == -16.0 - assert doc.matches[3].id == '3' - assert doc.matches[3].score.value == -24.0 - for match in doc.matches: - assert match.score.ref_id == doc.id - assert match.score.op_name == 'MockAbsoluteLengthRanker' diff --git a/tests/unit/drivers/test_cache_driver.py b/tests/unit/drivers/test_cache_driver.py deleted file mode 100644 index d3d14bbbed8ac..0000000000000 --- a/tests/unit/drivers/test_cache_driver.py +++ /dev/null @@ -1,321 +0,0 @@ -import os -import pickle -from typing import Any - -import pytest - -from jina import DocumentArray -from jina.drivers.cache import BaseCacheDriver -from jina.drivers.delete import DeleteDriver -from jina.executors import BaseExecutor -from jina.executors.indexers.cache import DocCache, ID_KEY, CONTENT_HASH_KEY -from jina.proto import jina_pb2 -from jina.types.document import Document -from tests import random_docs, get_documents - - -class MockCacheDriver(BaseCacheDriver): - @property - def exec_fn(self): - return self._exec_fn - - def on_hit(self, req_doc: 'jina_pb2.DocumentProto', hit_result: Any) -> None: - raise NotImplementedError - - @property - def docs(self): - return DocumentArray(list(random_docs(10))) - - -def test_cache_driver_twice(tmpdir, test_metas): - docs = DocumentArray(list(random_docs(10))) - driver = MockCacheDriver() - with DocCache(tmpdir, metas=test_metas) as executor: - assert not executor.handler_mutex - driver.attach(executor=executor, runtime=None) - driver._apply_all(docs) - - with pytest.raises(NotImplementedError): - # duplicate docs - driver._apply_all(docs) - - # new docs - docs = DocumentArray(list(random_docs(10, start_id=100))) - driver._apply_all(docs) - filename = executor.save_abspath - - # check persistence - assert os.path.exists(filename) - - -def test_cache_driver_tmpfile(tmpdir, test_metas): - docs = DocumentArray(list(random_docs(10, embedding=False))) - driver = MockCacheDriver() - with DocCache(tmpdir, fields=(ID_KEY,), metas=test_metas) as executor: - assert not executor.handler_mutex - driver.attach(executor=executor, runtime=None) - - driver._apply_all(docs) - - with pytest.raises(NotImplementedError): - # duplicate docs - driver._apply_all(docs) - - # new docs - docs = DocumentArray(list(random_docs(10, start_id=100, embedding=False))) - driver._apply_all(docs) - - assert os.path.exists(executor.index_abspath) - - -def test_cache_driver_from_file(tmpdir, test_metas): - filename = 'cache' - test_metas['name'] = filename - folder = os.path.join(test_metas["workspace"]) - folder = os.path.join(folder, 'cache-0') - os.makedirs(folder) - bin_full_path = os.path.join(folder, filename) - docs = DocumentArray(list(random_docs(10, embedding=False))) - pickle.dump( - {doc.id: BaseCacheDriver.hash_doc(doc, ['content_hash']) for doc in docs}, - open(f'{bin_full_path}.bin.ids', 'wb'), - ) - pickle.dump( - {BaseCacheDriver.hash_doc(doc, ['content_hash']): doc.id for doc in docs}, - open(f'{bin_full_path}.bin.cache', 'wb'), - ) - - driver = MockCacheDriver() - with DocCache(metas=test_metas, fields=(CONTENT_HASH_KEY,)) as executor: - assert not executor.handler_mutex - driver.attach(executor=executor, runtime=None) - - with pytest.raises(NotImplementedError): - # duplicate docs - driver._apply_all(docs) - - # new docs - docs = DocumentArray(list(random_docs(10, start_id=100))) - driver._apply_all(docs) - - # check persistence - assert os.path.exists(executor.save_abspath) - - -class MockBaseCacheDriver(BaseCacheDriver): - @property - def exec_fn(self): - return self._exec_fn - - def on_hit(self, req_doc: 'jina_pb2.DocumentProto', hit_result: Any) -> None: - raise NotImplementedError - - -class SimpleDeleteDriver(DeleteDriver): - @property - def exec_fn(self): - return self._exec_fn - - -def test_cache_content_driver_same_content(tmpdir, test_metas): - doc1 = Document(id='1') - doc1.text = 'blabla' - doc1.update_content_hash() - docs1 = DocumentArray([doc1]) - - doc2 = Document(id='2') - doc2.text = 'blabla' - doc2.update_content_hash() - docs2 = DocumentArray([doc2]) - assert doc1.content_hash == doc2.content_hash - - driver = MockBaseCacheDriver() - - with DocCache(tmpdir, metas=test_metas, fields=(CONTENT_HASH_KEY,)) as executor: - driver.attach(executor=executor, runtime=None) - driver._apply_all(docs1) - - with pytest.raises(NotImplementedError): - driver._apply_all(docs2) - - assert executor.size == 1 - filename = executor.save_abspath - - # update - old_doc = Document(id=9999) - old_doc.text = 'blabla' - old_doc.update_content_hash() - - new_string = 'blabla-new' - doc1.text = new_string - doc1.update_content_hash() - with BaseExecutor.load(filename) as executor: - executor.update(['1'], [doc1.content_hash]) - - with BaseExecutor.load(filename) as executor: - assert executor.query(doc1.content_hash) is True - assert executor.query(old_doc.content_hash) is False - - # delete - with BaseExecutor.load(filename) as executor: - executor.delete([doc1.id]) - - with BaseExecutor.load(filename) as executor: - assert executor.query(doc1.content_hash) is False - - -def test_cache_content_driver_same_id(tmp_path, test_metas): - filename = os.path.join(tmp_path, 'DocCache.bin') - doc1 = Document(id=1) - doc1.text = 'blabla' - doc1.update_content_hash() - docs1 = DocumentArray([doc1]) - - doc2 = Document(id=1) - doc2.text = 'blabla2' - doc2.update_content_hash() - docs2 = DocumentArray([doc2]) - - driver = MockBaseCacheDriver() - - with DocCache(filename, metas=test_metas, fields=(CONTENT_HASH_KEY,)) as executor: - driver.attach(executor=executor, runtime=None) - driver._apply_all(docs1) - driver._apply_all(docs2) - assert executor.size == 2 - - -@pytest.mark.parametrize('field_type', [CONTENT_HASH_KEY, ID_KEY]) -def test_cache_driver_update(tmpdir, test_metas, field_type, mocker): - driver = MockBaseCacheDriver(method='update', traversal_paths=['r']) - - docs = [Document(text=f'doc_{i}') for i in range(5)] - [d.update_content_hash() for d in docs] - - def validate_delete(self, keys, *args, **kwargs): - assert len(keys) == len(docs) - assert all([k == d.id for k, d in zip(keys, docs)]) - - def validate_update(self, keys, values, *args, **kwargs): - assert len(keys) == len(docs) - assert len(values) == len(docs) - assert all([k == d.id for k, d in zip(keys, docs)]) - if self.fields == CONTENT_HASH_KEY: - assert all([v == d.content_hash for v, d in zip(values, docs)]) - elif self.fields == ID_KEY: - assert all([v == d.id for v, d in zip(values, docs)]) - - with DocCache(tmpdir, metas=test_metas, fields=(field_type,)) as e: - mocker.patch.object(DocCache, 'update', validate_update) - mocker.patch.object(DocCache, 'delete', validate_delete) - driver.attach(executor=e, runtime=None) - driver._apply_all(docs) - - -@pytest.mark.parametrize('field_type', [CONTENT_HASH_KEY, ID_KEY]) -def test_cache_driver_delete(tmpdir, test_metas, field_type, mocker): - docs = [Document(text=f'doc_{i}') for i in range(5)] - - driver = SimpleDeleteDriver() - - def validate_delete(self, keys, *args, **kwargs): - assert len(keys) == len(docs) - assert all([k == d.id for k, d in zip(keys, docs)]) - - with DocCache(tmpdir, metas=test_metas, fields=(field_type,)) as e: - mocker.patch.object(DocCache, 'delete', validate_delete) - - driver.attach(executor=e, runtime=None) - mck = mocker.patch.object(driver, 'runtime', autospec=True) - mck.request.ids = [d.id for d in docs] - driver() - - -def test_cache_driver_multiple_fields(test_metas): - docs1 = list( - get_documents(0, same_content=True, same_tag_content=False, index_start=0) - ) - docs2 = list( - get_documents(0, same_content=True, same_tag_content=False, index_start=0) - ) - filename = 'cache' - test_metas['name'] = filename - driver = MockBaseCacheDriver() - - with DocCache( - filename, metas=test_metas, fields=(CONTENT_HASH_KEY, 'tags__tag_field') - ) as executor: - driver.attach(executor=executor, runtime=None) - driver._apply_all(docs1) - with pytest.raises(NotImplementedError): - driver._apply_all(docs2) - assert executor.size == len(docs1) - - with BaseExecutor.load(executor.save_abspath) as executor: - driver.attach(executor=executor, runtime=None) - with pytest.raises(NotImplementedError): - driver._apply_all(docs1) - assert executor.size == len(docs1) - - # switching order doesn't matter - with DocCache( - metas=test_metas, - fields=( - 'tags__tag_field', - CONTENT_HASH_KEY, - ), - ) as executor: - driver.attach(executor=executor, runtime=None) - with pytest.raises(NotImplementedError): - driver._apply_all(docs1) - with pytest.raises(AssertionError): - # TODO(cristian): size should be loaded if there is an existing cache? - assert executor.size == len(docs1) - - -def test_hash(): - d1 = Document() - d1.tags['a'] = '123' - d1.tags['b'] = '456' - d2 = Document() - d2.tags['a'] = '1' - d2.tags['b'] = '23456' - assert BaseCacheDriver.hash_doc( - d1, ['tags__a', 'tags__b'] - ) == BaseCacheDriver.hash_doc(d1, ['tags__a', 'tags__b']) - assert BaseCacheDriver.hash_doc( - d1, ['tags__a', 'tags__b'] - ) != BaseCacheDriver.hash_doc(d2, ['tags__a', 'tags__b']) - - -def test_cache_legacy_field_type(tmp_path, test_metas): - filename = os.path.join(tmp_path, 'DocCache.bin') - doc1 = Document(id=1) - doc1.text = 'blabla' - doc1.update_content_hash() - docs1 = DocumentArray([doc1]) - - doc2 = Document(id=1) - doc2.text = 'blabla2' - doc2.update_content_hash() - docs2 = DocumentArray([doc2]) - - doc3 = Document(id=12312) - doc3.text = 'blabla' - doc3.update_content_hash() - docs3 = DocumentArray([doc3]) - - driver = MockBaseCacheDriver() - - with DocCache(filename, metas=test_metas, field=CONTENT_HASH_KEY) as executor: - driver.attach(executor=executor, runtime=None) - assert executor.fields == [CONTENT_HASH_KEY] - driver._apply_all(docs1) - driver._apply_all(docs2) - assert executor.size == 2 - - with BaseExecutor.load(executor.save_abspath) as executor: - driver.attach(executor=executor, runtime=None) - assert executor.fields == [CONTENT_HASH_KEY] - with pytest.raises(NotImplementedError): - driver._apply_all(docs3) diff --git a/tests/unit/drivers/test_concat_driver.py b/tests/unit/drivers/test_concat_driver.py deleted file mode 100644 index e938ec0fddf54..0000000000000 --- a/tests/unit/drivers/test_concat_driver.py +++ /dev/null @@ -1,73 +0,0 @@ -import os - -import numpy as np - -from jina import Document -from jina.flow import Flow -from jina.types.ndarray.generic import NdArray - -from tests import validate_callback - -e1 = np.random.random([7]) -e2 = np.random.random([5]) -e3 = np.random.random([3]) -e4 = np.random.random([9]) - - -def input_function(): - with Document() as doc1: - doc1.embedding = e1 - with Document() as chunk1: - chunk1.embedding = e2 - chunk1.id = 1 - doc1.chunks.add(chunk1) - with Document() as doc2: - doc2.embedding = e3 - with Document() as chunk2: - chunk2.embedding = e4 - chunk2.id = 2 - doc2.chunks.add(chunk2) - return [doc1, doc2] - - -def test_array2pb(): - # i don't understand why is this set? - # os env should be available to that process-context only - if 'JINA_ARRAY_QUANT' in os.environ: - print(f'quant is on: {os.environ["JINA_ARRAY_QUANT"]}') - del os.environ['JINA_ARRAY_QUANT'] - - d = NdArray() - d.value = e4 - np.testing.assert_almost_equal(d.value, e4) - - -def test_concat_embed_driver(mocker): - if 'JINA_ARRAY_QUANT' in os.environ: - print(f'quant is on: {os.environ["JINA_ARRAY_QUANT"]}') - del os.environ['JINA_ARRAY_QUANT'] - - def validate(req): - assert len(req.docs) == 2 - assert NdArray(req.docs[0].embedding).value.shape == (e1.shape[0] * 2,) - assert NdArray(req.docs[1].embedding).value.shape == (e3.shape[0] * 2,) - np.testing.assert_almost_equal( - NdArray(req.docs[0].embedding).value, - np.concatenate([e1, e1], axis=0), - decimal=4, - ) - - mock = mocker.Mock() - # simulate two encoders - flow = ( - Flow() - .add(name='a') - .add(name='b', needs='gateway') - .join(needs=['a', 'b'], uses='- !ConcatEmbedDriver | {}') - ) - - with flow: - flow.index(inputs=input_function, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) diff --git a/tests/unit/drivers/test_craft_driver.py b/tests/unit/drivers/test_craft_driver.py deleted file mode 100644 index 62368923a1055..0000000000000 --- a/tests/unit/drivers/test_craft_driver.py +++ /dev/null @@ -1,90 +0,0 @@ -from typing import Dict - -import numpy as np -import pytest - -from jina import Document, DocumentArray -from jina.drivers.craft import CraftDriver -from jina.executors.decorators import single -from jina.executors.crafters import BaseCrafter -from jina.types.ndarray.generic import NdArray - - -class MockCrafter(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @single - def craft(self, text: str, *args, **kwargs) -> Dict: - if text == 'valid': - return {'blob': np.array([0.0, 0.0, 0.0]), 'weight': 10} - else: - return {'non_existing_key': 1} - - -class MockImageCrafter(BaseCrafter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @single - def craft(self, blob: np.ndarray, *args, **kwargs) -> Dict: - assert len(blob.shape) == 3 - assert blob.shape[0] == 1 - return {'blob': blob} - - -class SimpleCraftDriver(CraftDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture() -def text_craft_executor(): - return MockCrafter() - - -@pytest.fixture() -def image_craft_executor(): - return MockImageCrafter() - - -@pytest.fixture() -def craft_driver(): - driver = SimpleCraftDriver() - executor = MockCrafter() - driver.attach(executor=executor, runtime=None) - return driver - - -def test_valid_document(craft_driver, text_craft_executor): - craft_driver.attach(executor=text_craft_executor, runtime=None) - valid_document = Document(content='valid') - docs = DocumentArray([valid_document]) - craft_driver._apply_all(docs) - np.testing.assert_equal( - NdArray(valid_document.blob).value, np.array([0.0, 0.0, 0.0]) - ) - assert valid_document.weight == 10 - - -def test_invalid_document(craft_driver, text_craft_executor): - craft_driver.attach(executor=text_craft_executor, runtime=None) - invalid_document = Document(content='invalid') - docs = DocumentArray([invalid_document]) - with pytest.raises(AttributeError) as error: - craft_driver._apply_all(docs) - assert error.value.__str__() == '\'non_existing_key\' is not recognized' - - -def test_image_crafting(craft_driver, image_craft_executor): - craft_driver.attach(executor=image_craft_executor, runtime=None) - blob1 = np.random.random((1, 32, 64)) - blob2 = np.random.random((1, 64, 32)) - docs = DocumentArray([Document(blob=blob1), Document(blob=blob2)]) - craft_driver._apply_all(docs) - np.testing.assert_equal(docs[0].blob, blob1) - np.testing.assert_equal(docs[1].blob, blob2) diff --git a/tests/unit/drivers/test_craftevaluation_driver.py b/tests/unit/drivers/test_craftevaluation_driver.py deleted file mode 100644 index 8c09ebb73cd03..0000000000000 --- a/tests/unit/drivers/test_craftevaluation_driver.py +++ /dev/null @@ -1,200 +0,0 @@ -from typing import Any - -import numpy as np -import pytest - -from jina import Document, Request -from jina.drivers.evaluate import FieldEvaluateDriver -from jina.executors.evaluators import BaseEvaluator -from jina.proto import jina_pb2 -from jina.types.document.helper import DocGroundtruthPair - - -class MockDiffEvaluator(BaseEvaluator): - def evaluate(self, actual: Any, desired: Any, *args, **kwargs) -> float: - return abs(len(actual) - len(desired)) - - -@pytest.fixture(scope='function', params=['text', 'buffer', 'blob']) -def field_type(request): - return request.param - - -@pytest.fixture(scope='function') -def doc_with_field_type(field_type): - class DocCreator(object): - def create(self): - doc = Document() - if field_type == 'text': - doc.text = 'aaa' - elif field_type == 'buffer': - doc.buffer = b'\x01\x02\x03' - elif field_type == 'blob': - doc.blob = np.array([1, 1, 1]) - return doc - - return DocCreator() - - -@pytest.fixture(scope='function') -def groundtruth_with_field_type(field_type): - class GTCreator(object): - def create(self): - gt = Document() - if field_type == 'text': - gt.text = 'aaaa' - elif field_type == 'buffer': - gt.buffer = b'\x01\x02\x03\04' - elif field_type == 'blob': - gt.blob = np.array([1, 1, 1, 1]) - return gt - - return GTCreator() - - -@pytest.fixture(scope='function') -def doc_groundtruth_pair(doc_with_field_type, groundtruth_with_field_type): - class DocGroundtruthPairFactory(object): - def create(self): - return DocGroundtruthPair( - doc=doc_with_field_type.create(), - groundtruth=groundtruth_with_field_type.create(), - ) - - return DocGroundtruthPairFactory() - - -@pytest.fixture(scope='function') -def ground_truth_pairs(doc_groundtruth_pair): - doc_groundtruth_pairs = [] - for _ in range(10): - doc_groundtruth_pairs.append(doc_groundtruth_pair.create()) - return doc_groundtruth_pairs - - -@pytest.fixture -def mock_diff_evaluator(): - return MockDiffEvaluator() - - -class SimpleEvaluateDriver(FieldEvaluateDriver): - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture(scope='function') -def simple_evaluate_driver(field_type): - return SimpleEvaluateDriver(field=field_type) - - -def test_crafter_evaluate_driver( - mock_diff_evaluator, simple_evaluate_driver, ground_truth_pairs -): - simple_evaluate_driver.attach(executor=mock_diff_evaluator, runtime=None) - simple_evaluate_driver._apply_all(ground_truth_pairs) - for pair in ground_truth_pairs: - doc = pair.doc - assert len(doc.evaluations) == 1 - assert doc.evaluations[0].op_name == 'MockDiffEvaluator' - assert doc.evaluations[0].value == 1.0 - - -class SimpleChunkEvaluateDriver(FieldEvaluateDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.eval_request = None - self._traversal_paths = ('c',) - - @property - def exec_fn(self): - return self._exec_fn - - @property - def req(self) -> 'jina_pb2.RequestProto': - """Get the current (typed) request, shortcut to ``self.pea.request``""" - return self.eval_request - - @property - def expect_parts(self) -> int: - return 1 - - -@pytest.fixture(scope='function') -def doc_groundtruth_pair(doc_with_field_type, groundtruth_with_field_type): - class DocGroundtruthPairFactory(object): - def create(self): - return DocGroundtruthPair( - doc=doc_with_field_type.create(), - groundtruth=groundtruth_with_field_type.create(), - ) - - return DocGroundtruthPairFactory() - - -@pytest.fixture(scope='function') -def ground_truth_pairs(doc_groundtruth_pair): - doc_groundtruth_pairs = [] - for _ in range(10): - doc_groundtruth_pairs.append(doc_groundtruth_pair.create()) - return doc_groundtruth_pairs - - -@pytest.fixture(scope='function') -def simple_chunk_evaluate_driver(): - def get_evaluate_driver(field_type): - return SimpleChunkEvaluateDriver(field=field_type) - - return get_evaluate_driver - - -@pytest.fixture -def eval_request(): - def request(field_type): - num_docs = 10 - req = jina_pb2.RequestProto() - for idx in range(num_docs): - doc = req.index.docs.add() - gt = req.index.groundtruths.add() - chunk_doc = Document(doc.chunks.add()) - chunk_gt = Document(gt.chunks.add()) - chunk_doc.granularity = 1 - chunk_gt.granularity = 1 - if field_type == 'text': - chunk_doc.text = 'aaa' - chunk_gt.text = 'aaaa' - elif field_type == 'buffer': - chunk_doc.buffer = b'\x01\x02\x03' - chunk_gt.buffer = b'\x01\x02\x03\x04' - elif field_type == 'blob': - chunk_doc.blob = np.array([1, 1, 1]) - chunk_gt.blob = np.array([1, 1, 1, 1]) - return Request(req).as_typed_request('index') - - return request - - -@pytest.mark.parametrize('field_type', ['text', 'buffer', 'blob']) -def test_crafter_evaluate_driver_in_chunks( - field_type, simple_chunk_evaluate_driver, mock_diff_evaluator, eval_request -): - # this test proves that we can evaluate matches at chunk level, - # proving that the driver can traverse in a parallel way docs and groundtruth - req = eval_request(field_type) - driver = simple_chunk_evaluate_driver(field_type) - driver.attach(executor=mock_diff_evaluator, runtime=None) - driver.eval_request = req - driver() - - rd = list(req.docs) - rg = list(req.groundtruths) - assert len(rd) == len(rg) - assert len(rd) == 10 - for doc in rd: - assert len(doc.evaluations) == 0 # evaluation done at chunk level - rdc = list(doc.chunks) - assert len(rdc) == 1 - chunk = rdc[0] - assert len(chunk.evaluations) == 1 # evaluation done at chunk level - assert chunk.evaluations[0].op_name == 'MockDiffEvaluator' - assert chunk.evaluations[0].value == 1.0 diff --git a/tests/unit/drivers/test_encoder_driver.py b/tests/unit/drivers/test_encoder_driver.py deleted file mode 100644 index c2d844c22648e..0000000000000 --- a/tests/unit/drivers/test_encoder_driver.py +++ /dev/null @@ -1,108 +0,0 @@ -from typing import Any - -import pytest -import numpy as np -from scipy import sparse - -from jina import Document, DocumentArray -from jina.drivers.encode import EncodeDriver, ScipySparseEncodeDriver -from jina.executors.encoders import BaseEncoder -from jina.executors.decorators import batching - - -@pytest.fixture(scope='function') -def num_docs(): - return 10 - - -@pytest.fixture(scope='function') -def docs_to_encode(num_docs): - docs = [] - for idx in range(1, num_docs + 1): - doc = Document(content=np.array([idx])) - docs.append(doc) - return DocumentArray(docs) - - -def get_encoder(batch_size): - class MockEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @batching(batch_size=batch_size) - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - if batch_size is not None and batch_size > 0: - assert len(content) <= batch_size - if batch_size == 5: - assert len(content) == 5 - return content - - return MockEncoder() - - -class SimpleEncoderDriver(EncodeDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.mark.parametrize( - 'batch_size', [None, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 15, 20, 100, 10000] -) -def test_encode_driver(batch_size, docs_to_encode, num_docs): - driver = SimpleEncoderDriver() - executor = get_encoder(batch_size) - driver.attach(executor=executor, runtime=None) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert doc.embedding is None - driver._apply_all(docs_to_encode) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert doc.embedding == doc.blob - - -def get_sparse_encoder(sparse_type): - class MockEncoder(BaseEncoder): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - # return a sparse vector of the same number of rows as `data` of different types - embed = sparse_type(content) - return embed - - return MockEncoder() - - -class SimpleScipySparseEncoderDriver(ScipySparseEncodeDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture( - params=[sparse.csr_matrix, sparse.coo_matrix, sparse.bsr_matrix, sparse.csc_matrix] -) -def sparse_type(request): - return request.param - - -def test_sparse_encode_driver(sparse_type, docs_to_encode, num_docs): - driver = SimpleScipySparseEncoderDriver() - encoder = get_sparse_encoder(sparse_type) - driver.attach(executor=encoder, runtime=None) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert doc.embedding is None - driver._apply_all(docs_to_encode) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert isinstance(doc.embedding, sparse.coo_matrix) - assert doc.embedding == doc.blob diff --git a/tests/unit/drivers/test_encodingevaluation_driver.py b/tests/unit/drivers/test_encodingevaluation_driver.py deleted file mode 100644 index aac6d03aecf1f..0000000000000 --- a/tests/unit/drivers/test_encodingevaluation_driver.py +++ /dev/null @@ -1,124 +0,0 @@ -import numpy as np -import pytest - -from jina import Document, Request -from jina.drivers.evaluate import NDArrayEvaluateDriver -from jina.executors.evaluators.embedding import BaseEmbeddingEvaluator -from jina.proto import jina_pb2 -from jina.types.document.helper import DocGroundtruthPair - - -class MockDiffEvaluator(BaseEmbeddingEvaluator): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def evaluate( - self, actual: 'np.array', desired: 'np.array', *args, **kwargs - ) -> float: - """ " - :param actual: the embedding of the document (resulting from an Encoder) - :param desired: the expected embedding of the document - :return the evaluation metric value for the request document - """ - return abs(sum(actual - desired) / len(actual)) - - -@pytest.fixture -def mock_diff_evaluator(): - return MockDiffEvaluator() - - -class SimpleEvaluateDriver(NDArrayEvaluateDriver): - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture -def simple_evaluate_driver(): - return SimpleEvaluateDriver() - - -@pytest.fixture -def ground_truth_pairs(): - num_docs = 10 - pairs = [] - for idx in range(num_docs): - doc = Document(embedding=np.array([1, 1])) - gt = Document(embedding=np.array([2, 2])) - pairs.append(DocGroundtruthPair(doc=doc, groundtruth=gt)) - return pairs - - -def test_encoding_evaluate_driver( - mock_diff_evaluator, simple_evaluate_driver, ground_truth_pairs -): - simple_evaluate_driver.attach(executor=mock_diff_evaluator, runtime=None) - simple_evaluate_driver._apply_all(ground_truth_pairs) - for pair in ground_truth_pairs: - doc = pair.doc - assert len(doc.evaluations) == 1 - assert doc.evaluations[0].op_name == 'MockDiffEvaluator' - assert doc.evaluations[0].value == 1.0 - - -class SimpleChunkEvaluateDriver(NDArrayEvaluateDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.eval_request = None - self._traversal_paths = ('c',) - - @property - def exec_fn(self): - return self._exec_fn - - @property - def req(self) -> 'jina_pb2.RequestProto': - """Get the current (typed) request, shortcut to ``self.pea.request``""" - return self.eval_request - - @property - def expect_parts(self) -> int: - return 1 - - -@pytest.fixture -def simple_chunk_evaluate_driver(): - return SimpleChunkEvaluateDriver() - - -@pytest.fixture -def eval_request(): - num_docs = 10 - req = jina_pb2.RequestProto() - for idx in range(num_docs): - doc = Document(req.index.docs.add()) - gt = Document(req.index.groundtruths.add()) - chunk_doc = doc.chunks.new() - chunk_gt = gt.chunks.new() - chunk_doc.embedding = np.array([1, 1]) - chunk_gt.embedding = np.array([2, 2]) - return Request(req).as_typed_request('index') - - -def test_encoding_evaluate_driver_embedding_in_chunks( - simple_chunk_evaluate_driver, mock_diff_evaluator, eval_request -): - # this test proves that we can evaluate matches at chunk level, - # proving that the driver can traverse in a parallel way docs and groundtruth - simple_chunk_evaluate_driver.attach(executor=mock_diff_evaluator, runtime=None) - simple_chunk_evaluate_driver.eval_request = eval_request - simple_chunk_evaluate_driver() - - ed = list(eval_request.docs) - eg = list(eval_request.groundtruths) - assert len(ed) == len(eg) - assert len(ed) == 10 - for doc in ed: - assert len(doc.evaluations) == 0 # evaluation done at chunk level - dc = list(doc.chunks) - assert len(dc) == 1 - chunk = dc[0] - assert len(chunk.evaluations) == 1 # evaluation done at chunk level - assert chunk.evaluations[0].op_name == 'MockDiffEvaluator' - assert chunk.evaluations[0].value == 1.0 diff --git a/tests/unit/drivers/test_eval_collect_driver.py b/tests/unit/drivers/test_eval_collect_driver.py deleted file mode 100644 index 08d688f34e37b..0000000000000 --- a/tests/unit/drivers/test_eval_collect_driver.py +++ /dev/null @@ -1,38 +0,0 @@ -from jina import Flow -from jina import Document - -from tests import validate_callback - - -def input_function(): - with Document() as doc1: - ev1 = doc1.evaluations.add() - ev1.value = 1 - ev1.op_name = 'op1' - with Document() as doc2: - ev2 = doc2.evaluations.add() - ev2.value = 2 - ev2.op_name = 'op2' - return [doc1, doc2] - - -def test_collect_evals_driver(mocker): - def validate(req): - assert len(req.docs) == 2 - # each doc should now have two evaluations - for d in req.docs: - assert len(d.evaluations) == 2 - - mock = mocker.Mock() - # simulate two encoders - flow = ( - Flow() - .add(name='a') - .add(name='b', needs='gateway') - .join(needs=['a', 'b'], uses='- !CollectEvaluationDriver {}') - ) - with flow: - flow.index(inputs=input_function, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) diff --git a/tests/unit/drivers/test_helper.py b/tests/unit/drivers/test_helper.py deleted file mode 100644 index 7caf135a9b8e6..0000000000000 --- a/tests/unit/drivers/test_helper.py +++ /dev/null @@ -1,56 +0,0 @@ -import random - -import numpy as np -import pytest - -from jina import Document, DocumentArray -from jina.proto import jina_pb2 -from jina.types.message import Message -from jina.types.ndarray.generic import NdArray - - -@pytest.mark.parametrize('proto_type', ['float32', 'float64', 'uint8']) -@pytest.mark.repeat(10) -def test_array_protobuf_conversions(proto_type): - random_array = np.random.rand( - random.randrange(0, 50), random.randrange(0, 20) - ).astype(proto_type) - d = NdArray() - d.value = random_array - np.testing.assert_almost_equal(d.value, random_array) - - -@pytest.mark.parametrize( - 'quantize, proto_type', - [('fp16', 'float32'), ('fp16', 'float64'), ('uint8', 'uint8')], -) -@pytest.mark.repeat(10) -def test_array_protobuf_conversions_with_quantize(quantize, proto_type): - random_array = np.random.rand( - random.randrange(0, 50), random.randrange(0, 20) - ).astype(proto_type) - d = NdArray(quantize=quantize) - d.value = random_array - np.testing.assert_almost_equal(d.value, random_array, decimal=2) - - -def test_add_route(): - r = jina_pb2.RequestProto() - r.control.command = jina_pb2.RequestProto.ControlRequestProto.IDLE - msg = Message(None, r, pod_name='test1', identity='sda') - msg.add_route('name', 'identity') - assert len(msg.envelope.routes) == 2 - assert msg.envelope.routes[1].pod == 'name' - assert msg.envelope.routes[1].pod_id == 'identity' - - -def test_extract_docs(): - d = Document() - - contents, docs_pts = DocumentArray([d]).all_embeddings - assert contents is None - - vec = np.random.random([2, 2]) - d.embedding = vec - contents, docs_pts = DocumentArray([d]).all_embeddings - np.testing.assert_equal(contents[0], vec) diff --git a/tests/unit/drivers/test_kv_index_driver.py b/tests/unit/drivers/test_kv_index_driver.py deleted file mode 100644 index b2a0ff76255b4..0000000000000 --- a/tests/unit/drivers/test_kv_index_driver.py +++ /dev/null @@ -1,142 +0,0 @@ -from typing import Iterable - -import pytest - -from jina.drivers.index import KVIndexDriver -from jina.executors.indexers import BaseKVIndexer -from jina.types.document import Document - - -class MockGroundTruthIndexer(BaseKVIndexer): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.docs = {} - - def add( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - for key, value in zip(keys, values): - self.docs[key] = value - - def update( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - for key, value in zip(keys, values): - self.docs[key] = value - - def delete(self, keys: Iterable[str], *args, **kwargs) -> None: - for key in keys: - del self.docs[key] - - -class SimpleKVIndexDriver(KVIndexDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture(scope='function') -def simple_kv_indexer_driver_add(): - return SimpleKVIndexDriver() - - -@pytest.fixture(scope='function') -def simple_kv_indexer_driver_update(): - return SimpleKVIndexDriver(method='update') - - -@pytest.fixture(scope='function') -def simple_kv_indexer_driver_delete(): - return SimpleKVIndexDriver(method='delete') - - -@pytest.fixture(scope='function') -def mock_groundtruth_indexer(): - return MockGroundTruthIndexer() - - -@pytest.fixture(scope='function') -def documents(): - docs = [] - for idx in range(5): - with Document(text=f'{idx}') as d: - d.id = f'{idx:0>16}' - docs.append(d) - return docs - - -@pytest.fixture(scope='function') -def updated_documents(): - docs = [] - for idx in range(3): - with Document(text='updated_' + f'{idx}') as d: - d.id = f'{idx:0>16}' - docs.append(d) - return docs - - -@pytest.fixture(scope='function') -def deleted_documents(): - docs = [] - for idx in range(3): - with Document() as d: - d.id = f'{idx:0>16}' - docs.append(d) - return docs - - -def test_kv_index_driver_add( - mock_groundtruth_indexer, simple_kv_indexer_driver_add, documents -): - simple_kv_indexer_driver_add.attach(executor=mock_groundtruth_indexer, runtime=None) - simple_kv_indexer_driver_add._apply_all(documents) - - assert len(mock_groundtruth_indexer.docs) == 5 - for idx, doc in enumerate(documents): - assert mock_groundtruth_indexer.docs[doc.id] == doc.SerializeToString() - - -def test_kv_index_driver_update( - mock_groundtruth_indexer, - simple_kv_indexer_driver_add, - simple_kv_indexer_driver_update, - documents, - updated_documents, -): - simple_kv_indexer_driver_add.attach(executor=mock_groundtruth_indexer, runtime=None) - simple_kv_indexer_driver_add._apply_all(documents) - - simple_kv_indexer_driver_update.attach( - executor=mock_groundtruth_indexer, runtime=None - ) - simple_kv_indexer_driver_update._apply_all(updated_documents) - - assert len(mock_groundtruth_indexer.docs) == 5 - for idx, doc in enumerate(updated_documents[:3] + documents[3:5]): - assert mock_groundtruth_indexer.docs[doc.id] == doc.SerializeToString() - - -def test_kv_index_driver_delete( - mock_groundtruth_indexer, - simple_kv_indexer_driver_add, - simple_kv_indexer_driver_delete, - documents, - deleted_documents, -): - simple_kv_indexer_driver_add.attach(executor=mock_groundtruth_indexer, runtime=None) - simple_kv_indexer_driver_add._apply_all(documents) - - simple_kv_indexer_driver_delete.attach( - executor=mock_groundtruth_indexer, runtime=None - ) - simple_kv_indexer_driver_delete._apply_all(deleted_documents) - - assert len(mock_groundtruth_indexer.docs) == 2 - for idx, doc in enumerate(documents[3:5]): - assert mock_groundtruth_indexer.docs[doc.id] == doc.SerializeToString() - - for idx, doc in enumerate(deleted_documents[:3]): - assert doc.id not in mock_groundtruth_indexer.docs diff --git a/tests/unit/drivers/test_kv_search_driver.py b/tests/unit/drivers/test_kv_search_driver.py deleted file mode 100644 index f74ff906b3bda..0000000000000 --- a/tests/unit/drivers/test_kv_search_driver.py +++ /dev/null @@ -1,166 +0,0 @@ -from typing import Optional, Iterable - -import numpy as np -import pytest - -from jina import Document, DocumentArray -from jina.drivers.search import KVSearchDriver -from jina.executors.indexers import BaseKVIndexer -from jina.types.ndarray.generic import NdArray - - -class MockIndexer(BaseKVIndexer): - def add( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - pass - - def query(self, keys: Iterable[str]) -> Optional[bytes]: - values = [] - for k in keys: - values.append(self.db.get(k, None)) - return values - - def get_query_handler(self): - pass - - def get_add_handler(self): - pass - - def get_create_handler(self): - pass - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.db = {} - doc_ids = ['1', '2', '3', '4'] - doc_ids = [item * 16 for item in doc_ids] - for doc_id in doc_ids: - with Document() as doc: - doc.id = doc_id - doc.embedding = np.array([doc.id]) - self.db[doc.id] = doc.SerializeToString() - - -class SimpleKVSearchDriver(KVSearchDriver): - def __init__(self, docs=None, traversal_paths=['r'], *args, **kwargs): - super().__init__(traversal_paths=traversal_paths, *args, **kwargs) - self._docs = docs - - @property - def docs(self): - return self._docs - - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture(scope='function') -def document(): - # 1-D embedding - # doc: 0 - # - chunk: 1 - # - chunk: 2 - # - chunk: 3 - # - chunk: 4 - # - chunk: 5 - will be missing from KV indexer - doc = Document() - doc.id = '0' * 16 - for c in range(5): - with Document() as chunk: - chunk.id = str(c + 1) * 16 - doc.chunks.add(chunk) - return doc - - -@pytest.fixture(scope='function') -def document_with_matches_on_chunks(): - # 1-D embedding - # doc: 0 - # - chunk: 1 - # - match: 2 - # - match: 3 - # - match: 4 - # - match: 5 - will be missing from KV indexer - # - match: 6 - will be missing from KV indexer - with Document() as doc: - doc.id = '0' * 16 - with Document() as chunk: - chunk.id = '1' * 16 - for m in range(5): - with Document() as match: - match.id = str(m + 2) * 16 - match.score.value = 1.0 - chunk.matches.append(match) - doc.chunks.append(chunk) - return doc - - -def test_vectorsearch_driver_mock_indexer_apply_all(document): - driver = SimpleKVSearchDriver() - - executor = MockIndexer() - driver.attach(executor=executor, runtime=None) - - dcs = list(document.chunks) - assert len(dcs) == 5 - for chunk in dcs: - assert chunk.embedding is None - - driver._apply_all([DocumentArray(document.chunks)]) - - dcs = list(document.chunks) - - # chunk idx: 5 had no matched and is removed as missing idx - assert len(dcs) == 4 - for chunk in dcs: - assert chunk.embedding is not None - embedding_array = chunk.embedding - np.testing.assert_equal(embedding_array, np.array([chunk.id])) - - -def test_vectorsearch_driver_mock_indexer(document): - dcs = list(document.chunks) - assert len(dcs) == 5 - for chunk in dcs: - assert chunk.embedding is None - - driver = SimpleKVSearchDriver( - docs=DocumentArray([document]), traversal_paths=('c',) - ) - - executor = MockIndexer() - driver.attach(executor=executor, runtime=None) - - driver() - - # chunk idx: 5 had no matched and is removed as missing idx - dcs = list(document.chunks) - assert len(dcs) == 4 - for chunk in dcs: - assert chunk.embedding is not None - embedding_array = chunk.embedding - np.testing.assert_equal(embedding_array, np.array([chunk.id])) - - -def test_vectorsearch_driver_mock_indexer_with_matches_on_chunks( - document_with_matches_on_chunks, -): - driver = SimpleKVSearchDriver( - docs=DocumentArray([document_with_matches_on_chunks]), traversal_paths=('cm',) - ) - executor = MockIndexer() - driver.attach(executor=executor, runtime=None) - - driver() - - dcs = list(document_with_matches_on_chunks.chunks) - assert len(dcs) == 1 - chunk = dcs[0] - matches = list(chunk.matches) - assert len(matches) == 3 - for match in matches: - assert NdArray(match.embedding).value is not None - embedding_array = NdArray(match.embedding).value - np.testing.assert_equal(embedding_array, np.array([match.id])) diff --git a/tests/unit/drivers/test_load_groundtruth_driver.py b/tests/unit/drivers/test_load_groundtruth_driver.py deleted file mode 100644 index 8ca576ee333d6..0000000000000 --- a/tests/unit/drivers/test_load_groundtruth_driver.py +++ /dev/null @@ -1,117 +0,0 @@ -from typing import Optional, Iterable - -import pytest - -from jina import Request -from jina.drivers.evaluate import LoadGroundTruthDriver -from jina.executors.indexers import BaseKVIndexer -from jina.proto import jina_pb2 -from jina.types.document import Document - - -class MockGroundTruthIndexer(BaseKVIndexer): - def add( - self, keys: Iterable[str], values: Iterable[bytes], *args, **kwargs - ) -> None: - pass - - def query(self, keys: str) -> Optional[bytes]: - retrieved_queries = [] - for key in keys: - if key in self.db.keys(): - retrieved_queries.append(self.db[key]) - else: - retrieved_queries.append(None) - return retrieved_queries - - def get_query_handler(self): - pass - - def get_add_handler(self): - pass - - def get_create_handler(self): - pass - - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - doc1 = Document() - doc1.id = '01' * 8 - doc1.tags['groundtruth'] = True - doc2 = Document() - doc2.id = '02' * 8 - doc2.tags['groundtruth'] = True - doc4 = Document() - doc4.id = '04' * 8 - doc4.tags['groundtruth'] = True - self.db = { - doc1.id: doc1.SerializeToString(), - doc2.id: doc2.SerializeToString(), - doc4.id: doc4.SerializeToString(), - } - - -class SimpleLoadGroundTruthDriver(LoadGroundTruthDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.eval_request = None - - @property - def exec_fn(self): - return self._exec_fn - - @property - def req(self) -> 'jina_pb2.RequestProto': - """Get the current (typed) request, shortcut to ``self.pea.request``""" - return self.eval_request - - @property - def expect_parts(self) -> int: - return 1 - - -@pytest.fixture(scope='function') -def simple_load_groundtruth_driver(): - return SimpleLoadGroundTruthDriver() - - -@pytest.fixture(scope='function') -def mock_groundtruth_indexer(): - return MockGroundTruthIndexer() - - -@pytest.fixture(scope='function') -def eval_request(): - req = Request() - req.request_type = 'search' - # doc: 1 - # doc: 2 - # doc: 3 - # doc: 4 - # doc: 5 - will be missing from KV indexer - for idx in range(5): - dp = Document() - dp.id = f'0{str(idx + 1)}' * 8 - req.docs.append(dp) - return req - - -def test_load_groundtruth_driver( - mock_groundtruth_indexer, simple_load_groundtruth_driver, eval_request -): - simple_load_groundtruth_driver.attach( - executor=mock_groundtruth_indexer, runtime=None - ) - simple_load_groundtruth_driver.eval_request = eval_request - simple_load_groundtruth_driver() - - assert len(eval_request.docs) == 3 - assert len(eval_request.groundtruths) == 3 - - for groundtruth in eval_request.groundtruths: - assert groundtruth.tags['groundtruth'] - - assert eval_request.groundtruths[0].id == '01' * 8 - assert eval_request.groundtruths[1].id == '02' * 8 - # index 3 and 5 have no groundtruth in the KVIndexer - assert eval_request.groundtruths[2].id == '04' * 8 diff --git a/tests/unit/drivers/test_multimodal_driver.py b/tests/unit/drivers/test_multimodal_driver.py deleted file mode 100644 index 90efac29cc4b0..0000000000000 --- a/tests/unit/drivers/test_multimodal_driver.py +++ /dev/null @@ -1,148 +0,0 @@ -import numpy as np -import pytest - -from jina import Document, DocumentArray -from jina.drivers.multimodal import MultiModalDriver -from jina.executors.encoders.multimodal import BaseMultiModalEncoder -from jina.types.document.multimodal import MultimodalDocument - - -@pytest.fixture(scope='function') -def embedding(): - class EmbeddingFactory(object): - def create(self, dim, seed): - np.random.seed(seed) - return np.random.random([dim]) - - return EmbeddingFactory() - - -@pytest.fixture(scope='function') -def embeddings(embedding): - return [ - embedding.create(dim=16, seed=1), - embedding.create(dim=16, seed=2), - embedding.create(dim=24, seed=3), - ] - - -@pytest.fixture(scope='function') -def doc_with_multimodal_chunks(embeddings): - doc = MultimodalDocument() - chunk1 = Document() - chunk2 = Document() - chunk3 = Document() - chunk1.modality = 'visual1' - chunk2.modality = 'visual2' - chunk3.modality = 'textual' - chunk1.embedding = embeddings[0] - chunk2.embedding = embeddings[1] - chunk3.embedding = embeddings[2] - doc.chunks.append(chunk1) - doc.chunks.append(chunk2) - doc.chunks.append(chunk3) - return doc - - -class MockMultiModalEncoder(BaseMultiModalEncoder): - def __init__(self, positional_modality, *args, **kwargs): - super().__init__(positional_modality=positional_modality, *args, **kwargs) - - def encode(self, *data: 'np.ndarray', **kwargs) -> 'np.ndarray': - visual1 = data[(self.positional_modality.index('visual1'))] - visual2 = data[(self.positional_modality.index('visual2'))] - textual = data[(self.positional_modality.index('textual'))] - return np.concatenate((visual1, visual2, textual), axis=1) - - -@pytest.fixture -def mock_multimodal_encoder(): - return MockMultiModalEncoder(positional_modality=['visual1', 'visual2', 'textual']) - - -class SimpleMultiModalDriver(MultiModalDriver): - def __init__(self, *args, **kwargs): - import logging - - super().__init__(*args, **kwargs) - self.test_logger = logging.getLogger('test multimodal driver') - - @property - def logger(self): - return self.test_logger - - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture -def simple_multimodal_driver(): - return SimpleMultiModalDriver() - - -def test_multimodal_driver( - simple_multimodal_driver, mock_multimodal_encoder, doc_with_multimodal_chunks -): - simple_multimodal_driver.attach(executor=mock_multimodal_encoder, runtime=None) - simple_multimodal_driver._apply_all(DocumentArray([doc_with_multimodal_chunks])) - doc = doc_with_multimodal_chunks - assert len(doc.chunks) == 3 - visual1 = doc.chunks[0] - visual2 = doc.chunks[1] - textual = doc.chunks[2] - assert ( - doc.embedding.shape[0] - == visual1.embedding.shape[0] - + visual2.embedding.shape[0] - + textual.embedding.shape[0] - ) - - -@pytest.fixture(scope='function') -def doc_with_multimodal_chunks_wrong(embeddings): - doc = MultimodalDocument() - chunk1 = Document() - chunk2 = Document() - chunk3 = Document() - chunk1.modality = 'visual' - chunk2.modality = 'visual' - chunk3.modality = 'textual' - chunk1.embedding = embeddings[0] - chunk2.embedding = embeddings[1] - chunk3.embedding = embeddings[2] - doc.chunks.append(chunk1) - doc.chunks.append(chunk2) - doc.chunks.append(chunk3) - return doc - - -def test_multimodal_driver_assert_one_chunk_per_modality( - simple_multimodal_driver, mock_multimodal_encoder, doc_with_multimodal_chunks_wrong -): - simple_multimodal_driver.attach(executor=mock_multimodal_encoder, runtime=None) - assert not doc_with_multimodal_chunks_wrong.is_valid - - -@pytest.fixture -def mock_multimodal_encoder_shuffled(): - return MockMultiModalEncoder(positional_modality=['visual2', 'textual', 'visual1']) - - -def test_multimodal_driver_with_shuffled_order( - simple_multimodal_driver, - mock_multimodal_encoder_shuffled, - doc_with_multimodal_chunks, -): - simple_multimodal_driver.attach( - executor=mock_multimodal_encoder_shuffled, runtime=None - ) - simple_multimodal_driver._apply_all(DocumentArray([doc_with_multimodal_chunks])) - doc = doc_with_multimodal_chunks - assert len(doc.chunks) == 3 - visual1 = doc.chunks[2] - visual2 = doc.chunks[0] - textual = doc.chunks[1] - control = np.concatenate([visual2.embedding, textual.embedding, visual1.embedding]) - test = doc.embedding - np.testing.assert_array_equal(control, test) diff --git a/tests/unit/drivers/test_predict_driver.py b/tests/unit/drivers/test_predict_driver.py deleted file mode 100644 index 0db959a18920d..0000000000000 --- a/tests/unit/drivers/test_predict_driver.py +++ /dev/null @@ -1,47 +0,0 @@ -import numpy as np - -from jina.drivers.predict import ( - BinaryPredictDriver, - OneHotPredictDriver, - MultiLabelPredictDriver, -) - - -def test_binary_predict_driver(): - driver = BinaryPredictDriver() - - assert driver.prediction2label(np.array([0, 1, 1, 0])) == ['no', 'yes', 'yes', 'no'] - - -def test_one_hot_predict_driver(): - driver = OneHotPredictDriver(labels=['cat', 'dog', 'human']) - - assert driver.prediction2label(np.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]])) == [ - 'human', - 'cat', - 'dog', - ] - - driver = OneHotPredictDriver(labels=['yes', 'no']) - - assert driver.prediction2label(np.array([[0, 1], [1, 0], [0, 1]])) == [ - 'no', - 'yes', - 'no', - ] - - -def test_multi_label_predict_driver(): - driver = MultiLabelPredictDriver(labels=['cat', 'dog', 'human']) - - assert driver.prediction2label(np.array([[0, 0, 1], [1, 0, 0], [0, 1, 0]])) == [ - ['human'], - ['cat'], - ['dog'], - ] - - assert driver.prediction2label(np.array([[0, 1, 1], [1, 1, 0], [1, 1, 1]])) == [ - ['dog', 'human'], - ['cat', 'dog'], - ['cat', 'dog', 'human'], - ] diff --git a/tests/unit/drivers/test_predict_mock_driver.py b/tests/unit/drivers/test_predict_mock_driver.py deleted file mode 100644 index 253890ca5320d..0000000000000 --- a/tests/unit/drivers/test_predict_mock_driver.py +++ /dev/null @@ -1,136 +0,0 @@ -import numpy as np -import pytest -from google.protobuf.struct_pb2 import ListValue - -from jina import Document -from jina.drivers.predict import ( - BinaryPredictDriver, - MultiLabelPredictDriver, - OneHotPredictDriver, - Prediction2DocBlobDriver, -) -from jina.executors.classifiers import BaseClassifier -from jina.types.ndarray.generic import NdArray -from jina.types.arrays import DocumentArray -from tests import random_docs - - -@pytest.fixture(scope='function') -def num_docs(): - return 10 - - -@pytest.fixture(scope='function') -def docs_to_encode(num_docs): - docs = [] - for idx in range(num_docs): - doc = Document(content=np.array([idx])) - docs.append(doc) - return DocumentArray(docs) - - -class MockBinaryPredictDriver(BinaryPredictDriver): - def exec_fn(self, embedding: 'np.ndarray'): - random_label = np.random.randint(0, 2, [embedding.shape[0]]) - return random_label.astype(np.int64) - - -class MockOneHotPredictDriver(OneHotPredictDriver): - def exec_fn(self, embedding: 'np.ndarray'): - return np.eye(3)[np.random.choice(3, embedding.shape[0])] - - -class MockMultiLabelPredictDriver(MultiLabelPredictDriver): - def exec_fn(self, embedding: 'np.ndarray'): - return np.eye(3)[np.random.choice(3, embedding.shape[0])] - - -class MockAllLabelPredictDriver(MultiLabelPredictDriver): - def exec_fn(self, embedding: 'np.ndarray'): - return np.ones([embedding.shape[0], 3]) - - -class MockPrediction2DocBlobDriver(Prediction2DocBlobDriver): - def exec_fn(self, embedding: 'np.ndarray'): - return np.eye(3)[np.random.choice(3, embedding.shape[0])] - - -class MockClassifierDriver(BinaryPredictDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - -class MockClassifier(BaseClassifier): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - def predict(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - # predict 0 or 1 based on divisiblity by 2 - return (content % 2 == 0).astype(int) - - -def test_binary_predict_driver(): - docs = DocumentArray(random_docs(2)) - driver = MockBinaryPredictDriver() - driver._apply_all(docs) - - for d in docs: - assert d.tags['prediction'] in {'yes', 'no'} - for c in d.chunks: - assert c.tags['prediction'] in {'yes', 'no'} - - -def test_one_hot_predict_driver(): - docs = DocumentArray(random_docs(2)) - driver = MockOneHotPredictDriver(labels=['cat', 'dog', 'human']) - driver._apply_all(docs) - - for d in docs: - assert d.tags['prediction'] in {'cat', 'dog', 'human'} - for c in d.chunks: - assert c.tags['prediction'] in {'cat', 'dog', 'human'} - - -def test_multi_label_predict_driver(): - docs = DocumentArray(random_docs(2)) - driver = MockMultiLabelPredictDriver(labels=['cat', 'dog', 'human']) - driver._apply_all(docs) - - for d in docs: - assert isinstance(d.tags['prediction'], ListValue) - for t in d.tags['prediction']: - assert t in {'cat', 'dog', 'human'} - - docs = DocumentArray(random_docs(2)) - driver = MockAllLabelPredictDriver(labels=['cat', 'dog', 'human']) - driver._apply_all(docs) - - for d in docs: - assert isinstance(d.tags['prediction'], ListValue) - assert list(d.tags['prediction']) == ['cat', 'dog', 'human'] - - -def test_as_blob_driver(): - docs = DocumentArray(random_docs(2)) - driver = MockPrediction2DocBlobDriver() - driver._apply_all(docs) - - for d in docs: - assert NdArray(d.blob).value.shape == (3,) - - -def test_predict_driver_without_embeddings(docs_to_encode, num_docs): - executor = MockClassifier(total_num_docs=num_docs) - driver = MockClassifierDriver(fields='content') # use doc.content to predict tags - driver.attach(executor=executor, runtime=None) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert doc.embedding is None - driver._apply_all(docs_to_encode) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert doc.tags['prediction'] in ['yes', 'no'] diff --git a/tests/unit/drivers/test_rankingevaluation_driver.py b/tests/unit/drivers/test_rankingevaluation_driver.py deleted file mode 100644 index f1b6a42a6f1af..0000000000000 --- a/tests/unit/drivers/test_rankingevaluation_driver.py +++ /dev/null @@ -1,225 +0,0 @@ -from typing import Tuple - -import pytest - -from jina.drivers.evaluate import RankEvaluateDriver -from jina.executors.evaluators.rank.precision import PrecisionEvaluator -from jina.proto import jina_pb2 -from jina.types.document.helper import DocGroundtruthPair - -from tests import validate_callback - - -class SimpleRankEvaluateDriver(RankEvaluateDriver): - def __init__(self, fields: Tuple[str], *args, **kwargs): - super().__init__(fields, *args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - @property - def expect_parts(self) -> int: - return 1 - - -class RunningAvgRankEvaluateDriver(RankEvaluateDriver): - def __init__(self, fields: Tuple[str], *args, **kwargs): - super().__init__(fields, runining_avg=True, *args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - @property - def expect_parts(self) -> int: - return 1 - - -@pytest.fixture -def simple_rank_evaluate_driver(fields): - return SimpleRankEvaluateDriver(fields) - - -@pytest.fixture -def runningavg_rank_evaluate_driver(fields): - return RunningAvgRankEvaluateDriver(fields) - - -@pytest.fixture -def ground_truth_pairs(): - num_docs = 10 - - def add_matches(doc: jina_pb2.DocumentProto, num_matches): - for idx in range(num_matches): - match = doc.matches.add() - match.tags['id'] = idx - match.score.value = idx - - pairs = [] - for idx in range(num_docs): - doc = jina_pb2.DocumentProto() - gt = jina_pb2.DocumentProto() - add_matches(doc, num_docs) - add_matches(gt, num_docs) - pairs.append(DocGroundtruthPair(doc=doc, groundtruth=gt)) - return pairs - - -@pytest.mark.parametrize( - 'fields', [('tags__id',), ('score__value',), 'tags__id', 'score__value'] -) -def test_ranking_evaluate_simple_driver( - simple_rank_evaluate_driver, ground_truth_pairs -): - simple_rank_evaluate_driver.attach( - executor=PrecisionEvaluator(eval_at=2), runtime=None - ) - simple_rank_evaluate_driver._apply_all(ground_truth_pairs) - for pair in ground_truth_pairs: - doc = pair.doc - assert len(doc.evaluations) == 1 - assert doc.evaluations[0].op_name == 'PrecisionEvaluator@2' - assert doc.evaluations[0].value == 1.0 - - -@pytest.mark.parametrize('fields', [('tags__id', 'score__value')]) -def test_ranking_evaluate_extract_multiple_fields( - simple_rank_evaluate_driver, ground_truth_pairs, mocker -): - - mock = mocker.Mock() - mock.return_value = 1.0 - - def eval_function(actual, desired): - assert isinstance(actual[0], Tuple) - assert isinstance(desired[0], Tuple) - - simple_rank_evaluate_driver._exec_fn = mock - simple_rank_evaluate_driver._apply_all(ground_truth_pairs) - - validate_callback(mock, eval_function) - - -@pytest.mark.parametrize('fields', [('tags__id',), ('score__value',)]) -def test_ranking_evaluate_runningavg_driver( - runningavg_rank_evaluate_driver, ground_truth_pairs -): - runningavg_rank_evaluate_driver.attach( - executor=PrecisionEvaluator(eval_at=2), runtime=None - ) - runningavg_rank_evaluate_driver._apply_all(ground_truth_pairs) - for pair in ground_truth_pairs: - doc = pair.doc - assert len(doc.evaluations) == 1 - assert doc.evaluations[0].op_name == 'PrecisionEvaluator@2' - assert doc.evaluations[0].value == 1.0 - - -class SimpleChunkRankEvaluateDriver(RankEvaluateDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.eval_request = None - self._traversal_paths = ('c',) - - @property - def exec_fn(self): - return self._exec_fn - - @property - def req(self) -> 'jina_pb2.RequestProto': - """Get the current (typed) request, shortcut to ``self.pea.request``""" - return self.eval_request - - @property - def expect_parts(self) -> int: - return 1 - - -@pytest.fixture -def simple_chunk_rank_evaluate_driver(): - return SimpleChunkRankEvaluateDriver() - - -@pytest.fixture -def eval_request(): - num_docs = 10 - num_matches = 1 - - def add_matches(doc: jina_pb2.DocumentProto): - for idx in range(num_matches): - match = doc.matches.add() - match.tags['id'] = idx - - req = jina_pb2.RequestProto.IndexRequestProto() - for idx in range(num_docs): - doc = req.docs.add() - gt = req.groundtruths.add() - chunk_doc = doc.chunks.add() - chunk_gt = gt.chunks.add() - chunk_doc.granularity = 1 - chunk_gt.granularity = 1 - add_matches(chunk_doc) - add_matches(chunk_gt) - return req - - -@pytest.mark.parametrize('eval_at', [None, 2]) -def test_ranking_evaluate_driver_matches_in_chunks( - simple_chunk_rank_evaluate_driver, eval_request, eval_at -): - # this test proves that we can evaluate matches at chunk level, - # proving that the driver can traverse in a parallel way docs and groundtruth - simple_chunk_rank_evaluate_driver.attach( - executor=PrecisionEvaluator(eval_at=eval_at), runtime=None - ) - simple_chunk_rank_evaluate_driver.eval_request = eval_request - simple_chunk_rank_evaluate_driver() - - assert len(eval_request.docs) == len(eval_request.groundtruths) - assert len(eval_request.docs) == 10 - for doc in eval_request.docs: - assert len(doc.evaluations) == 0 # evaluation done at chunk level - assert len(doc.chunks) == 1 - chunk = doc.chunks[0] - assert len(chunk.evaluations) == 1 # evaluation done at chunk level - if eval_at: - assert chunk.evaluations[0].op_name == 'PrecisionEvaluator@2' - else: - assert chunk.evaluations[0].op_name == 'PrecisionEvaluator' - assert chunk.evaluations[0].value == 1.0 - - -@pytest.fixture -def eval_request_with_unmatching_struct(): - num_docs = 10 - num_matches = 1 - - def add_matches(doc: jina_pb2.DocumentProto): - for idx in range(num_matches): - match = doc.matches.add() - match.tags['id'] = idx - - req = jina_pb2.RequestProto.SearchRequestProto() - for idx in range(num_docs): - doc = req.docs.add() - gt = req.groundtruths.add() - chunk_doc = doc.chunks.add() - chunk_gt = gt.chunks.add() - chunk_doc.granularity = 1 - chunk_gt.granularity = 1 - add_matches(chunk_doc) - add_matches(chunk_gt) - _ = gt.chunks.add() - return req - - -def test_evaluate_assert_doc_groundtruth_structure( - simple_chunk_rank_evaluate_driver, eval_request_with_unmatching_struct -): - simple_chunk_rank_evaluate_driver.attach( - executor=PrecisionEvaluator(eval_at=2), runtime=None - ) - simple_chunk_rank_evaluate_driver.eval_request = eval_request_with_unmatching_struct - with pytest.raises(AssertionError): - simple_chunk_rank_evaluate_driver() diff --git a/tests/unit/drivers/test_recursive_traversal_tree.py b/tests/unit/drivers/test_recursive_traversal_tree.py deleted file mode 100644 index daaa80ccfa2f3..0000000000000 --- a/tests/unit/drivers/test_recursive_traversal_tree.py +++ /dev/null @@ -1,240 +0,0 @@ -from jina.drivers import FlatRecursiveMixin, BaseExecutableDriver -from jina import DocumentArray, Document - -DOCUMENTS_PER_LEVEL = 1 - - -class AppendOneChunkTwoMatchesCrafter(FlatRecursiveMixin, BaseExecutableDriver): - def __init__(self, docs, *args, **kwargs): - super().__init__(*args, **kwargs) - self._docs = docs - - @property - def docs(self): - return self._docs - - def _apply_all(self, docs, *args, **kwargs) -> None: - for doc in docs: - add_chunk(doc) - add_match(doc) - add_match(doc) - - -def add_chunk(doc): - chunk = Document() - chunk.granularity = doc.granularity + 1 - chunk.adjacency = doc.adjacency - return doc.chunks.add(chunk) - - -def add_match(doc): - match = Document() - match.granularity = doc.granularity - match.adjacency = doc.adjacency + 1 - return doc.matches.add(match) - - -def build_docs(): - """ Builds up a complete chunk-match structure, with a depth of 2 in both directions recursively. """ - max_granularity = 2 - max_adjacency = 2 - - def iterate_build(document, current_granularity, current_adjacency): - if current_granularity < max_granularity: - for i in range(DOCUMENTS_PER_LEVEL): - chunk = add_chunk(document) - iterate_build(chunk, chunk.granularity, chunk.adjacency) - if current_adjacency < max_adjacency: - for i in range(DOCUMENTS_PER_LEVEL): - match = add_match(document) - iterate_build(match, match.granularity, match.adjacency) - - docs = [] - for base_id in range(DOCUMENTS_PER_LEVEL): - document = Document() - document.granularity = 0 - document.adjacency = 0 - docs.append(document) - iterate_build(document, 0, 0) - return DocumentArray(docs) - - -def apply_traversal_path(traversal_paths): - docs = build_docs() - driver = AppendOneChunkTwoMatchesCrafter(docs=docs, traversal_paths=traversal_paths) - driver() - return docs - - -def test_only_root(): - docs = apply_traversal_path(['r']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks) == 1 - assert len(docs[0].chunks[0].chunks[0].matches) == 1 - assert len(docs[0].chunks[0].matches) == 1 - assert len(docs[0].matches) == 3 - assert len(docs[0].matches[0].chunks) == 1 - - -def test_only_matches(): - docs = apply_traversal_path(['m']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 1 - assert len(docs[0].chunks[0].matches) == 1 - assert len(docs[0].matches) == 1 - assert len(docs[0].matches[0].chunks) == 2 - assert len(docs[0].matches[0].matches) == 3 - assert len(docs[0].matches[0].matches[0].chunks) == 1 - - -def test_only_chunks(): - docs = apply_traversal_path(['c']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 1 - assert len(docs[0].chunks[0].chunks) == 2 - assert len(docs[0].chunks[0].matches) == 3 - assert len(docs[0].matches) == 1 - assert len(docs[0].matches[0].chunks) == 1 - assert len(docs[0].matches[0].matches) == 1 - assert len(docs[0].matches[0].matches[0].chunks) == 1 - - -def test_match_chunk(): - docs = apply_traversal_path(['mc']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 1 - assert len(docs[0].chunks[0].matches) == 1 - assert len(docs[0].matches) == 1 - assert len(docs[0].matches[0].chunks) == 1 - assert len(docs[0].matches[0].chunks[0].chunks) == 2 - assert len(docs[0].matches[0].matches) == 1 - assert len(docs[0].matches[0].matches[0].chunks) == 1 - - -def test_chunk_match(): - docs = apply_traversal_path(['cm']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 1 - assert len(docs[0].chunks[0].matches) == 1 - assert len(docs[0].chunks[0].matches[0].chunks) == 2 - assert len(docs[0].matches) == 1 - assert len(docs[0].matches[0].chunks) == 1 - assert len(docs[0].matches[0].matches) == 1 - assert len(docs[0].matches[0].matches[0].chunks) == 1 - - -def test_multi_paths(): - docs = apply_traversal_path(['cc', 'mm']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 1 - assert len(docs[0].chunks[0].matches) == 1 - assert len(docs[0].chunks[0].chunks) == 1 - assert len(docs[0].chunks[0].chunks[0].chunks) == 1 - assert len(docs[0].matches) == 1 - assert len(docs[0].matches[0].chunks) == 1 - assert len(docs[0].matches[0].matches) == 1 - assert len(docs[0].matches[0].matches[0].chunks) == 2 - - -def test_both_from_0(): - docs = apply_traversal_path(['r', 'c', 'm', 'cc', 'mm']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks[0].matches) == 3 - assert len(docs[0].chunks[0].chunks[0].chunks) == 1 # 0 before traversal - assert len(docs[0].chunks[0].matches) == 3 - assert len(docs[0].matches) == 3 - assert len(docs[0].matches[0].chunks) == 2 - assert len(docs[0].matches[0].matches) == 3 - assert len(docs[0].matches[0].matches[0].chunks) == 2 - - -def test_both_from_0_upper_case(): - docs = apply_traversal_path(['R', 'C', 'M', 'CC', 'MM']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks[0].matches) == 3 - assert len(docs[0].chunks[0].chunks[0].chunks) == 1 # 0 before traversal - assert len(docs[0].chunks[0].matches) == 3 - assert len(docs[0].matches) == 3 - assert len(docs[0].matches[0].chunks) == 2 - assert len(docs[0].matches[0].matches) == 3 - assert len(docs[0].matches[0].matches[0].chunks) == 2 - - -def test_adjacency0_granularity1(): - docs = apply_traversal_path(['c', 'cc', 'cm', 'cmm']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 1 - assert len(docs[0].chunks[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks[0].matches) == 3 - assert len(docs[0].chunks[0].matches) == 3 - assert len(docs[0].chunks[0].matches[0].chunks) == 2 - assert len(docs[0].chunks[0].matches[0].matches) == 3 - assert len(docs[0].chunks[0].matches[0].matches[0].chunks) == 2 - assert len(docs[0].matches) == 1 - assert len(docs[0].matches[0].chunks) == 1 - assert len(docs[0].matches[0].matches) == 1 - assert len(docs[0].matches[0].matches[0].chunks) == 1 - - -def test_adjacency1_granularity1(): - docs = apply_traversal_path(['cm', 'cmm', 'mcc']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 1 - assert len(docs[0].chunks[0].chunks) == 1 - assert len(docs[0].chunks[0].chunks[0].matches) == 1 - assert len(docs[0].chunks[0].matches) == 1 - assert len(docs[0].chunks[0].matches[0].chunks) == 2 - assert len(docs[0].chunks[0].matches[0].matches) == 3 - assert len(docs[0].chunks[0].matches[0].matches[0].chunks) == 2 - assert len(docs[0].matches) == 1 - assert len(docs[0].matches[0].chunks) == 1 - assert len(docs[0].matches[0].chunks[0].chunks) == 1 - assert len(docs[0].matches[0].chunks[0].chunks[0].matches) == 3 - assert len(docs[0].matches[0].chunks[0].matches) == 1 - assert len(docs[0].matches[0].matches) == 1 - assert len(docs[0].matches[0].matches[0].chunks) == 1 - - -def test_selection(): - docs = apply_traversal_path(['cmm', 'mcm']) - assert docs[0].chunks[0].matches[0].matches[0].granularity == 1 - assert docs[0].chunks[0].matches[0].matches[0].adjacency == 2 - assert len(docs[0].chunks[0].matches[0].matches) == 1 - assert docs[0].matches[0].chunks[0].matches[0].granularity == 1 - assert docs[0].matches[0].chunks[0].matches[0].adjacency == 2 - assert len(docs[0].matches[0].chunks[0].matches) == 1 - - -def test_root_chunk(): - docs = apply_traversal_path(['r', 'c']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks) == 2 - assert len(docs[0].chunks[1].chunks) == 1 - - -def test_chunk_root(): - docs = apply_traversal_path(['c', 'r']) - assert len(docs) == 1 - assert len(docs[0].chunks) == 2 - assert len(docs[0].chunks[0].chunks) == 2 - assert len(docs[0].chunks[1].chunks) == 0 - - -def test_call(): - docs = build_docs() - doc = docs[0] - doc.ClearField('chunks') - docs = DocumentArray([doc]) - driver = AppendOneChunkTwoMatchesCrafter(docs=docs, traversal_paths=('mcm',)) - assert docs[0].matches[0].chunks[0].matches[0].granularity == 1 - assert docs[0].matches[0].chunks[0].matches[0].adjacency == 2 - driver() - assert len(docs[0].matches[0].chunks[0].matches) == 1 - assert len(docs[0].matches[0].chunks[0].matches[0].chunks) == 2 - assert len(docs[0].matches[0].chunks[0].matches[0].matches) == 2 diff --git a/tests/unit/drivers/test_reduce_all_driver.py b/tests/unit/drivers/test_reduce_all_driver.py deleted file mode 100644 index 22ab8cd7ea179..0000000000000 --- a/tests/unit/drivers/test_reduce_all_driver.py +++ /dev/null @@ -1,119 +0,0 @@ -import os -from typing import List, Dict - -import numpy as np -import pytest - -from jina import Document -from jina.executors.segmenters import BaseSegmenter -from jina.executors.encoders import BaseEncoder -from jina.executors.decorators import single -from jina.flow import Flow - -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture(scope='function') -def docs(): - documents = [] - for i in range(1, 4): - with Document() as doc: - doc.text = ( - f'title: this is mode1 from doc{i}, body: this is mode2 from doc{i}' - ) - documents.append(doc) - return documents - - -class MockSegmenterReduce(BaseSegmenter): - @single - def segment(self, text: str, *args, **kwargs) -> List[Dict]: - split = text.split(',') - chunks = [ - dict(text=split[0], offset=0, weight=1.0, modality='mode1'), - dict(text=split[1], offset=1, weight=1.0, modality='mode2'), - ] - return chunks - - -class MockEncoderReduce(BaseEncoder): - def encode(self, content: str, *args, **kwargs) -> 'np.ndarray': - output = [] - for r in content: - if 'mode1' in r: - output.append([0.0, 0.0, 0.0]) - elif 'mode2' in r: - output.append([1.0, 1.0, 1.0]) - - return np.array(output) - - -def test_merge_chunks_with_different_modality(mocker, docs): - def input_function(): - return docs - - def validate(req): - assert len(req.index.docs) == 3 - for doc in req.index.docs: - assert len(doc.chunks) == 2 - assert doc.chunks[0].modality in ['mode1', 'mode2'] - assert doc.chunks[1].modality in ['mode1', 'mode2'] - assert doc.chunks[0].modality != doc.chunks[1].modality - - response_mock = mocker.Mock() - - flow = ( - Flow() - .add(name='segmenter', uses='MockSegmenterReduce') - .add(name='encoder1', uses=os.path.join(cur_dir, 'yaml/mockencoder-mode1.yml')) - .add( - name='encoder2', - uses=os.path.join(cur_dir, 'yaml/mockencoder-mode2.yml'), - needs=['segmenter'], - ) - .add( - name='reducer', - uses='- !ReduceAllDriver | {traversal_paths: [c]}', - needs=['encoder1', 'encoder2'], - ) - ) - - with flow: - flow.index(inputs=input_function, on_done=response_mock) - - validate_callback(response_mock, validate) - - -def test_reduce_all_root_chunks(mocker, docs): - def input_fn(): - return docs - - def validate(req): - assert len(req.index.docs) == 6 - for doc in req.index.docs: - assert len(doc.chunks) == 1 - assert doc.chunks[0].modality in ['mode1', 'mode2'] - - response_mock = mocker.Mock() - - flow = ( - Flow() - .add(name='segmenter', uses='MockSegmenterReduce') - .add(name='encoder1', uses=os.path.join(cur_dir, 'yaml/mockencoder-mode1.yml')) - .add( - name='encoder2', - uses=os.path.join(cur_dir, 'yaml/mockencoder-mode2.yml'), - needs=['segmenter'], - ) - .add( - name='reducer', - uses='- !ReduceAllDriver | {traversal_paths: [r]}', - needs=['encoder1', 'encoder2'], - ) - ) - - with flow: - flow.index(input_fn=input_fn, on_done=response_mock) - validate_callback(response_mock, validate) diff --git a/tests/unit/drivers/test_segmenter_driver.py b/tests/unit/drivers/test_segmenter_driver.py deleted file mode 100644 index 9c1980871a09e..0000000000000 --- a/tests/unit/drivers/test_segmenter_driver.py +++ /dev/null @@ -1,138 +0,0 @@ -from typing import Dict, List - -import numpy as np -import pytest - -from jina import Document -from jina.drivers.segment import SegmentDriver -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import single -from jina.types.arrays import DocumentArray - - -class MockSegmenter(BaseSegmenter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.required_keys = {'text'} - - @single - def segment(self, text: str, *args, **kwargs) -> List[Dict]: - if text == 'valid': - # length, parent_id and id are protected keys that won't affect the segments - return [ - { - 'blob': np.array([0.0, 0.0, 0.0]), - 'weight': 0.0, - 'mime_type': 'text/plain', - 'tags': {'id': 3}, - }, - {'blob': np.array([1.0, 1.0, 1.0]), 'weight': 1.0, 'tags': {'id': 4}}, - {'blob': np.array([2.0, 2.0, 2.0]), 'weight': 2.0, 'tags': {'id': 5}}, - ] - else: - return [{'non_existing_key': 1}] - - -class MockImageSegmenter(BaseSegmenter): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - self.required_keys = {'text'} - - @single - def segment(self, blob: np.ndarray, *args, **kwargs) -> List[Dict]: - assert len(blob.shape) == 3 - assert blob.shape[0] == 1 - return [{'blob': blob}] - - -class SimpleSegmentDriver(SegmentDriver): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture() -def text_segmenter_executor(): - return MockSegmenter() - - -@pytest.fixture() -def image_segmenter_executor(): - return MockImageSegmenter() - - -@pytest.fixture() -def segment_driver(): - return SimpleSegmentDriver() - - -def test_segment_driver(segment_driver, text_segmenter_executor): - valid_doc = Document() - valid_doc.text = 'valid' - valid_doc.mime_type = 'image/png' - - segment_driver.attach(executor=text_segmenter_executor, runtime=None) - segment_driver._apply_all(DocumentArray([valid_doc])) - - assert valid_doc.chunks[0].tags['id'] == 3 - assert valid_doc.chunks[0].parent_id == valid_doc.id - np.testing.assert_equal(valid_doc.chunks[0].blob, np.array([0.0, 0.0, 0.0])) - assert valid_doc.chunks[0].weight == 0.0 - assert valid_doc.chunks[0].mime_type == 'text/plain' - - assert valid_doc.chunks[1].tags['id'] == 4 - assert valid_doc.chunks[1].parent_id == valid_doc.id - np.testing.assert_equal(valid_doc.chunks[1].blob, np.array([1.0, 1.0, 1.0])) - assert valid_doc.chunks[1].weight == 1.0 - assert valid_doc.chunks[1].mime_type == 'image/png' - - assert valid_doc.chunks[2].tags['id'] == 5 - assert valid_doc.chunks[2].parent_id == valid_doc.id - np.testing.assert_equal(valid_doc.chunks[2].blob, np.array([2.0, 2.0, 2.0])) - assert valid_doc.chunks[2].weight == 2.0 - assert valid_doc.chunks[2].mime_type == 'image/png' - - -def test_chunks_exist_already(segment_driver, text_segmenter_executor): - document = Document( - text='valid', chunks=[Document(text='test2'), Document(text='test3')] - ) - # before segmentation - assert len(document.chunks) == 2 - for chunk in document.chunks: - assert chunk.parent_id == document.id - assert chunk.siblings == 2 - segment_driver.attach(executor=text_segmenter_executor, runtime=None) - segment_driver._apply_all(DocumentArray([document])) - - # after segmentation - assert len(document.chunks) == 5 - for chunk in document.chunks: - assert chunk.parent_id == document.id - assert chunk.siblings == 5 - - -def test_broken_document(segment_driver, text_segmenter_executor): - segment_driver.attach(executor=text_segmenter_executor, runtime=None) - - invalid_doc = Document() - invalid_doc.id = 1 - invalid_doc.text = 'invalid' - - with pytest.raises(AttributeError): - segment_driver._apply_all([DocumentArray([invalid_doc])]) - - -def test_image_segmenter(segment_driver, image_segmenter_executor): - blob1 = np.random.random((1, 32, 64)) - blob2 = np.random.random((1, 64, 32)) - docs = DocumentArray([Document(blob=blob1), Document(blob=blob2)]) - segment_driver.attach(executor=image_segmenter_executor, runtime=None) - segment_driver._apply_all(docs) - for doc in docs: - assert len(doc.chunks) == 1 - np.testing.assert_equal(docs[0].chunks[0].blob, blob1) - np.testing.assert_equal(docs[1].chunks[0].blob, blob2) diff --git a/tests/unit/drivers/test_vector_fill_driver.py b/tests/unit/drivers/test_vector_fill_driver.py deleted file mode 100644 index 7dff35021d4db..0000000000000 --- a/tests/unit/drivers/test_vector_fill_driver.py +++ /dev/null @@ -1,47 +0,0 @@ -from typing import Any - -import numpy as np -import pytest - -from jina import Document -from jina.drivers.search import VectorFillDriver -from jina.executors.indexers import BaseIndexer - - -@pytest.fixture(scope='function') -def num_docs(): - return 10 - - -@pytest.fixture(scope='function') -def docs_to_encode(num_docs): - docs = [] - for idx in range(num_docs): - doc = Document(content=np.array([idx])) - docs.append(doc) - return docs - - -class MockIndexer(BaseIndexer): - def query_by_key(self, keys: Any, *args, **kwargs) -> 'np.ndarray': - # encodes 10 * keys into the encoder, so return keys - return np.random.random([len(keys), 5]) - - -class SimpleFillDriver(VectorFillDriver): - @property - def exec_fn(self): - return self._exec_fn - - -def test_index_driver(docs_to_encode, num_docs): - driver = SimpleFillDriver() - executor = MockIndexer() - driver.attach(executor=executor, runtime=None) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert doc.embedding is None - driver._apply_all(docs_to_encode) - assert len(docs_to_encode) == num_docs - for doc in docs_to_encode: - assert doc.embedding.shape == (5,) diff --git a/tests/unit/drivers/test_vector_index_driver.py b/tests/unit/drivers/test_vector_index_driver.py deleted file mode 100644 index 3ba9bc48b0d5c..0000000000000 --- a/tests/unit/drivers/test_vector_index_driver.py +++ /dev/null @@ -1,267 +0,0 @@ -from copy import deepcopy - -import numpy as np -import scipy -import torch -import tensorflow as tf -import pytest - -from jina import DocumentArray -from jina.drivers.delete import DeleteDriver -from jina.drivers.index import VectorIndexDriver -from jina.executors.indexers import BaseVectorIndexer -from jina.types.document import Document - - -def embedding_cls_type_supported(): - return ['dense', 'scipy_csr', 'scipy_coo', 'torch', 'tf'] - - -class MockGroundTruthVectorIndexer(BaseVectorIndexer): - def __init__(self, embedding_cls_type, *args, **kwargs): - super().__init__(*args, **kwargs) - self.docs = {} - self.embedding_cls_type = embedding_cls_type - - def add(self, keys, vectors, *args, **kwargs): - if self.embedding_cls_type in ['dense', 'torch', 'tf']: - for key, value in zip(keys, vectors): - self.docs[key] = value - elif self.embedding_cls_type.startswith('scipy'): - for i, key in enumerate(keys): - self.docs[key] = vectors.getrow(i) - - def update(self, keys, vectors, *args, **kwargs) -> None: - if self.embedding_cls_type in ['dense', 'torch', 'tf']: - for key, value in zip(keys, vectors): - self.docs[key] = value - elif self.embedding_cls_type.startswith('scipy'): - for i, key in enumerate(keys): - self.docs[key] = vectors.getrow(i) - - def delete(self, keys, *args, **kwargs) -> None: - for key in keys: - del self.docs[key] - - -class SimpleVectorIndexDriver(VectorIndexDriver): - @property - def exec_fn(self): - return self._exec_fn - - -class SimpleDeleteDriver(DeleteDriver): - @property - def exec_fn(self): - return self._exec_fn - - -@pytest.fixture(scope='function') -def simple_vector_indexer_driver_add(): - return SimpleVectorIndexDriver() - - -@pytest.fixture(scope='function') -def simple_vector_indexer_driver_update(): - return SimpleVectorIndexDriver(method='update') - - -@pytest.fixture(scope='function') -def simple_vector_indexer_driver_delete(): - return SimpleDeleteDriver() - - -@pytest.fixture(scope='function') -def mock_groundtruth_indexer_factory(): - def indexer(embedding_cls_type): - return MockGroundTruthVectorIndexer(embedding_cls_type) - - return indexer - - -@pytest.fixture(scope='function') -def documents_factory(): - def documents(embedding_cls_type, text_prefix='', num_docs=5): - docs = [] - for idx in range(num_docs): - with Document(text=f'{text_prefix}{idx}') as d: - d.id = f'{idx:0>16}' - dense_embedding = np.random.random([10]) - if embedding_cls_type == 'dense': - d.embedding = dense_embedding - elif embedding_cls_type == 'scipy_csr': - d.embedding = scipy.sparse.csr_matrix(dense_embedding) - elif embedding_cls_type == 'scipy_coo': - d.embedding = scipy.sparse.coo_matrix(dense_embedding) - elif embedding_cls_type == 'torch': - sparse_embedding = scipy.sparse.coo_matrix(dense_embedding) - values = sparse_embedding.data - indices = np.vstack((sparse_embedding.row, sparse_embedding.col)) - d.embedding = torch.sparse_coo_tensor( - indices, - values, - sparse_embedding.shape, - ) - elif embedding_cls_type == 'tf': - sparse_embedding = scipy.sparse.coo_matrix(dense_embedding) - values = sparse_embedding.data - indices = [ - (x, y) - for x, y in zip(sparse_embedding.row, sparse_embedding.col) - ] - d.embedding = tf.SparseTensor( - indices=indices, - values=values, - dense_shape=[1, 10], - ) - docs.append(d) - return DocumentArray(docs) - - return documents - - -@pytest.fixture(scope='function') -def deleted_documents(): - docs = [] - for idx in range(3): - with Document() as d: - d.id = f'{idx:0>16}' - docs.append(d) - return DocumentArray(docs) - - -@pytest.fixture(scope='function') -def empty_documents(): - docs = [] - for idx in range(100, 120): - with Document() as d: - d.id = f'{idx:0>16}' - docs.append(d) - return DocumentArray(docs) - - -def assert_embedding(embedding_cls_type, obtained, expected): - if embedding_cls_type == 'dense': - np.testing.assert_equal(obtained, expected.embedding) - elif embedding_cls_type.startswith('scipy'): - np.testing.assert_equal(obtained.todense(), expected.embedding.todense()) - elif embedding_cls_type == 'torch': - from jina.types.ndarray.sparse.pytorch import SparseNdArray - - np.testing.assert_array_equal( - expected.get_sparse_embedding(sparse_ndarray_cls_type=SparseNdArray)[0] - .to_dense() - .numpy(), - obtained.to_dense().numpy(), - ) - elif embedding_cls_type == 'tf': - from jina.types.ndarray.sparse.tensorflow import SparseNdArray - - np.testing.assert_array_equal( - tf.sparse.to_dense( - expected.get_sparse_embedding(sparse_ndarray_cls_type=SparseNdArray) - ).numpy(), - tf.sparse.to_dense(obtained).numpy(), - ) - - -@pytest.mark.parametrize('embedding_cls_type', embedding_cls_type_supported()) -def test_vector_index_driver_add( - simple_vector_indexer_driver_add, - mock_groundtruth_indexer_factory, - documents_factory, - embedding_cls_type, -): - indexer = mock_groundtruth_indexer_factory(embedding_cls_type) - documents = documents_factory(embedding_cls_type) - simple_vector_indexer_driver_add.attach(executor=indexer, runtime=None) - simple_vector_indexer_driver_add._apply_all(documents) - assert len(indexer.docs) == 5 - for idx, doc in enumerate(documents): - assert_embedding(embedding_cls_type, indexer.docs[doc.id], doc) - - -@pytest.mark.parametrize('embedding_cls_type', embedding_cls_type_supported()) -def test_vector_index_driver_add_bad_docs( - mocker, - mock_groundtruth_indexer_factory, - simple_vector_indexer_driver_add, - documents_factory, - empty_documents, - embedding_cls_type, -): - indexer = mock_groundtruth_indexer_factory(embedding_cls_type) - documents = documents_factory(embedding_cls_type) - simple_vector_indexer_driver_add.attach(executor=indexer, runtime=None) - logger_mock = mocker.Mock() - pea_mock = mocker.Mock() - pea_mock.logger = logger_mock - simple_vector_indexer_driver_add.runtime = pea_mock - # TODO once https://github.com/jina-ai/jina/pull/1555 is merged union can be declared using '+' - union = deepcopy(documents) - for d in empty_documents: - union.add(d) - simple_vector_indexer_driver_add._apply_all(union) - - # make sure the warning for bad docs is triggered - assert len(indexer.docs) == 5 - for idx, doc in enumerate(documents): - assert_embedding(embedding_cls_type, indexer.docs[doc.id], doc) - for idx, doc in enumerate(empty_documents): - assert doc.id not in indexer.docs - - -@pytest.mark.parametrize('embedding_cls_type', embedding_cls_type_supported()) -def test_vector_index_driver_update( - mock_groundtruth_indexer_factory, - simple_vector_indexer_driver_add, - simple_vector_indexer_driver_update, - documents_factory, - embedding_cls_type, -): - indexer = mock_groundtruth_indexer_factory(embedding_cls_type) - documents = documents_factory(embedding_cls_type) - updated_documents = documents_factory(embedding_cls_type, 'update', 3) - simple_vector_indexer_driver_add.attach(executor=indexer, runtime=None) - simple_vector_indexer_driver_add._apply_all(documents) - - simple_vector_indexer_driver_update.attach(executor=indexer, runtime=None) - simple_vector_indexer_driver_update._apply_all(updated_documents) - - assert len(indexer.docs) == 5 - for idx, doc in enumerate(updated_documents): - assert_embedding(embedding_cls_type, indexer.docs[doc.id], doc) - for idx in range(3, 5): - doc = documents[idx] - assert_embedding(embedding_cls_type, indexer.docs[doc.id], doc) - - -@pytest.mark.parametrize('embedding_cls_type', embedding_cls_type_supported()) -def test_vector_index_driver_delete( - mock_groundtruth_indexer_factory, - simple_vector_indexer_driver_add, - simple_vector_indexer_driver_delete, - documents_factory, - deleted_documents, - mocker, - embedding_cls_type, -): - indexer = mock_groundtruth_indexer_factory(embedding_cls_type) - documents = documents_factory(embedding_cls_type) - simple_vector_indexer_driver_add.attach(executor=indexer, runtime=None) - simple_vector_indexer_driver_add._apply_all(documents) - - simple_vector_indexer_driver_delete.attach(executor=indexer, runtime=None) - mock_load = mocker.patch.object( - simple_vector_indexer_driver_delete, 'runtime', autospec=True - ) - mock_load.request.ids = [d.id for d in deleted_documents] - simple_vector_indexer_driver_delete() - - assert len(indexer.docs) == 2 - for idx in range(3, 5): - doc = documents[idx] - assert_embedding(embedding_cls_type, indexer.docs[doc.id], doc) - - for idx, doc in enumerate(deleted_documents): - assert doc.id not in indexer.docs diff --git a/tests/unit/drivers/test_vector_search_driver.py b/tests/unit/drivers/test_vector_search_driver.py deleted file mode 100644 index 7f4cf0c1c3748..0000000000000 --- a/tests/unit/drivers/test_vector_search_driver.py +++ /dev/null @@ -1,136 +0,0 @@ -from typing import Tuple - -import numpy as np -import scipy -import torch -import tensorflow as tf -import pytest - -from jina import Document, QueryLang -from jina.drivers.search import VectorSearchDriver -from jina.executors.indexers import BaseVectorIndexer - - -def embedding_cls_type_supported(): - return ['dense', 'scipy_csr', 'scipy_coo', 'torch', 'tf'] - - -class MockVectorSearchDriver(VectorSearchDriver): - @property - def exec_fn(self): - return self._exec_fn - - -class MockVectorSearchDriverWithQS(VectorSearchDriver): - @property - def queryset(self): - q = QueryLang() - q.name = 'MockVectorSearchDriverWithQS' - q.priority = 1 - q.parameters['top_k'] = 4 - return [q] - - -@pytest.fixture(scope='function') -def documents_factory(): - def documents(embedding_cls_type): - doc = Document() - for c in range(10): - chunk = Document() - chunk.id = str(c) * 16 - dense_embedding = np.random.random([10]) - if embedding_cls_type == 'dense': - chunk.embedding = dense_embedding - elif embedding_cls_type == 'scipy_csr': - chunk.embedding = scipy.sparse.csr_matrix(dense_embedding) - elif embedding_cls_type == 'scipy_coo': - chunk.embedding = scipy.sparse.coo_matrix(dense_embedding) - elif embedding_cls_type == 'torch': - sparse_embedding = scipy.sparse.coo_matrix(dense_embedding) - values = sparse_embedding.data - indices = np.vstack((sparse_embedding.row, sparse_embedding.col)) - chunk.embedding = torch.sparse_coo_tensor( - indices, - values, - sparse_embedding.shape, - ) - elif embedding_cls_type == 'tf': - sparse_embedding = scipy.sparse.coo_matrix(dense_embedding) - values = sparse_embedding.data - indices = [ - (x, y) for x, y in zip(sparse_embedding.row, sparse_embedding.col) - ] - chunk.embedding = tf.SparseTensor( - indices=indices, - values=values, - dense_shape=[1, 10], - ) - doc.chunks.append(chunk) - return doc - - return documents - - -def test_vectorsearch_driver_mock_queryset(): - # no queryset - driver = VectorSearchDriver(top_k=3) - assert driver.top_k == 3 - - # with queryset - driver = MockVectorSearchDriverWithQS(top_k=3) - assert driver.top_k == 4 - - -def mock_query(vectors, top_k: int) -> Tuple['np.ndarray', 'np.ndarray']: - idx = np.zeros((vectors.shape[0], top_k), dtype=(np.str_, 16)) - dist = np.zeros((vectors.shape[0], top_k)) - for i, row in enumerate(dist): - for k in range(top_k): - row[k] = float(k) - return idx, dist - - -def mock_query_by_key(keys: 'np.ndarray'): - return np.random.random([len(keys), 7]) - - -@pytest.mark.parametrize('embedding_cls_type', embedding_cls_type_supported()) -def test_vectorsearch_driver_mock_indexer( - monkeypatch, documents_factory, embedding_cls_type -): - driver = MockVectorSearchDriver(top_k=2) - index = BaseVectorIndexer() - monkeypatch.setattr(index, 'query_by_key', None) - monkeypatch.setattr(driver, '_exec', index) - monkeypatch.setattr(driver, 'runtime', None) - monkeypatch.setattr(driver, '_exec_fn', mock_query) - doc = documents_factory(embedding_cls_type) - driver._apply_all(doc.chunks) - - for chunk in doc.chunks: - assert len(chunk.matches) == 2 - for match in chunk.matches: - assert match.granularity == chunk.granularity - assert match.score.ref_id == str(chunk.id) - assert match.embedding is None - assert chunk.matches[0].score.value == 0.0 - assert chunk.matches[1].score.value == 1.0 - - -@pytest.mark.parametrize('embedding_cls_type', embedding_cls_type_supported()) -def test_vectorsearch_driver_mock_indexer_with_fill( - monkeypatch, documents_factory, embedding_cls_type -): - driver = MockVectorSearchDriver(top_k=2, fill_embedding=True) - index = BaseVectorIndexer() - monkeypatch.setattr(index, 'query_by_key', mock_query_by_key) - monkeypatch.setattr(driver, '_exec', index) - monkeypatch.setattr(driver, 'runtime', None) - monkeypatch.setattr(driver, '_exec_fn', mock_query) - doc = documents_factory(embedding_cls_type) - driver._apply_all(doc.chunks) - - for chunk in doc.chunks: - assert chunk.matches[0].embedding.shape == (7,) - assert chunk.matches[-1].embedding.shape == (7,) - assert chunk.matches[-1].embedding is not None diff --git a/tests/unit/drivers/yaml/mockencoder-mode1.yml b/tests/unit/drivers/yaml/mockencoder-mode1.yml deleted file mode 100644 index d7f51bd535a13..0000000000000 --- a/tests/unit/drivers/yaml/mockencoder-mode1.yml +++ /dev/null @@ -1,12 +0,0 @@ -!MockEncoderReduce -requests: - on: - IndexRequest: - - !FilterQL - with: - lookups: {modality: mode1} - traversal_paths: ['c'] - - !EncodeDriver - with: - method: encode - traversal_paths: ['c'] diff --git a/tests/unit/drivers/yaml/mockencoder-mode2.yml b/tests/unit/drivers/yaml/mockencoder-mode2.yml deleted file mode 100644 index 736427a11240e..0000000000000 --- a/tests/unit/drivers/yaml/mockencoder-mode2.yml +++ /dev/null @@ -1,12 +0,0 @@ -!MockEncoderReduce -requests: - on: - IndexRequest: - - !FilterQL - with: - lookups: {modality: mode2} - traversal_paths: ['c'] - - !EncodeDriver - with: - method: encode - traversal_paths: ['c'] diff --git a/tests/unit/executors/dbms/__init__.py b/tests/unit/executors/dbms/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/dbms/dbms.yml b/tests/unit/executors/dbms/dbms.yml deleted file mode 100644 index aaa1de59c0814..0000000000000 --- a/tests/unit/executors/dbms/dbms.yml +++ /dev/null @@ -1,16 +0,0 @@ -jtype: KeyValueDBMSIndexer -with: - index_filename: dbms.bin -metas: - workspace: $DBMS_WORKSPACE - name: dbms -requests: - on: - [IndexRequest, UpdateRequest]: - - jtype: DBMSIndexDriver - with: - executor: dbms - DeleteRequest: - - jtype: DeleteDriver - with: - executor: dbms diff --git a/tests/unit/executors/dbms/test_dbms.py b/tests/unit/executors/dbms/test_dbms.py deleted file mode 100644 index 84ac296e162a5..0000000000000 --- a/tests/unit/executors/dbms/test_dbms.py +++ /dev/null @@ -1,63 +0,0 @@ -import os - -from jina.drivers.index import DBMSIndexDriver -from jina.executors.indexers.dbms import BaseDBMSIndexer -from jina.executors.indexers.dbms.keyvalue import KeyValueDBMSIndexer -from tests import get_documents - - -def _get_ids_vecs_meta(docs): - ids, vecs, metas = zip( - *[ - ( - doc.id, - doc.embedding, - DBMSIndexDriver._doc_without_embedding(doc).SerializeToString(), - ) - for doc in docs - ] - ) - return ids, vecs, metas - - -def test_dbms_keyvalue(tmpdir, test_metas): - docs = list(get_documents(chunks=False, nr=10, same_content=True)) - ids, vecs, metas = _get_ids_vecs_meta(docs) - - save_path = None - with KeyValueDBMSIndexer(index_filename='dbms', metas=test_metas) as indexer: - indexer.add(ids, vecs, metas) - assert indexer.size == len(docs) - save_path = indexer.save_abspath - indexer.dump(os.path.join(tmpdir, 'dump1'), 2) - - # we can index and dump again in the same context - docs2 = list( - get_documents(chunks=False, nr=10, same_content=True, index_start=len(docs)) - ) - ids, vecs, metas = _get_ids_vecs_meta(docs2) - indexer.add(ids, vecs, metas) - assert indexer.size == 2 * len(docs) - indexer.dump(os.path.join(tmpdir, 'dump2'), 3) - - new_docs = list(get_documents(chunks=False, nr=10, same_content=False)) - ids, vecs, meta = zip( - *[ - ( - doc.id, - doc.embedding, - DBMSIndexDriver._doc_without_embedding(doc).SerializeToString(), - ) - for doc in new_docs - ] - ) - - # assert contents update - with BaseDBMSIndexer.load(save_path) as indexer: - indexer.update(ids, vecs, meta) - assert indexer.size == 2 * len(docs) - - # assert contents update - with BaseDBMSIndexer.load(save_path) as indexer: - indexer.delete([d.id for d in docs]) - assert indexer.size == len(docs) diff --git a/tests/unit/executors/encoders/__init__.py b/tests/unit/executors/encoders/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/encoders/test_frameworks.py b/tests/unit/executors/encoders/test_frameworks.py deleted file mode 100644 index 24768420087f8..0000000000000 --- a/tests/unit/executors/encoders/test_frameworks.py +++ /dev/null @@ -1,15 +0,0 @@ -import pytest - -from jina.excepts import ModelCheckpointNotExist -from jina.executors.encoders.frameworks import BaseOnnxEncoder, BaseMindsporeEncoder - - -def test_raised_exception(): - with pytest.raises(ModelCheckpointNotExist): - BaseOnnxEncoder() - - with pytest.raises(ModelCheckpointNotExist): - BaseMindsporeEncoder() - - with pytest.raises(AttributeError): - BaseMindsporeEncoder.model() diff --git a/tests/unit/executors/encoders/test_numeric.py b/tests/unit/executors/encoders/test_numeric.py deleted file mode 100644 index 684a14b2fc6ed..0000000000000 --- a/tests/unit/executors/encoders/test_numeric.py +++ /dev/null @@ -1,38 +0,0 @@ -import numpy as np -import pickle - -import pytest - -from jina.executors.encoders.numeric import TransformEncoder - -input_dim = 5 -target_output_dim = 5 - - -class SimpleModel: - def fit(self, data): - return data - - def transform(self, data): - return data - - -@pytest.fixture() -def model_path(tmpdir): - model_path = str(tmpdir) + '/model.pkl' - model = SimpleModel() - with open(model_path, 'wb') as output: - pickle.dump(model, output) - return model_path - - -@pytest.fixture() -def encoder(model_path): - return TransformEncoder(model_path=model_path) - - -def test_transform_encoder_test(encoder): - test_data = np.random.rand(10, input_dim) - encoded_data = encoder.encode(test_data) - assert encoded_data.shape == (test_data.shape[0], target_output_dim) - assert type(encoded_data) == np.ndarray diff --git a/tests/unit/executors/evaluators/__init__.py b/tests/unit/executors/evaluators/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/evaluators/craft/__init__.py b/tests/unit/executors/evaluators/craft/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/evaluators/craft/nlp/__init__.py b/tests/unit/executors/evaluators/craft/nlp/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/evaluators/craft/nlp/test_length.py b/tests/unit/executors/evaluators/craft/nlp/test_length.py deleted file mode 100644 index f17629b31b567..0000000000000 --- a/tests/unit/executors/evaluators/craft/nlp/test_length.py +++ /dev/null @@ -1,30 +0,0 @@ -import numpy as np -import pytest - -from jina.executors.evaluators.text.length import TextLengthEvaluator - - -@pytest.mark.parametrize( - 'doc, gt, expected', - [ - ('aaa', 'bbb', 0.0), - ('AbcD', 'fghkl', 1.0), - ], -) -def test_length_evaluator(doc, gt, expected): - evaluator = TextLengthEvaluator() - assert evaluator.evaluate(actual=doc, desired=gt) == expected - assert evaluator._running_stats._n == 1 - np.testing.assert_almost_equal(evaluator.mean, expected) - - -def test_cosine_evaluator_average(): - doc_content = ['aaa', 'bbb', 'abc'] - gt_content = ['aaaa', 'ccc', 'ddd'] - - evaluator = TextLengthEvaluator() - assert evaluator.evaluate(actual=doc_content[0], desired=gt_content[0]) == 1.0 - assert evaluator.evaluate(actual=doc_content[1], desired=gt_content[1]) == 0.0 - assert evaluator.evaluate(actual=doc_content[2], desired=gt_content[2]) == 0.0 - assert evaluator._running_stats._n == 3 - np.testing.assert_almost_equal(evaluator.mean, 1.0 / 3) diff --git a/tests/unit/executors/evaluators/embedding/__init__.py b/tests/unit/executors/evaluators/embedding/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/evaluators/embedding/test_embedding.py b/tests/unit/executors/evaluators/embedding/test_embedding.py deleted file mode 100644 index 109c22ce5396a..0000000000000 --- a/tests/unit/executors/evaluators/embedding/test_embedding.py +++ /dev/null @@ -1,33 +0,0 @@ -import numpy as np -import pytest - -from jina.executors.evaluators.embedding.cosine import CosineEvaluator -from jina.executors.evaluators.embedding.euclidean import EuclideanEvaluator - - -@pytest.mark.parametrize( - 'embedding1, embedding2, distance', - [ - ([1, 1, 1], [1, 1, 1], 0), - ([0, 1], [1, 0], 1), - ([1, 2, 4, 7, 3], [5, 4, 3, 8, 9], 0.12985245), - ], -) -def test_euclidean(embedding1, embedding2, distance): - evaluator = CosineEvaluator() - res = evaluator.evaluate(actual=np.array(embedding1), desired=np.array(embedding2)) - np.testing.assert_almost_equal(res, distance) - - -@pytest.mark.parametrize( - 'embedding1, embedding2, distance', - [ - ([1, 1, 1], [1, 1, 1], 0), - ([2, 4], [2, 5], 1), - ([1, 2, 4, 7, 3], [5, 4, 3, 8, 9], 7.61577311), - ], -) -def test_cosine(embedding1, embedding2, distance): - evaluator = EuclideanEvaluator() - res = evaluator.evaluate(actual=np.array(embedding1), desired=np.array(embedding2)) - np.testing.assert_almost_equal(res, distance) diff --git a/tests/unit/executors/evaluators/encode/__init__.py b/tests/unit/executors/evaluators/encode/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/evaluators/encode/test_cosine.py b/tests/unit/executors/evaluators/encode/test_cosine.py deleted file mode 100644 index f0928e68f66b1..0000000000000 --- a/tests/unit/executors/evaluators/encode/test_cosine.py +++ /dev/null @@ -1,33 +0,0 @@ -import numpy as np -import pytest - -from jina.executors.evaluators.embedding.cosine import CosineEvaluator - - -@pytest.mark.parametrize( - 'doc_embedding, gt_embedding, expected', - [ - ([0, 1], [0, 1], 0.0), - ([0, 1], [1, 0], 1.0), - ([1, 0], [0, 1], 1.0), - ([1, 0], [1, 0], 0.0), - ([0, -1], [0, 1], 2.0), # https://github.com/scipy/scipy/issues/9322 - ], -) -def test_cosine_evaluator(doc_embedding, gt_embedding, expected): - evaluator = CosineEvaluator() - assert evaluator.evaluate(actual=doc_embedding, desired=gt_embedding) == expected - assert evaluator._running_stats._n == 1 - np.testing.assert_almost_equal(evaluator.mean, expected) - - -def test_cosine_evaluator_average(): - doc_embeddings = [np.array([0, 1]), np.array([1, 0]), np.array([2, 2])] - gt_embeddings = [np.array([1, 0]), np.array([1, 0]), np.array([4, 4])] - - evaluator = CosineEvaluator() - assert evaluator.evaluate(actual=doc_embeddings[0], desired=gt_embeddings[0]) == 1.0 - assert evaluator.evaluate(actual=doc_embeddings[1], desired=gt_embeddings[1]) == 0.0 - assert evaluator.evaluate(actual=doc_embeddings[2], desired=gt_embeddings[2]) == 0.0 - assert evaluator._running_stats._n == 3 - np.testing.assert_almost_equal(evaluator.mean, 1.0 / 3) diff --git a/tests/unit/executors/evaluators/encode/test_euclidean.py b/tests/unit/executors/evaluators/encode/test_euclidean.py deleted file mode 100644 index 4390111c1f3d0..0000000000000 --- a/tests/unit/executors/evaluators/encode/test_euclidean.py +++ /dev/null @@ -1,46 +0,0 @@ -from math import sqrt - -import numpy as np -import pytest - -from jina.executors.evaluators.embedding.euclidean import EuclideanEvaluator - - -@pytest.mark.parametrize( - 'doc_embedding, gt_embedding, expected', - [ - ([0, 0], [0, 0], 0.0), - ([0, 0], [0, 1], 1.0), - ([0, 0], [1, 0], 1.0), - ([0, 0], [1, 1], sqrt(2.0)), - ([0, 1], [0, 0], 1.0), - ([0, 1], [0, 1], 0.0), - ([0, 1], [1, 0], sqrt(2.0)), - ([0, 1], [1, 1], 1.0), - ([1, 0], [0, 0], 1.0), - ([1, 0], [0, 1], sqrt(2.0)), - ([1, 0], [1, 0], 0.0), - ([1, 0], [1, 1], 1.0), - ([1, 1], [0, 0], sqrt(2.0)), - ([1, 1], [0, 1], 1.0), - ([1, 1], [1, 0], 1.0), - ([1, 1], [1, 1], 0.0), - ], -) -def test_euclidean_evaluator(doc_embedding, gt_embedding, expected): - evaluator = EuclideanEvaluator() - assert evaluator.evaluate(actual=doc_embedding, desired=gt_embedding) == expected - assert evaluator._running_stats._n == 1 - np.testing.assert_almost_equal(evaluator.mean, expected) - - -def test_euclidean_evaluator_average(): - doc_embeddings = [np.array([0, 1]), np.array([1, 0]), np.array([2, 2])] - gt_embeddings = [np.array([0, 2]), np.array([1, 0]), np.array([2, 4])] - - evaluator = EuclideanEvaluator() - assert evaluator.evaluate(actual=doc_embeddings[0], desired=gt_embeddings[0]) == 1.0 - assert evaluator.evaluate(actual=doc_embeddings[1], desired=gt_embeddings[1]) == 0.0 - assert evaluator.evaluate(actual=doc_embeddings[2], desired=gt_embeddings[2]) == 2.0 - assert evaluator._running_stats._n == 3 - np.testing.assert_almost_equal(evaluator.mean, 3.0 / 3) diff --git a/tests/unit/executors/evaluators/rank/__init__.py b/tests/unit/executors/evaluators/rank/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/evaluators/rank/test_base.py b/tests/unit/executors/evaluators/rank/test_base.py deleted file mode 100644 index a782afe607ff3..0000000000000 --- a/tests/unit/executors/evaluators/rank/test_base.py +++ /dev/null @@ -1,30 +0,0 @@ -from jina.executors.evaluators.rank import BaseRankingEvaluator -from jina.drivers.evaluate import RankEvaluateDriver - - -class DummyRankingEvaluator(BaseRankingEvaluator): - def evaluate(self, actual, desired, *args, **kwargs) -> float: - return 1.0 - - -def test_base_ranking_evalutor(): - evaluator = DummyRankingEvaluator() - actual_eval_driver = evaluator._drivers['SearchRequest'][-1] - assert isinstance(actual_eval_driver, RankEvaluateDriver) - default_eval_driver = RankEvaluateDriver() - assert list(default_eval_driver.fields) == actual_eval_driver.fields - # make sure the default value for fields in RankEvaluateDriver is no longer overwritten by `executors.requests.BaseRankingEvaluator.yml` - from jina.jaml import JAML - from pkg_resources import resource_filename - - with open( - resource_filename( - 'jina', - '/'.join(('resources', 'executors.requests.BaseRankingEvaluator.yml')), - ) - ) as fp: - config_from_resources = JAML.load(fp) - assert ( - default_eval_driver.fields - == config_from_resources['on']['SearchRequest']['drivers'][-1].fields - ) diff --git a/tests/unit/executors/evaluators/rank/test_precision.py b/tests/unit/executors/evaluators/rank/test_precision.py deleted file mode 100644 index 74ed67c8455ef..0000000000000 --- a/tests/unit/executors/evaluators/rank/test_precision.py +++ /dev/null @@ -1,49 +0,0 @@ -import numpy as np -import pytest - -from jina.executors.evaluators.rank.precision import PrecisionEvaluator - - -@pytest.mark.parametrize( - 'eval_at, expected', - [(None, 0.4), (0, 0.0), (2, 1.0), (4, 0.5), (5, 0.4), (100, 0.4)], -) -def test_precision_evaluator(eval_at, expected): - matches_ids = [0, 1, 2, 3, 4] - - desired_ids = [1, 0, 20, 30, 40] - - evaluator = PrecisionEvaluator(eval_at=eval_at) - assert evaluator.evaluate(actual=matches_ids, desired=desired_ids) == expected - np.testing.assert_almost_equal(evaluator.mean, expected) - - -@pytest.mark.parametrize( - 'eval_at, expected_first', - [(None, 0.4), (0, 0.0), (2, 1.0), (4, 0.5), (5, 0.4), (100, 0.4)], -) -def test_precision_evaluator_average(eval_at, expected_first): - matches_ids = [[0, 1, 2, 3, 4], [-1, -1, -1, -1, -1], [-1, -1, -1, -1, -1]] - - desired_ids = [[1, 0, 20, 30, 40], [1, 0, 20, 30, 40], [1, 0, 20, 30, 40]] - - evaluator = PrecisionEvaluator(eval_at=eval_at) - assert ( - evaluator.evaluate(actual=matches_ids[0], desired=desired_ids[0]) - == expected_first - ) - assert evaluator.evaluate(actual=matches_ids[1], desired=desired_ids[1]) == 0.0 - assert evaluator.evaluate(actual=matches_ids[2], desired=desired_ids[2]) == 0.0 - assert evaluator._running_stats._n == 3 - np.testing.assert_almost_equal(evaluator.mean, expected_first / 3) - - -def test_precision_evaluator_no_groundtruth(): - matches_ids = [0, 1, 2, 3, 4] - - desired_ids = [] - - evaluator = PrecisionEvaluator(eval_at=2) - assert evaluator.evaluate(actual=matches_ids, desired=desired_ids) == 0.0 - assert evaluator._running_stats._n == 1 - np.testing.assert_almost_equal(evaluator.mean, 0.0) diff --git a/tests/unit/executors/evaluators/rank/test_recall.py b/tests/unit/executors/evaluators/rank/test_recall.py deleted file mode 100644 index 3551621140e4c..0000000000000 --- a/tests/unit/executors/evaluators/rank/test_recall.py +++ /dev/null @@ -1,50 +0,0 @@ -import numpy as np -import pytest - -from jina.executors.evaluators.rank.recall import RecallEvaluator - - -@pytest.mark.parametrize( - 'eval_at, expected', - [(None, 0.4), (0, 0.0), (1, 0.2), (2, 0.4), (3, 0.4), (5, 0.4), (100, 0.4)], -) -def test_recall_evaluator(eval_at, expected): - matches_ids = [0, 1, 2, 3, 4] - - desired_ids = [1, 0, 20, 30, 40] - - evaluator = RecallEvaluator(eval_at=eval_at) - assert evaluator.evaluate(actual=matches_ids, desired=desired_ids) == expected - assert evaluator._running_stats._n == 1 - np.testing.assert_almost_equal(evaluator.mean, expected) - - -@pytest.mark.parametrize( - 'eval_at, expected_first', - [(None, 0.4), (0, 0.0), (1, 0.2), (2, 0.4), (3, 0.4), (5, 0.4), (100, 0.4)], -) -def test_recall_evaluator_average(eval_at, expected_first): - matches_ids = [[0, 1, 2, 3, 4], [0, 1, 2, 3, 4], [0, 1, 2, 3, 4]] - - desired_ids = [[1, 0, 20, 30, 40], [-1, -1, -1, -1, -1], [-1, -1, -1, -1, -1]] - - evaluator = RecallEvaluator(eval_at=eval_at) - assert ( - evaluator.evaluate(actual=matches_ids[0], desired=desired_ids[0]) - == expected_first - ) - assert evaluator.evaluate(actual=matches_ids[1], desired=desired_ids[1]) == 0.0 - assert evaluator.evaluate(actual=matches_ids[2], desired=desired_ids[2]) == 0.0 - assert evaluator._running_stats._n == 3 - np.testing.assert_almost_equal(evaluator.mean, expected_first / 3) - - -def test_recall_evaluator_no_matches(): - matches_ids = [] - - desired_ids = [1, 0, 20, 30, 40] - - evaluator = RecallEvaluator(eval_at=2) - assert evaluator.evaluate(actual=matches_ids, desired=desired_ids) == 0.0 - assert evaluator._running_stats._n == 1 - np.testing.assert_almost_equal(evaluator.mean, 0.0) diff --git a/tests/unit/executors/evaluators/test_runningstats.py b/tests/unit/executors/evaluators/test_runningstats.py deleted file mode 100644 index 509795a45ddb5..0000000000000 --- a/tests/unit/executors/evaluators/test_runningstats.py +++ /dev/null @@ -1,16 +0,0 @@ -import numpy as np - -from jina.executors.evaluators.running_stats import RunningStats - - -def test_running_stats(): - a = np.random.random([50]) - r = RunningStats() - - for aa in a: - r += aa - - np.testing.assert_almost_equal(a.mean(), r.mean) - np.testing.assert_almost_equal(a.std(), r.std) - - print(str(r)) diff --git a/tests/unit/executors/indexers/__init__.py b/tests/unit/executors/indexers/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/executors/indexers/binarypb.yml b/tests/unit/executors/indexers/binarypb.yml deleted file mode 100644 index bfb04af6ed8a7..0000000000000 --- a/tests/unit/executors/indexers/binarypb.yml +++ /dev/null @@ -1,15 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: test-docs -metas: - workspace: $JINA_TEST_WORKSPACE_BINARY_PB - name: test-indexer -requests: - on: - SearchRequest: - - !KVSearchDriver - with: - is_update: true - traversal_paths: [ 'r' ] - IndexRequest: - - !KVIndexDriver {} \ No newline at end of file diff --git a/tests/unit/executors/indexers/test_binary_indexer.py b/tests/unit/executors/indexers/test_binary_indexer.py deleted file mode 100644 index 2784751dc6164..0000000000000 --- a/tests/unit/executors/indexers/test_binary_indexer.py +++ /dev/null @@ -1,204 +0,0 @@ -import copy -import os -import time - -import numpy as np -import pytest -from jina.executors.indexers import BaseIndexer -from jina.executors.indexers.keyvalue import BinaryPbIndexer -from jina.flow import Flow -from tests import random_docs, validate_callback - - -@pytest.mark.parametrize('random_workspace_name', ['JINA_TEST_WORKSPACE_BINARY_PB']) -def test_binarypb_in_flow(test_metas, mocker): - docs = list(random_docs(10)) - - def validate(req): - assert len(docs) == len(req.docs) - for d, d0 in zip(req.docs, docs): - np.testing.assert_almost_equal(d.embedding, d0.embedding) - - f = Flow().add(uses='binarypb.yml') - - with f: - f.index(docs) - - docs_no_embedding = copy.deepcopy(docs) - for d in docs_no_embedding: - d.ClearField('embedding') - - mock = mocker.Mock() - with f: - f.search(docs_no_embedding, on_done=mock) - - mock.assert_called_once() - validate_callback(mock, validate) - - -@pytest.mark.parametrize('delete_on_dump', [True, False]) -def test_binarypb_update1(test_metas, delete_on_dump): - with BinaryPbIndexer(metas=test_metas, delete_on_dump=delete_on_dump) as idxer: - idxer.add(['1', '2', '3'], [b'oldvalue', b'same', b'random']) - idxer.save() - assert idxer.size == 3 - - first_size = os.path.getsize(idxer.index_abspath) - save_abspath = idxer.save_abspath - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.query(['1']) == [b'oldvalue'] - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.query(['1']) == [b'oldvalue'] - - second_size = os.path.getsize(idxer.index_abspath) - assert second_size == first_size - - with BaseIndexer.load(save_abspath) as idxer: - # some new value - idxer.update(['1', '2'], [b'newvalue', b'same']) - idxer.save() - - third_size = os.path.getsize(idxer.index_abspath) - if delete_on_dump: - assert third_size == first_size - else: - assert third_size > first_size - assert idxer.size == 3 - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.query(['1']) == [b'newvalue'] - assert idxer.query(['2']) == [b'same'] - assert idxer.query(['3']) == [b'random'] - assert idxer.query(['99']) == [None] - - with BaseIndexer.load(save_abspath) as idxer: - # partial update when missing keys encountered - idxer.update(['1', '2', '99'], [b'abcvalue', b'abcd', b'WILL_BE_IGNORED']) - idxer.save() - assert idxer.size == 3 - - fourth_size = os.path.getsize(idxer.index_abspath) - if delete_on_dump: - assert fourth_size == first_size - else: - assert fourth_size > first_size - assert idxer.size == 3 - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.query(['1']) == [b'abcvalue'] - assert idxer.query(['2']) == [b'abcd'] - assert idxer.query(['3']) == [b'random'] - assert idxer.query(['99']) == [None] - assert idxer.query(['1', '2']) == [b'abcvalue', b'abcd'] - assert idxer.query(['1', '2', '3']) == [b'abcvalue', b'abcd', b'random'] - - -@pytest.mark.parametrize('delete_on_dump', [True, False]) -def test_binarypb_add_and_update_not_working(test_metas, delete_on_dump): - with BinaryPbIndexer(metas=test_metas, delete_on_dump=delete_on_dump) as idxer: - idxer.add(['11', '12', '13'], [b'eleven', b'twelve', b'thirteen']) - idxer.save() - # FIXME `add` and `update` won't work in the same context - # since `.save` calls `.flush` on a closed handler - # and the handler needs to have been - # closed for us to allow querying in the `.update` - with pytest.raises(AttributeError): - idxer.update(['12'], [b'twelve-new']) - idxer.save() - assert idxer.size == 3 - save_abspath = idxer.save_abspath - - with BaseIndexer.load(save_abspath) as idxer: - idxer.update(['12'], [b'twelve-new']) - idxer.save() - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.query(['11']) == [b'eleven'] - assert idxer.query(['12']) == [b'twelve-new'] - assert idxer.query(['12', '13']) == [b'twelve-new', b'thirteen'] - assert idxer.size == 3 - assert idxer.sample() in (b'eleven', b'twelve-new', b'thirteen') - - -@pytest.mark.parametrize('delete_on_dump', [True, False]) -def test_binarypb_delete(test_metas, delete_on_dump): - with BinaryPbIndexer(metas=test_metas, delete_on_dump=delete_on_dump) as idxer: - idxer.add(['1', '2', '3'], [b'oldvalue', b'same', b'random']) - idxer.save() - assert idxer.size == 3 - save_abspath = idxer.save_abspath - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.size == 3 - assert idxer.query('1') == [b'oldvalue'] - - with BaseIndexer.load(save_abspath) as idxer: - idxer.delete(iter(['1', '2'])) - idxer.save() - assert idxer.size == 1 - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.query('1') == [None] - assert idxer.query('2') == [None] - assert idxer.query('3') == [b'random'] - - -@pytest.mark.parametrize('delete_on_dump', [True, False]) -def test_binarypb_update_twice(test_metas, delete_on_dump): - """two updates in a row does work""" - with BinaryPbIndexer(metas=test_metas, delete_on_dump=delete_on_dump) as idxer: - idxer.add(['1', '2', '3'], [b'oldvalue', b'same', b'random']) - idxer.save() - assert idxer.size == 3 - save_abspath = idxer.save_abspath - - with BaseIndexer.load(save_abspath) as idxer: - idxer.update(['1', '2'], [b'newvalue', b'othernewvalue']) - idxer.save() - - with BaseIndexer.load(save_abspath) as idxer: - assert idxer.query(['1']) == [b'newvalue'] - assert idxer.query(['2']) == [b'othernewvalue'] - assert idxer.query(['1', '2']) == [b'newvalue', b'othernewvalue'] - - -# benchmark only -@pytest.mark.skipif( - 'GITHUB_WORKFLOW' in os.environ, reason='skip the benchmark test on github workflow' -) -@pytest.mark.parametrize('delete_on_dump', [True, False]) -def test_binarypb_benchmark(test_metas, delete_on_dump): - entries = 100000 - nr_to_update = 10000 - keys = np.arange(entries) - values = np.random.randint(0, 10, size=entries).astype(bytes) - - with BinaryPbIndexer(metas=test_metas, delete_on_dump=delete_on_dump) as idxer: - idxer.add(keys, values) - idxer.save() - assert idxer.size == entries - save_abspath = idxer.save_abspath - - new_values = np.random.randint(0, 10, size=nr_to_update).astype(bytes) - - with BaseIndexer.load(save_abspath) as idxer: - idxer.update(keys[:nr_to_update], new_values) - time_now = time.time() - idxer.save() - - time_end = time.time() - print( - f'delete_on_dump = {delete_on_dump}, entries={entries}. took {time_end - time_now} seconds' - ) - - -def test_kvindexer_iterate(test_metas): - """two updates in a row does work""" - with BinaryPbIndexer(metas=test_metas) as idxer: - idxer.add(['1', '2', '3'], [b'oldvalue', b'same', b'random']) - save_abspath = idxer.save_abspath - - with BaseIndexer.load(save_abspath) as idxer: - assert list(idxer) == [[b'oldvalue'], [b'same'], [b'random']] diff --git a/tests/unit/executors/indexers/test_numpyindexer.py b/tests/unit/executors/indexers/test_numpyindexer.py deleted file mode 100644 index 7856c9798bfc0..0000000000000 --- a/tests/unit/executors/indexers/test_numpyindexer.py +++ /dev/null @@ -1,543 +0,0 @@ -import os - -import numpy as np -import pytest - -from jina.executors.indexers import BaseIndexer -from jina.executors.indexers.vector import NumpyIndexer - -# fix the seed here - -np.random.seed(500) -retr_idx = None -num_data = 100 -num_dim = 64 -num_query = 10 -vec_idx = np.array( - np.random.randint(0, high=num_data, size=[num_data]), dtype=(np.str_, 16) -) -vec = np.random.random([num_data, num_dim]) -query = np.array(np.random.random([num_query, num_dim]), dtype=np.float32) - - -@pytest.mark.parametrize( - 'batch_size, compress_level', [(None, 0), (None, 1), (2, 0), (2, 1)] -) -def test_numpy_indexer(batch_size, compress_level, test_metas): - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=compress_level, - metas=test_metas, - ) as indexer: - indexer.batch_size = batch_size - indexer.add(vec_idx, vec) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - if compress_level == 0: - assert isinstance(indexer.query_handler, np.memmap) - idx, dist = indexer.query(query, top_k=4) - assert idx.shape == dist.shape - assert idx.shape == (num_query, 4) - assert indexer.sample().shape == (num_dim,) - - -def test_numpy_indexer_long_ids(test_metas): - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=0, - metas=test_metas, - ) as indexer: - indexer.batch_size = 4 - long_vec_id = np.array(vec_idx, dtype=(np.str_, 20)) - long_vec_id[0] = '1234512345123451234' - indexer.add(long_vec_id, vec) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - idx, dist = indexer.query(query, top_k=4) - assert idx.shape == dist.shape - assert idx.shape == (num_query, 4) - - -def test_numpy_indexer_assert_shape_mismatch(test_metas): - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=0, - metas=test_metas, - ) as indexer: - indexer.batch_size = 4 - vec_short = np.array([[1, 1, 1], [2, 2, 2]]) - vec_keys = np.array([1, 2, 3]) - with pytest.raises(ValueError): - indexer.add(vec_keys, vec_short) - - -@pytest.mark.parametrize( - 'batch_size, compress_level', [(None, 0), (None, 1), (16, 0), (16, 1)] -) -def test_numpy_indexer_known(batch_size, compress_level, test_metas): - vectors = np.array([[1, 1, 1], [10, 10, 10], [100, 100, 100], [1000, 1000, 1000]]) - keys = np.array(['4', '5', '6', '7'], dtype=(np.str_, 16)) - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=compress_level, - metas=test_metas, - ) as indexer: - indexer.batch_size = batch_size - indexer.add(keys, vectors) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - queries = np.array([[1, 1, 1], [10, 10, 10], [100, 100, 100], [1000, 1000, 1000]]) - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - if compress_level == 0: - assert isinstance(indexer.query_handler, np.memmap) - idx, dist = indexer.query(queries, top_k=2) - np.testing.assert_equal( - idx, np.array([['4', '5'], ['5', '4'], ['6', '5'], ['7', '6']]) - ) - assert idx.shape == dist.shape - assert idx.shape == (4, 2) - np.testing.assert_equal(indexer.query_by_key(['7', '4']), vectors[[3, 0]]) - - -@pytest.mark.parametrize( - 'batch_size, compress_level', [(None, 0), (None, 1), (16, 0), (16, 1)] -) -def test_scipy_indexer(batch_size, compress_level, test_metas): - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - backend='scipy', - compress_level=compress_level, - metas=test_metas, - ) as indexer: - indexer.batch_size = batch_size - indexer.add(vec_idx, vec) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - if compress_level == 0: - assert isinstance(indexer.query_handler, np.memmap) - idx, dist = indexer.query(query, top_k=4) - assert idx.shape == dist.shape - assert idx.shape == (num_query, 4) - - -@pytest.mark.parametrize( - 'batch_size, compress_level', [(None, 0), (None, 1), (16, 0), (16, 1)] -) -def test_numpy_indexer_known_big(batch_size, compress_level, test_metas): - """Let's try to have some real test. We will have an index with 10k vectors of random values between 5 and 10. - We will change tweak some specific vectors that we expect to be retrieved at query time. We will tweak vector - at index [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000], this will also be the query vectors. - Then the keys will be assigned shifted to test the proper usage of `int2ext_id` and `ext2int_id` - """ - vectors = np.random.uniform(low=5.0, high=10.0, size=(10000, 1024)) - - queries = np.empty((10, 1024)) - for idx in range(0, 10000, 1000): - array = idx * np.ones((1, 1024)) - queries[int(idx / 1000)] = array - vectors[idx] = array - - # TODO: PLLEASE DO NOT BUILD FLAKY KEYS LIKE THIS - keys = np.squeeze( - np.array(np.arange(10000, 20000).reshape(-1, 1), dtype=(np.str_, 16)) - ) - - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=compress_level, - metas=test_metas, - ) as indexer: - indexer.add(keys, vectors) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - if compress_level == 0: - assert isinstance(indexer.query_handler, np.memmap) - idx, dist = indexer.query(queries, top_k=1) - np.testing.assert_equal( - idx, - np.array( - [ - ['10000'], - ['11000'], - ['12000'], - ['13000'], - ['14000'], - ['15000'], - ['16000'], - ['17000'], - ['18000'], - ['19000'], - ] - ), - ) - assert idx.shape == dist.shape - assert idx.shape == (10, 1) - np.testing.assert_equal( - indexer.query_by_key(['10000', '15000']), vectors[[0, 5000]] - ) - - -@pytest.mark.parametrize('compress_level', [0, 1, 2, 3, 4, 5]) -def test_scipy_indexer_known_big(compress_level, test_metas): - """Let's try to have some real test. We will have an index with 10k vectors of random values between 5 and 10. - We will change tweak some specific vectors that we expect to be retrieved at query time. We will tweak vector - at index [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000], this will also be the query vectors. - Then the keys will be assigned shifted to test the proper usage of `int2ext_id` and `ext2int_id` - """ - vectors = np.random.uniform(low=5.0, high=10.0, size=(10000, 1024)) - - queries = np.empty((10, 1024)) - for idx in range(0, 10000, 1000): - array = idx * np.ones((1, 1024)) - queries[int(idx / 1000)] = array - vectors[idx] = array - - keys = np.squeeze( - np.array(np.arange(10000, 20000).reshape(-1, 1), dtype=(np.str_, 16)) - ) - - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - backend='scipy', - compress_level=compress_level, - metas=test_metas, - ) as indexer: - indexer.add(keys, vectors) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - if compress_level == 0: - assert isinstance(indexer.query_handler, np.memmap) - idx, dist = indexer.query(queries, top_k=1) - np.testing.assert_equal( - idx, - np.array( - [ - ['10000'], - ['11000'], - ['12000'], - ['13000'], - ['14000'], - ['15000'], - ['16000'], - ['17000'], - ['18000'], - ['19000'], - ] - ), - ) - assert idx.shape == dist.shape - assert idx.shape == (10, 1) - np.testing.assert_equal( - indexer.query_by_key(['10000', '15000']), vectors[[0, 5000]] - ) - - -@pytest.mark.parametrize( - 'batch_size, num_docs, top_k', - [(1, 10, 1), (1, 10, 10), (10, 1, 1), (10, 1000, 10), (10, 10, 100)], -) -def test__get_sorted_top_k(batch_size, num_docs, top_k, test_metas): - dist = np.random.uniform(size=(batch_size, num_docs)) - - expected_idx = np.argsort(dist)[:, :top_k] - expected_dist = np.sort(dist)[:, :top_k] - - with NumpyIndexer(metric='euclidean', metas=test_metas) as indexer: - idx, dist = indexer._get_sorted_top_k(dist, top_k=top_k) - - np.testing.assert_equal(idx, expected_idx) - np.testing.assert_equal(dist, expected_dist) - - -@pytest.mark.parametrize( - 'batch_size, compress_level', [(None, 0), (None, 1), (2, 0), (2, 1)] -) -def test_numpy_indexer_empty_data(batch_size, compress_level, test_metas): - idx_file_path = os.path.join(test_metas['workspace'], 'np.test.gz') - with NumpyIndexer( - index_filename=str(idx_file_path), - compress_level=compress_level, - metas=test_metas, - ) as indexer: - indexer.batch_size = batch_size - indexer.touch() - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - idx, dist = indexer.query(query, top_k=4) - assert len(idx) == 0 - assert len(dist) == 0 - - -@pytest.mark.parametrize('metric', ['euclidean', 'cosine']) -def test_indexer_one_dimensional(metric, test_metas): - import math - - add_vec_idx = np.array(['0'], dtype=(np.str_, 16)) - add_vec = np.asarray([[1]]) - query_vec = np.asarray([[2]]) - with NumpyIndexer( - metric=metric, index_filename='np.test.gz', metas=test_metas - ) as indexer: - indexer.add(add_vec_idx, add_vec) - - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - assert isinstance(indexer.query_handler, np.memmap) - idx, dist = indexer.query(query_vec, top_k=4) - assert idx.shape == dist.shape - assert idx.shape == (1, 1) - assert not math.isnan(dist[0]) - - -@pytest.mark.parametrize('dimension', [1, 64]) -@pytest.mark.parametrize('metric', ['euclidean', 'cosine']) -def test_indexer_zeros(metric, dimension, test_metas): - import math - - query_vec = np.array(np.zeros([1, dimension]), dtype=np.float32) - add_vec_idx = np.array( - np.random.randint(0, high=num_data, size=[num_data]), dtype=(np.str_, 16) - ) - add_vec = np.random.random([num_data, dimension]) - with NumpyIndexer( - metric=metric, index_filename='np.test.gz', metas=test_metas - ) as indexer: - indexer.add(add_vec_idx, add_vec) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - assert isinstance(indexer.query_handler, np.memmap) - idx, dist = indexer.query(query_vec, top_k=4) - - assert idx.shape == dist.shape - assert idx.shape == (1, 4) - if metric == 'cosine': - assert all(math.isnan(x) for x in dist[0]) - else: - assert not any(math.isnan(x) for x in dist[0]) - - -@pytest.mark.parametrize('real_delete', [True, False]) -@pytest.mark.parametrize('compress_level', [0, 1, 2, 3, 4, 5]) -def test_numpy_update_delete_delete_on_dump(compress_level, real_delete, test_metas): - np.random.seed(500) - num_dim = 3 - vec_idx = np.array(['12', '112', '903'], dtype=(np.str_, 16)) - vec = np.random.random([len(vec_idx), num_dim]) - - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=compress_level, - metas=test_metas, - delete_on_dump=real_delete, - ) as indexer: - indexer.add(vec_idx, vec) - indexer.save() - assert indexer.num_dim == num_dim - assert indexer.size == len(vec_idx) - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - query_results = indexer.query_by_key(vec_idx) - assert np.array_equal(vec, query_results) - - # update - key_to_update = vec_idx[0] - data_to_update = np.random.random([1, num_dim]) - # nonexistent key - random_keys = np.array(['999'], dtype=(np.str_, 16)) - random_data = np.random.random([1, num_dim]) - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - # NON-EXISTENT KEYS: this will log warning but not fail - indexer.update(random_keys, random_data) - indexer.update([key_to_update], data_to_update) - indexer.save() - - with BaseIndexer.load(save_abspath) as indexer: - # this tests the real delete - if real_delete: - assert len(indexer.valid_indices) == indexer.size - else: - assert len(indexer.valid_indices) != indexer.size - assert isinstance(indexer, NumpyIndexer) - query_results = indexer.query_by_key([key_to_update]) - assert np.array_equal(data_to_update, query_results) - - # delete - keys_to_delete = 1 - vec_idx_to_delete = vec_idx[:keys_to_delete] - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - indexer.delete(vec_idx_to_delete) - indexer.save() - assert indexer.size == len(vec_idx) - keys_to_delete - - assert indexer.size == len(vec_idx) - keys_to_delete - - with BaseIndexer.load(save_abspath) as indexer: - # this tests the real delete - if real_delete: - assert len(indexer.valid_indices) == indexer.size - else: - assert len(indexer.valid_indices) != indexer.size - assert isinstance(indexer, NumpyIndexer) - assert indexer.size == len(vec_idx) - keys_to_delete - # random non-existent key - assert indexer.query_by_key(['123861942']) is None - query_results = indexer.query_by_key(vec_idx[keys_to_delete:]) - expected = vec[keys_to_delete:] - np.testing.assert_allclose(query_results, expected, equal_nan=True) - - -@pytest.mark.parametrize( - 'batch_size, compress_level', [(None, 0), (None, 1), (16, 0), (16, 1)] -) -def test_numpy_indexer_known_and_delete_delete_dump( - batch_size, compress_level, test_metas -): - vectors = np.array([[1, 1, 1], [10, 10, 10], [100, 100, 100]]) - keys = np.array(['4', '5', '6'], dtype=(np.str_, 16)) - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=compress_level, - metas=test_metas, - delete_on_dump=True, - ) as indexer: - indexer.batch_size = batch_size - indexer.add(keys, vectors) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - top_k = 3 - queries = np.array([[1, 1, 1], [10, 10, 10]]) - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - idx, dist = indexer.query(queries, top_k=top_k) - np.testing.assert_equal(idx, np.array([['4', '5', '6'], ['5', '4', '6']])) - assert idx.shape == dist.shape - assert idx.shape == (len(queries), top_k) - np.testing.assert_equal( - indexer.query_by_key(['5', '4', '6']), vectors[[1, 0, 2]] - ) - - # update and query again - key_to_update = np.array(['4']) - data_to_update = np.array([[1000, 1000, 1000]]) - - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - indexer.update(key_to_update, data_to_update) - indexer.save() - - with BaseIndexer.load(save_abspath) as indexer: - # this tests the real delete - assert len(indexer.valid_indices) == indexer.size - assert isinstance(indexer, NumpyIndexer) - idx, dist = indexer.query(queries, top_k=top_k) - np.testing.assert_equal(idx, np.array([['5', '6', '4'], ['5', '6', '4']])) - assert idx.shape == dist.shape - assert idx.shape == (len(queries), top_k) - - # delete and query again - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - indexer.delete([4]) - indexer.save() - - top_k = 2 - queries = np.array([[100, 100, 100], [10, 10, 10]]) - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - idx, dist = indexer.query(queries, top_k=2) - np.testing.assert_equal(idx, np.array([['6', '5'], ['5', '6']])) - assert idx.shape == dist.shape - assert idx.shape == (len(queries), top_k) - np.testing.assert_equal(indexer.query_by_key(['6', '5']), vectors[[2, 1]]) - - # test query by nonexistent key - with BaseIndexer.load(save_abspath) as indexer: - assert isinstance(indexer, NumpyIndexer) - assert indexer.query_by_key(['91237124']) is None - - -@pytest.mark.parametrize('compress_level', [0, 1, 2, 3]) -def test_numpy_indexer_with_ref_indexer(compress_level, test_metas): - vectors = np.array([[1, 1, 1], [10, 10, 10], [100, 100, 100], [1000, 1000, 1000]]) - keys = np.array(['4', '5', '6', '7'], dtype=(np.str_, 16)) - with NumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=compress_level, - metas=test_metas, - ) as indexer: - indexer.add(keys, vectors) - indexer.save() - assert os.path.exists(indexer.index_abspath) - index_filename = indexer.index_filename - - queries = np.array([[1, 1, 1], [10, 10, 10], [100, 100, 100], [1000, 1000, 1000]]) - with NumpyIndexer( - metric='euclidean', ref_indexer=indexer, metas=test_metas - ) as new_indexer: - np.testing.assert_equal(np.array(list(new_indexer)), vectors) - assert new_indexer.compress_level == compress_level - assert new_indexer.index_filename == index_filename - assert isinstance(indexer, NumpyIndexer) - if compress_level == 0: - assert isinstance(new_indexer.query_handler, np.memmap) - idx, dist = new_indexer.query(queries, top_k=2) - np.testing.assert_equal( - idx, np.array([['4', '5'], ['5', '4'], ['6', '5'], ['7', '6']]) - ) - assert idx.shape == dist.shape - assert idx.shape == (4, 2) - np.testing.assert_equal(new_indexer.query_by_key(['7', '4']), vectors[[3, 0]]) diff --git a/tests/unit/executors/indexers/test_numpyindexer_batching.py b/tests/unit/executors/indexers/test_numpyindexer_batching.py deleted file mode 100644 index eb8f214b585b9..0000000000000 --- a/tests/unit/executors/indexers/test_numpyindexer_batching.py +++ /dev/null @@ -1,77 +0,0 @@ -import os - -import pytest -import numpy as np - -from jina.executors.decorators import batching -from jina.executors.indexers import BaseIndexer -from jina.executors.indexers.vector import NumpyIndexer, _ext_B, _euclidean - - -class MockNumpyIndexer(NumpyIndexer): - @batching(merge_over_axis=1, slice_on=2) - def _euclidean(self, cached_A, raw_B): - assert raw_B.shape[0] == self.batch_size - data = _ext_B(raw_B) - return _euclidean(cached_A, data) - - -@pytest.mark.parametrize('batch_size', [2, 5, 10, 20, 100, 500]) -def test_numpy_indexer_known_big_batch(batch_size, test_metas): - """Let's try to have some real test. We will have an index with 10k vectors of random values between 5 and 10. - We will change tweak some specific vectors that we expect to be retrieved at query time. We will tweak vector - at index [0, 1000, 2000, 3000, 4000, 5000, 6000, 7000, 8000, 9000], this will also be the query vectors. - Then the keys will be assigned shifted to test the proper usage of `int2ext_id` and `ext2int_id` - """ - vectors = np.random.uniform(low=5.0, high=10.0, size=(10000, 1024)) - - queries = np.empty((10, 1024)) - for idx in range(0, 10000, 1000): - array = idx * np.ones((1, 1024)) - queries[int(idx / 1000)] = array - vectors[idx] = array - - # TODO: PLLEASE DO NOT BUILD FLAKY KEYS LIKE THIS - keys = np.squeeze( - np.array(np.arange(10000, 20000).reshape(-1, 1), dtype=(np.str_, 16)) - ) - - with MockNumpyIndexer( - metric='euclidean', - index_filename='np.test.gz', - compress_level=0, - metas=test_metas, - ) as indexer: - indexer.batch_size = batch_size - indexer.add(keys, vectors) - indexer.save() - assert os.path.exists(indexer.index_abspath) - save_abspath = indexer.save_abspath - - with BaseIndexer.load(save_abspath) as indexer: - indexer.batch_size = batch_size - assert isinstance(indexer, MockNumpyIndexer) - assert isinstance(indexer._raw_ndarray, np.memmap) - idx, dist = indexer.query(queries, top_k=1) - np.testing.assert_equal( - idx, - np.array( - [ - ['10000'], - ['11000'], - ['12000'], - ['13000'], - ['14000'], - ['15000'], - ['16000'], - ['17000'], - ['18000'], - ['19000'], - ] - ), - ) - assert idx.shape == dist.shape - assert idx.shape == (10, 1) - np.testing.assert_equal( - indexer.query_by_key(['10000', '15000']), vectors[[0, 5000]] - ) diff --git a/tests/unit/executors/test_comp_exec.py b/tests/unit/executors/test_comp_exec.py deleted file mode 100644 index 5117187a84952..0000000000000 --- a/tests/unit/executors/test_comp_exec.py +++ /dev/null @@ -1,47 +0,0 @@ -import os - -import pytest - -from jina.executors import BaseExecutor -from jina.executors.compound import CompoundExecutor -from tests import rm_files - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -class DummyA(BaseExecutor): - def say(self): - return 'a' - - -class DummyB(BaseExecutor): - def say(self): - return 'b' - - -def test_compositional_dump(test_metas): - a = CompoundExecutor(metas=test_metas) - a.components = lambda: [BaseExecutor(), BaseExecutor()] - assert a.name - a.touch() - a.save() - a.save_config() - assert os.path.exists(a.save_abspath) - assert os.path.exists(a.config_abspath) - rm_files([a.save_abspath, a.config_abspath]) - - -@pytest.fixture -def tmp_workspace(tmpdir): - os.environ['JINA_TEST_COMPOUND_FROM_YAML'] = str(tmpdir) - yield - del os.environ['JINA_TEST_COMPOUND_FROM_YAML'] - - -def test_compound_from_yaml(tmp_workspace): - a = BaseExecutor.load_config(os.path.join(cur_dir, 'yaml/npvec.yml')) - assert isinstance(a, CompoundExecutor) - assert callable(getattr(a, 'add')) - assert callable(getattr(a, 'query')) - assert callable(getattr(a, 'meta_add')) - assert callable(getattr(a, 'meta_query')) diff --git a/tests/unit/executors/test_decorators.py b/tests/unit/executors/test_decorators.py index 153914993c2d9..2c8d9c7036a3b 100644 --- a/tests/unit/executors/test_decorators.py +++ b/tests/unit/executors/test_decorators.py @@ -1,46 +1,6 @@ -import os - -import numpy as np import pytest -from jina.executors.decorators import ( - as_update_method, - as_ndarray, - batching, - store_init_kwargs, - single, -) - - -def test_as_update_method(): - class A: - def __init__(self): - self.is_updated = False - @as_update_method - def f(self): - pass - - a = A() - assert not a.is_updated - a.f() - assert a.is_updated - - -def test_as_ndarray(): - class A: - @as_ndarray - def f_list(self, *args, **kwargs): - return [0] - - @as_ndarray - def f_int(self, *args, **kwargs): - return 0 - - a = A() - - assert isinstance(a.f_list(), np.ndarray) - with pytest.raises(TypeError): - a.f_int() +from jina.executors.decorators import store_init_kwargs, requests def test_store_init_kwargs(): @@ -61,435 +21,15 @@ def f(self, a, b, *args, **kwargs): instance.f('a', 'b', c='c') -def test_single(): - class A: - def __init__(self): - self.call_nbr = 0 - - @single - def f(self, data): - assert isinstance(data, int) - self.call_nbr += 1 - return data - - instance = A() - result = instance.f([1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - assert instance.call_nbr == 4 - - result = instance.f(1) - assert result == 1 - - -def test_single_kwargs_call(): - class A: - @single - def f(self, data): - assert isinstance(data, int) - return data - - instance = A() - result = instance.f(data=1) - assert result == 1 - - -def test_single_np_ndarray(): - class A: - def __init__(self): - self.call_nbr = 0 - - @single - def f(self, data): - assert isinstance(data, np.ndarray) - assert data.shape == (5,) - self.call_nbr += 1 - return data - - instance = A() - input_np = np.random.random((4, 5)) - result = instance.f(input_np) - np.testing.assert_equal(result, input_np) - assert instance.call_nbr == 4 - - -def test_single_np_ndarray_kwargs_call(): - class A: - @single - def f(self, data): - assert isinstance(data, np.ndarray) - assert data.shape == (5,) - return data - - instance = A() - input_np = np.random.random(5) - result = instance.f(data=input_np) - np.testing.assert_equal(result, input_np) - - -def test_single_string(): - class A: - def __init__(self): - self.call_nbr = 0 - - @single - def f(self, data): - assert isinstance(data, str) - return data - - instance = A() - result = instance.f(['test0', 'test1']) - assert len(result) == 2 - for i, res in enumerate(result): - assert res == f'test{i}' - - result = instance.f('test0') - assert result == 'test0' - - -def test_single_bytes(): - class A: - def __init__(self): - self.call_nbr = 0 - - @single - def f(self, data): - assert isinstance(data, bytes) - return data - - instance = A() - result = instance.f([str.encode('test0'), str.encode('test1')]) - assert len(result) == 2 - for i, res in enumerate(result): - assert res == str.encode(f'test{i}') - - result = instance.f(b'test0') - assert result == b'test0' - - -def test_batching(): - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - self.batch_sizes = [] - - @batching - def f(self, data): - self.batch_sizes.append(len(data)) - return data - - instance = A(1) - result = instance.f([1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - assert len(instance.batch_sizes) == 4 - for batch_size in instance.batch_sizes: - assert batch_size == 1 - - instance = A(3) - result = instance.f([1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - assert len(instance.batch_sizes) == 2 - assert instance.batch_sizes[0] == 3 - assert instance.batch_sizes[1] == 1 - - instance = A(5) - result = instance.f([1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - assert len(instance.batch_sizes) == 1 - assert instance.batch_sizes[0] == 4 - - -@pytest.mark.parametrize('input_shape', [(4, 5), (4, 5, 5)]) -def test_batching_np_array(input_shape): - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - self.batch_sizes = [] - - @batching - def f(self, data): - self.batch_sizes.append(len(data)) - return data - - instance = A(1) - input_np = np.random.random(input_shape) - result = instance.f(input_np) - np.testing.assert_equal(result, input_np) - assert len(instance.batch_sizes) == 4 - for batch_size in instance.batch_sizes: - assert batch_size == 1 - - instance = A(3) - result = instance.f(input_np) - np.testing.assert_equal(result, input_np) - assert len(instance.batch_sizes) == 2 - assert instance.batch_sizes[0] == 3 - assert instance.batch_sizes[1] == 1 - - instance = A(5) - result = instance.f(input_np) - np.testing.assert_equal(result, input_np) - assert len(instance.batch_sizes) == 1 - assert instance.batch_sizes[0] == 4 - - -def test_batching_slice_on(): - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - self.batch_sizes = [] - - @batching(slice_on=2) - def f(self, key, data): - self.batch_sizes.append(len(data)) - return data - - instance = A(1) - result = instance.f(None, [1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - assert len(instance.batch_sizes) == 4 - for batch_size in instance.batch_sizes: - assert batch_size == 1 - - instance = A(3) - result = instance.f(None, [1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - assert len(instance.batch_sizes) == 2 - assert instance.batch_sizes[0] == 3 - assert instance.batch_sizes[1] == 1 - - instance = A(5) - result = instance.f(None, [1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - assert len(instance.batch_sizes) == 1 - assert instance.batch_sizes[0] == 4 - - -def test_batching_memmap(tmpdir): - path = os.path.join(str(tmpdir), 'vec.gz') - vec = np.random.random([10, 10]) - with open(path, 'wb') as f: - f.write(vec.tobytes()) - - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - - @batching - def f(self, data): - assert data.shape == (2, 10) - return data - - instance = A(2) - result = instance.f( - np.memmap(path, dtype=vec.dtype.name, mode='r', shape=vec.shape) - ) - assert result.shape == (10, 10) - assert isinstance(result, np.ndarray) - - -def test_batching_ordinal_idx_arg(tmpdir): - path = os.path.join(str(tmpdir), 'vec.gz') - vec = np.random.random([10, 10]) - with open(path, 'wb') as f: - f.write(vec.tobytes()) - - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - self.ord_idx = [] - - @batching(ordinal_idx_arg=2) - def f(self, data, ord_idx): - self.ord_idx.append(ord_idx) - return list(range(ord_idx.start, ord_idx.stop)) - - instance = A(2) - result = instance.f( - np.memmap(path, dtype=vec.dtype.name, mode='r', shape=vec.shape), - slice(0, vec.shape[0]), - ) - assert len(instance.ord_idx) == 5 - assert instance.ord_idx[0].start == 0 - assert instance.ord_idx[0].stop == 2 - assert instance.ord_idx[1].start == 2 - assert instance.ord_idx[1].stop == 4 - assert instance.ord_idx[2].start == 4 - assert instance.ord_idx[2].stop == 6 - assert instance.ord_idx[3].start == 6 - assert instance.ord_idx[3].stop == 8 - assert instance.ord_idx[4].start == 8 - assert instance.ord_idx[4].stop == 10 - - assert result == [0, 1, 2, 3, 4, 5, 6, 7, 8, 9] - - -@pytest.mark.skip( - reason='Currently wrong implementation of batching with labels, not well considered in batching helper' -) -def test_batching_with_label(): - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - - @batching(label_on=2) - def f(self, data, labels): - return data, labels - - instance = A(2) - data = [1, 1, 2, 2] - labels = ['label1', 'label1', 'label2', 'label2'] - result = instance.f(data, labels) - assert result == [[(1, 'label1'), (1, 'label1')], [(2, 'label2'), (2, 'label2')]] - - -def test_batching_multi(): - slice_nargs = 3 - - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - self.batching = [] - - @batching(slice_nargs=slice_nargs) - def f(self, *datas): - assert len(datas) == slice_nargs - d0, d1, d2 = datas - assert d0.shape == (2, 2) - assert d1.shape == (2, 4) - assert d2.shape == (2, 6) - concat = np.concatenate(datas, axis=1) - self.batching.append(concat) - return concat - - num_docs = 4 - batch_size = 2 - instance = A(batch_size) - data0 = np.random.rand(num_docs, 2) - data1 = np.random.rand(num_docs, 4) - data2 = np.random.rand(num_docs, 6) - data = [data0, data1, data2] - result = instance.f(*data) - from math import ceil - - result_dim = sum([d.shape[1] for d in data]) - assert result.shape == (num_docs, result_dim) - assert len(instance.batching) == ceil(num_docs / batch_size) - for batch in instance.batching: - assert batch.shape == (batch_size, result_dim) - - -def test_single_multi(): - class A: - def __init__(self): - self.call_nbr = 0 - - @single(slice_nargs=3) - def f(self, data0, data1, data2): - assert isinstance(data0, int) - assert isinstance(data1, int) - assert isinstance(data2, int) - self.call_nbr += 1 - return data1 - - instance = A() - data0 = [0, 0, 0, 0] - data1 = [1, 1, 1, 1] - data2 = [2, 2, 2, 2] - data = [data0, data1, data2] - result = instance.f(*data) - assert result == [1, 1, 1, 1] - assert instance.call_nbr == 4 - - instance = A() - result = instance.f(0, 1, 2) - assert result == 1 - - -def test_batching_as_ndarray(): - class A: - def __init__(self, batch_size): - self.batch_size = batch_size - self.batch_sizes = [] - - @as_ndarray - @batching - def f(self, data): - self.batch_sizes.append(len(data)) - return data - - instance = A(1) - input_data = [[1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1], [1, 1, 1, 1]] - result = instance.f(input_data) - assert isinstance(result, np.ndarray) - np.testing.assert_equal(result, np.array(input_data)) - assert len(instance.batch_sizes) == 4 - for batch_size in instance.batch_sizes: - assert batch_size == 1 - - instance = A(3) - result = instance.f(input_data) - assert isinstance(result, np.ndarray) - np.testing.assert_equal(result, np.array(input_data)) - assert len(instance.batch_sizes) == 2 - assert instance.batch_sizes[0] == 3 - assert instance.batch_sizes[1] == 1 - - instance = A(5) - result = instance.f(input_data) - assert isinstance(result, np.ndarray) - np.testing.assert_equal(result, np.array(input_data)) - assert len(instance.batch_sizes) == 1 - assert instance.batch_sizes[0] == 4 - - -def test_single_slice_on(): - class A: - @single(slice_on=2) - def f(self, key, data, *args, **kwargs): - assert isinstance(data, int) - return data - - instance = A() - result = instance.f(None, [1, 1, 1, 1]) - assert result == [1, 1, 1, 1] - - -def test_single_multi_input_slice_on(): - class A: - @single(slice_on=1, slice_nargs=2) - def f(self, key, data, *args, **kwargs): - assert isinstance(data, int) - assert isinstance(key, str) - return data - - instance = A() - data = instance.f(['a', 'b', 'c', 'd'], [1, 1, 1, 1]) - assert isinstance(data, list) - assert data == [1, 1, 1, 1] - - -@pytest.mark.parametrize('slice_on, num_data', [(1, 3), (2, 2)]) -def test_single_multi_input_slice_on_error(slice_on, num_data): - class A: - @single(slice_on=slice_on, slice_nargs=num_data) - def f(self, key, data, *args, **kwargs): - assert isinstance(data, int) - assert isinstance(key, str) - return data - - instance = A() - with pytest.raises(IndexError): - instance.f(['a', 'b', 'c', 'd'], [1, 1, 1, 1]) +def test_requests(): + with pytest.raises(TypeError): + @requests + def fn(*args): + pass -def test_single_multi_input_kwargs_call(): - class A: - @single - def f(self, key, data, *args, **kwargs): - assert isinstance(data, int) - assert isinstance(key, str) - return data + @requests + def fn_2(*args, **kwargs): + pass - instance = A() - result = instance.f(data=1, key='a') - assert result == 1 + assert hasattr(fn_2, 'fn') diff --git a/tests/unit/executors/test_dump_executor_with_drivers.py b/tests/unit/executors/test_dump_executor_with_drivers.py deleted file mode 100644 index 212ee81ae6bb5..0000000000000 --- a/tests/unit/executors/test_dump_executor_with_drivers.py +++ /dev/null @@ -1,55 +0,0 @@ -import os -import pickle -import pytest - -from jina.drivers.control import RouteDriver -from jina.executors import BaseExecutor - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def test_dump_driver(tmpdir): - rd = RouteDriver(raise_no_dealer=True) - rd.idle_dealer_ids = ('hello', 'there') - - with open(str(tmpdir / 'a.bin'), 'wb') as fp: - pickle.dump(rd, fp) - - with open(str(tmpdir / 'a.bin'), 'rb') as fp: - p = pickle.load(fp) - - # init args & kwargs values should be save - assert p.raise_no_dealer - - # other stateful values should be reset to init()'s time - assert not p.idle_dealer_ids - - -def test_dump_exegit_scutor_without_drivers(tmpdir): - # Create an executor from a yaml file and store it to disc - executor_a = BaseExecutor.load_config(f'{cur_dir}/yaml/route.yml') - executor_a.touch() - executor_a._drivers['ControlRequest'][0].idle_dealer_ids = ('hello', 'there') - executor_a.save(str(tmpdir / 'aux.bin')) - - # load the saved executor_a as executor_b - executor_b = BaseExecutor.load(str(tmpdir / 'aux.bin')) - assert hasattr(executor_b, '_drivers') is False - - -@pytest.fixture -def temp_workspace(tmpdir): - os.environ['JINA_TEST_LOAD_FROM_DUMP_WORKSPACE'] = str(tmpdir) - yield - del os.environ['JINA_TEST_LOAD_FROM_DUMP_WORKSPACE'] - - -def test_drivers_renewed_from_yml_when_loaded_from_dump(temp_workspace): - executor_a = BaseExecutor.load_config(f'{cur_dir}/yaml/example_1.yml') - assert executor_a._drivers['SearchRequest'][0]._is_update is True - - with executor_a: - executor_a.touch() - - executor_b = BaseExecutor.load_config(f'{cur_dir}/yaml/example_2.yml') - assert executor_b._drivers['SearchRequest'][0]._is_update is False diff --git a/tests/unit/executors/test_executor.py b/tests/unit/executors/test_executor.py new file mode 100644 index 0000000000000..8dc0701decda9 --- /dev/null +++ b/tests/unit/executors/test_executor.py @@ -0,0 +1,136 @@ +import os + +import pytest + +from jina import Executor +from jina.executors.metas import get_default_metas + + +@property +def workspace(self) -> str: + """ + Get the path of the current shard. + + :return: returns the workspace of the shard of this Executor. + """ + return os.path.abspath( + self.metas.workspace + or ( + os.path.join(self.runtime_args.workspace, self.metas.name) + if self.metas.replica_id == -1 + else os.path.join( + self.runtime_args.workspace, self.metas.name, self.metas.replica_id + ) + ) + ) + + +@pytest.fixture +def replica_id(request): + return request.param + + +@pytest.fixture +def pea_id(request): + return request.param + + +@pytest.fixture +def test_metas_workspace_simple(tmpdir): + metas = get_default_metas() + metas['workspace'] = str(tmpdir) + metas['name'] = 'test' + return metas + + +@pytest.fixture +def test_bad_metas_workspace(tmpdir): + metas = get_default_metas() + return metas + + +@pytest.fixture +def test_metas_workspace_replica_peas(tmpdir, replica_id, pea_id): + metas = get_default_metas() + metas['workspace'] = str(tmpdir) + metas['name'] = 'test' + metas['replica_id'] = replica_id + metas['pea_id'] = pea_id + return metas + + +def test_executor_workspace_simple(test_metas_workspace_simple): + executor = Executor(metas=test_metas_workspace_simple) + assert executor.workspace == os.path.abspath( + test_metas_workspace_simple['workspace'] + ) + + +@pytest.mark.parametrize('replica_id', [0, 1, 2], indirect=True) +@pytest.mark.parametrize('pea_id', [0, 1, 2], indirect=True) +def test_executor_workspace(test_metas_workspace_replica_peas, replica_id, pea_id): + executor = Executor( + metas={'name': test_metas_workspace_replica_peas['name']}, + runtime_args=test_metas_workspace_replica_peas, + ) + assert executor.workspace == os.path.abspath( + os.path.join( + test_metas_workspace_replica_peas['workspace'], + test_metas_workspace_replica_peas['name'], + str(replica_id), + str(pea_id), + ) + ) + + +@pytest.mark.parametrize('replica_id', [0, 1, 2], indirect=True) +@pytest.mark.parametrize('pea_id', [None, -1], indirect=True) +def test_executor_workspace_parent_replica_nopea( + test_metas_workspace_replica_peas, replica_id, pea_id +): + executor = Executor( + metas={'name': test_metas_workspace_replica_peas['name']}, + runtime_args=test_metas_workspace_replica_peas, + ) + assert executor.workspace == os.path.abspath( + os.path.join( + test_metas_workspace_replica_peas['workspace'], + test_metas_workspace_replica_peas['name'], + str(replica_id), + ) + ) + + +@pytest.mark.parametrize('replica_id', [None, -1], indirect=True) +@pytest.mark.parametrize('pea_id', [0, 1, 2], indirect=True) +def test_executor_workspace_parent_noreplica_pea( + test_metas_workspace_replica_peas, replica_id, pea_id +): + executor = Executor( + metas={'name': test_metas_workspace_replica_peas['name']}, + runtime_args=test_metas_workspace_replica_peas, + ) + assert executor.workspace == os.path.abspath( + os.path.join( + test_metas_workspace_replica_peas['workspace'], + test_metas_workspace_replica_peas['name'], + str(pea_id), + ) + ) + + +@pytest.mark.parametrize('replica_id', [None, -1], indirect=True) +@pytest.mark.parametrize('pea_id', [None, -1], indirect=True) +def test_executor_workspace_parent_noreplica_nopea( + test_metas_workspace_replica_peas, replica_id, pea_id +): + executor = Executor( + metas={'name': test_metas_workspace_replica_peas['name']}, + runtime_args=test_metas_workspace_replica_peas, + ) + assert executor.workspace == os.path.abspath( + os.path.join( + test_metas_workspace_replica_peas['workspace'], + test_metas_workspace_replica_peas['name'], + ) + ) diff --git a/tests/unit/executors/test_executor_function.py b/tests/unit/executors/test_executor_function.py deleted file mode 100644 index a55e10c808964..0000000000000 --- a/tests/unit/executors/test_executor_function.py +++ /dev/null @@ -1,201 +0,0 @@ -import numpy as np -import pytest - -from jina import DocumentArray, Document -from jina.drivers.encode import EncodeDriver -from jina.executors.encoders import BaseEncoder -from tests import random_docs - - -def test_extract_multi_fields(mocker): - encode_mock = mocker.Mock() - - class MyExecutor(BaseEncoder): - def encode(self, id, embedding): - encode_mock() - assert isinstance(id, list) - assert isinstance(embedding, list) - assert isinstance(id[0], str) - assert isinstance(embedding[0], np.ndarray) - - exec = MyExecutor() - bd = EncodeDriver() - - bd.attach(exec, runtime=None) - docs = list(random_docs(10)) - - ds = DocumentArray(docs) - - bd._apply_all(ds) - encode_mock.assert_called() - - -def test_extract_multi_fields_with_ndarray_type(mocker): - encode_mock = mocker.Mock() - - class MyExecutor(BaseEncoder): - def encode(self, id: 'np.ndarray', embedding: 'np.ndarray'): - encode_mock() - assert isinstance(id, np.ndarray) - assert isinstance(embedding, np.ndarray) - assert isinstance(id[0], str) - assert isinstance(embedding[0], np.ndarray) - - exec = MyExecutor() - bd = EncodeDriver() - - bd.attach(exec, runtime=None) - docs = list(random_docs(10)) - - ds = DocumentArray(docs) - - bd._apply_all(ds) - encode_mock.assert_called() - - -def test_extract_bad_fields(mocker): - encode_mock = mocker.Mock() - - class MyExecutor(BaseEncoder): - def encode(self, data): - encode_mock() - - exec = MyExecutor() - bd = EncodeDriver() - - bd.attach(exec, runtime=None) - docs = list(random_docs(10)) - - ds = DocumentArray(docs) - - with pytest.raises( - AttributeError, match='is now deprecated and not a valid argument' - ): - bd._apply_all(ds) - encode_mock.assert_not_called() - - class MyExecutor(BaseEncoder): - def encode(self, hello): - encode_mock() - - exec = MyExecutor() - bd = EncodeDriver() - bd.attach(exec, runtime=None) - - with pytest.raises(AttributeError, match='are invalid Document attributes'): - bd._apply_all(ds) - encode_mock.assert_not_called() - - class MyExecutor(BaseEncoder): - def encode(self, mimeType): - encode_mock() - - exec = MyExecutor() - bd = EncodeDriver() - bd.attach(exec, runtime=None) - - with pytest.raises(AttributeError, match='you give them in CamelCase'): - bd._apply_all(ds) - encode_mock.assert_not_called() - - -def test_extract_bad_fields_no_strict_args(mocker): - encode_mock = mocker.Mock() - - class MyExecutor(BaseEncoder): - def encode(self, hello): - encode_mock() - - exec = MyExecutor() - bd = EncodeDriver(strict_method_args=False) - - bd.attach(exec, runtime=None) - docs = list(random_docs(10)) - - ds = DocumentArray(docs) - - bd._apply_all(ds) - encode_mock.assert_not_called() - - -def test_exec_fn_arbitrary_name(mocker): - encode_mock = mocker.Mock() - - class MyExecutor(BaseEncoder): - def foo(self, id): - assert isinstance(id[0], str) - assert isinstance(id, list) - encode_mock() - - exec = MyExecutor() - bd = EncodeDriver(method='foo') - - bd.attach(exec, runtime=None) - docs = list(random_docs(10)) - - ds = DocumentArray(docs) - - bd._apply_all(ds) - encode_mock.assert_called() - - -def test_exec_fn_return_dict(mocker): - encode_mock = mocker.Mock() - - class MyExecutor(BaseEncoder): - def encode(self, id): - encode_mock() - return [{'id': 'hello'}] * len(id) - - exec = MyExecutor() - bd = EncodeDriver() - - bd.attach(exec, runtime=None) - docs = list(random_docs(10)) - - ds = DocumentArray(docs) - - bd._apply_all(ds) - encode_mock.assert_called() - - for d in ds: - assert d.id == 'hello' - - -def test_exec_fn_return_doc(mocker): - encode_mock = mocker.Mock() - - class MyExecutor(BaseEncoder): - def encode(self, id): - encode_mock() - return [Document(mime_type='image/png')] * len(id) - - exec = MyExecutor() - bd = EncodeDriver() - - bd.attach(exec, runtime=None) - docs = list(random_docs(10)) - - ds = DocumentArray(docs) - - bd._apply_all(ds) - encode_mock.assert_called() - - for d in ds: - assert d.mime_type == 'image/png' - - -def test_exec_fn_annotation(): - class MyExecutor(BaseEncoder): - def foo( - self, a: 'np.ndarray', b: np.ndarray, c: np.float, *args, **kwargs - ) -> 'np.ndarray': - pass - - exec = MyExecutor() - bd = EncodeDriver(method='foo', strict_method_args=False) - - bd.attach(exec, runtime=None) - - assert bd._exec_fn_return_is_ndarray - assert bd._exec_fn_required_keys_is_ndarray == [True, True, False] diff --git a/tests/unit/executors/test_executor_logger.py b/tests/unit/executors/test_executor_logger.py deleted file mode 100644 index bc6f12e593d26..0000000000000 --- a/tests/unit/executors/test_executor_logger.py +++ /dev/null @@ -1,38 +0,0 @@ -import pytest - -from jina.executors import BaseExecutor -from jina.executors.metas import get_default_metas -from jina.parsers import set_pea_parser - - -@pytest.fixture(scope='function', autouse=True) -def metas(tmpdir): - metas = get_default_metas() - metas['workspace'] = str(tmpdir) - yield metas - - -def test_executor_logger(metas): - from fluent import asynchandler as fluentasynchandler - - args = set_pea_parser().parse_args([]) - with BaseExecutor(args, metas=metas) as executor: - assert len(executor.logger.logger.handlers) == 2 - has_fluent = False - for h in executor.logger.logger.handlers: - if isinstance(h, fluentasynchandler.FluentHandler): - has_fluent = True - assert has_fluent - executor.logger.info('logging from executor') - executor.touch() - executor.save() - save_abspath = executor.save_abspath - - with BaseExecutor.load(save_abspath) as executor: - assert len(executor.logger.logger.handlers) == 2 - has_fluent = False - for h in executor.logger.logger.handlers: - if isinstance(h, fluentasynchandler.FluentHandler): - has_fluent = True - assert has_fluent - executor.logger.info('logging from executor') diff --git a/tests/unit/executors/test_route_exec.py b/tests/unit/executors/test_route_exec.py deleted file mode 100644 index 7d0e375d966c4..0000000000000 --- a/tests/unit/executors/test_route_exec.py +++ /dev/null @@ -1,28 +0,0 @@ -import os -from pprint import pprint - -import pytest - -from jina.executors import BaseExecutor -from jina.flow import Flow -from tests import random_docs - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def test_load_driver(): - b = BaseExecutor.load_config(os.path.join(cur_dir, 'yaml/route.yml')) - pprint(b._drivers) - - c = BaseExecutor.load_config('_pass') - assert len(b._drivers['ControlRequest']) == len(c._drivers['ControlRequest']) - pprint(c._drivers) - - -@pytest.mark.skip('https://github.com/jina-ai/jina/pull/1070') -def test_route(): - docs = random_docs(num_docs=2, chunks_per_doc=2) - f = Flow().add(uses_before=os.path.join(cur_dir, 'yaml', 'route.yml'), shards=2) - - with f: - f.index(docs) diff --git a/tests/unit/executors/test_set_metas.py b/tests/unit/executors/test_set_metas.py index 075f90d65cf0e..c0fae432109c6 100644 --- a/tests/unit/executors/test_set_metas.py +++ b/tests/unit/executors/test_set_metas.py @@ -1,19 +1,50 @@ +from types import SimpleNamespace + from jina.executors import BaseExecutor -from jina.executors.indexers.vector import NumpyIndexer -from jina.executors.metas import get_default_metas -def test_set_batch_size(): - batch_size = 325 - metas = get_default_metas() - metas['batch_size'] = batch_size - indexer = NumpyIndexer(index_filename=f'test.gz', metas=metas) - assert indexer.batch_size == batch_size +def test_exec_from_python(): + be = BaseExecutor(metas={'name': 'hello', 'random_name': 'random_value'}) + assert be.metas.name == 'hello' + assert be.metas.random_name == 'random_value' + + +def test_runtime_args(): + b = BaseExecutor.load_config( + 'BaseExecutor', metas={'name': 'b123'}, runtime_args={'hello': 'world'} + ) + + assert b.runtime_args.hello == 'world' + assert b.metas.name == 'b123' + + +def test_default_args_from_load_config(): + b = BaseExecutor.load_config('!BaseExecutor {}') + + assert isinstance(b.runtime_args, SimpleNamespace) + assert isinstance(b.metas, SimpleNamespace) + # name is always auto-assigned + assert b.metas.name + + +def test_runtime_args_from_load_config(): + y = ''' +!BaseExecutor +metas: + name: my-mwu-encoder + workspace: ./ + ''' + + b = BaseExecutor.load_config(y) + + assert b.metas.workspace == './' + assert b.metas.name == 'my-mwu-encoder' + +def test_default_args_from_python(): + b = BaseExecutor() -def test_set_dummy_meta(): - dummy = 325 - metas = get_default_metas() - metas['dummy'] = dummy - executor = BaseExecutor(metas=metas) - assert executor.dummy == dummy + assert isinstance(b.runtime_args, SimpleNamespace) + assert isinstance(b.metas, SimpleNamespace) + # name is always auto-assigned + assert b.metas.name diff --git a/tests/unit/executors/test_set_requests.py b/tests/unit/executors/test_set_requests.py deleted file mode 100644 index d2b84811c39f8..0000000000000 --- a/tests/unit/executors/test_set_requests.py +++ /dev/null @@ -1,141 +0,0 @@ -from jina.drivers.delete import DeleteDriver -from jina.drivers.encode import EncodeDriver -from jina.drivers.querylang.filter import FilterQL -from jina.executors import BaseExecutor - -y_no_fill = """ -!BaseEncoder -requests: - use_default: false -""" - - -def test_no_fill(): - be = BaseExecutor.load_config(y_no_fill) - assert not be._drivers - - -y_no_fill_with_index_request = """ -!BaseEncoder -requests: - use_default: false - on: - IndexRequest: - - !RouteDriver {} -""" - - -def test_no_fill_with_index_request(): - be = BaseExecutor.load_config(y_no_fill_with_index_request) - assert len(be._drivers) == 2 - assert 'IndexRequest' in be._drivers - assert 'ControlRequest' in be._drivers - - -y_fill_default_with_index_request = """ -!BaseEncoder -requests: - use_default: true - on: - IndexRequest: - - !EncodeDriver {} -""" - - -def test_fill_default_with_index_request(): - be = BaseExecutor.load_config(y_fill_default_with_index_request) - assert len(be._drivers) == 6 - assert isinstance(be._drivers['IndexRequest'][0], EncodeDriver) - print(be._drivers['IndexRequest'][0]._init_kwargs_dict) - - -y_fill_default_with_index_request_no_with = """ -!BaseEncoder -requests: - use_default: true - on: - IndexRequest: - drivers: - - !FilterQL - with: - lookups: - mime_type: image/jpeg - - !EncodeDriver {} -""" - - -def test_with_common_kwargs_on_index_no_with(): - be = BaseExecutor.load_config(y_fill_default_with_index_request_no_with) - assert len(be._drivers) == 6 - assert isinstance(be._drivers['IndexRequest'][1], EncodeDriver) - assert isinstance(be._drivers['IndexRequest'][0], FilterQL) - - -y_fill_default_with_index_request_with_common = """ -!BaseEncoder -requests: - use_default: true - on: - IndexRequest: - with: - traversal_paths: ['mmm'] - drivers: - - !FilterQL - with: - lookups: - mime_type: image/jpeg - - !EncodeDriver {} -""" - - -def test_with_common_kwargs_on_index(): - be = BaseExecutor.load_config(y_fill_default_with_index_request_with_common) - assert len(be._drivers) == 6 - assert isinstance(be._drivers['IndexRequest'][1], EncodeDriver) - assert isinstance(be._drivers['IndexRequest'][0], FilterQL) - assert be._drivers['IndexRequest'][0]._traversal_paths == ['mmm'] - assert be._drivers['IndexRequest'][1]._traversal_paths == ['mmm'] - - -y_fill_default_with_two_request_with_common = """ -!BaseEncoder -requests: - use_default: true - on: - [IndexRequest, SearchRequest]: - with: - traversal_paths: ['mmm'] - drivers: - - !FilterQL - with: - lookups: - mime_type: image/jpeg - - !EncodeDriver {} - [DeleteRequest]: - with: - traversal_paths: ['ccc'] - drivers: - - !FilterQL - with: - lookups: - mime_type: image/jpeg - - !DeleteDriver {} -""" - - -def test_with_common_kwargs_on_two_requests(): - be = BaseExecutor.load_config(y_fill_default_with_two_request_with_common) - assert len(be._drivers) == 6 - - for r in ('IndexRequest', 'SearchRequest', 'DeleteRequest'): - if r == 'DeleteRequest': - assert isinstance(be._drivers[r][1], DeleteDriver) - else: - assert isinstance(be._drivers[r][1], EncodeDriver) - assert isinstance(be._drivers[r][0], FilterQL) - if r == 'DeleteRequest': - assert be._drivers[r][0]._traversal_paths == ['ccc'] - assert be._drivers[r][1]._traversal_paths == ['ccc'] - else: - assert be._drivers[r][0]._traversal_paths == ['mmm'] - assert be._drivers[r][1]._traversal_paths == ['mmm'] diff --git a/tests/unit/executors/yaml/example_1.yml b/tests/unit/executors/yaml/example_1.yml deleted file mode 100644 index 803b8b5660188..0000000000000 --- a/tests/unit/executors/yaml/example_1.yml +++ /dev/null @@ -1,30 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: images.gz -metas: - name: images_kv_idx # a customized name - workspace: $JINA_TEST_LOAD_FROM_DUMP_WORKSPACE -requests: - on: - IndexRequest: - - !FilterQL - with: - lookups: {'modality': 'image'} - - !Buffer2URI - with: - base64: True - - !ExcludeQL - with: - fields: - - chunks - - embedding - - buffer - - !KVIndexDriver - with: - executor: images_kv_idx - SearchRequest: - - !KVSearchDriver - with: - executor: images_kv_idx - traversal_paths: ['m'] - is_update: True \ No newline at end of file diff --git a/tests/unit/executors/yaml/example_2.yml b/tests/unit/executors/yaml/example_2.yml deleted file mode 100644 index 800c8a1c96ea1..0000000000000 --- a/tests/unit/executors/yaml/example_2.yml +++ /dev/null @@ -1,30 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: images.gz -metas: - name: images_kv_idx # a customized name - workspace: $JINA_TEST_LOAD_FROM_DUMP_WORKSPACE -requests: - on: - IndexRequest: - - !FilterQL - with: - lookups: {'modality': 'image'} - - !Buffer2URI - with: - base64: True - - !ExcludeQL - with: - fields: - - chunks - - embedding - - buffer - - !KVIndexDriver - with: - executor: images_kv_idx - SearchRequest: - - !KVSearchDriver - with: - executor: images_kv_idx - traversal_paths: ['m'] - is_update: False \ No newline at end of file diff --git a/tests/unit/executors/yaml/npvec.yml b/tests/unit/executors/yaml/npvec.yml deleted file mode 100644 index 2612a282d3ce4..0000000000000 --- a/tests/unit/executors/yaml/npvec.yml +++ /dev/null @@ -1,30 +0,0 @@ -!CompoundExecutor -components: - - !NumpyIndexer - with: - metric: euclidean - num_dim: -1 - index_key: HNSW32 - index_filename: vec.idx - metas: - name: my_vec_indexer # a customized name - workspace: $JINA_TEST_COMPOUND_FROM_YAML - - !BinaryPbIndexer - with: - index_filename: chunk.gzip - metas: - name: chunk_meta_indexer - workspace: $JINA_TEST_COMPOUND_FROM_YAML -with: - routes: - meta_add: - chunk_meta_indexer: add - meta_query: - chunk_meta_indexer: query - query: - my_vec_indexer: query - add: - my_vec_indexer: add -metas: - name: chunk_compound_indexer - workspace: 'test-workspace' \ No newline at end of file diff --git a/tests/unit/executors/yaml/route.yml b/tests/unit/executors/yaml/route.yml deleted file mode 100644 index 9eb93798a691a..0000000000000 --- a/tests/unit/executors/yaml/route.yml +++ /dev/null @@ -1,17 +0,0 @@ -!BaseExecutor -with: {} -metas: - name: route -requests: - on: -# [SearchRequest, TrainRequest, IndexRequest, ControlRequest]: -# - !RouteDriver {} - ControlRequest: - - !RouteDriver {} - - !ControlReqDriver {} - SearchRequest: - - !RouteDriver {} - IndexRequest: - - !RouteDriver {} - TrainRequest: - - !RouteDriver {} \ No newline at end of file diff --git a/tests/unit/flow/test_asyncflow.py b/tests/unit/flow/test_asyncflow.py index 9d7b9c9e97440..d1bceedb99486 100644 --- a/tests/unit/flow/test_asyncflow.py +++ b/tests/unit/flow/test_asyncflow.py @@ -1,4 +1,5 @@ import asyncio +import time import numpy as np import pytest @@ -7,7 +8,6 @@ from jina.flow.asyncio import AsyncFlow from jina.logging.profile import TimeContext from jina.types.request import Response - from tests import validate_callback num_docs = 5 @@ -31,7 +31,7 @@ def documents(start_index, end_index): with Document() as chunk: chunk.text = 'text in chunk' chunk.tags['id'] = 'id in chunk tags' - doc.chunks.add(chunk) + doc.chunks.append(chunk) yield doc @@ -40,14 +40,16 @@ def documents(start_index, end_index): async def test_run_async_flow(restful, mocker): r_val = mocker.Mock() with AsyncFlow(restful=restful).add() as f: - async for r in f.index_ndarray(np.random.random([num_docs, 4]), on_done=r_val): + async for r in f.index( + Document.from_ndarray(np.random.random([num_docs, 4])), on_done=r_val + ): assert isinstance(r, Response) validate_callback(r_val, validate) async def async_input_function(): for _ in range(num_docs): - yield np.random.random([4]) + yield Document(content=np.random.random([4])) await asyncio.sleep(0.1) @@ -78,9 +80,17 @@ async def test_run_async_flow_async_input(restful, inputs, mocker): async def run_async_flow_5s(restful): # WaitDriver pause 5s makes total roundtrip ~5s - with AsyncFlow(restful=restful).add(uses='- !WaitDriver {}') as f: - async for r in f.index_ndarray( - np.random.random([num_docs, 4]), on_done=validate + from jina import Executor, requests + + class Wait5s(Executor): + @requests + def foo(self, **kwargs): + print('im called!') + time.sleep(5) + + with AsyncFlow(restful=restful).add(uses=Wait5s) as f: + async for r in f.index( + Document.from_ndarray(np.random.random([num_docs, 4])), on_done=validate ): assert isinstance(r, Response) @@ -127,7 +137,7 @@ async def test_run_async_flow_other_task_concurrent(restful): @pytest.mark.parametrize('restful', [False]) async def test_return_results_async_flow(return_results, restful): with AsyncFlow(restful=restful, return_results=return_results).add() as f: - async for r in f.index_ndarray(np.random.random([10, 2])): + async for r in f.index(Document.from_ndarray(np.random.random([10, 2]))): assert isinstance(r, Response) @@ -139,3 +149,18 @@ async def test_return_results_async_flow_crud(return_results, restful, flow_api) with AsyncFlow(restful=restful, return_results=return_results).add() as f: async for r in getattr(f, flow_api)(documents(0, 10)): assert isinstance(r, Response) + + +@pytest.mark.asyncio +async def test_async_flow_empty_data(): + + from jina import Executor, requests + + class MyExec(Executor): + @requests + def foo(self, parameters, **kwargs): + assert parameters['hello'] == 'world' + + with AsyncFlow().add(uses=MyExec) as f: + async for r in f.post('/hello', parameters={'hello': 'world'}): + assert isinstance(r, Response) diff --git a/tests/unit/flow/test_flow.py b/tests/unit/flow/test_flow.py index 710ad57ef775f..b9c843e342fa7 100644 --- a/tests/unit/flow/test_flow.py +++ b/tests/unit/flow/test_flow.py @@ -9,7 +9,6 @@ from jina.executors import BaseExecutor from jina.helper import random_identity from jina.peapods.pods import BasePod -from jina.proto.jina_pb2 import DocumentProto from jina.types.request import Response from tests import random_docs, validate_callback @@ -180,16 +179,6 @@ def docpb_workspace(tmpdir): del os.environ['TEST_DOCSHARD_WORKSPACE'] -def test_shards(docpb_workspace): - f = Flow().add( - name='doc_pb', uses=os.path.join(cur_dir, '../yaml/test-docpb.yml'), parallel=3 - ) - with f: - f.index(inputs=random_docs(1000), random_doc_id=False) - with f: - pass - - def test_py_client(): f = ( Flow() @@ -305,8 +294,8 @@ def test_dry_run_with_two_pathways_diverging_at_non_gateway(): def test_refactor_num_part(): f = ( Flow() - .add(name='r1', uses='_logforward', needs='gateway') - .add(name='r2', uses='_logforward', needs='gateway') + .add(name='r1', needs='gateway') + .add(name='r2', needs='gateway') .join(['r1', 'r2']) ) @@ -331,9 +320,9 @@ def test_refactor_num_part(): def test_refactor_num_part_proxy(): f = ( Flow() - .add(name='r1', uses='_logforward') - .add(name='r2', uses='_logforward', needs='r1') - .add(name='r3', uses='_logforward', needs='r1') + .add(name='r1') + .add(name='r2', needs='r1') + .add(name='r3', needs='r1') .join(['r2', 'r3']) ) @@ -363,31 +352,27 @@ def test_refactor_num_part_proxy(): def test_refactor_num_part_proxy_2(restful): f = ( Flow(restful=restful) - .add(name='r1', uses='_logforward') - .add(name='r2', uses='_logforward', needs='r1', parallel=2) - .add(name='r3', uses='_logforward', needs='r1', parallel=3, polling='ALL') + .add(name='r1') + .add(name='r2', needs='r1', parallel=2) + .add(name='r3', needs='r1', parallel=3, polling='ALL') .needs(['r2', 'r3']) ) with f: - f.index(['abbcs', 'efgh']) + f.index([Document(text='abbcs'), Document(text='efgh')]) @pytest.mark.parametrize('restful', [False, True]) def test_refactor_num_part_2(restful): - f = Flow(restful=restful).add( - name='r1', uses='_logforward', needs='gateway', parallel=3, polling='ALL' - ) + f = Flow(restful=restful).add(name='r1', needs='gateway', parallel=3, polling='ALL') with f: - f.index(['abbcs', 'efgh']) + f.index([Document(text='abbcs'), Document(text='efgh')]) - f = Flow(restful=restful).add( - name='r1', uses='_logforward', needs='gateway', parallel=3 - ) + f = Flow(restful=restful).add(name='r1', needs='gateway', parallel=3) with f: - f.index(['abbcs', 'efgh']) + f.index([Document(text='abbcs'), Document(text='efgh')]) @pytest.fixture() @@ -397,28 +382,17 @@ def datauri_workspace(tmpdir): del os.environ['TEST_DATAURIINDEX_WORKSPACE'] -@pytest.mark.parametrize('restful', [False, True]) -def test_index_text_files(mocker, restful, datauri_workspace): - def validate(req): - assert len(req.docs) > 0 - for d in req.docs: - assert d.mime_type == 'text/plain' - - response_mock = mocker.Mock() - - f = Flow(restful=restful, read_only=True).add( - uses=os.path.join(cur_dir, '../yaml/datauriindex.yml'), timeout_ready=-1 - ) - files = os.path.join(cur_dir, 'yaml/*.yml') - with f: - f.index_files(files, on_done=response_mock) - - validate_callback(response_mock, validate) - - # TODO(Deepankar): Gets stuck when `restful: True` - issues with `needs='gateway'` @pytest.mark.parametrize('restful', [False]) def test_flow_with_publish_driver(mocker, restful): + from jina import Executor, requests + + class DummyOneHotTextEncoder(Executor): + @requests + def foo(self, docs, **kwargs): + for d in docs: + d.embedding = np.array([1, 2, 3]) + def validate(req): for d in req.docs: assert d.embedding is not None @@ -427,45 +401,15 @@ def validate(req): f = ( Flow(restful=restful) - .add(name='r2', uses='!DummyOneHotTextEncoder') - .add(name='r3', uses='!DummyOneHotTextEncoder', needs='gateway') + .add(name='r2', uses=DummyOneHotTextEncoder) + .add(name='r3', uses=DummyOneHotTextEncoder, needs='gateway') .join(needs=['r2', 'r3']) ) with f: - f.index(['text_1', 'text_2'], on_done=response_mock) - - validate_callback(response_mock, validate) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_flow_with_modalitys_simple(mocker, restful): - def validate(req): - for d in req.index.docs: - assert d.modality in ['mode1', 'mode2'] - - def input_function(): - doc1 = DocumentProto() - doc1.modality = 'mode1' - doc2 = DocumentProto() - doc2.modality = 'mode2' - doc3 = DocumentProto() - doc3.modality = 'mode1' - return [doc1, doc2, doc3] - - response_mock = mocker.Mock() - - flow = ( - Flow(restful=restful) - .add(name='chunk_seg', parallel=3) - .add( - name='encoder12', - parallel=2, - uses='- !FilterQL | {lookups: {modality__in: [mode1, mode2]}, traversal_paths: [c]}', + f.index( + [Document(text='text_1'), Document(text='text_2')], on_done=response_mock ) - ) - with flow: - flow.index(inputs=input_function, on_done=response_mock) validate_callback(response_mock, validate) @@ -494,7 +438,7 @@ def test_flow_arbitrary_needs(restful): ) with f: - f.index(['abc', 'def']) + f.index([Document(text='abbcs'), Document(text='efgh')]) @pytest.mark.parametrize('restful', [False]) @@ -513,7 +457,7 @@ def test_flow_needs_all(restful): assert f._pod_nodes['r2'].needs == {'p3', 'r1'} with f: - f.index_ndarray(np.random.random([10, 10])) + f.index(Document.from_ndarray(np.random.random([10, 10]))) f = ( Flow(restful=restful) @@ -528,7 +472,7 @@ def test_flow_needs_all(restful): assert f._pod_nodes['p4'].needs == {'r2'} with f: - f.index_ndarray(np.random.random([10, 10])) + f.index(Document.from_ndarray(np.random.random([10, 10]))) def test_flow_with_pod_envs(): @@ -564,7 +508,7 @@ def __init__(self, *args, **kwargs): @pytest.mark.parametrize('restful', [False, True]) def test_return_results_sync_flow(return_results, restful): with Flow(restful=restful, return_results=return_results).add() as f: - r = f.index_ndarray(np.random.random([10, 2])) + r = f.index(Document.from_ndarray(np.random.random([10, 2]))) if return_results: assert isinstance(r, list) assert isinstance(r[0], Response) @@ -632,8 +576,8 @@ def test_flow_identity_override(): !Flow version: '1.0' pods: - - uses: _pass - - uses: _pass + - name: hello + - name: world parallel: 3 ''' @@ -734,20 +678,6 @@ def test_flow_get_item(): assert isinstance(f1['pod0'], BasePod) -def test_flow_yaml_dump(): - import io - - f = io.StringIO() - f1 = Flow().add() - with f1: - f1.to_swarm_yaml(path=f) - assert 'gateway' in f.getvalue() - assert 'services' in f.getvalue() - assert 'jina pod' in f.getvalue() - - assert '!Flow' in f1.yaml_spec - - def test_flow_add_class(): class CustomizedExecutor(BaseExecutor): pass @@ -759,9 +689,9 @@ class CustomizedExecutor(BaseExecutor): def test_flow_allinone_yaml(): - from jina import Encoder + from jina import Executor - class CustomizedEncoder(Encoder): + class CustomizedEncoder(Executor): pass f = Flow.load_config(os.path.join(cur_dir, 'yaml/flow-allinone.yml')) @@ -771,3 +701,21 @@ class CustomizedEncoder(Encoder): f = Flow.load_config(os.path.join(cur_dir, 'yaml/flow-allinone-oldstyle.yml')) with f: pass + + +def test_flow_empty_data_request(mocker): + from jina import Executor, requests + + class MyExec(Executor): + @requests + def foo(self, parameters, **kwargs): + assert parameters['hello'] == 'world' + + f = Flow().add(uses=MyExec) + + mock = mocker.Mock() + + with f: + f.post('/hello', parameters={'hello': 'world'}, on_done=mock) + + mock.assert_called() diff --git a/tests/unit/flow/test_flow_before_after.py b/tests/unit/flow/test_flow_before_after.py index 4c4d5a2a84eb6..04d8271dde3ca 100644 --- a/tests/unit/flow/test_flow_before_after.py +++ b/tests/unit/flow/test_flow_before_after.py @@ -1,9 +1,16 @@ import pytest +from jina import Executor, requests, __default_executor__ from jina.flow import Flow from tests import random_docs +class MyExec(Executor): + @requests + def foo(self, **kwargs): + pass + + @pytest.mark.parametrize('restful', [False, True]) def test_flow(restful): docs = random_docs(10) @@ -19,7 +26,7 @@ def test_flow(restful): @pytest.mark.parametrize('restful', [False, True]) def test_flow_before(restful): docs = random_docs(10) - f = Flow(restful=restful).add(uses_before='_logforward', name='p1') + f = Flow(restful=restful).add(uses_before=MyExec, name='p1') with f: f.index(docs) @@ -31,7 +38,7 @@ def test_flow_before(restful): @pytest.mark.parametrize('restful', [False, True]) def test_flow_after(restful): docs = random_docs(10) - f = Flow(restful=restful).add(uses_after='_logforward', name='p1') + f = Flow(restful=restful).add(uses_after=MyExec, name='p1') with f: f.index(docs) @@ -41,12 +48,24 @@ def test_flow_after(restful): @pytest.mark.parametrize('restful', [False, True]) -def test_flow_before_after(restful): +def test_flow_default_before_after_is_ignored(restful): docs = random_docs(10) f = Flow(restful=restful).add( - uses_before='_logforward', uses_after='_pass', name='p1' + uses_after=__default_executor__, uses_before=__default_executor__, name='p1' ) + with f: + f.index(docs) + assert f.num_pods == 2 + assert f._pod_nodes['p1'].num_peas == 1 + assert f.num_peas == 2 + + +@pytest.mark.parametrize('restful', [False, True]) +def test_flow_before_after(restful): + docs = random_docs(10) + f = Flow(restful=restful).add(uses_before=MyExec, uses_after=MyExec, name='p1') + with f: f.index(docs) assert f.num_pods == 2 diff --git a/tests/unit/flow/test_flow_except.py b/tests/unit/flow/test_flow_except.py index f4d100bb441ce..cd617b6e26c57 100644 --- a/tests/unit/flow/test_flow_except.py +++ b/tests/unit/flow/test_flow_except.py @@ -1,18 +1,14 @@ -import pytest - import numpy as np +import pytest -from jina.executors.decorators import single -from jina.executors.crafters import BaseCrafter -from jina.flow import Flow +from jina import Flow, Executor, requests, Document from jina.proto import jina_pb2 - from tests import validate_callback -class DummyCrafterExcept(BaseCrafter): - @single - def craft(self, text, *args, **kwargs): +class DummyCrafterExcept(Executor): + @requests + def craft(self, *args, **kwargs): return 1 / 0 @@ -25,19 +21,27 @@ def validate(req): assert req.status.code == jina_pb2.StatusProto.ERROR assert bad_routes[0].pod == 'r1/ZEDRuntime' + from jina import Executor, requests + + class BadExecutor(Executor): + + @requests + def foo(self, **kwargs): + raise NotImplementedError + f = ( Flow(restful=restful) - .add(name='r1', uses='!BaseCrafter') - .add(name='r2', uses='!BaseEncoder') - .add(name='r3', uses='!BaseEncoder') + .add(name='r1', uses=BadExecutor) + .add(name='r2') + .add(name='r3') ) on_error_mock = mocker.Mock() - # always test two times, make sure the flow still works after it fails on the first + # always test two times, make sure the flow test_bad_flow_customizedstill works after it fails on the first with f: - f.index(['abbcs', 'efgh'], on_error=on_error_mock) - f.index(['abbcs', 'efgh'], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) validate_callback(on_error_mock, validate) @@ -54,9 +58,9 @@ def validate(req): f = ( Flow(restful=restful) - .add(name='r1') - .add(name='r2', uses='!DummyCrafterExcept') - .add(name='r3', uses='!BaseEncoder') + .add(name='r1') + .add(name='r2', uses='!DummyCrafterExcept') + .add(name='r3', uses='!BaseExecutor') ) with f: @@ -66,14 +70,22 @@ def validate(req): # always test two times, make sure the flow still works after it fails on the first with f: - f.index(['abbcs', 'efgh'], on_error=on_error_mock) - f.index(['abbcs', 'efgh'], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) validate_callback(on_error_mock, validate) @pytest.mark.parametrize('restful', [False, True]) def test_except_with_parallel(mocker, restful): + from jina import Executor, Flow, requests + + class MyExecutor(Executor): + + @requests + def foo(self, **kwargs): + raise NotImplementedError + def validate(req): assert req.status.code == jina_pb2.StatusProto.ERROR err_routes = [ @@ -81,15 +93,15 @@ def validate(req): ] assert len(err_routes) == 2 assert err_routes[0].exception.executor == 'DummyCrafterExcept' - assert err_routes[1].exception.executor == 'BaseEncoder' + assert err_routes[1].exception.executor == 'MyExecutor' assert err_routes[0].exception.name == 'ZeroDivisionError' assert err_routes[1].exception.name == 'NotImplementedError' f = ( Flow(restful=restful) - .add(name='r1') - .add(name='r2', uses='!DummyCrafterExcept', parallel=3) - .add(name='r3', uses='!BaseEncoder') + .add(name='r1') + .add(name='r2', uses=DummyCrafterExcept, parallel=3) + .add(name='r3', uses=MyExecutor) ) with f: @@ -99,8 +111,8 @@ def validate(req): # always test two times, make sure the flow still works after it fails on the first with f: - f.index(['abbcs', 'efgh'], on_error=on_error_mock) - f.index(['abbcs', 'efgh'], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) validate_callback(on_error_mock, validate) @@ -110,18 +122,24 @@ def test_on_error_callback(mocker, restful): def validate1(): raise NotImplementedError + class MyExecutor(Executor): + + @requests + def foo(self, **kwargs): + raise NotImplementedError + def validate2(x, *args): x = x.routes assert len(x) == 4 # gateway, r1, r3, gateway badones = [r for r in x if r.status.code == jina_pb2.StatusProto.ERROR] assert badones[0].pod == 'r3/ZEDRuntime' - f = Flow(restful=restful).add(name='r1').add(name='r3', uses='!BaseEncoder') + f = Flow(restful=restful).add(name='r1').add(name='r3', uses=MyExecutor) on_error_mock = mocker.Mock() with f: - f.index(['abbcs', 'efgh'], on_done=validate1, on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_done=validate1, on_error=on_error_mock) validate_callback(on_error_mock, validate2) @@ -140,7 +158,7 @@ def validate1(x, *args): on_error_mock = mocker.Mock() with f: - f.index(['abbcs', 'efgh'], on_done=response_mock, on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_done=response_mock, on_error=on_error_mock) validate_callback(response_mock, validate1) on_error_mock.assert_not_called() @@ -161,7 +179,7 @@ def f3(*args): hit.append('always') with f: - f.index(np.random.random([10, 10]), on_done=f1, on_error=f2, on_always=f3) + f.index(Document.from_ndarray(np.random.random([10, 10])), on_done=f1, on_error=f2, on_always=f3) assert hit == ['done', 'always'] @@ -170,9 +188,9 @@ def f3(*args): @pytest.mark.parametrize('restful', [False, True]) def test_flow_on_error_callback(restful): - class DummyCrafterNotImplemented(BaseCrafter): - @single - def craft(self, blob, *args, **kwargs): + class DummyCrafterNotImplemented(Executor): + @requests + def craft(self, text, *args, **kwargs): raise NotImplementedError f = Flow(restful=restful).add(uses='!DummyCrafterNotImplemented') @@ -188,7 +206,7 @@ def f3(*args): hit.append('always') with f: - f.index(np.random.random([10, 10]), on_done=f1, on_error=f2, on_always=f3) + f.index(Document.from_ndarray(np.random.random([10, 10])), on_done=f1, on_error=f2, on_always=f3) assert hit == ['error', 'always'] diff --git a/tests/unit/flow/test_flow_interval_dump_load.py b/tests/unit/flow/test_flow_interval_dump_load.py deleted file mode 100644 index 320ea885e84f4..0000000000000 --- a/tests/unit/flow/test_flow_interval_dump_load.py +++ /dev/null @@ -1,66 +0,0 @@ -import multiprocessing -import os -import time - -import numpy as np -import pytest - -from jina import Document, Flow - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_FLOW_DUMP_LOAD_INTERVAL_WORKSPACE'] = str(tmpdir) - yield - del os.environ['JINA_FLOW_DUMP_LOAD_INTERVAL_WORKSPACE'] - - -@pytest.fixture(scope='function') -def flow_with_dump_interval(): - return Flow().add(uses='_index', dump_interval=1) - - -@pytest.fixture(scope='function') -def flow_with_load_interval(): - return Flow().add(uses='_index', load_interval=1) - - -def test_dump_load_interval(config, flow_with_dump_interval, flow_with_load_interval): - """Run index and search in parallel, we should observe number of documents while searching - keep increasing. - We expect while indexing and quering, we should get a new `num_matches` for each run. - """ - num_matches = set() - - def input_fn(): - for idx in range(10): - time.sleep(1) - yield Document(embedding=np.array([1, 2, 3]), tags={'idx': idx}) - - def print_req(req, j): - print(f'{j}-time got {len(req.docs[0].matches)} results') - num_matches.add(len(req.docs[0].matches)) - - def index_flow_with_dump_interval(): - with flow_with_dump_interval as f: - f.index(input_fn, request_size=1) - - def search_flow_with_load_interval(): - with flow_with_load_interval as f: - for j in range(10): - f.search( - Document(embedding=np.array([1, 2, 3])), - request_size=1, - on_done=lambda x: print_req(x, j), - top_k=999, - ) - time.sleep(1) - - # run dump interval flow - t = multiprocessing.Process(target=index_flow_with_dump_interval) - t.start() - time.sleep(1) - # run load interval flow - search_flow_with_load_interval() - # verify num_matches has different values since we're querying while indexing - assert len(num_matches) > 1 diff --git a/tests/unit/flow/test_flow_merge.py b/tests/unit/flow/test_flow_merge.py index f90c6b4a1d07f..9b63e8ab4c431 100644 --- a/tests/unit/flow/test_flow_merge.py +++ b/tests/unit/flow/test_flow_merge.py @@ -2,28 +2,33 @@ import pytest -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import single -from jina.flow import Flow +from jina import Flow, Executor, requests, Document from tests import random_docs, validate_callback -class DummySegment(BaseSegmenter): +class DummySegment(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) self._label = random.random() - @single - def segment(self, id, *args, **kwargs): - return [ - dict(buffer=f'aa{self._label}'.encode()), - dict(buffer=f'bb{self._label}'.encode()), - ] + @requests + def segment(self, docs, *args, **kwargs): + for d in docs: + d.chunks = [ + Document(buffer=f'aa{self._label}'.encode()), + Document(buffer=f'bb{self._label}'.encode()), + ] + + +class Merger(Executor): + + @requests + def merge(self, docs, **kwargs): + return docs def validate(req): - chunk_ids = [c.id for d in req.index.docs for c in d.chunks] - assert len(chunk_ids) == len(set(chunk_ids)) + chunk_ids = [c.id for d in req.docs for c in d.chunks] assert len(chunk_ids) == 80 @@ -35,13 +40,13 @@ def validate(req): def test_this_will_fail(mocker, restful): f = ( Flow(restful=restful) - .add(name='a11', uses='DummySegment') - .add(name='a12', uses='DummySegment', needs='gateway') - .add(name='r1', uses='_merge_chunks', needs=['a11', 'a12']) - .add(name='a21', uses='DummySegment', needs='gateway') - .add(name='a22', uses='DummySegment', needs='gateway') - .add(name='r2', uses='_merge_chunks', needs=['a21', 'a22']) - .add(uses='_merge_chunks', needs=['r1', 'r2']) + .add(name='a11', uses='DummySegment') + .add(name='a12', uses='DummySegment', needs='gateway') + .add(name='r1', needs=['a11', 'a12']) + .add(name='a21', uses='DummySegment', needs='gateway') + .add(name='a22', uses='DummySegment', needs='gateway') + .add(name='r2', needs=['a21', 'a22']) + .add(needs=['r1', 'r2']) ) response_mock = mocker.Mock() @@ -58,15 +63,15 @@ def test_this_will_fail(mocker, restful): def test_this_should_work(mocker, restful): f = ( Flow(restful=restful) - .add(name='a1') - .add(name='a11', uses='DummySegment', needs='a1') - .add(name='a12', uses='DummySegment', needs='a1') - .add(name='r1', uses='_merge_chunks', needs=['a11', 'a12']) - .add(name='a2', needs='gateway') - .add(name='a21', uses='DummySegment', needs='a2') - .add(name='a22', uses='DummySegment', needs='a2') - .add(name='r2', uses='_merge_chunks', needs=['a21', 'a22']) - .add(uses='_merge_chunks', needs=['r1', 'r2']) + .add(name='a1') + .add(name='a11', uses='DummySegment', needs='a1') + .add(name='a12', uses='DummySegment', needs='a1') + .add(name='r1', uses=Merger, needs=['a11', 'a12']) + .add(name='a2', needs='gateway') + .add(name='a21', uses='DummySegment', needs='a2') + .add(name='a22', uses='DummySegment', needs='a2') + .add(name='r2', uses=Merger, needs=['a21', 'a22']) + .add(uses=Merger, needs=['r1', 'r2']) ) response_mock = mocker.Mock() diff --git a/tests/unit/flow/test_flow_multimode.py b/tests/unit/flow/test_flow_multimode.py deleted file mode 100644 index ff3154f299555..0000000000000 --- a/tests/unit/flow/test_flow_multimode.py +++ /dev/null @@ -1,114 +0,0 @@ -import os -from typing import List, Dict - -import pytest -import numpy as np - -from jina.executors.segmenters import BaseSegmenter -from jina.executors.encoders import BaseEncoder -from jina.executors.indexers.keyvalue import BinaryPbIndexer -from jina.executors.decorators import single -from jina.flow import Flow -from jina.proto import jina_pb2 - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -class MockSegmenter(BaseSegmenter): - @single - def segment(self, text: str, *args, **kwargs) -> List[Dict]: - split = text.split(',') - chunks = [ - dict(text=split[0], offset=0, weight=1.0, modality='mode1'), - dict(text=split[1], offset=1, weight=1.0, modality='mode2'), - ] - return chunks - - -class MockEncoder(BaseEncoder): - def encode(self, content: 'np.ndarray', *args, **kwargs) -> 'np.ndarray': - output = [] - for r in content: - if "mode1" in r: - output.append([0.0, 0.0, 0.0]) - elif "mode2" in r: - output.append([1.0, 1.0, 1.0]) - - return np.array(output) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_flow_with_modalities(tmpdir, restful): - os.environ['JINA_TEST_FLOW_MULTIMODE_WORKSPACE'] = str(tmpdir) - - def input_function(): - doc1 = jina_pb2.DocumentProto() - doc1.text = 'title: this is mode1 from doc1, body: this is mode2 from doc1' - doc1.id = '1' - - doc2 = jina_pb2.DocumentProto() - doc2.text = 'title: this is mode1 from doc2, body: this is mode2 from doc2' - doc2.id = '2' - - doc3 = jina_pb2.DocumentProto() - doc3.text = 'title: this is mode1 from doc3, body: this is mode2 from doc3' - doc3.id = '3' - - return [doc1, doc2, doc3] - - flow = ( - Flow(restful=restful) - .add(name='segmenter', uses='!MockSegmenter') - .add(name='encoder1', uses=os.path.join(cur_dir, 'yaml/mockencoder-mode1.yml')) - .add( - name='indexer1', - uses=os.path.join(cur_dir, 'yaml/numpy-indexer-1.yml'), - needs=['encoder1'], - ) - .add( - name='encoder2', - uses=os.path.join(cur_dir, 'yaml/mockencoder-mode2.yml'), - needs=['segmenter'], - ) - .add(name='indexer2', uses=os.path.join(cur_dir, 'yaml/numpy-indexer-2.yml')) - .join(['indexer1', 'indexer2']) - ) - - with flow: - flow.index(inputs=input_function) - - with open(os.path.join(tmpdir, 'compound', 'vecidx1-0', 'vec1.gz'), 'rb') as fp: - result = np.frombuffer(fp.read(), dtype='float').reshape([-1, 3]) - np.testing.assert_equal( - result, np.array([[0.0, 0.0, 0.0], [0.0, 0.0, 0.0], [0.0, 0.0, 0.0]]) - ) - - with open(os.path.join(tmpdir, 'compound', 'vecidx2-0', 'vec2.gz'), 'rb') as fp: - result = np.frombuffer(fp.read(), dtype='float').reshape([-1, 3]) - np.testing.assert_equal( - result, np.array([[1.0, 1.0, 1.0], [1.0, 1.0, 1.0], [1.0, 1.0, 1.0]]) - ) - - chunkIndexer1 = BinaryPbIndexer.load( - os.path.join(tmpdir, 'compound', 'kvidx1-0', 'kvidx1.bin') - ) - assert chunkIndexer1.size == 3 - d_id = list(chunkIndexer1.query_handler.header.keys())[0] - - query_doc = jina_pb2.DocumentProto() - query_doc.ParseFromString(chunkIndexer1.query([d_id])[0]) - assert query_doc.text == 'title: this is mode1 from doc1' - assert query_doc.modality == 'mode1' - - chunkIndexer2 = BinaryPbIndexer.load( - os.path.join(tmpdir, 'compound', 'kvidx2-0', 'kvidx2.bin') - ) - assert chunkIndexer2.size == 3 - d_id = list(chunkIndexer2.query_handler.header.keys())[0] - - query_doc = jina_pb2.DocumentProto() - query_doc.ParseFromString(chunkIndexer2.query([d_id])[0]) - assert query_doc.text == ' body: this is mode2 from doc1' - assert query_doc.modality == 'mode2' - - del os.environ['JINA_TEST_FLOW_MULTIMODE_WORKSPACE'] diff --git a/tests/unit/flow/test_flow_reload.py b/tests/unit/flow/test_flow_reload.py deleted file mode 100644 index 287868cc7d75c..0000000000000 --- a/tests/unit/flow/test_flow_reload.py +++ /dev/null @@ -1,90 +0,0 @@ -import os - -import pytest -import requests - -from jina import Flow -from jina.executors import BaseExecutor - - -def test_flow_rest_reload(): - f = Flow().add() - f.use_rest_gateway() - with f: - r = requests.post( - f'http://0.0.0.0:{f.port_expose}/reload', json={'targets': ['pod0']} - ) - assert r.status_code == 200 - - -@pytest.mark.skipif( - 'GITHUB_WORKFLOW' in os.environ, - reason='skip the test on github as it will hang the whole CI, locally is fine', -) -def test_flow_simple_reload(mocker): - mock = mocker.Mock() - - class DummyExecutor(BaseExecutor): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - mock() - - f = Flow().add(name='mypod', uses=DummyExecutor, runtime_backend='thread') - - with f: - pass - - assert mock.call_count == 1 - - mock.reset_mock() - with f: - f.reload(targets='mypod') - - assert mock.call_count == 2 - - -@pytest.mark.skipif( - 'GITHUB_WORKFLOW' in os.environ, - reason='skip the test on github as it will hang the whole CI, locally is fine', -) -def test_flow_topology_multi_reload(mocker): - mock1 = mocker.Mock() - mock2 = mocker.Mock() - - class DummyExecutor1(BaseExecutor): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - mock1() - - class DummyExecutor2(BaseExecutor): - def __init__(self, *args, **kwargs): - super().__init__(*args, **kwargs) - mock2() - - f = ( - Flow() - .add(name='mypod', uses=DummyExecutor1, runtime_backend='thread') - .add(runtime_backend='thread') - .add(name='yourpod', uses=DummyExecutor2, runtime_backend='thread') - .needs_all(runtime_backend='thread') - ) - - with f: - pass - - assert mock1.call_count == 1 - assert mock2.call_count == 1 - - mock1.reset_mock() - mock2.reset_mock() - - with f: - f.reload(targets='mypod') - assert mock1.call_count == 2 - assert mock2.call_count == 1 - f.reload(targets='yourpod') - assert mock1.call_count == 2 - assert mock2.call_count == 2 - f.reload(targets='.*pod') - assert mock1.call_count == 3 - assert mock2.call_count == 3 diff --git a/tests/unit/flow/test_flow_rolling_reload.py b/tests/unit/flow/test_flow_rolling_reload.py deleted file mode 100644 index b97fa25fe6376..0000000000000 --- a/tests/unit/flow/test_flow_rolling_reload.py +++ /dev/null @@ -1,67 +0,0 @@ -import multiprocessing -import os -import time - -import numpy as np -import pytest - -from jina import Document, Flow - - -@pytest.fixture -def config(tmpdir): - os.environ['JINA_FLOW_DUMP_LOAD_INTERVAL_WORKSPACE'] = str(tmpdir) - yield - del os.environ['JINA_FLOW_DUMP_LOAD_INTERVAL_WORKSPACE'] - - -@pytest.fixture(scope='function') -def index_flow(): - return Flow().add(uses='_index', shards=3) - - -@pytest.fixture(scope='function') -def search_flow(): - return Flow().add(uses='_index', shards=3, polling='all') - - -def test_rolling_reload(config, index_flow, search_flow): - """Run index and search in parallel, we should observe number of documents while searching - keep increasing. - We expect while indexing and quering, we should get a new `num_matches` for each run. - """ - num_matches = set() - - def input_fn(): - for idx in range(10): - time.sleep(1) - yield Document(embedding=np.array([1, 2, 3]), tags={'idx': idx}) - - def print_req(req, j): - print(f'{j}-time got {len(req.docs[0].matches)} results') - num_matches.add(len(req.docs[0].matches)) - - def index_flow_with_shards(): - with index_flow as f: - f.index(input_fn, request_size=1) - - def search_flow_rolling_reload(): - with search_flow as f: - for j in range(10): - f.reload(targets=f'pod0/{j % 3}') - f.search( - Document(embedding=np.array([1, 2, 3])), - request_size=1, - on_done=lambda x: print_req(x, j), - top_k=999, - ) - time.sleep(1) - - # run dump interval flow - t = multiprocessing.Process(target=index_flow_with_shards) - t.start() - time.sleep(1) - # run load interval flow - search_flow_rolling_reload() - # verify num_matches has different values since we're querying while indexing - assert len(num_matches) > 1 diff --git a/tests/unit/flow/test_flow_segmenter.py b/tests/unit/flow/test_flow_segmenter.py index 2896ced04d08c..054b6961cd3b6 100644 --- a/tests/unit/flow/test_flow_segmenter.py +++ b/tests/unit/flow/test_flow_segmenter.py @@ -2,26 +2,25 @@ import pytest -from jina.executors.segmenters import BaseSegmenter -from jina.executors.decorators import single -from jina.flow import Flow +from jina import Document, requests, Executor, Flow from tests import random_docs, validate_callback cur_dir = os.path.dirname(os.path.abspath(__file__)) -class DummySegment(BaseSegmenter): +class DummySegment(Executor): """dummySegment represents a basic segment of two values""" - @single - def segment(self, id, *args, **kwargs): + @requests + def segment(self, docs, *args, **kwargs): """create a dummy segment of two values.""" - return [dict(buffer=b'aa'), dict(buffer=b'bb')] + for d in docs: + d.chunks = [Document(buffer=b'aa'), Document(buffer=b'bb')] def validate(req): """simple check for validating tests.""" - chunk_ids = [c.id for d in req.index.docs for c in d.chunks] + chunk_ids = [c.id for d in req.docs for c in d.chunks] assert len(chunk_ids) == len(set(chunk_ids)) assert len(chunk_ids) == 20 diff --git a/tests/unit/flow/test_flow_skip.py b/tests/unit/flow/test_flow_skip.py index c414299e339cc..4b7f3981b0fa5 100644 --- a/tests/unit/flow/test_flow_skip.py +++ b/tests/unit/flow/test_flow_skip.py @@ -1,17 +1,14 @@ import pytest +from jina import Flow, Executor, requests, Document from jina.enums import OnErrorStrategy -from jina.executors.decorators import single -from jina.executors.crafters import BaseCrafter -from jina.flow import Flow from jina.proto import jina_pb2 from tests import validate_callback -class DummyCrafterSkip(BaseCrafter): - @single - def craft(self, text, *args, **kwargs): - self.logger.warning('crafting division by zero') +class DummyCrafterSkip(Executor): + @requests + def craft(self, *args, **kwargs): return 1 / 0 @@ -40,7 +37,7 @@ def validate(req): # always test two times, make sure the flow still works after it fails on the first with f: - f.index(['abbcs', 'efgh'], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) validate_callback(on_error_mock, validate) @@ -70,62 +67,7 @@ def validate(req): f = ( Flow(restful=restful, on_error_strategy=OnErrorStrategy.SKIP_HANDLE) - .add(name='r1', uses='!DummyCrafterSkip') - .add(name='r2') - .add(name='r3', needs='r1') - .needs(['r3', 'r2']) - ) - - on_error_mock = mocker.Mock() - - # always test two times, make sure the flow still works after it fails on the first - with f: - f.index(['abbcs', 'efgh'], on_error=on_error_mock) - - validate_callback(on_error_mock, validate) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_bad_flow_skip_exec(mocker, restful): - def validate(req): - bad_routes = [ - r for r in req.routes if r.status.code >= jina_pb2.StatusProto.ERROR - ] - assert len(bad_routes) == 1 - assert req.status.code == jina_pb2.StatusProto.ERROR - assert bad_routes[0].pod == 'r1/ZEDRuntime' - - f = ( - Flow(restful=restful, on_error_strategy=OnErrorStrategy.SKIP_EXECUTOR) - .add(name='r1', uses='!DummyCrafterSkip') - .add(name='r2') - .add(name='r3') - ) - - on_error_mock = mocker.Mock() - - # always test two times, make sure the flow still works after it fails on the first - with f: - f.index(['abbcs', 'efgh'], on_error=on_error_mock) - - validate_callback(on_error_mock, validate) - - -@pytest.mark.parametrize('restful', [False, True]) -def test_bad_flow_skip_exec_join(mocker, restful): - """Make sure the exception wont affect the gather/reduce ops""" - - def validate(req): - bad_routes = [ - r for r in req.routes if r.status.code >= jina_pb2.StatusProto.ERROR - ] - assert len(bad_routes) == 1 - assert req.status.code == jina_pb2.StatusProto.ERROR - assert bad_routes[0].pod == 'r1/ZEDRuntime' - - f = ( - Flow(restful=restful, on_error_strategy=OnErrorStrategy.SKIP_EXECUTOR) - .add(name='r1', uses='!DummyCrafterSkip') + .add(name='r1', uses=DummyCrafterSkip) .add(name='r2') .add(name='r3', needs='r1') .needs(['r3', 'r2']) @@ -135,6 +77,6 @@ def validate(req): # always test two times, make sure the flow still works after it fails on the first with f: - f.index(['abbcs', 'efgh'], on_error=on_error_mock) + f.index([Document(text='abbcs'), Document(text='efgh')], on_error=on_error_mock) validate_callback(on_error_mock, validate) diff --git a/tests/unit/flow/test_flow_visualization.py b/tests/unit/flow/test_flow_visualization.py index f7ac08fb64fdf..bbfa32cd70de2 100644 --- a/tests/unit/flow/test_flow_visualization.py +++ b/tests/unit/flow/test_flow_visualization.py @@ -1,6 +1,7 @@ import os from jina.flow import Flow +from jina import Executor cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -53,17 +54,18 @@ def test_visualization_plot_in_middle(tmpdir): def test_flow_before_after_plot(tmpdir): - Flow().add(uses_before='_pass', uses_after='_pass', name='p1').plot( + + Flow().add(uses_before=Executor, uses_after=Executor, name='p1').plot( os.path.join(tmpdir, 'flow.svg') ) assert os.path.exists(os.path.join(tmpdir, 'flow.svg')) def test_flow_before_plot(tmpdir): - Flow().add(uses_before='_pass', name='p1').plot(os.path.join(tmpdir, 'flow.svg')) + Flow().add(uses_before=Executor, name='p1').plot(os.path.join(tmpdir, 'flow.svg')) assert os.path.exists(os.path.join(tmpdir, 'flow.svg')) def test_flow_after_plot(tmpdir): - Flow().add(uses_after='_pass', name='p1').plot(os.path.join(tmpdir, 'flow.svg')) + Flow().add(uses_after=Executor, name='p1').plot(os.path.join(tmpdir, 'flow.svg')) assert os.path.exists(os.path.join(tmpdir, 'flow.svg')) diff --git a/tests/unit/flow/test_flow_yaml_parser.py b/tests/unit/flow/test_flow_yaml_parser.py index be7b622519d33..c49a94a863b4b 100644 --- a/tests/unit/flow/test_flow_yaml_parser.py +++ b/tests/unit/flow/test_flow_yaml_parser.py @@ -1,12 +1,11 @@ import os - from pathlib import Path + import numpy as np import pytest -from jina import Flow, AsyncFlow +from jina import Flow, AsyncFlow, Executor, Document from jina.excepts import BadFlowYAMLVersion -from jina.executors.encoders import BaseEncoder from jina.flow import BaseFlow from jina.jaml import JAML from jina.jaml.parsers import get_supported_versions @@ -49,12 +48,12 @@ def test_add_needs_inspect(tmpdir): .needs(['pod0', 'pod1']) ) with f1: - f1.index_ndarray(np.random.random([5, 5]), on_done=print) + f1.index(Document.from_ndarray(np.random.random([5, 5])), on_done=print) f2 = Flow.load_config('yaml/flow-v1.0-syntax.yml') with f2: - f2.index_ndarray(np.random.random([5, 5]), on_done=print) + f2.index(Document.from_ndarray(np.random.random([5, 5])), on_done=print) assert f1 == f2 @@ -109,7 +108,7 @@ def test_flow_yaml_from_string(): def test_flow_uses_from_dict(): - class DummyEncoder(BaseEncoder): + class DummyEncoder(Executor): pass d1 = {'jtype': 'DummyEncoder', 'metas': {'name': 'dummy1'}} diff --git a/tests/unit/flow/yaml/dummy-seg-not-random.yml b/tests/unit/flow/yaml/dummy-seg-not-random.yml index fcc3bdc275725..873458d2c068f 100644 --- a/tests/unit/flow/yaml/dummy-seg-not-random.yml +++ b/tests/unit/flow/yaml/dummy-seg-not-random.yml @@ -1,3 +1 @@ -jtype: DummySegment -requests: - use_default: true +jtype: DummySegment \ No newline at end of file diff --git a/tests/unit/flow/yaml/flow-allinone-oldstyle.yml b/tests/unit/flow/yaml/flow-allinone-oldstyle.yml index 6266a45086299..8dc97e667875f 100644 --- a/tests/unit/flow/yaml/flow-allinone-oldstyle.yml +++ b/tests/unit/flow/yaml/flow-allinone-oldstyle.yml @@ -2,21 +2,9 @@ jtype: Flow version: '1' pods: - uses: - !CustomizedEncoder - requests: - on: - IndexRequest: - - !FilterQL - with: - lookups: { modality__in: [ mode2 ] } - traversal_paths: [ 'c' ] - - !EncodeDriver - with: - method: encode - traversal_paths: [ 'c' ] - - uses: _index + !CustomizedEncoder {} - uses: - !BaseEncoder + !BaseExecutor metas: name: test_indexer workspace: ./indexed \ No newline at end of file diff --git a/tests/unit/flow/yaml/flow-allinone.yml b/tests/unit/flow/yaml/flow-allinone.yml index dd9be5888ebcf..20453e03f3457 100644 --- a/tests/unit/flow/yaml/flow-allinone.yml +++ b/tests/unit/flow/yaml/flow-allinone.yml @@ -3,20 +3,8 @@ version: '1' pods: - uses: jtype: CustomizedEncoder - requests: - on: - IndexRequest: - - jtype: FilterQL - with: - lookups: { modality__in: [ mode2 ] } - traversal_paths: [ 'c' ] - - jtype: EncodeDriver - with: - method: encode - traversal_paths: [ 'c' ] - - uses: _index - uses: - jtype: BaseEncoder + jtype: BaseExecutor metas: name: test_indexer workspace: ./indexed \ No newline at end of file diff --git a/tests/unit/flow/yaml/flow-legacy-syntax.yml b/tests/unit/flow/yaml/flow-legacy-syntax.yml index be08ce8f892f2..aef1e6182e393 100644 --- a/tests/unit/flow/yaml/flow-legacy-syntax.yml +++ b/tests/unit/flow/yaml/flow-legacy-syntax.yml @@ -2,10 +2,8 @@ version: '1' pods: - name: pod0 - uses: _pass needs: gateway - name: pod1 - uses: _pass needs: gateway - name: pod2 uses: _merge diff --git a/tests/unit/flow/yaml/flow-v1-syntax.yml b/tests/unit/flow/yaml/flow-v1-syntax.yml index 83c7e4f123ce3..135a9a795a6b3 100644 --- a/tests/unit/flow/yaml/flow-v1-syntax.yml +++ b/tests/unit/flow/yaml/flow-v1-syntax.yml @@ -3,11 +3,9 @@ version: '1' pods: - name: pod0 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - name: pod1 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - method: needs # let's try something new in Flow YAML v1: needs needs: [pod1, pod0] \ No newline at end of file diff --git a/tests/unit/flow/yaml/flow-v1.0-syntax.yml b/tests/unit/flow/yaml/flow-v1.0-syntax.yml index 59929151eeef0..1573e3ae80547 100644 --- a/tests/unit/flow/yaml/flow-v1.0-syntax.yml +++ b/tests/unit/flow/yaml/flow-v1.0-syntax.yml @@ -3,11 +3,9 @@ version: '1.0' pods: - name: pod0 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - name: pod1 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - method: inspect # add an inspect node on pod1 - method: needs # let's try something new in Flow YAML v1: needs diff --git a/tests/unit/flow/yaml/flow-v99-syntax.yml b/tests/unit/flow/yaml/flow-v99-syntax.yml index 42d7bc199b073..5364a510ce9a9 100644 --- a/tests/unit/flow/yaml/flow-v99-syntax.yml +++ b/tests/unit/flow/yaml/flow-v99-syntax.yml @@ -3,11 +3,9 @@ version: '99.99' pods: - name: pod0 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - name: pod1 # notice the change here, name is now an attribute method: add # by default method is always add, available: add, needs, inspect - uses: _pass needs: gateway - method: needs # let's try something new in Flow YAML v1: needs needs: [pod1, pod0] \ No newline at end of file diff --git a/tests/unit/flow/yaml/mockencoder-mode1.yml b/tests/unit/flow/yaml/mockencoder-mode1.yml deleted file mode 100644 index 39a1bfdec42f9..0000000000000 --- a/tests/unit/flow/yaml/mockencoder-mode1.yml +++ /dev/null @@ -1,12 +0,0 @@ -!MockEncoder -requests: - on: - IndexRequest: - - !FilterQL - with: - lookups: {modality__in: [mode1]} - traversal_paths: ['c'] - - !EncodeDriver - with: - method: encode - traversal_paths: ['c'] diff --git a/tests/unit/flow/yaml/mockencoder-mode2.yml b/tests/unit/flow/yaml/mockencoder-mode2.yml deleted file mode 100644 index 328a80b395c48..0000000000000 --- a/tests/unit/flow/yaml/mockencoder-mode2.yml +++ /dev/null @@ -1,12 +0,0 @@ -!MockEncoder -requests: - on: - IndexRequest: - - !FilterQL - with: - lookups: {modality__in: [mode2]} - traversal_paths: ['c'] - - !EncodeDriver - with: - method: encode - traversal_paths: ['c'] diff --git a/tests/unit/flow/yaml/numpy-indexer-1.yml b/tests/unit/flow/yaml/numpy-indexer-1.yml deleted file mode 100644 index 07a44ce9c898a..0000000000000 --- a/tests/unit/flow/yaml/numpy-indexer-1.yml +++ /dev/null @@ -1,33 +0,0 @@ -!CompoundIndexer -metas: - name: compound - workspace: $JINA_TEST_FLOW_MULTIMODE_WORKSPACE -components: - - !NumpyIndexer - with: - metric: euclidean - index_filename: vec1.gz - metas: - name: vecidx1 # a customized name - - !BinaryPbIndexer - with: - index_filename: chunk1.gz - metas: - name: kvidx1 # a customized name -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - traversal_paths: ['c'] - executor: vecidx1 - - !ExcludeQL - with: - fields: - - embedding - - !KVIndexDriver - with: - traversal_paths: ['c'] - executor: kvidx1 - ControlRequest: - - !ControlReqDriver {} diff --git a/tests/unit/flow/yaml/numpy-indexer-2.yml b/tests/unit/flow/yaml/numpy-indexer-2.yml deleted file mode 100644 index 2a2bf0f164399..0000000000000 --- a/tests/unit/flow/yaml/numpy-indexer-2.yml +++ /dev/null @@ -1,33 +0,0 @@ -!CompoundIndexer -metas: - name: compound - workspace: $JINA_TEST_FLOW_MULTIMODE_WORKSPACE -components: - - !NumpyIndexer - with: - metric: euclidean - index_filename: vec2.gz - metas: - name: vecidx2 # a customized name - - !BinaryPbIndexer - with: - index_filename: chunk2.gz - metas: - name: kvidx2 # a customized name -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - traversal_paths: ['c'] - executor: vecidx2 - - !ExcludeQL - with: - fields: - - embedding - - !KVIndexDriver - with: - traversal_paths: ['c'] - executor: kvidx2 - ControlRequest: - - !ControlReqDriver {} diff --git a/tests/unit/jaml/test_type_parse.py b/tests/unit/jaml/test_type_parse.py index 1e8917c360550..97869206d3ad0 100644 --- a/tests/unit/jaml/test_type_parse.py +++ b/tests/unit/jaml/test_type_parse.py @@ -3,6 +3,7 @@ from jina.enums import SocketType from jina.executors import BaseExecutor from jina.jaml import JAML +from jina import __default_executor__ class MyExecutor(BaseExecutor): @@ -11,7 +12,7 @@ class MyExecutor(BaseExecutor): def test_non_empty_reg_tags(): assert JAML.registered_tags() - assert 'BaseExecutor' in JAML.registered_tags() + assert __default_executor__ in JAML.registered_tags() @pytest.mark.parametrize( @@ -72,123 +73,6 @@ def test_include_unknown(include_unk, expected): jtype: unknown-blah ''', ), - ( - ''' -!CompoundIndexer -components: - - !NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - !BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: indexer - workspace: $JINA_WORKSPACE - - ''', - ''' -jtype: CompoundIndexer -components: - - jtype: NumpyIndexer - with: - index_filename: vec.gz - metric: euclidean - metas: - name: vecidx - - jtype: BinaryPbIndexer - with: - index_filename: doc.gz - metas: - name: docidx -metas: - name: indexer - workspace: $JINA_WORKSPACE - ''', - ), - ( - ''' -!CompoundIndexer -metas: - workspace: $TMP_WORKSPACE -components: - - !NumpyIndexer - with: - metric: euclidean - index_filename: vec.gz - metas: - name: vecidx # a customized name - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: kvidx # a customized name -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY - - !KVIndexDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY - SearchRequest: - - !VectorSearchDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY - - !KVSearchDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY - - ''', - ''' -jtype: CompoundIndexer -metas: - workspace: $TMP_WORKSPACE -components: - - jtype: NumpyIndexer - with: - metric: euclidean - index_filename: vec.gz - metas: - name: vecidx # a customized name - - jtype: BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: kvidx # a customized name -requests: - on: - IndexRequest: - - jtype: VectorIndexDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY - - jtype: KVIndexDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY - SearchRequest: - - jtype: VectorSearchDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY - - jtype: KVSearchDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY - - ''', - ), ], ) def test_escape(original, escaped): diff --git a/tests/unit/mwu-encoder/mwu_driver.py b/tests/unit/mwu-encoder/mwu_driver.py deleted file mode 100644 index a696a641e6ba7..0000000000000 --- a/tests/unit/mwu-encoder/mwu_driver.py +++ /dev/null @@ -1,7 +0,0 @@ -from jina.drivers.control import ControlReqDriver - - -class MyAwesomeDriver(ControlReqDriver): - def __call__(self, *args, **kwargs): - print('hello from customized drivers') - super().__call__(*args, **kwargs) diff --git a/tests/unit/mwu-encoder/mwu_encoder.py b/tests/unit/mwu-encoder/mwu_encoder.py index 896dff1786e7e..40cc5929d4132 100644 --- a/tests/unit/mwu-encoder/mwu_encoder.py +++ b/tests/unit/mwu-encoder/mwu_encoder.py @@ -1,25 +1,33 @@ -from typing import Any +import os -import numpy as np +from typing import Any -from jina.executors.encoders import BaseEncoder +from jina import Executor, requests -class MWUEncoder(BaseEncoder): +class MWUEncoder(Executor): def __init__(self, greetings: str, *args, **kwargs): super().__init__(*args, **kwargs) self._greetings = greetings - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.logger.info(f'{self._greetings} {content}') - return np.random.random([content.shape[0], 3]) + @requests + def encode(self, **kwargs) -> Any: + pass -class MWUUpdater(BaseEncoder): +class MWUUpdater(Executor): def __init__(self, greetings: str, *args, **kwargs): super().__init__(*args, **kwargs) self._greetings = greetings - def encode(self, content: 'np.ndarray', *args, **kwargs) -> Any: - self.is_updated = True - return np.random.random([content.shape[0], 3]) + @requests + def encode(self, **kwargs) -> Any: + pass + + def close(self) -> None: + import pickle + + os.makedirs(self.workspace, exist_ok=True) + bin_path = os.path.join(self.workspace, f'{self.metas.name}.bin') + with open(bin_path, 'wb') as f: + pickle.dump(self._greetings, f) diff --git a/tests/unit/mwu-encoder/mwu_encoder_driver.yml b/tests/unit/mwu-encoder/mwu_encoder_driver.yml deleted file mode 100644 index 64f08ae6225d1..0000000000000 --- a/tests/unit/mwu-encoder/mwu_encoder_driver.yml +++ /dev/null @@ -1,13 +0,0 @@ -!MWUEncoder -with: - greetings: hello there! -metas: - name: my-mwu-encoder - py_modules: - - mwu_encoder.py - - mwu_driver.py - workspace: ./ -requests: - on: - ControlRequest: - - !MyAwesomeDriver {} \ No newline at end of file diff --git a/tests/unit/mwu-encoder/mwu_encoder_upd.yml b/tests/unit/mwu-encoder/mwu_encoder_upd.yml index f0396dc56eba8..f7816f1b96afe 100644 --- a/tests/unit/mwu-encoder/mwu_encoder_upd.yml +++ b/tests/unit/mwu-encoder/mwu_encoder_upd.yml @@ -3,5 +3,4 @@ with: greetings: hello im from external yaml! metas: name: ext-mwu-encoder - py_modules: mwu_encoder.py - workspace: /abc \ No newline at end of file + py_modules: mwu_encoder.py \ No newline at end of file diff --git a/tests/unit/optimizers/__init__.py b/tests/unit/optimizers/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/optimizers/flow.yml b/tests/unit/optimizers/flow.yml deleted file mode 100644 index 7be3621588536..0000000000000 --- a/tests/unit/optimizers/flow.yml +++ /dev/null @@ -1,6 +0,0 @@ -!Flow -version: '1' -env: - JINA_TEST_FLOW_RUNNER_WORKSPACE_VAR: ${{JINA_TEST_FLOW_RUNNER_WORKSPACE}} -pods: - - uses: index.yml diff --git a/tests/unit/optimizers/index.yml b/tests/unit/optimizers/index.yml deleted file mode 100644 index 604a2c3508a02..0000000000000 --- a/tests/unit/optimizers/index.yml +++ /dev/null @@ -1,12 +0,0 @@ -!NumpyIndexer -with: - index_filename: tmp2 -metas: - name: vecidx - workspace: ${{JINA_TEST_FLOW_RUNNER_WORKSPACE_VAR}} -requests: - on: - SearchRequest: - - !VectorSearchDriver {} - IndexRequest: - - !VectorIndexDriver {} diff --git a/tests/unit/optimizers/parameters.yml b/tests/unit/optimizers/parameters.yml deleted file mode 100644 index 59900982f0158..0000000000000 --- a/tests/unit/optimizers/parameters.yml +++ /dev/null @@ -1,31 +0,0 @@ -- !IntegerParameter - jaml_variable: JINA_EXECUTOR_PARAM1 - high: 1 - low: 0 - step_size: 1 - parameter_name: param1 -- !UniformParameter - jaml_variable: JINA_EXECUTOR_PARAM3 - high: 2 - low: 1 - parameter_name: param3 -- !UniformParameter - jaml_variable: JINA_EXECUTOR_PARAM4 - high: 2 - low: 1 - parameter_name: param4 -- !LogUniformParameter - jaml_variable: JINA_EXECUTOR_PARAM5 - high: 2 - low: 1 - parameter_name: param5 -- !CategoricalParameter - jaml_variable: JINA_EXECUTOR_PARAM6 - choices: ['a'] - parameter_name: param6 -- !DiscreteUniformParameter - jaml_variable: JINA_EXECUTOR_PARAM7 - high: 2 - low: 1 - q: 0.1 - parameter_name: param7 diff --git a/tests/unit/optimizers/test_flow_runner.py b/tests/unit/optimizers/test_flow_runner.py deleted file mode 100644 index 417ffb552088a..0000000000000 --- a/tests/unit/optimizers/test_flow_runner.py +++ /dev/null @@ -1,49 +0,0 @@ -import os - -from jina.optimizers.flow_runner import SingleFlowRunner -from tests import random_docs, validate_callback - - -def test_flow_runner(tmpdir, mocker): - def callback(resp): - if len(resp.search.docs): - assert True - else: - assert False - - workspace = os.path.join(tmpdir, 'test_flow_runner') - - flow_runner = SingleFlowRunner( - flow_yaml='flow.yml', - documents=random_docs(5), - request_size=1, - execution_method='index', - overwrite_workspace=True, - ) - - flow_runner.run( - workspace=workspace, - trial_parameters={'JINA_TEST_FLOW_RUNNER_WORKSPACE': workspace}, - ) - # Test overwriting - flow_runner.run( - workspace=workspace, - trial_parameters={'JINA_TEST_FLOW_RUNNER_WORKSPACE': workspace}, - ) - - flow_runner = SingleFlowRunner( - flow_yaml='flow.yml', - documents=random_docs(5), - request_size=1, - execution_method='search', - ) - - mock = mocker.Mock() - flow_runner.run( - workspace=workspace, - trial_parameters={'JINA_TEST_FLOW_RUNNER_WORKSPACE': workspace}, - callback=mock, - ) - - validate_callback(mock, callback) - assert os.path.exists(os.path.join(workspace, 'vecidx-0', 'tmp2')) diff --git a/tests/unit/optimizers/test_optimizers.py b/tests/unit/optimizers/test_optimizers.py deleted file mode 100644 index 03ec30af18b2f..0000000000000 --- a/tests/unit/optimizers/test_optimizers.py +++ /dev/null @@ -1,108 +0,0 @@ -import os -import optuna -import pytest -from unittest.mock import Mock -import yaml - -from jina.optimizers import MeanEvaluationCallback, ResultProcessor, FlowOptimizer -from jina.optimizers.parameters import ( - IntegerParameter, - UniformParameter, - LogUniformParameter, - CategoricalParameter, - DiscreteUniformParameter, -) - - -@pytest.fixture -def responses(): - responses = Mock() - - doc1 = Mock() - eval1 = Mock() - eval1.op_name = 'metric1' - eval1.value = 1 - eval2 = Mock() - eval2.op_name = 'metric2' - eval2.value = 0.5 - doc1.evaluations = [eval1, eval2] - - doc2 = Mock() - eval3 = Mock() - eval3.op_name = 'metric1' - eval3.value = 0.5 - eval4 = Mock() - eval4.op_name = 'metric2' - eval4.value = 0.5 - doc2.evaluations = [eval3, eval4] - - responses.search.docs = [doc1, doc2] - return responses - - -def test_evaluation_callback_no_name(responses): - # test with no metric name given to callback - cb = MeanEvaluationCallback() - cb(responses) - cb(responses) - - evaluation = cb.get_final_evaluation() - assert evaluation == 0.75 - - -def test_evaluation_callback_with_name(responses): - # test with metric name given to callback - evaluation_metric = 'metric2' - cb = MeanEvaluationCallback(evaluation_metric) - cb(responses) - cb(responses) - - evaluation = cb.get_final_evaluation() - assert evaluation == 0.5 - - -def test_result_processor(tmpdir): - study = Mock() - study.trials = [1, 2] - study.best_trial.params = {'a': 1} - study.best_trial.duration = 3 - - filepath = os.path.join(tmpdir, 'best_config.yml') - proc = ResultProcessor(study) - proc.save_parameters(filepath) - assert yaml.load(open(filepath), Loader=yaml.Loader) == {'a': 1} - - -def test_suggest(tmpdir): - def _objective(trial): - - value = FlowOptimizer._suggest( - IntegerParameter(0, 3, 1, jaml_variable='IntegerParameter'), trial - ) - assert 0 <= value - assert value <= 3 - value = FlowOptimizer._suggest( - UniformParameter(0, 3, jaml_variable='UniformParameter'), trial - ) - assert 0 <= value - assert value <= 3 - value = FlowOptimizer._suggest( - LogUniformParameter(1, 3, jaml_variable='LogUniformParameter'), trial - ) - assert 1 <= value - assert value <= 3 - value = FlowOptimizer._suggest( - CategoricalParameter([0, 1.5, 2, 3], jaml_variable='CategoricalParameter'), - trial, - ) - assert 0 <= value - assert value <= 3 - value = FlowOptimizer._suggest( - DiscreteUniformParameter(0, 3, 1, jaml_variable='DiscreteUniformParameter'), - trial, - ) - assert 0 <= value - assert value <= 3 - - study = optuna.create_study() - study.optimize(_objective, n_trials=1) diff --git a/tests/unit/optimizers/test_parameters.py b/tests/unit/optimizers/test_parameters.py deleted file mode 100644 index 98bcb94164852..0000000000000 --- a/tests/unit/optimizers/test_parameters.py +++ /dev/null @@ -1,26 +0,0 @@ -import os - -from jina.optimizers.parameters import ( - IntegerParameter, - LogUniformParameter, - UniformParameter, - CategoricalParameter, - DiscreteUniformParameter, - load_optimization_parameters, -) - - -def test_parameter_file_loading(): - params_loaded = load_optimization_parameters( - os.path.join(os.path.dirname(__file__), 'parameters.yml') - ) - expected = [ - IntegerParameter, - UniformParameter, - UniformParameter, - LogUniformParameter, - CategoricalParameter, - DiscreteUniformParameter, - ] - for param, param_type in zip(params_loaded, expected): - assert type(param) == param_type diff --git a/tests/unit/peapods/pods/test_compound_pods.py b/tests/unit/peapods/pods/test_compound_pods.py index 147f59b9f145e..5e5540eb90a3c 100644 --- a/tests/unit/peapods/pods/test_compound_pods.py +++ b/tests/unit/peapods/pods/test_compound_pods.py @@ -2,9 +2,9 @@ from jina.enums import SchedulerType, SocketType, PollingType from jina.parsers import set_pod_parser +from jina import __default_executor__ from jina.peapods import CompoundPod, Pod - @pytest.fixture(scope='function') def pod_args(): args = [ @@ -26,7 +26,7 @@ def pod_args_singleton(): '--name', 'test2', '--uses-before', - '_pass', + __default_executor__, '--parallel', '1', '--replicas', diff --git a/tests/unit/peapods/pods/test_pod_factory.py b/tests/unit/peapods/pods/test_pod_factory.py index 7e902a993d2ca..49bfca1915ee2 100644 --- a/tests/unit/peapods/pods/test_pod_factory.py +++ b/tests/unit/peapods/pods/test_pod_factory.py @@ -1,7 +1,7 @@ -from jina.peapods.pods.factory import PodFactory -from jina.peapods.pods import Pod -from jina.peapods.pods.compoundpod import CompoundPod from jina.parsers import set_pod_parser +from jina.peapods.pods import Pod +from jina.peapods.pods.compound import CompoundPod +from jina.peapods.pods.factory import PodFactory def test_pod_factory_pod(): diff --git a/tests/unit/peapods/pods/test_pods.py b/tests/unit/peapods/pods/test_pods.py index 30adbc004cf28..ad9669eadfdd4 100644 --- a/tests/unit/peapods/pods/test_pods.py +++ b/tests/unit/peapods/pods/test_pods.py @@ -6,6 +6,7 @@ from jina.parsers import set_gateway_parser from jina.parsers import set_pod_parser from jina.peapods import Pod +from jina import __default_executor__ @pytest.fixture(scope='function') @@ -27,7 +28,7 @@ def pod_args_singleton(): '--name', 'test2', '--uses-before', - '_pass', + __default_executor__, '--parallel', '1', '--host', @@ -183,13 +184,20 @@ def test_pod_args_remove_uses_ba(): assert p.num_peas == 1 args = set_pod_parser().parse_args( - ['--uses-before', '_pass', '--uses-after', '_pass'] + ['--uses-before', __default_executor__, '--uses-after', __default_executor__] ) with Pod(args) as p: assert p.num_peas == 1 args = set_pod_parser().parse_args( - ['--uses-before', '_pass', '--uses-after', '_pass', '--parallel', '2'] + [ + '--uses-before', + __default_executor__, + '--uses-after', + __default_executor__, + '--parallel', + '2', + ] ) with Pod(args) as p: assert p.num_peas == 4 diff --git a/tests/unit/peapods/runtimes/asyncio/rest/test_models.py b/tests/unit/peapods/runtimes/asyncio/rest/test_models.py index 240580c359b53..c42e237939995 100644 --- a/tests/unit/peapods/runtimes/asyncio/rest/test_models.py +++ b/tests/unit/peapods/runtimes/asyncio/rest/test_models.py @@ -46,7 +46,7 @@ def test_enum_definitions(): command_enum_definition = PROTO_TO_PYDANTIC_MODELS.RequestProto().schema()[ 'definitions' ]['Command'] - assert command_enum_definition['enum'] == [0, 1, 3, 4, 5, 6, 7, 8, 9] + assert command_enum_definition['enum'] == [0, 1, 2, 3, 4, 5, 6] def test_all_fields_in_document_proto(): @@ -59,7 +59,6 @@ def test_all_fields_in_document_proto(): 'content_hash', 'granularity', 'adjacency', - 'level_name', 'parent_id', 'chunks', 'weight', @@ -79,7 +78,7 @@ def test_all_fields_in_document_proto(): document_proto_properties_alias = PROTO_TO_PYDANTIC_MODELS.DocumentProto().schema()[ 'definitions' ]['DocumentProto']['properties'] - for i in ['contentHash', 'levelName', 'parentId', 'mimeType']: + for i in ['contentHash', 'parentId', 'mimeType']: assert i in document_proto_properties_alias @@ -117,13 +116,13 @@ def test_oneof_validation_error(): with pytest.raises(pydantic.error_wrappers.ValidationError) as error: doc = PROTO_TO_PYDANTIC_MODELS.DocumentProto(text='abc', buffer=b'abc') - assert "only one field among ['buffer', 'blob', 'text']" in str(error.value) + assert "only one field among ['buffer', 'blob', 'text', 'uri']" in str(error.value) with pytest.raises(pydantic.error_wrappers.ValidationError) as error: doc = PROTO_TO_PYDANTIC_MODELS.DocumentProto( text='abc', buffer=b'abc', blob=PROTO_TO_PYDANTIC_MODELS.NdArrayProto() ) - assert "only one field among ['buffer', 'blob', 'text']" in str(error.value) + assert "only one field among ['buffer', 'blob', 'text', 'uri']" in str(error.value) def test_tags_document(): @@ -239,20 +238,8 @@ def test_pydatic_document_to_jina_document(): @pytest.mark.parametrize('top_k', [5, 10]) def test_model_with_top_k(top_k): - m = JinaRequestModel(data=['abc'], top_k=top_k) - assert m.queryset[0].name == 'SliceQL' - assert m.queryset[0].parameters['end'] == top_k - assert m.queryset[1].name == 'VectorSearchDriver' - assert m.queryset[1].parameters['top_k'] == top_k - - -def test_model_with_queryset(): - m = JinaRequestModel( - data=['abc'], - queryset=[ - {'name': 'CustomQuerySet', 'parameters': {'top_k': 10}, 'priority': 1} - ], - ) - assert m.queryset[0].name == 'CustomQuerySet' - assert m.queryset[0].parameters['top_k'] == 10 - assert m.queryset[0].priority == 1 + m = JinaRequestModel(data=['abc'], parameters={'top_k': top_k}) + assert m.parameters['top_k'] == top_k + + m = JinaRequestModel(parameters={'top_k': top_k}) + assert m.parameters['top_k'] == top_k diff --git a/tests/unit/peapods/runtimes/container/test_container_runtime.py b/tests/unit/peapods/runtimes/container/test_container_runtime.py index e917c738ba554..d1be98dd3a3bf 100644 --- a/tests/unit/peapods/runtimes/container/test_container_runtime.py +++ b/tests/unit/peapods/runtimes/container/test_container_runtime.py @@ -5,6 +5,8 @@ import pytest from jina.checker import NetworkChecker +from jina.executors import BaseExecutor +from jina.executors.decorators import requests from jina.flow import Flow from jina.helper import random_name from jina.parsers import set_pea_parser @@ -25,6 +27,16 @@ ) +@pytest.fixture +def _logforward(): + class _LogForward(BaseExecutor): + @requests + def foo(self, **kwargs): + pass + + return _LogForward + + @pytest.fixture(scope='module') def docker_image_built(): import docker @@ -96,25 +108,25 @@ def test_flow_with_replica_container_ext_yaml(docker_image_built): f.index(inputs=random_docs(10)) -def test_flow_topo1(docker_image_built): +def test_flow_topo1(docker_image_built, _logforward): f = ( Flow() .add( name='d0', uses='docker://jinaai/jina:test-pip', - uses_internal='_logforward', + uses_internal=_logforward, entrypoint='jina pod', ) .add( name='d1', uses='docker://jinaai/jina:test-pip', - uses_internal='_logforward', + uses_internal=_logforward, entrypoint='jina pod', ) .add( name='d2', uses='docker://jinaai/jina:test-pip', - uses_internal='_logforward', + uses_internal=_logforward, needs='d0', entrypoint='jina pod', ) @@ -125,20 +137,20 @@ def test_flow_topo1(docker_image_built): f.index(inputs=random_docs(10)) -def test_flow_topo_mixed(docker_image_built): +def test_flow_topo_mixed(docker_image_built, _logforward): f = ( Flow() .add( name='d4', uses='docker://jinaai/jina:test-pip', - uses_internal='_logforward', + uses_internal=_logforward, entrypoint='jina pod', ) - .add(name='d5', uses='_logforward') + .add(name='d5', uses=_logforward) .add( name='d6', uses='docker://jinaai/jina:test-pip', - uses_internal='_logforward', + uses_internal=_logforward, needs='d4', entrypoint='jina pod', ) @@ -149,14 +161,14 @@ def test_flow_topo_mixed(docker_image_built): f.index(inputs=random_docs(10)) -def test_flow_topo_parallel(docker_image_built): +def test_flow_topo_parallel(docker_image_built, _logforward): f = ( Flow() .add( name='d7', uses='docker://jinaai/jina:test-pip', entrypoint='jina pod', - uses_internal='_pass', + uses_internal=_logforward, parallel=3, ) .add(name='d8', parallel=3) @@ -164,7 +176,7 @@ def test_flow_topo_parallel(docker_image_built): name='d9', uses='docker://jinaai/jina:test-pip', entrypoint='jina pod', - uses_internal='_pass', + uses_internal=_logforward, needs='d7', ) .join(['d9', 'd8']) @@ -174,7 +186,7 @@ def test_flow_topo_parallel(docker_image_built): f.index(inputs=random_docs(10)) -def test_flow_topo_ldl_parallel(docker_image_built): +def test_flow_topo_ldl_parallel(docker_image_built, _logforward): f = ( Flow() .add(name='d10') @@ -182,7 +194,7 @@ def test_flow_topo_ldl_parallel(docker_image_built): name='d11', uses='docker://jinaai/jina:test-pip', entrypoint='jina pod', - uses_internal='_pass', + uses_internal=_logforward, parallel=3, ) .add(name='d12') @@ -198,6 +210,7 @@ def test_container_volume(docker_image_built, tmpdir): name=random_name(), uses=f'docker://{img_name}', volumes=abc_path, + workspace='/abc', uses_internal=os.path.join(cur_dir, '../../../mwu-encoder/mwu_encoder_upd.yml'), ) @@ -205,7 +218,7 @@ def test_container_volume(docker_image_built, tmpdir): f.index(random_docs(10)) assert os.path.exists( - os.path.join(abc_path, 'ext-mwu-encoder-0', 'ext-mwu-encoder.bin') + os.path.join(abc_path, 'ext-mwu-encoder', '0', 'ext-mwu-encoder.bin') ) @@ -218,13 +231,14 @@ def test_container_volume_arbitrary(docker_image_built, tmpdir): uses_internal=os.path.join( cur_dir, '../../../mwu-encoder/mwu_encoder_volume_change.yml' ), + workspace='/mapped/here/abc', ) with f: f.index(random_docs(10)) assert os.path.exists( - os.path.join(abc_path, 'ext-mwu-encoder-0', 'ext-mwu-encoder.bin') + os.path.join(abc_path, 'ext-mwu-encoder', '0', 'ext-mwu-encoder.bin') ) @@ -242,14 +256,14 @@ def test_container_ping(docker_image_built): assert cm.value.code == 0 -def test_tail_host_docker2local_parallel(docker_image_built): +def test_tail_host_docker2local_parallel(docker_image_built, _logforward): f = ( Flow() .add( name='d10', uses='docker://jinaai/jina:test-pip', entrypoint='jina pod', - uses_internal='_pass', + uses_internal=_logforward, parallel=3, ) .add(name='d11') @@ -258,14 +272,14 @@ def test_tail_host_docker2local_parallel(docker_image_built): assert getattr(f._pod_nodes['d10'].peas_args['tail'], 'host_out') == defaulthost -def test_tail_host_docker2local(docker_image_built): +def test_tail_host_docker2local(docker_image_built, _logforward): f = ( Flow() .add( name='d12', uses='docker://jinaai/jina:test-pip', entrypoint='jina pod', - uses_internal='_pass', + uses_internal=_logforward, ) .add(name='d13') ) diff --git a/tests/unit/peapods/runtimes/remote/jinad/test_remote.py b/tests/unit/peapods/runtimes/remote/jinad/test_remote.py index ed49c1ffdea85..a13e18430149f 100644 --- a/tests/unit/peapods/runtimes/remote/jinad/test_remote.py +++ b/tests/unit/peapods/runtimes/remote/jinad/test_remote.py @@ -2,6 +2,7 @@ from jina.enums import PollingType from jina.helper import ArgNamespace +from jina import __default_executor__ def test_namespace_to_dict(): @@ -14,7 +15,7 @@ def test_namespace_to_dict(): polling=PollingType.ANY, port_ctrl=39011, port_expose=8000, - uses_after='_pass', + uses_after=__default_executor__, uses_before=None, ), 'tail': Namespace( @@ -25,7 +26,7 @@ def test_namespace_to_dict(): polling=PollingType.ANY, port_ctrl=46937, port_expose=8000, - uses_after='_pass', + uses_after=__default_executor__, uses_before=None, ), 'peas': [ @@ -38,7 +39,7 @@ def test_namespace_to_dict(): port_ctrl=44747, port_expose=8000, uses='helloworld.encoder.yml', - uses_after='_pass', + uses_after=__default_executor__, uses_before=None, ), Namespace( @@ -50,7 +51,7 @@ def test_namespace_to_dict(): port_ctrl=48957, port_expose=8000, uses='helloworld.encoder.yml', - uses_after='_pass', + uses_after=__default_executor__, uses_before=None, ), ], diff --git a/tests/unit/peapods/runtimes/remote/jinad/yamls/flow-index.yaml b/tests/unit/peapods/runtimes/remote/jinad/yamls/flow-index.yaml index 7a4c6679cc8d8..2024de1878d35 100644 --- a/tests/unit/peapods/runtimes/remote/jinad/yamls/flow-index.yaml +++ b/tests/unit/peapods/runtimes/remote/jinad/yamls/flow-index.yaml @@ -1,7 +1,5 @@ !Flow version: '1' -with: - compress_hwm: 1024 pods: - name: encode uses: yamls/encoder.yml diff --git a/tests/unit/peapods/zmq/test_zmq_addr.py b/tests/unit/peapods/zmq/test_zmq_addr.py index 28a80214a4c75..8d9af8cf31e68 100644 --- a/tests/unit/peapods/zmq/test_zmq_addr.py +++ b/tests/unit/peapods/zmq/test_zmq_addr.py @@ -6,6 +6,7 @@ from jina.types.message import Message from jina.clients.request import request_generator from tests import random_docs +from jina import __default_executor__ @pytest.fixture @@ -14,7 +15,7 @@ def zmq_args_argparse(): '--name', 'test2', '--uses-before', - '_pass', + __default_executor__, '--parallel', '1', '--host', @@ -39,7 +40,9 @@ def runtime(zmq_args_argparse): @pytest.fixture def ctrl_messages(): - return [Message(None, r, 'test', '123') for r in request_generator(random_docs(10))] + return [ + Message(None, r, 'test', '123') for r in request_generator('/', random_docs(10)) + ] @pytest.fixture(params=['zmq_args_dict', 'zmq_args_argparse']) diff --git a/tests/unit/proto/test_pb_struct_tags.py b/tests/unit/proto/test_pb_struct_tags.py index 9842057e86716..5ef5b5ccf6944 100644 --- a/tests/unit/proto/test_pb_struct_tags.py +++ b/tests/unit/proto/test_pb_struct_tags.py @@ -26,3 +26,10 @@ def test_tags(document): # can be used as a dict for _, _ in d2.tags['nested'].items(): continue + + +def test_tags_assign(): + d = DocumentProto() + d.tags.update({'int': 1, 'float': 0.1234}) + with pytest.raises(AttributeError): + d.tags = {'int': 1, 'float': 0.1234} diff --git a/tests/unit/test_cli.py b/tests/unit/test_cli.py index 745761a088cfc..83ab40becc074 100644 --- a/tests/unit/test_cli.py +++ b/tests/unit/test_cli.py @@ -1,4 +1,5 @@ import json +import os import subprocess import pytest @@ -58,3 +59,14 @@ def test_ping(): NetworkChecker(a3) assert cm.value.code == 1 + + +@pytest.mark.parametrize('project', ['fashion', 'chatbot', 'multimodal']) +def test_fork(tmpdir, project): + subprocess.check_call(['jina', 'hello', 'fork', project, f'{tmpdir}/tmp']) + + assert os.path.exists(f'{tmpdir}/tmp/app.py') + assert os.path.exists(f'{tmpdir}/tmp/executors.py') + if project == 'multimodal': + assert os.path.exists(f'{tmpdir}/tmp/flow-index.yml') + assert os.path.exists(f'{tmpdir}/tmp/flow-search.yml') diff --git a/tests/unit/test_cust_driver.py b/tests/unit/test_cust_driver.py deleted file mode 100644 index 9c4006427b0f1..0000000000000 --- a/tests/unit/test_cust_driver.py +++ /dev/null @@ -1,40 +0,0 @@ -import os -from jina import Flow, Document -from jina.executors import BaseExecutor -from jina.parsers import set_pea_parser -from jina.peapods.peas import BasePea - -from tests import validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def test_load_executor_with_custom_driver(): - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-executor-with-custom-driver.yml') - ) as be: - assert be._drivers['IndexRequest'][0].__class__.__name__ == 'DummyEncodeDriver' - - -def test_load_pod_with_custom_driver(): - args = set_pea_parser().parse_args( - ['--uses', os.path.join(cur_dir, 'yaml/test-executor-with-custom-driver.yml')] - ) - with BasePea(args): - # load success with no error - pass - - -def test_load_flow_with_custom_driver(mocker): - mock = mocker.Mock() - - def validate(req): - assert len(req.docs) == 1 - assert req.docs[0].text == 'hello from DummyEncodeDriver' - - with Flow().add( - uses=os.path.join(cur_dir, 'yaml/test-executor-with-custom-driver.yml') - ) as f: - f.index([Document()], on_done=mock) - - validate_callback(mock, validate) diff --git a/tests/unit/test_driver_yaml.py b/tests/unit/test_driver_yaml.py deleted file mode 100644 index 08d5e117b640d..0000000000000 --- a/tests/unit/test_driver_yaml.py +++ /dev/null @@ -1,136 +0,0 @@ -import os - -import pytest -from pkg_resources import resource_filename - -from jina.drivers import BaseDriver -from jina.drivers.control import ControlReqDriver -from jina.drivers.search import KVSearchDriver -from jina.executors import BaseExecutor -from jina.jaml import JAML -from jina.parsers import set_pod_parser -from jina.peapods import Pod - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def test_load_yaml1(tmpdir): - with open(os.path.join(cur_dir, 'yaml/test-driver.yml'), encoding='utf8') as fp: - a = JAML.load(fp) - - assert isinstance(a[0], KVSearchDriver) - assert isinstance(a[1], ControlReqDriver) - assert isinstance(a[2], BaseDriver) - - with open(os.path.join(tmpdir, 'test_driver.yml'), 'w', encoding='utf8') as fp: - JAML.dump(a[0], fp) - - with open(os.path.join(tmpdir, 'test_driver.yml'), encoding='utf8') as fp: - b = JAML.load(fp) - - assert isinstance(b, KVSearchDriver) - assert b._executor_name == a[0]._executor_name - - -def test_load_cust_with_driver(): - a = BaseExecutor.load_config( - os.path.join(cur_dir, 'mwu-encoder/mwu_encoder_driver.yml') - ) - assert a._drivers['ControlRequest'][0].__class__.__name__ == 'MyAwesomeDriver' - p = set_pod_parser().parse_args( - ['--uses', os.path.join(cur_dir, 'mwu-encoder/mwu_encoder_driver.yml')] - ) - with Pod(p): - # will print a cust task_name from the driver when terminate - pass - - -def test_pod_new_api_from_kwargs(): - a = BaseExecutor.load_config( - os.path.join(cur_dir, 'mwu-encoder/mwu_encoder_driver.yml') - ) - assert a._drivers['ControlRequest'][0].__class__.__name__ == 'MyAwesomeDriver' - - args = set_pod_parser().parse_args( - ['--uses', os.path.join(cur_dir, 'mwu-encoder/mwu_encoder_driver.yml')] - ) - with Pod(args): - # will print a cust task_name from the driver when terminate - pass - - -@pytest.mark.parametrize('random_workspace_name', ['JINA_TEST_EXEC_WITH_DRIVER']) -def test_load_yaml2(test_metas): - a = BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-exec-with-driver.yml') - ) - assert len(a._drivers) == 2 - # should be able to auto fill in ControlRequest - assert 'ControlRequest' in a._drivers - a.save_config() - p = a.config_abspath - b = BaseExecutor.load_config(p) - assert a._drivers == b._drivers - a.touch() - a.save() - c = BaseExecutor.load(a.save_abspath) - assert not hasattr(c, '_drivers') - - -@pytest.mark.parametrize( - 'yaml_path, name, expected', - [ - ('executors._clear.yml', 'clear', 6), - ('executors._concat.yml', 'concat', 6), - ('executors._eval_pr.yml', 'eval_pr', 2), - ('executors._logforward.yml', 'logforward', 6), - ('executors._merge.yml', 'merge', 6), - ('executors._merge_chunks.yml', 'merge_chunks', 6), - ('executors._merge_eval.yml', 'merge_eval', 6), - ('executors._merge_matches.yml', 'merge_matches', 6), - ('executors._pass.yml', 'forward', 6), - ], -) -def test_resource_executor(yaml_path, name, expected): - a = BaseExecutor.load_config( - resource_filename('jina', '/'.join(('resources', yaml_path))) - ) - assert a.name == name - assert len(a._drivers) == expected - - -def test_multiple_executor(): - from jina.executors.encoders import BaseEncoder - from jina.executors.indexers import BaseIndexer - from jina.executors.rankers import Chunk2DocRanker - from jina.executors.crafters import BaseCrafter - - class D1(BaseEncoder): - pass - - d1 = D1() - assert len(d1._drivers) == 6 - - class D2(BaseIndexer): - pass - - d2 = D2('dummy.bin') - assert len(d2._drivers) == 1 - - class D3(Chunk2DocRanker): - pass - - d3 = D3() - assert len(d3._drivers) == 2 - - class D4(BaseCrafter): - pass - - d4 = D4() - assert len(d4._drivers) == 6 - - class D5(BaseCrafter): - pass - - d5 = D5() - assert len(d5._drivers) == 6 diff --git a/tests/unit/test_echostream.py b/tests/unit/test_echostream.py index 561e4e24ef132..86c313f61397d 100644 --- a/tests/unit/test_echostream.py +++ b/tests/unit/test_echostream.py @@ -40,8 +40,6 @@ def test_simple_zmqlet(): 'PULL_BIND', '--socket-out', 'PUSH_BIND', - '--uses', - '_logforward', '--timeout-ctrl', '-1', ] @@ -51,7 +49,7 @@ def test_simple_zmqlet(): with BasePea(args2), Zmqlet(args, logger) as z: req = jina_pb2.RequestProto() req.request_id = random_identity() - d = req.index.docs.add() + d = req.data.docs.add() d.tags['id'] = 2 msg = Message(None, req, 'tmp', '') z.send_message(msg) diff --git a/tests/unit/test_exectype.py b/tests/unit/test_exectype.py deleted file mode 100644 index 43b3081a7202d..0000000000000 --- a/tests/unit/test_exectype.py +++ /dev/null @@ -1,62 +0,0 @@ -import os - -import pytest - -from jina.executors import BaseExecutor - -# BaseIndexer is already registered -from jina.jaml import JAML - - -@pytest.fixture -def unregister(): - from jina.executors.indexers import BaseIndexer - - if 'tests.unit.test_exectype.BaseIndexer' in BaseIndexer._registered_class: - BaseIndexer._registered_class.remove('tests.unit.test_exectype.BaseIndexer') - yield - if 'tests.unit.test_exectype.BaseIndexer' in BaseIndexer._registered_class: - BaseIndexer._registered_class.remove('tests.unit.test_exectype.BaseIndexer') - - -@pytest.mark.parametrize('f_register', [True, False]) -def test_exec_type(tmpdir, f_register, unregister): - from jina.executors.indexers import BaseIndexer - - assert 'jina.executors.indexers.BaseIndexer' in BaseExecutor._registered_class - - # init from YAML should be okay as well - BaseExecutor.load_config('BaseIndexer') - - BaseIndexer().save_config(os.path.join(tmpdir, 'tmp.yml')) - with open(os.path.join(tmpdir, 'tmp.yml')) as fp: - _ = JAML.load(fp) - - def assert_bi(): - b = BaseIndexer(1) - - b.save_config(os.path.join(tmpdir, 'tmp.yml')) - with open(os.path.join(tmpdir, 'tmp.yml')) as fp: - b = JAML.load(fp) - assert b.a == 1 - - # By this point, BaseIndexer has not registered in reg_cls_set yet and store_init_kwargs will be executed - class BaseIndexer(BaseExecutor): - def __init__(self, a=0, *args, **kwargs): - super().__init__(*args, **kwargs) - self.a = a - - assert_bi() - - class BaseIndexer(BaseExecutor): - force_register = f_register - - def __init__(self, a=0, *args, **kwargs): - super().__init__(*args, **kwargs) - self.a = a - - if f_register: - assert_bi() - else: - with pytest.raises(AssertionError): - assert_bi() diff --git a/tests/unit/test_gateway.py b/tests/unit/test_gateway.py index 747aff3ec7a2b..5c5fc202900d7 100644 --- a/tests/unit/test_gateway.py +++ b/tests/unit/test_gateway.py @@ -6,31 +6,21 @@ import pytest from jina import Document -from jina.drivers.control import BaseControlDriver from jina.enums import CompressAlgo -from jina.executors.encoders import BaseEncoder from jina.flow import Flow from tests import random_docs -class DummyEncoder(BaseEncoder): - def encode(self, content, *args, **kwargs): - pass - - @pytest.mark.parametrize('compress_algo', list(CompressAlgo)) def test_compression(compress_algo, mocker): - class CompressCheckDriver(BaseControlDriver): - def __call__(self, *args, **kwargs): - assert self.req._envelope.compression.algorithm == str(compress_algo) response_mock = mocker.Mock() f = ( Flow(compress=str(compress_algo)) - .add(uses='- !CompressCheckDriver {}') - .add(name='DummyEncoder', parallel=2) - .add(uses='- !CompressCheckDriver {}') + .add() + .add(name='DummyEncoder', parallel=2) + .add() ) with f: diff --git a/tests/unit/test_helper.py b/tests/unit/test_helper.py index ece4d88705c2c..0a441971b6637 100644 --- a/tests/unit/test_helper.py +++ b/tests/unit/test_helper.py @@ -6,11 +6,10 @@ import pytest from cli import _is_latest_version -from jina import Executor -from jina import NdArray, Request +from jina import Executor, __default_endpoint__ from jina.clients.helper import _safe_callback, pprint_routes from jina.excepts import BadClientCallback, NotSupportedError, NoAvailablePortError -from jina.executors.decorators import requests, batching +from jina.executors.decorators import requests from jina.helper import ( cached_property, convert_tuple_to_list, @@ -19,12 +18,14 @@ touch_dir, random_port, find_request_binding, + dunder_get, ) from jina.jaml.helper import complete_path from jina.logging import default_logger from jina.logging.profile import TimeContext from jina.proto import jina_pb2 -from jina.types.querylang.queryset.dunderkey import dunder_get +from jina.types.ndarray.generic import NdArray +from jina.types.request import Request from tests import random_docs @@ -86,10 +87,10 @@ def test_check_update(): def test_wrap_func(): - from jina.executors.encoders import BaseEncoder + from jina import Executor - class DummyEncoder(BaseEncoder): - def encode(self): + class DummyEncoder(Executor): + def __init__(self): pass class MockEnc(DummyEncoder): @@ -99,7 +100,7 @@ class MockMockEnc(MockEnc): pass class MockMockMockEnc(MockEnc): - def encode(self): + def __init__(self): pass def check_override(cls, method): @@ -110,11 +111,10 @@ def check_override(cls, method): is_override = not is_inherit and is_parent_method return is_override - assert not check_override(BaseEncoder, 'encode') - assert check_override(DummyEncoder, 'encode') - assert not check_override(MockEnc, 'encode') - assert not check_override(MockMockEnc, 'encode') - assert check_override(MockMockMockEnc, 'encode') + assert check_override(DummyEncoder, '__init__') + assert not check_override(MockEnc, '__init__') + assert not check_override(MockMockEnc, '__init__') + assert check_override(MockMockMockEnc, '__init__') def test_pprint_routes(capfd): @@ -294,27 +294,25 @@ def test_random_port_max_failures_for_tests_only(config_few_ports): class MyDummyExecutor(Executor): - @batching @requests - def foo(self): + def foo(self, **kwargs): pass @requests(on='index') - def bar(self): + def bar(self, **kwargs): pass @requests(on='search') - def bar2(self): + def bar2(self, **kwargs): pass - @batching def foo2(self): pass def test_find_request_binding(): r = find_request_binding(MyDummyExecutor) - assert r['default'] == 'foo' + assert r[__default_endpoint__] == 'foo' assert r['index'] == 'bar' assert r['search'] == 'bar2' assert 'foo2' not in r.values() diff --git a/tests/unit/test_importer.py b/tests/unit/test_importer.py index 11edeaee66a54..53d04f5d60983 100644 --- a/tests/unit/test_importer.py +++ b/tests/unit/test_importer.py @@ -46,27 +46,14 @@ def test_no_suppress_other_exception(): raise Exception -@pytest.mark.parametrize('import_once', [True, False]) -@pytest.mark.parametrize('ns', ['jina.executors', 'jina.hub', 'jina.drivers']) -def test_import_classes_import_once(ns, import_once): - depend_tree = import_classes(namespace=ns, import_once=import_once) - assert (not depend_tree) == import_once - - -@pytest.mark.parametrize('import_once', [True, False]) -def test_import_classes_import_once_exception(import_once): - with pytest.raises(TypeError): - _ = import_classes(namespace='dummy', import_once=import_once) - - -@pytest.mark.parametrize('ns', ['jina.executors', 'jina.hub', 'jina.drivers']) +@pytest.mark.parametrize('ns', ['jina.executors', 'jina.hub']) def test_import_classes_failed_find_package(ns, mocker): mocker.patch('pkgutil.get_loader', return_value=None) depend_tree = import_classes(namespace=ns) assert len(depend_tree) == 0 -@pytest.mark.parametrize('ns', ['jina.executors', 'jina.hub', 'jina.drivers']) +@pytest.mark.parametrize('ns', ['jina.executors']) def test_import_classes_failed_import_module(ns, mocker, recwarn): import importlib @@ -76,25 +63,4 @@ def test_import_classes_failed_import_module(ns, mocker, recwarn): depend_tree = import_classes(namespace=ns) assert len(depend_tree) == 0 assert len(recwarn) == 1 - assert ( - 'You can use `jina check` to list all executors and drivers' - in recwarn[0].message.args[0] - ) - - -@pytest.mark.parametrize('print_table', [True, False]) -@pytest.mark.parametrize('ns', ['jina.executors', 'jina.hub']) -def test_import_classes_failed_load_default_exc_config( - ns, print_table, mocker, recwarn, capsys -): - mocker.patch('pkg_resources.resource_stream', side_effect=Exception('mocked error')) - _ = import_classes(namespace=ns, show_import_table=print_table) - if print_table: - captured = capsys.readouterr() - assert '✗' in captured.out - else: - assert len(recwarn) == 1 - assert ( - 'You can use `jina check` to list all executors and drivers' - in recwarn[0].message.args[0] - ) + assert 'You can use `jina check`' in recwarn[0].message.args[0] diff --git a/tests/unit/test_index.py b/tests/unit/test_index.py deleted file mode 100644 index 485f2f6299edb..0000000000000 --- a/tests/unit/test_index.py +++ /dev/null @@ -1,135 +0,0 @@ -import os - -import pytest -import numpy as np - -from jina.executors.indexers.vector import NumpyIndexer -from jina.flow import Flow -from jina.proto import jina_pb2 -from jina import Document -from tests import random_docs, validate_callback - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture(scope='function') -def test_workspace_index(tmpdir): - os.environ['JINA_TEST_INDEX'] = str(tmpdir) - workspace_path = os.environ['JINA_TEST_INDEX'] - yield workspace_path - del os.environ['JINA_TEST_INDEX'] - - -@pytest.fixture(scope='function') -def test_workspace_joint(tmpdir): - os.environ['JINA_TEST_JOINT'] = str(tmpdir) - workspace_path = os.environ['JINA_TEST_JOINT'] - yield workspace_path - del os.environ['JINA_TEST_JOINT'] - - -def get_result(resp): - n = [] - for d in resp.search.docs: - n.append([k.id for k in d.matches]) - n = np.array(n) - # each doc should return a list of top-100 - np.testing.assert_equal(n.shape[0], 2) - np.testing.assert_equal(n.shape[1], 50) - - -class DummyIndexer(NumpyIndexer): - # the add() function is simply copied from NumpyIndexer - def add(self, *args, **kwargs): - pass - - -class DummyIndexer2(NumpyIndexer): - # the add() function is simply copied from NumpyIndexer - def add(self, keys: 'np.ndarray', vectors: 'np.ndarray', *args, **kwargs): - if len(vectors.shape) != 2: - raise ValueError( - f'vectors shape {vectors.shape} is not valid, expecting "vectors" to have rank of 2' - ) - - if not self.num_dim: - self.num_dim = vectors.shape[1] - self.dtype = vectors.dtype.name - elif self.num_dim != vectors.shape[1]: - raise ValueError( - "vectors' shape [%d, %d] does not match with indexers's dim: %d" - % (vectors.shape[0], vectors.shape[1], self.num_dim) - ) - elif self.dtype != vectors.dtype.name: - raise TypeError( - f"vectors' dtype {vectors.dtype.name} does not match with indexers's dtype: {self.dtype}" - ) - elif keys.shape[0] != vectors.shape[0]: - raise ValueError( - 'number of key %d not equal to number of vectors %d' - % (keys.shape[0], vectors.shape[0]) - ) - elif self.key_dtype != keys.dtype.name: - raise TypeError( - f"keys' dtype {keys.dtype.name} does not match with indexers keys's dtype: {self.key_dtype}" - ) - - self.write_handler.write(vectors.tobytes()) - self.key_bytes += keys.tobytes() - self.key_dtype = keys.dtype.name - self._size += keys.shape[0] - - -def test_doc_iters(): - docs = random_docs(3, 5) - for doc in docs: - assert isinstance(doc, Document) - - -def test_simple_route(): - f = Flow().add() - with f: - f.index(inputs=random_docs(10)) - - -def test_update_method(test_metas): - with DummyIndexer(index_filename='testa.bin', metas=test_metas) as indexer: - indexer.save() - assert not os.path.exists(indexer.save_abspath) - assert not os.path.exists(indexer.index_abspath) - indexer.add() - indexer.save() - assert os.path.exists(indexer.save_abspath) - assert os.path.exists(indexer.index_abspath) - - with DummyIndexer2(index_filename='testb.bin', metas=test_metas) as indexer: - indexer.save() - assert not os.path.exists(indexer.save_abspath) - assert not os.path.exists(indexer.index_abspath) - indexer.add(np.array([1, 2, 3]), np.array([[1, 1, 1], [2, 2, 2]])) - indexer.save() - assert os.path.exists(indexer.save_abspath) - assert os.path.exists(indexer.index_abspath) - - -def test_index(test_workspace_index): - f = Flow().add(uses=os.path.join(cur_dir, 'yaml/test-index.yml'), parallel=3) - with f: - f.index(inputs=random_docs(50)) - for j in range(3): - assert os.path.exists(os.path.join(test_workspace_index, f'test2-{j}', f'tmp2')) - - -def test_compound_idx(test_workspace_joint, mocker): - def validate(req): - assert req.status.code < jina_pb2.StatusProto.ERROR - assert req.search.docs[0].matches[0].score.op_name == 'NumpyIndexer' - - with Flow().add(uses=os.path.join(cur_dir, 'yaml/test-joint.yml')) as f: - f.index(random_docs(100, chunks_per_doc=0)) - - response_mock = mocker.Mock() - with Flow().add(uses=os.path.join(cur_dir, 'yaml/test-joint.yml')) as g: - g.search(random_docs(10, chunks_per_doc=0), on_done=response_mock) - - validate_callback(response_mock, validate) diff --git a/tests/unit/test_index_remote.py b/tests/unit/test_index_remote.py deleted file mode 100644 index a6de037ca57e0..0000000000000 --- a/tests/unit/test_index_remote.py +++ /dev/null @@ -1,60 +0,0 @@ -import os - -import numpy as np - -from jina.executors.indexers.vector import NumpyIndexer - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -def get_result(resp): - n = [] - for d in resp.search.docs: - for c in d.chunks: - n.append([k.id for k in c.matches]) - n = np.array(n) - # each chunk should return a list of top-100 - np.testing.assert_equal(n.shape[0], 5) - np.testing.assert_equal(n.shape[1], 100) - - -class DummyIndexer(NumpyIndexer): - # the add() function is simply copied from NumpyIndexer - def add(self, *args, **kwargs): - pass - - -class DummyIndexer2(NumpyIndexer): - # the add() function is simply copied from NumpyIndexer - def add(self, keys: 'np.ndarray', vectors: 'np.ndarray', *args, **kwargs): - if len(vectors.shape) != 2: - raise ValueError( - f'vectors shape {vectors.shape} is not valid, expecting "vectors" to have rank of 2' - ) - - if not self.num_dim: - self.num_dim = vectors.shape[1] - self.dtype = vectors.dtype.name - elif self.num_dim != vectors.shape[1]: - raise ValueError( - "vectors' shape [%d, %d] does not match with indexers's dim: %d" - % (vectors.shape[0], vectors.shape[1], self.num_dim) - ) - elif self.dtype != vectors.dtype.name: - raise TypeError( - f"vectors' dtype {vectors.dtype.name} does not match with indexers's dtype: {self.dtype}" - ) - elif keys.shape[0] != vectors.shape[0]: - raise ValueError( - 'number of key %d not equal to number of vectors %d' - % (keys.shape[0], vectors.shape[0]) - ) - elif self.key_dtype != keys.dtype.name: - raise TypeError( - f"keys' dtype {keys.dtype.name} does not match with indexers keys's dtype: {self.key_dtype}" - ) - - self.write_handler.write(vectors.tobytes()) - self.key_bytes += keys.tobytes() - self.key_dtype = keys.dtype.name - self._size += keys.shape[0] diff --git a/tests/unit/test_is_read_message.py b/tests/unit/test_is_read_message.py index 1e719d24e9f1a..7680efd91ce34 100644 --- a/tests/unit/test_is_read_message.py +++ b/tests/unit/test_is_read_message.py @@ -1,10 +1,12 @@ import time + +from jina import __default_executor__ +from jina.helper import random_identity from jina.logging import default_logger from jina.parsers import set_pea_parser from jina.peapods.peas import BasePea from jina.peapods.zmq import Zmqlet from jina.types.message import Message -from jina.helper import random_identity from jina.types.request import Request from tests import validate_callback @@ -50,8 +52,6 @@ def _post_hook(self, msg: 'Message') -> 'BasePea': 'PULL_BIND', '--socket-out', 'PUSH_BIND', - '--uses', - '_clear', # will trigger use '--timeout-ctrl', '-1', ] @@ -72,7 +72,7 @@ def _post_hook(self, msg: 'Message') -> 'BasePea': '--socket-out', 'PUSH_BIND', '--uses', - '_pass', # will NOT trigger use + __default_executor__, # will NOT trigger use '--timeout-ctrl', '-1', ] @@ -83,7 +83,7 @@ def test_read_zmqlet(): with MockBasePeaRead(args2), Zmqlet(args1, default_logger) as z: req = Request() req.request_id = random_identity() - d = req.index.docs.add() + d = req.data.docs.add() d.tags['id'] = 2 msg = Message(None, req, 'tmp', '') z.send_message(msg) @@ -93,7 +93,7 @@ def test_not_read_zmqlet(): with MockBasePeaNotRead(args3), Zmqlet(args1, default_logger) as z: req = Request() req.request_id = random_identity() - d = req.index.docs.add() + d = req.data.docs.add() d.tags['id'] = 2 msg = Message(None, req, 'tmp', '') z.send_message(msg) @@ -104,12 +104,12 @@ def test_recv_message_zmqlet(mocker): zmqlet2 = Zmqlet(args2, default_logger) req = Request() req.request_id = random_identity() - doc = req.index.docs.add() + doc = req.data.docs.add() doc.tags['id'] = 2 msg = Message(None, req, 'tmp', '') def callback(msg_): - assert msg_.request.docs[0].tags['id'] == msg.request.index.docs[0].tags['id'] + assert msg_.request.docs[0].tags['id'] == msg.request.data.docs[0].tags['id'] mock = mocker.Mock() zmqlet1.send_message(msg) diff --git a/tests/unit/test_loadbalance.py b/tests/unit/test_loadbalance.py index a47bda45b9a37..082e57020c7a8 100644 --- a/tests/unit/test_loadbalance.py +++ b/tests/unit/test_loadbalance.py @@ -1,28 +1,23 @@ import os import time +from jina import Flow, Executor, requests from jina.enums import SchedulerType -from jina.executors.decorators import single -from jina.executors.crafters import BaseCrafter -from jina.flow import Flow from tests import random_docs os.environ['JINA_LOG_LEVEL'] = 'DEBUG' -class SlowWorker(BaseCrafter): +class SlowWorker(Executor): def __init__(self, *args, **kwargs): super().__init__(*args, **kwargs) # half of worker is slow self.is_slow = os.getpid() % 2 != 0 - self.logger.warning('im a slow worker') - @single - def craft(self, id, *args, **kwargs): + @requests + def craft(self, **kwargs): if self.is_slow: - self.logger.warning('slowly doing') time.sleep(1) - return {'id': id} def test_lb(): diff --git a/tests/unit/test_workspace.py b/tests/unit/test_workspace.py deleted file mode 100644 index 1017dee4211e4..0000000000000 --- a/tests/unit/test_workspace.py +++ /dev/null @@ -1,323 +0,0 @@ -import os -import shutil - -import numpy as np -import pytest - -from jina.executors import BaseExecutor - -cur_dir = os.path.dirname(os.path.abspath(__file__)) - - -@pytest.fixture() -def test_workspace(tmpdir): - os.environ['JINA_TEST_WORKSPACE'] = str(tmpdir) - os.environ['JINA_TEST_WORKSPACE_COMP1'] = os.path.join(str(tmpdir), 'component-1') - os.environ['JINA_TEST_WORKSPACE_COMP2'] = os.path.join(str(tmpdir), 'component-2') - yield - del os.environ['JINA_TEST_WORKSPACE'] - del os.environ['JINA_TEST_WORKSPACE_COMP1'] - del os.environ['JINA_TEST_WORKSPACE_COMP2'] - - -@pytest.mark.parametrize('pea_id', [-1, 0, 1, 2, 3]) -def test_shard_workspace(test_workspace, pea_id): - tmpdir = os.environ['JINA_TEST_WORKSPACE'] - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-workspace.yml'), pea_id=pea_id - ) as executor: - executor.index_filename = 'index_filename' - executor.touch() - if pea_id > 0: - assert os.path.exists( - os.path.join( - tmpdir, f'{executor.name}-{executor.pea_id}', f'{executor.name}.bin' - ) - ) - else: - assert os.path.exists( - os.path.join(tmpdir, f'test1-{pea_id}', f'{executor.name}.bin') - ) - - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-workspace.yml'), pea_id=pea_id - ) as executor: - assert executor.index_filename == 'index_filename' - - -@pytest.mark.parametrize('dump_compound', [True, False]) -@pytest.mark.parametrize('pea_id', [-1, 0, 1, 2, 3]) -def test_compound_indexer_no_workspace_in_components( - test_workspace, pea_id, dump_compound -): - tmpdir = os.environ['JINA_TEST_WORKSPACE'] - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-compound-indexer.yml'), pea_id=pea_id - ) as executor: - assert executor.pea_id == pea_id - assert len(executor.components) == 2 - for i, component in enumerate(executor): - assert component.pea_id == executor.pea_id - component.index_filename = f'index_filename-component-{i}' - component.touch() - executor._attached_pea = 'hey' - if dump_compound: - executor.touch() - - compound_bin_expected = os.path.join( - tmpdir, f'{executor.name}-{executor.pea_id}', f'{executor.name}.bin' - ) - - if dump_compound: - assert os.path.exists(compound_bin_expected) - else: - assert not os.path.exists(compound_bin_expected) - - for component in executor: - if pea_id > 0: - assert os.path.exists( - os.path.join( - tmpdir, - f'{executor.name}-{executor.pea_id}', - f'{component.name}-{component.pea_id}', - f'{component.name}.bin', - ) - ) - else: - assert os.path.exists( - os.path.join( - tmpdir, - f'{executor.name}', - f'{component.name}-{component.pea_id}', - f'{component.name}.bin', - ) - ) - - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-compound-indexer.yml'), pea_id=pea_id - ) as executor: - assert len(executor.components) == 2 - for i, component in enumerate(executor): - assert component.index_filename == f'index_filename-component-{i}' - if dump_compound: - assert executor._attached_pea == 'hey' - - -@pytest.mark.parametrize('dump_compound', [True, False]) -@pytest.mark.parametrize('pea_id', [-1, 0]) -def test_compound_indexer_with_workspace_in_components( - test_workspace, pea_id, dump_compound -): - # the workspace in components will be ignored in compound - tmpdir = os.environ['JINA_TEST_WORKSPACE'] - comp1_dir = os.environ['JINA_TEST_WORKSPACE_COMP1'] - comp2_dir = os.environ['JINA_TEST_WORKSPACE_COMP2'] - with BaseExecutor.load_config( - os.path.join( - cur_dir, 'yaml/test-compound-indexer-components-with-workspace.yml' - ), - pea_id=pea_id, - ) as executor: - assert len(executor.components) == 2 - assert executor.pea_id == pea_id - for i, component in enumerate(executor): - assert component.pea_id == executor.pea_id - component.index_filename = f'index_filename-component-{i}' - component.touch() - executor._attached_pea = 'hey' - if dump_compound: - executor.touch() - - compound_bin_expected = None - if pea_id > 0: - compound_bin_expected = os.path.join( - tmpdir, f'{executor.name}-{executor.pea_id}', f'{executor.name}.bin' - ) - else: - compound_bin_expected = os.path.join( - tmpdir, f'{executor.name}-{executor.pea_id}', f'{executor.name}.bin' - ) - - if dump_compound: - assert os.path.exists(compound_bin_expected) - else: - assert not os.path.exists(compound_bin_expected) - - for i, component in enumerate(executor): - if pea_id > 0: - assert os.path.exists( - os.path.join( - comp1_dir if i == 0 else comp2_dir, - f'{component.name}-{component.pea_id}', - f'{component.name}.bin', - ) - ) - else: - assert os.path.exists( - os.path.join( - comp1_dir if i == 0 else comp2_dir, - f'{component.name}-{component.pea_id}', - f'{component.name}.bin', - ) - ) - - with BaseExecutor.load_config( - os.path.join( - cur_dir, 'yaml/test-compound-indexer-components-with-workspace.yml' - ), - pea_id=pea_id, - ) as executor: - assert len(executor.components) == 2 - for i, component in enumerate(executor): - assert component.index_filename == f'index_filename-component-{i}' - if dump_compound: - assert executor._attached_pea == 'hey' - - -@pytest.mark.parametrize('pea_id', [-1, 0, 1, 2, 3]) -def test_indexer_ref_indexer(test_workspace, pea_id): - tmpdir = os.environ['JINA_TEST_WORKSPACE'] - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-indexer-workspace.yml'), pea_id=pea_id - ) as ref_indexer: - ref_indexer.num_dim = 512 - ref_indexer.touch() - - if pea_id > 0: - assert os.path.exists( - os.path.join( - tmpdir, - f'{ref_indexer.name}-{ref_indexer.pea_id}', - f'{ref_indexer.name}.bin', - ) - ) - else: - assert os.path.exists( - os.path.join( - tmpdir, - f'{ref_indexer.name}-{ref_indexer.pea_id}', - f'{ref_indexer.name}.bin', - ) - ) - - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-refindexer-workspace.yml'), pea_id=pea_id - ) as indexer: - assert indexer.num_dim == 512 - - -@pytest.mark.parametrize('pea_id', [-1, 0, 1, 2, 3]) -def test_compound_indexer_ref_indexer(test_workspace, pea_id): - tmpdir = os.environ['JINA_TEST_WORKSPACE'] - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-compound-indexer2.yml'), pea_id=pea_id - ) as compound_indexer: - ref_indexer = compound_indexer[1] - ref_indexer.num_dim = 512 - ref_indexer.touch() - - if pea_id > 0: - assert os.path.exists( - os.path.join( - tmpdir, - f'{compound_indexer.name}-{compound_indexer.pea_id}', - f'{ref_indexer.name}-{ref_indexer.pea_id}', - f'{ref_indexer.name}.bin', - ) - ) - else: - assert os.path.exists( - os.path.join( - tmpdir, - f'{compound_indexer.name}', - f'{ref_indexer.name}-{ref_indexer.pea_id}', - f'{ref_indexer.name}.bin', - ) - ) - - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test_compount_indexer2-ref-indexer.yml'), - pea_id=pea_id, - ) as compound_indexer: - indexer = compound_indexer[1] - assert indexer.num_dim == 512 - - -@pytest.fixture() -def test_workspace_move(tmpdir): - os.environ['JINA_TEST_WORKSPACE'] = os.path.join(str(tmpdir), 'host') - yield - del os.environ['JINA_TEST_WORKSPACE'] - - -# This test tries to simulate the situation where an executor workspace is mapped to a docker container, and therefore -# its workspace has changed. -@pytest.mark.parametrize('pea_id', [-1, 0, 1, 2, 3]) -def test_simple_indexer_workspace_move_to_docker(test_workspace_move, tmpdir, pea_id): - keys = ['0', '1'] - content = [b'a', b'b'] - old_tmpdir = os.environ['JINA_TEST_WORKSPACE'] - docker_tmpdir = os.path.join(tmpdir, 'docker') - - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-kvindexer-workspace.yml'), pea_id=pea_id - ) as indexer: - indexer.add(keys, content) - - assert os.path.exists( - os.path.join( - old_tmpdir, f'{indexer.name}-{indexer.pea_id}', f'{indexer.name}.bin' - ) - ) - - shutil.copytree(os.environ['JINA_TEST_WORKSPACE'], docker_tmpdir) - - shutil.rmtree(os.environ['JINA_TEST_WORKSPACE']) - - os.environ['JINA_TEST_WORKSPACE'] = str(docker_tmpdir) - - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-kvindexer-workspace.yml'), pea_id=pea_id - ) as indexer: - assert indexer.query(keys[0]) == [content[0]] - assert indexer.query(keys[1]) == [content[1]] - assert indexer.workspace == docker_tmpdir - - -def test_compound_indexer_rw(test_workspace): - all_vecs = np.random.random([6, 5]) - for j in range(3): - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-compound-indexer2.yml'), - separated_workspace=True, - pea_id=j, - ) as indexer: - assert indexer[0] == indexer['test_meta'] - assert not indexer[0].is_updated - assert not indexer.is_updated - indexer[0].add( - [str(j), str(j * 2), str(j * 3)], [bytes(j), bytes(j * 2), bytes(j * 3)] - ) - assert indexer[0].is_updated - assert indexer.is_updated - assert not indexer[1].is_updated - indexer[1].add([str(j), str(j * 2)], all_vecs[(j * 2, j * 2 + 1), :]) - assert indexer[1].is_updated - indexer.save() - # the compound executor itself is not modified, therefore should not generate a save - assert not os.path.exists(indexer.save_abspath) - assert os.path.exists(indexer[0].save_abspath) - assert os.path.exists(indexer[0].index_abspath) - assert os.path.exists(indexer[1].save_abspath) - assert os.path.exists(indexer[1].index_abspath) - - recovered_vecs = [] - for j in range(3): - with BaseExecutor.load_config( - os.path.join(cur_dir, 'yaml/test-compound-indexer2.yml'), - separated_workspace=True, - pea_id=j, - ) as indexer: - recovered_vecs.append(indexer[1].query_handler) - - np.testing.assert_almost_equal(all_vecs, np.concatenate(recovered_vecs)) diff --git a/tests/unit/test_yamlparser.py b/tests/unit/test_yamlparser.py index a1f12b6fc1d5b..c2d2609eba592 100644 --- a/tests/unit/test_yamlparser.py +++ b/tests/unit/test_yamlparser.py @@ -2,17 +2,13 @@ import pytest import yaml -from pkg_resources import resource_filename +from jina import __default_executor__ from jina.enums import SocketType from jina.executors import BaseExecutor -from jina.executors.compound import CompoundExecutor -from jina.executors.metas import fill_metas_with_defaults from jina.helper import expand_dict from jina.helper import expand_env_var from jina.jaml import JAML -from jina.parsers import set_pea_parser -from jina.peapods.peas import BasePea cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -90,12 +86,6 @@ class AttrDict: assert isinstance(a.components, list) -def test_yaml_fill(): - with open(os.path.join(cur_dir, 'yaml/test-expand2.yml')) as fp: - a = JAML.load(fp) - print(fill_metas_with_defaults(a)) - - def test_class_yaml(): class DummyClass: pass @@ -106,33 +96,6 @@ class DummyClass: assert type(a) == DummyClass -def test_class_yaml2(): - with open( - resource_filename( - 'jina', '/'.join(('resources', 'executors.requests.BaseExecutor.yml')) - ) - ) as fp: - JAML.load(fp) - - -def test_class_yaml3(): - args = set_pea_parser().parse_args([]) - - with BasePea(args): - pass - - from jina.executors.requests import _defaults - - assert _defaults is not None - - -def test_joint_indexer(test_workspace): - b = BaseExecutor.load_config(os.path.join(cur_dir, 'yaml/test-joint.yml')) - b.attach(runtime=None) - assert b._drivers['SearchRequest'][0]._exec == b[0] - assert b._drivers['SearchRequest'][-1]._exec == b[1] - - def test_load_external_fail(): with pytest.raises(yaml.constructor.ConstructorError): BaseExecutor.load_config('yaml/dummy_ext_exec.yml') @@ -153,23 +116,22 @@ def test_enum_yaml(): def test_encoder_name_env_replace(): os.environ['BE_TEST_NAME'] = 'hello123' - os.environ['BATCH_SIZE'] = '256' with BaseExecutor.load_config('yaml/test-encoder-env.yml') as be: - assert be.name == 'hello123' - assert be.batch_size == 256 + assert be.metas.name == 'hello123' def test_encoder_name_dict_replace(): - d = {'BE_TEST_NAME': 'hello123', 'BATCH_SIZE': 256} + d = {'BE_TEST_NAME': 'hello123'} with BaseExecutor.load_config('yaml/test-encoder-env.yml', context=d) as be: - assert be.name == 'hello123' - assert be.batch_size == 256 - assert be.workspace == 'hello123-256' + assert be.metas.name == 'hello123' + assert be.metas.workspace == 'hello123' def test_encoder_inject_config_via_kwargs(): - with BaseExecutor.load_config('yaml/test-encoder-env.yml', pea_id=345) as be: - assert be.pea_id == 345 + with BaseExecutor.load_config( + 'yaml/test-encoder-env.yml', metas={'pea_id': 345} + ) as be: + assert be.metas.pea_id == 345 def test_load_from_dict(): @@ -181,12 +143,10 @@ def test_load_from_dict(): # workspace: ${{this.name}}-${{this.batch_size}} d1 = { - 'jtype': 'BaseEncoder', + 'jtype': __default_executor__, 'metas': { 'name': '${{BE_TEST_NAME}}', - 'batch_size': '${{BATCH_SIZE}}', - 'pea_id': '${{pea_id}}', - 'workspace': '${{this.name}} -${{this.batch_size}}', + 'workspace': '${{this.name}}', }, } @@ -204,26 +164,8 @@ def test_load_from_dict(): # name: test2 # metas: # name: compound1 - - d2 = { - 'jtype': 'CompoundExecutor', - 'components': [ - { - 'jtype': 'BinaryPbIndexer', - 'with': {'index_filename': 'tmp1'}, - 'metas': {'name': 'test1'}, - }, - { - 'jtype': 'BinaryPbIndexer', - 'with': {'index_filename': 'tmp2'}, - 'metas': {'name': 'test2'}, - }, - ], - } - d = {'BE_TEST_NAME': 'hello123', 'BATCH_SIZE': 256} + d = {'BE_TEST_NAME': 'hello123'} b1 = BaseExecutor.load_config(d1, context=d) - b2 = BaseExecutor.load_config(d2, context=d) assert isinstance(b1, BaseExecutor) - assert isinstance(b2, CompoundExecutor) - assert b1.batch_size == 256 - assert b1.name == 'hello123' + + assert b1.metas.name == 'hello123' diff --git a/tests/unit/types/arrays/test_chunkarray.py b/tests/unit/types/arrays/test_chunkarray.py index 6a765506ae54f..8041fbbbb5e29 100644 --- a/tests/unit/types/arrays/test_chunkarray.py +++ b/tests/unit/types/arrays/test_chunkarray.py @@ -1,8 +1,8 @@ import pytest -from jina import Request -from jina.types.document import Document from jina.types.arrays.chunk import ChunkArray +from jina.types.document import Document +from jina.types.request import Request @pytest.fixture(scope='function') @@ -26,7 +26,7 @@ def reference_doc(document_factory): @pytest.fixture def chunks(document_factory): req = Request() - req.request_type = 'index' + req.request_type = 'data' req.docs.extend( [ document_factory.create(1, 'test 1'), @@ -34,7 +34,7 @@ def chunks(document_factory): document_factory.create(3, 'test 3'), ] ) - return req.proto.index.docs + return req.proto.data.docs @pytest.fixture diff --git a/tests/unit/types/arrays/test_documentarray.py b/tests/unit/types/arrays/test_documentarray.py index 6da9520b760e9..d1e8f10223708 100644 --- a/tests/unit/types/arrays/test_documentarray.py +++ b/tests/unit/types/arrays/test_documentarray.py @@ -2,13 +2,10 @@ import pytest import numpy as np -from scipy.sparse import coo_matrix, csr_matrix -import torch -import tensorflow as tf +from scipy.sparse import coo_matrix from jina import Document from jina.types.arrays import DocumentArray -from jina.enums import EmbeddingClsType DOCUMENTS_PER_LEVEL = 1 @@ -63,17 +60,11 @@ def test_append(docarray, document_factory): assert docarray[-1].id == doc.id -def test_add(docarray, document_factory): - doc = document_factory.create(4, 'test 4') - docarray.add(doc) - assert docarray[-1].id == doc.id - - def test_union(docarray, document_factory): additional_docarray = DocumentArray([]) for idx in range(4, 10): doc = document_factory.create(idx, f'test {idx}') - additional_docarray.add(doc) + additional_docarray.append(doc) union = docarray + additional_docarray for idx in range(0, 3): assert union[idx].id == docarray[idx].id @@ -85,7 +76,7 @@ def test_union_inplace(docarray, document_factory): additional_docarray = DocumentArray([]) for idx in range(4, 10): doc = document_factory.create(idx, f'test {idx}') - additional_docarray.add(doc) + additional_docarray.append(doc) union = deepcopy(docarray) union += additional_docarray for idx in range(0, 3): @@ -115,12 +106,7 @@ def test_delete(docarray, document_factory): assert docarray == docarray -def test_build(docarray): - docarray.build() - - def test_array_get_success(docarray, document_factory): - docarray.build() doc = document_factory.create(4, 'test 4') doc_id = 2 docarray[doc_id] = doc @@ -144,7 +130,6 @@ def test_array_get_from_slice_success(docs, document_factory): def test_array_get_fail(docarray, document_factory): - docarray.build() with pytest.raises(IndexError): docarray[0.1] = 1 # Set fail, not a supported type with pytest.raises(IndexError): @@ -179,13 +164,17 @@ def test_match_chunk_array(): with Document() as d: d.content = 'hello world' - m = d.matches.new() + m = Document() + d.matches.append(m) assert m.granularity == d.granularity - assert m.adjacency == d.adjacency + 1 + assert m.adjacency == 0 + assert d.matches[0].adjacency == d.adjacency + 1 assert len(d.matches) == 1 - c = d.chunks.new() - assert c.granularity == d.granularity + 1 + c = Document() + d.chunks.append(c) + assert c.granularity == 0 + assert d.chunks[0].granularity == d.granularity + 1 assert c.adjacency == d.adjacency assert len(d.chunks) == 1 @@ -202,249 +191,5 @@ def add_match(doc): with Document() as match: match.granularity = doc.granularity match.adjacency = doc.adjacency + 1 - doc.matches.add(match) + doc.matches.append(match) return match - - -@pytest.fixture -def documentarray(): - """ Builds up a complete chunk-match structure, with a depth of 2 in both directions recursively. """ - max_granularity = 2 - max_adjacency = 2 - - def iterate_build(document, current_granularity, current_adjacency): - if current_granularity < max_granularity: - for i in range(DOCUMENTS_PER_LEVEL): - chunk = add_chunk(document) - iterate_build(chunk, chunk.granularity, chunk.adjacency) - if current_adjacency < max_adjacency: - for i in range(DOCUMENTS_PER_LEVEL): - match = add_match(document) - iterate_build(match, match.granularity, match.adjacency) - - docs = [] - for base_id in range(DOCUMENTS_PER_LEVEL): - with Document() as d: - d.granularity = 0 - d.adjacency = 0 - docs.append(d) - iterate_build(d, 0, 0) - return documentarray(docs) - - -def callback_fn(docs, *args, **kwargs) -> None: - for doc in docs: - add_chunk(doc) - add_match(doc) - add_match(doc) - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize('num_rows', [1, 2, 3]) -@pytest.mark.parametrize('field', ['content', 'blob', 'embedding']) -def test_get_content(stack, num_rows, field): - batch_size = 10 - embed_size = 20 - - kwargs = {field: np.random.random((num_rows, embed_size))} - - docs = DocumentArray([Document(**kwargs) for _ in range(batch_size)]) - docs.append(Document()) - - contents, pts = docs.extract_docs(field, stack_contents=stack) - if stack: - assert isinstance(contents, np.ndarray) - assert contents.shape == (batch_size, num_rows, embed_size) - else: - assert len(contents) == batch_size - for content in contents: - assert content.shape == (num_rows, embed_size) - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize('field', ['id', 'text']) -def test_get_content_text_fields(stack, field): - batch_size = 10 - - kwargs = {field: 'text'} - - docs = DocumentArray([Document(**kwargs) for _ in range(batch_size)]) - - contents, pts = docs.extract_docs(field, stack_contents=stack) - if stack: - assert isinstance(contents, np.ndarray) - assert contents.shape == (batch_size,) - assert len(contents) == batch_size - for content in contents: - assert content == 'text' - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize('bytes_input', [b'bytes', np.array([0, 0, 0]).tobytes()]) -@pytest.mark.parametrize('field', ['content', 'buffer']) -def test_get_content_bytes_fields(stack, bytes_input, field): - batch_size = 10 - - kwargs = {field: bytes_input} - - docs = DocumentArray([Document(**kwargs) for _ in range(batch_size)]) - - contents, pts = docs.extract_docs(field, stack_contents=stack) - - assert len(contents) == batch_size - assert isinstance(contents, list) - for content in contents: - assert isinstance(content, bytes) - assert content == bytes_input - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize('fields', [['id', 'text'], ['content_hash', 'modality']]) -def test_get_content_multiple_fields_text(stack, fields): - batch_size = 10 - - kwargs = {field: f'text-{field}' for field in fields} - - docs = DocumentArray([Document(**kwargs) for _ in range(batch_size)]) - - contents, pts = docs.extract_docs(*fields, stack_contents=stack) - - assert len(contents) == len(fields) - assert isinstance(contents, list) - if stack: - assert isinstance(contents[0], np.ndarray) - assert isinstance(contents[1], np.ndarray) - - for content in contents: - assert len(content) == batch_size - if stack: - assert content.shape == (batch_size,) - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize('bytes_input', [b'bytes', np.array([0, 0, 0]).tobytes()]) -def test_get_content_multiple_fields_text_buffer(stack, bytes_input): - batch_size = 10 - fields = ['id', 'buffer'] - kwargs = {'id': 'text', 'buffer': bytes_input} - - docs = DocumentArray([Document(**kwargs) for _ in range(batch_size)]) - - contents, pts = docs.extract_docs(*fields, stack_contents=stack) - - assert len(contents) == len(fields) - assert isinstance(contents, list) - assert len(contents[0]) == batch_size - if stack: - assert isinstance(contents[0], np.ndarray) - assert contents[0].shape == (batch_size,) - assert isinstance(contents[1], list) - assert isinstance(contents[1][0], bytes) - - for content in contents: - assert len(content) == batch_size - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize('num_rows', [1, 2, 3]) -def test_get_content_multiple_fields_arrays(stack, num_rows): - fields = ['blob', 'embedding'] - - batch_size = 10 - embed_size = 20 - - kwargs = {field: np.random.random((num_rows, embed_size)) for field in fields} - docs = DocumentArray([Document(**kwargs) for _ in range(batch_size)]) - - contents, pts = docs.extract_docs(*fields, stack_contents=stack) - - assert len(contents) == len(fields) - assert isinstance(contents, list) - if stack: - assert isinstance(contents[0], np.ndarray) - assert isinstance(contents[1], np.ndarray) - - for content in contents: - assert len(content) == batch_size - if stack: - assert content.shape == (batch_size, num_rows, embed_size) - else: - for c in content: - assert c.shape == (num_rows, embed_size) - - -@pytest.mark.parametrize('stack', [False, True]) -@pytest.mark.parametrize('num_rows', [1, 2, 3]) -def test_get_content_multiple_fields_merge(stack, num_rows): - fields = ['embedding', 'text'] - - batch_size = 10 - embed_size = 20 - - kwargs = { - field: np.random.random((num_rows, embed_size)) - if field == 'embedding' - else 'text' - for field in fields - } - docs = DocumentArray([Document(**kwargs) for _ in range(batch_size)]) - - contents, pts = docs.extract_docs(*fields, stack_contents=stack) - - assert len(contents) == len(fields) - assert isinstance(contents, list) - if stack: - assert isinstance(contents[0], np.ndarray) - assert isinstance(contents[1], np.ndarray) - - for content in contents: - assert len(content) == batch_size - - if stack: - assert contents[0].shape == (batch_size, num_rows, embed_size) - assert contents[1].shape == (batch_size,) - else: - assert len(contents[0]) == batch_size - assert len(contents[1]) == batch_size - for c in contents[0]: - assert c.shape == (num_rows, embed_size) - - -@pytest.mark.parametrize( - 'embedding_cls_type, return_expected_type', - [ - (EmbeddingClsType.SCIPY_COO, coo_matrix), - (EmbeddingClsType.SCIPY_CSR, csr_matrix), - (EmbeddingClsType.TORCH, torch.Tensor), - (EmbeddingClsType.TF, tf.SparseTensor), - ], -) -def test_all_sparse_embeddings( - docarray_with_scipy_sparse_embedding, - embedding_cls_type, - return_expected_type, -): - ( - all_embeddings, - doc_pts, - ) = docarray_with_scipy_sparse_embedding.get_all_sparse_embeddings( - embedding_cls_type=embedding_cls_type, - ) - assert all_embeddings is not None - assert doc_pts is not None - assert len(doc_pts) == 3 - - if embedding_cls_type.is_scipy: - assert isinstance(all_embeddings, return_expected_type) - assert all_embeddings.shape == (3, 10) - if embedding_cls_type.is_torch: - assert isinstance(all_embeddings, return_expected_type) - assert all_embeddings.is_sparse - assert all_embeddings.shape[0] == 3 - assert all_embeddings.shape[1] == 10 - if embedding_cls_type.is_tf: - assert isinstance(all_embeddings, list) - assert isinstance(all_embeddings[0], return_expected_type) - assert len(all_embeddings) == 3 - assert all_embeddings[0].shape[0] == 1 - assert all_embeddings[0].shape[1] == 10 diff --git a/tests/unit/types/arrays/test_matcharray.py b/tests/unit/types/arrays/test_matcharray.py index 135ea2b192e06..46e0f1f2f53f1 100644 --- a/tests/unit/types/arrays/test_matcharray.py +++ b/tests/unit/types/arrays/test_matcharray.py @@ -1,8 +1,8 @@ import pytest -from jina import Request -from jina.types.document import Document from jina.types.arrays.match import MatchArray +from jina.types.document import Document +from jina.types.request import Request @pytest.fixture(scope='function') @@ -26,7 +26,7 @@ def reference_doc(document_factory): @pytest.fixture def matches(document_factory): req = Request() - req.request_type = 'index' + req.request_type = 'data' req.docs.extend( [ document_factory.create(1, 'test 1'), @@ -34,7 +34,7 @@ def matches(document_factory): document_factory.create(3, 'test 3'), ] ) - return req.proto.index.docs + return req.proto.data.docs @pytest.fixture diff --git a/tests/unit/types/arrays/test_querylangarray.py b/tests/unit/types/arrays/test_querylangarray.py deleted file mode 100644 index 3fbfe839f5fbc..0000000000000 --- a/tests/unit/types/arrays/test_querylangarray.py +++ /dev/null @@ -1,142 +0,0 @@ -import pytest - -from jina.proto.jina_pb2 import RequestProto, QueryLangProto -from jina.types.querylang import QueryLang -from jina.types.arrays.querylang import QueryLangArray - - -@pytest.fixture(scope='function') -def querylang_protos(): - """:return:: A :class:`RepeatedCompositeContainer` consist list of :class:`QueryLangProto`.""" - req = RequestProto() - for _ in range(3): - req.queryset.extend([QueryLangProto()]) - return req.queryset - - -@pytest.fixture(scope='function') -def querylang_array(querylang_protos): - """:return:: A :class:`RepeatedCompositeContainer` consist list of :class:`QueryLangProto`.""" - return QueryLangArray(querylang_protos=querylang_protos) - - -@pytest.fixture(scope='function') -def querylang_instance(): - """:return:: An instance of :class:`QueryLang`.""" - query_lang = QueryLang() - query_lang.name = 'test' - query_lang.priority = 5 - return query_lang - - -def test_init(querylang_protos): - """The test function to initialize the :class:`QueryLangArray`""" - assert QueryLangArray(querylang_protos=querylang_protos) - - -def test_insert(querylang_array, querylang_instance): - """Test insert :attr:`ql` to :class:`QueryLangArray` at :attr:`index`.""" - querylang_array.insert(index=0, ql=querylang_instance) - assert len(querylang_array) == 4 - assert querylang_array[0].name == 'test' - assert querylang_array[0].priority == 5 - - -def test_get_array_success(querylang_array, querylang_instance): - """ - Test :meth:`__listitem__` and :meth:`__getitem__` in :class`QueryLangArray`. - :attr:`key` might blongs to type `int` or `str`. - """ - querylang_array[0] = querylang_instance - assert querylang_array[0].name == 'test' - assert querylang_array[0].priority == 5 - querylang_array.build() - querylang_array['test'] = querylang_instance - assert querylang_array['test'].name == 'test' - assert querylang_array['test'].priority == 5 - - -def test_get_array_fail(querylang_array): - """Test :meth:`__listitem__` and :meth:`__getitem__` in :class`QueryLangArray`. - - .. note:: - Please assert pytest.rases `IndexError` - """ - with pytest.raises(IndexError): - querylang_array[10] - querylang_array['not_exist'] - - -def test_delete(querylang_array): - """Test :meth:`__del__`, should remove value from :class:`QueryLangArray` given an index.""" - del querylang_array[0] - assert len(querylang_array) == 2 - - -def test_length(querylang_array): - """Test :meth:`__len__`, should return the length of :class:`QueryLangArray`.""" - assert len(querylang_array) == 3 - - -def test_iter(querylang_array): - """Test :meth:`__iter__`, should yield an instance of :class:`QueryLang`.""" - for querylang in querylang_array: - assert isinstance(querylang, QueryLang) - - -@pytest.mark.parametrize( - 'querylang_item', - [QueryLangProto(), QueryLang(), {'name': 'Driver', 'parameters': {'key': 'value'}}], -) -def test_append_success_proto(querylang_array, querylang_item): - """Test :meth:`append`. Expect test three cases depends on the type of :attr:`value`. - Such as :class:`BaseDriver`, :class:`QueryLangProto` and :class:`QueryLang`. - - .. note:: - Please parameterize this test with pytest.mark.parameterize. - """ - querylang_array.append(querylang_item) - assert len(querylang_array) == 4 - - -def test_append_fail(querylang_array): - """Test :meth:`append` with an invalid input. - - .. note:: - Please assert pytest.rases `TypeError` - """ - with pytest.raises(TypeError): - querylang_array.append('invalid type') - - -def test_extend(querylang_array, querylang_instance): - """Test :meth:`extend`, extend an iterable to :class:`QueryLangArray`.""" - querylang_array.extend([querylang_instance]) - assert len(querylang_array) == 4 - assert querylang_array[3].name == querylang_instance.name - - -def test_clear(querylang_array): - """Test :meth:`clear`, ensure length of :attr:`_querylangs_proto` is 0 after clear.""" - querylang_array.clear() - assert len(querylang_array) == 0 - - -def test_reverse(querylang_array, querylang_instance): - """Test :meth:`reverse`, reverse the items in :class:`QueryLangArray`. - - .. note:: - reverse the same :class:`QueryLangArray` twice and assert they're identical. - """ - querylang_array.append(querylang_instance) - querylang_array.reverse() - assert querylang_array[0].name == querylang_instance.name - querylang_array.reverse() - assert querylang_array[3].name == querylang_instance.name - - -def test_build(querylang_array): - """Test :meth:`build`. - Ensure the built result :attr:`_docs_map` is `dict` and the values are correct. - """ - querylang_array.build() diff --git a/tests/unit/types/document/test_converters.py b/tests/unit/types/document/test_converters.py index 0128a3f697404..9af8aeff5a15a 100644 --- a/tests/unit/types/document/test_converters.py +++ b/tests/unit/types/document/test_converters.py @@ -2,6 +2,7 @@ import numpy as np import pytest + from jina import Document cur_dir = os.path.dirname(os.path.abspath(__file__)) @@ -9,15 +10,15 @@ def test_uri_to_blob(): doc = Document(uri=os.path.join(cur_dir, 'test.png')) - doc.convert_uri_to_blob() + doc.convert_image_uri_to_blob() assert isinstance(doc.blob, np.ndarray) assert doc.blob.shape == (85, 152, 3) # h,w,c def test_datauri_to_blob(): doc = Document(uri=os.path.join(cur_dir, 'test.png')) - doc.convert_uri_to_data_uri() - doc.convert_data_uri_to_blob() + doc.convert_uri_to_datauri() + doc.convert_image_datauri_to_blob() assert isinstance(doc.blob, np.ndarray) assert doc.blob.shape == (85, 152, 3) # h,w,c @@ -25,7 +26,7 @@ def test_datauri_to_blob(): def test_buffer_to_blob(): doc = Document(uri=os.path.join(cur_dir, 'test.png')) doc.convert_uri_to_buffer() - doc.convert_buffer_image_to_blob() + doc.convert_image_buffer_to_blob() assert isinstance(doc.blob, np.ndarray) assert doc.blob.shape == (85, 152, 3) # h,w,c @@ -51,7 +52,7 @@ def test_convert_blob_to_uri(arr_size, mode): doc = Document(content=np.random.randint(0, 255, arr_size)) assert doc.blob.any() assert not doc.uri - doc.convert_blob_to_uri(32, 28) + doc.convert_image_blob_to_uri(32, 28) assert doc.uri.startswith('data:image/png;base64,') @@ -132,13 +133,6 @@ def test_convert_content_to_uri(): ) def test_convert_uri_to_data_uri(uri, mimetype): doc = Document(uri=uri, mime_type=mimetype) - intialiazed_buffer = doc.buffer - intialiazed_uri = doc.uri - doc.convert_uri_to_data_uri() - converted_buffer = doc.buffer - converted_uri = doc.uri - print(doc.content_type) + doc.convert_uri_to_datauri() assert doc.uri.startswith(f'data:{mimetype}') - assert intialiazed_uri != converted_uri - assert converted_buffer != intialiazed_buffer assert doc.mime_type == mimetype diff --git a/tests/unit/types/document/test_document.py b/tests/unit/types/document/test_document.py index 6ff2537bc737f..6ba900a51ee1f 100644 --- a/tests/unit/types/document/test_document.py +++ b/tests/unit/types/document/test_document.py @@ -8,9 +8,10 @@ from google.protobuf.json_format import MessageToDict from scipy.sparse import coo_matrix, bsr_matrix, csr_matrix, csc_matrix -from jina import NdArray, Request from jina.proto.jina_pb2 import DocumentProto from jina.types.document import Document +from jina.types.ndarray.generic import NdArray +from jina.types.request import Request from jina.types.score import NamedScore from tests import random_docs @@ -80,7 +81,7 @@ def test_doc_update_fields(): d = [12, 34, 56] e = 'text-mod' w = 2.0 - a.set_attrs(embedding=b, tags=c, location=d, modality=e, weight=w) + a.set_attributes(embedding=b, tags=c, location=d, modality=e, weight=w) np.testing.assert_equal(a.embedding, b) assert list(a.location) == d assert a.modality == e @@ -99,9 +100,11 @@ def test_uri_get_set(): a.uri = 'https://abc.com/a.jpg' assert a.uri == 'https://abc.com/a.jpg' assert a.mime_type == 'image/jpeg' - - with pytest.raises(ValueError): - a.uri = 'abcdefg' + a.uri = 'abcdefg' + assert a.uri == 'abcdefg' + a.content = 'abcdefg' + assert a.text == 'abcdefg' + assert not a.uri def test_set_get_mime(): @@ -166,7 +169,7 @@ def test_doc_content(): def test_request_docs_mutable_iterator(): """To test the weak reference work in docs""" r = Request() - r.request_type = 'index' + r.request_type = 'data' for d in random_docs(10): r.docs.append(d) @@ -184,7 +187,7 @@ def test_request_docs_mutable_iterator(): # pb-lize it should see the change rpb = r.proto - for idx, d in enumerate(rpb.index.docs): + for idx, d in enumerate(rpb.data.docs): assert isinstance(d, DocumentProto) assert d.text == f'look I changed it! {idx}' @@ -193,7 +196,7 @@ def test_request_docs_mutable_iterator(): d.text = 'now i change it back' # iterate it again should see the change - for idx, d in enumerate(rpb.index.docs): + for idx, d in enumerate(rpb.data.docs): assert isinstance(d, DocumentProto) assert d.text == 'now i change it back' @@ -201,7 +204,7 @@ def test_request_docs_mutable_iterator(): def test_request_docs_chunks_mutable_iterator(): """Test if weak reference work in nested docs""" r = Request() - r.request_type = 'index' + r.request_type = 'data' for d in random_docs(10): r.docs.append(d) @@ -222,7 +225,7 @@ def test_request_docs_chunks_mutable_iterator(): # pb-lize it should see the change rpb = r.proto - for d in rpb.index.docs: + for d in rpb.data.docs: assert isinstance(d, DocumentProto) for idx, c in enumerate(d.chunks): assert isinstance(c, DocumentProto) @@ -233,7 +236,7 @@ def test_request_docs_chunks_mutable_iterator(): d.text = 'now i change it back' # iterate it again should see the change - for d in rpb.index.docs: + for d in rpb.data.docs: assert isinstance(d, DocumentProto) for c in d.chunks: assert c.text == 'now i change it back' @@ -491,25 +494,21 @@ def expected_doc_fields(): @pytest.fixture def ignored_doc_fields(): - return ['embedding', 'score', 'blob', 'buffer', 'text', 'tags'] + return ['embedding', 'score', 'blob', 'buffer', 'text', 'tags', 'uri'] def test_document_to_json(expected_doc_fields, ignored_doc_fields): doc = Document() doc_dict = json.loads(doc.json()) present_keys = sorted(doc_dict.keys()) - for field in expected_doc_fields: - if field not in ignored_doc_fields: - assert field in present_keys + assert present_keys == ['id'] def test_document_to_dict(expected_doc_fields, ignored_doc_fields): doc = Document() doc_dict = doc.dict() present_keys = sorted(doc_dict.keys()) - for field in expected_doc_fields: - if field not in ignored_doc_fields: - assert field in present_keys + assert present_keys == ['id'] def test_update_include_field(): @@ -620,57 +619,6 @@ def test_update_exclude_field(): assert d.chunks[0].id == '🐢' -def test_get_attr(): - d = Document( - { - 'id': '123', - 'text': 'document', - 'feature1': 121, - 'name': 'name', - 'tags': {'id': 'identity', 'a': 'b', 'c': 'd'}, - } - ) - d.score = NamedScore(value=42) - - required_keys = [ - 'id', - 'text', - 'tags__name', - 'tags__feature1', - 'score__value', - 'tags__c', - 'tags__id', - 'tags__inexistant', - 'inexistant', - ] - res = d.get_attrs(*required_keys) - - assert len(res.keys()) == len(required_keys) - assert res['id'] == '123' - assert res['tags__feature1'] == 121 - assert res['tags__name'] == 'name' - assert res['text'] == 'document' - assert res['tags__c'] == 'd' - assert res['tags__id'] == 'identity' - assert res['score__value'] == 42 - assert res['tags__inexistant'] is None - assert res['inexistant'] is None - - res2 = d.get_attrs(*['tags', 'text']) - assert len(res2.keys()) == 2 - assert res2['text'] == 'document' - assert res2['tags'] == d.tags - - d = Document({'id': '123', 'tags': {'outterkey': {'innerkey': 'real_value'}}}) - res3 = d.get_attrs(*['tags__outterkey__innerkey']) - assert len(res3.keys()) == 1 - assert res3['tags__outterkey__innerkey'] == 'real_value' - - d = Document(content=np.array([1, 2, 3])) - res4 = d.get_attrs(*['blob']) - np.testing.assert_equal(res4['blob'], np.array([1, 2, 3])) - - def test_get_attr_values(): d = Document( { @@ -694,7 +642,7 @@ def test_get_attr_values(): 'tags__inexistant', 'inexistant', ] - res = d.get_attrs_values(*required_keys) + res = d.get_attributes(*required_keys) assert len(res) == len(required_keys) assert res[required_keys.index('id')] == '123' @@ -708,41 +656,19 @@ def test_get_attr_values(): assert res[required_keys.index('inexistant')] is None required_keys_2 = ['tags', 'text'] - res2 = d.get_attrs_values(*required_keys_2) + res2 = d.get_attributes(*required_keys_2) assert len(res2) == 2 assert res2[required_keys_2.index('text')] == 'document' assert res2[required_keys_2.index('tags')] == d.tags d = Document({'id': '123', 'tags': {'outterkey': {'innerkey': 'real_value'}}}) required_keys_3 = ['tags__outterkey__innerkey'] - res3 = d.get_attrs_values(*required_keys_3) - assert len(res3) == 1 - assert res3[required_keys_3.index('tags__outterkey__innerkey')] == 'real_value' + res3 = d.get_attributes(*required_keys_3) + assert res3 == 'real_value' d = Document(content=np.array([1, 2, 3])) - res4 = d.get_attrs(*['blob']) - np.testing.assert_equal(res4['blob'], np.array([1, 2, 3])) - - -def test_pb_obj2dict(): - document = Document() - with document: - document.text = 'this is text' - document.tags['id'] = 'id in tags' - document.tags['inner_dict'] = {'id': 'id in inner_dict'} - with Document() as chunk: - chunk.text = 'text in chunk' - chunk.tags['id'] = 'id in chunk tags' - document.chunks.add(chunk) - res = document.get_attrs('text', 'tags', 'chunks') - assert res['text'] == 'this is text' - assert res['tags']['id'] == 'id in tags' - assert res['tags']['inner_dict']['id'] == 'id in inner_dict' - rcs = list(res['chunks']) - assert len(rcs) == 1 - assert isinstance(rcs[0], Document) - assert rcs[0].text == 'text in chunk' - assert rcs[0].tags['id'] == 'id in chunk tags' + res4 = np.stack(d.get_attributes(*['blob'])) + np.testing.assert_equal(res4, np.array([1, 2, 3])) def test_document_sparse_attributes_scipy(scipy_sparse_matrix): @@ -856,3 +782,12 @@ def test_siblings_needs_to_be_set_manually(): document.chunks.append(chunk) for i in range(3): assert document.chunks[i].siblings == 3 + + +def test_evaluations(): + document = Document() + score = document.evaluations.add() + score.op_name = 'operation' + score.value = 10.0 + assert document.evaluations[0].value == 10.0 + assert document.evaluations[0].op_name == 'operation' diff --git a/tests/unit/types/document/test_document_traverse.py b/tests/unit/types/document/test_document_traverse.py index fe7a890c61044..5cc935e8751f2 100644 --- a/tests/unit/types/document/test_document_traverse.py +++ b/tests/unit/types/document/test_document_traverse.py @@ -1,11 +1,10 @@ -from collections import Iterator +import types +import numpy as np import pytest -import types from jina import Document, DocumentArray from jina.clients.request import request_generator -from jina.executors.decorators import batching from tests import random_docs # some random prime number for sanity check @@ -22,11 +21,11 @@ def doc_req(): # add some random matches for d in ds: for _ in range(num_matches_per_doc): - d.matches.add(Document(content='hello')) + d.matches.append(Document(content='hello')) for c in d.chunks: for _ in range(num_matches_per_chunk): - c.matches.add(Document(content='world')) - req = list(request_generator(ds))[0] + c.matches.append(Document(content='world')) + req = list(request_generator('/', ds))[0] yield req @@ -85,51 +84,39 @@ def test_traverse_root_match_chunk(doc_req): assert len(ds) == 1 + num_docs + num_docs + num_docs * num_chunks_per_doc -def test_batching_traverse(doc_req): - @batching(batch_size=num_docs, slice_on=0) - def foo(docs): - print(f'batch_size:{len(docs)}') - assert len(docs) == num_docs - - ds = list(doc_req.docs.traverse(['c', 'm', 'cm'])) - # under this contruction, num_doc is the common denominator - - foo(ds) - - def test_traverse_flatten_embedding(doc_req): - flattened_results = doc_req.docs.traverse_flatten(['r', 'c']) - ds = flattened_results.all_embeddings - assert ds[0].shape == (num_docs + num_chunks_per_doc * num_docs, 10) + flattened_results = doc_req.docs.traverse_flat(['r', 'c']) + ds = np.stack(flattened_results.get_attributes('embedding')) + assert ds.shape == (num_docs + num_chunks_per_doc * num_docs, 10) def test_traverse_flatten_root(doc_req): - ds = list(doc_req.docs.traverse_flatten(['r'])) + ds = list(doc_req.docs.traverse_flat(['r'])) assert len(ds) == num_docs def test_traverse_flatten_chunk(doc_req): - ds = list(doc_req.docs.traverse_flatten(['c'])) + ds = list(doc_req.docs.traverse_flat(['c'])) assert len(ds) == num_docs * num_chunks_per_doc def test_traverse_flatten_root_plus_chunk(doc_req): - ds = list(doc_req.docs.traverse_flatten(['c', 'r'])) + ds = list(doc_req.docs.traverse_flat(['c', 'r'])) assert len(ds) == num_docs + num_docs * num_chunks_per_doc def test_traverse_flatten_match(doc_req): - ds = list(doc_req.docs.traverse_flatten(['m'])) + ds = list(doc_req.docs.traverse_flat(['m'])) assert len(ds) == num_docs * num_matches_per_doc def test_traverse_flatten_match_chunk(doc_req): - ds = list(doc_req.docs.traverse_flatten(['cm'])) + ds = list(doc_req.docs.traverse_flat(['cm'])) assert len(ds) == num_docs * num_chunks_per_doc * num_matches_per_chunk def test_traverse_flatten_root_match_chunk(doc_req): - ds = list(doc_req.docs.traverse_flatten(['r', 'c', 'm', 'cm'])) + ds = list(doc_req.docs.traverse_flat(['r', 'c', 'm', 'cm'])) assert ( len(ds) == num_docs @@ -139,54 +126,43 @@ def test_traverse_flatten_root_match_chunk(doc_req): ) -def test_batching_flatten_traverse(doc_req): - @batching(batch_size=num_docs, slice_on=0) - def foo(docs): - print(f'batch_size:{len(docs)}') - assert len(docs) == num_docs - - ds = list(doc_req.docs.traverse_flatten(['r', 'c', 'm', 'cm'])) - # under this contruction, num_doc is the common denominator - foo(ds) - - def test_traverse_flattened_per_path_embedding(doc_req): - flattened_results = list(doc_req.docs.traverse_flattened_per_path(['r', 'c'])) - ds = flattened_results[0].all_embeddings - assert ds[0].shape == (num_docs, 10) + flattened_results = list(doc_req.docs.traverse_flat_per_path(['r', 'c'])) + ds = np.stack(flattened_results[0].get_attributes('embedding')) + assert ds.shape == (num_docs, 10) - ds = flattened_results[1].all_embeddings - assert ds[0].shape == (num_docs * num_chunks_per_doc, 10) + ds = np.stack(flattened_results[1].get_attributes('embedding')) + assert ds.shape == (num_docs * num_chunks_per_doc, 10) def test_traverse_flattened_per_path_root(doc_req): - ds = list(doc_req.docs.traverse_flattened_per_path(['r'])) + ds = list(doc_req.docs.traverse_flat_per_path(['r'])) assert len(ds[0]) == num_docs def test_traverse_flattened_per_path_chunk(doc_req): - ds = list(doc_req.docs.traverse_flattened_per_path(['c'])) + ds = list(doc_req.docs.traverse_flat_per_path(['c'])) assert len(ds[0]) == num_docs * num_chunks_per_doc def test_traverse_flattened_per_path_root_plus_chunk(doc_req): - ds = list(doc_req.docs.traverse_flattened_per_path(['c', 'r'])) + ds = list(doc_req.docs.traverse_flat_per_path(['c', 'r'])) assert len(ds[0]) == num_docs * num_chunks_per_doc assert len(ds[1]) == num_docs def test_traverse_flattened_per_path_match(doc_req): - ds = list(doc_req.docs.traverse_flattened_per_path(['m'])) + ds = list(doc_req.docs.traverse_flat_per_path(['m'])) assert len(ds[0]) == num_docs * num_matches_per_doc def test_traverse_flattened_per_path_match_chunk(doc_req): - ds = list(doc_req.docs.traverse_flattened_per_path(['cm'])) + ds = list(doc_req.docs.traverse_flat_per_path(['cm'])) assert len(ds[0]) == num_docs * num_chunks_per_doc * num_matches_per_chunk def test_traverse_flattened_per_path_root_match_chunk(doc_req): - ds = list(doc_req.docs.traverse_flattened_per_path(['r', 'c', 'm', 'cm'])) + ds = list(doc_req.docs.traverse_flat_per_path(['r', 'c', 'm', 'cm'])) assert len(ds[0]) == num_docs assert len(ds[1]) == num_chunks_per_doc * num_docs assert len(ds[2]) == num_matches_per_doc * num_docs @@ -194,7 +170,7 @@ def test_traverse_flattened_per_path_root_match_chunk(doc_req): def test_docuset_traverse_over_iterator_HACKY(): - # HACKY USAGE DO NOT RECOMMEND: can also traverse over "runtime"-DocumentArray + # HACKY USAGE DO NOT RECOMMEND: can also traverse over "runtime"-documentarray ds = DocumentArray(random_docs(num_docs, num_chunks_per_doc)).traverse(['r']) assert len(list(list(ds)[0])) == num_docs @@ -213,3 +189,16 @@ def test_docuset_traverse_over_iterator_CAVEAT(): ds = DocumentArray(random_docs(num_docs, num_chunks_per_doc)).traverse(['c', 'r']) assert len(list(ds)) == num_docs + 1 + + +def test_doc_iter_method(): + ds = list(random_docs(10)) + + for d in DocumentArray(ds): + assert d.text == 'hello world' + + for d in DocumentArray(ds).traverse_flat(['c', 'r']): + d.text = 'modified' + + for d in DocumentArray(ds): + assert d.text == 'modified' diff --git a/tests/unit/types/document/test_multimodal_document.py b/tests/unit/types/document/test_multimodal_document.py index bb346de1e6fb6..67d23e8d32648 100644 --- a/tests/unit/types/document/test_multimodal_document.py +++ b/tests/unit/types/document/test_multimodal_document.py @@ -1,5 +1,6 @@ import numpy as np import pytest + from jina import Document from jina.excepts import BadDocType from jina.types.document.multimodal import MultimodalDocument @@ -82,8 +83,8 @@ def multimodal_document(chunk_1, chunk_2): md.tags['id'] = 1 md.text = b'hello world' md.embedding = np.random.random([10 + np.random.randint(0, 1)]) - md.chunks.add(chunk_1) # attach a document with embedding and without content - md.chunks.add(chunk_2) # attach a document with content and without embedding + md.chunks.append(chunk_1) # attach a document with embedding and without content + md.chunks.append(chunk_2) # attach a document with content and without embedding yield md @@ -122,7 +123,7 @@ def test_multimodal_document_fail_bad_doctype(visual_embedding): def test_multimodal_document_fail_length_mismatch(multimodal_document, chunk_3): # the multimodal document has 3 chunks, while 2 types of modalities. - multimodal_document.chunks.add(chunk_3) + multimodal_document.chunks.append(chunk_3) assert not multimodal_document.is_valid diff --git a/tests/unit/types/message/test_common.py b/tests/unit/types/message/test_common.py index e46a7ea4bb1dc..92f6bf9ee5164 100644 --- a/tests/unit/types/message/test_common.py +++ b/tests/unit/types/message/test_common.py @@ -1,13 +1,10 @@ import pytest -from jina.clients.request import request_generator -from jina.enums import RequestType from jina.proto import jina_pb2 from jina.types.message.common import ControlMessage -from jina.types.request.control import ControlRequest -@pytest.mark.parametrize('command', ['IDLE', 'CANCEL', 'TERMINATE', 'STATUS', 'RELOAD']) +@pytest.mark.parametrize('command', ['IDLE', 'CANCEL', 'TERMINATE', 'STATUS']) def test_control_msg(command): msg = ControlMessage(command) assert msg.proto.envelope.request_type == 'ControlRequest' @@ -20,12 +17,3 @@ def test_control_msg(command): def test_bad_control_command(): with pytest.raises(ValueError): ControlMessage('hello world') - - -def test_control_reload(): - for r in request_generator( - None, mode=RequestType.CONTROL, command='RELOAD', targets=['pod0'] - ): - assert isinstance(r, ControlRequest) - assert r.command == 'RELOAD' - assert r.targets == ['pod0'] diff --git a/tests/unit/types/message/test_compression.py b/tests/unit/types/message/test_compression.py index 21dde5622aeba..370875788d656 100644 --- a/tests/unit/types/message/test_compression.py +++ b/tests/unit/types/message/test_compression.py @@ -1,9 +1,9 @@ import pytest -from jina import Message from jina.clients.request import request_generator from jina.enums import CompressAlgo from jina.logging.profile import TimeContext +from jina.types.message import Message from tests import random_docs @@ -22,7 +22,7 @@ def test_compression(compress_algo, low_bytes, high_ratio): ) with TimeContext(f'no compress'): - for r in request_generator(docs): + for r in request_generator('/', docs): m = Message(None, r, compress=CompressAlgo.NONE, **kwargs) m.dump() no_comp_sizes.append(m.size) @@ -34,7 +34,7 @@ def test_compression(compress_algo, low_bytes, high_ratio): compress_min_ratio=10 if high_ratio else 1, ) with TimeContext(f'compressing with {str(compress_algo)}') as tc: - for r in request_generator(docs): + for r in request_generator('/', docs): m = Message(None, r, compress=compress_algo, **kwargs) m.dump() sizes.append(m.size) diff --git a/tests/unit/types/message/test_message.py b/tests/unit/types/message/test_message.py index a59db5f8e9495..086934db530d7 100644 --- a/tests/unit/types/message/test_message.py +++ b/tests/unit/types/message/test_message.py @@ -1,14 +1,13 @@ import sys -from typing import Sequence import pytest -from jina import Request, QueryLang, Document +from jina import Document from jina.clients.request import request_generator from jina.proto import jina_pb2 from jina.proto.jina_pb2 import EnvelopeProto from jina.types.message import Message -from jina.types.request import _trigger_fields +from jina.types.request import _trigger_fields, Request from tests import random_docs @@ -19,7 +18,7 @@ def test_lazy_access(field): reqs = ( Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ) for r in reqs: assert not r.is_used @@ -34,7 +33,7 @@ def test_lazy_access(field): def test_multiple_access(): reqs = [ Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ] for r in reqs: assert not r.is_used @@ -43,14 +42,14 @@ def test_multiple_access(): for r in reqs: assert not r.is_used - assert r.index + assert r.data assert r.is_used def test_lazy_nest_access(): reqs = ( Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ) for r in reqs: assert not r.is_used @@ -58,13 +57,13 @@ def test_lazy_nest_access(): r.docs[0].id = '1' * 16 # now it is read assert r.is_used - assert r.index.docs[0].id == '1' * 16 + assert r.data.docs[0].id == '1' * 16 def test_lazy_change_message_type(): reqs = ( Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ) for r in reqs: assert not r.is_used @@ -72,17 +71,17 @@ def test_lazy_change_message_type(): r.control.command = jina_pb2.RequestProto.ControlRequestProto.IDLE # now it is read assert r.is_used - assert len(r.index.docs) == 0 + assert len(r.data.docs) == 0 def test_lazy_append_access(): reqs = ( Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ) for r in reqs: assert not r.is_used - r.request_type = 'index' + r.request_type = 'data' # write access r.train r.docs.append(Document()) # now it is read @@ -92,12 +91,12 @@ def test_lazy_append_access(): def test_lazy_clear_access(): reqs = ( Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ) for r in reqs: assert not r.is_used # write access r.train - r.ClearField('index') + r.ClearField('data') # now it is read assert r.is_used @@ -105,12 +104,12 @@ def test_lazy_clear_access(): def test_lazy_nested_clear_access(): reqs = ( Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ) for r in reqs: assert not r.is_used # write access r.train - r.index.ClearField('docs') + r.data.ClearField('docs') # now it is read assert r.is_used @@ -123,12 +122,12 @@ def test_lazy_msg_access(): 'test', '123', request_id='123', - request_type='IndexRequest', + request_type='DataRequest', ) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ] for r in reqs: - assert not r.request.is_used + r.request.is_used = False assert r.envelope assert len(r.dump()) == 3 assert not r.request.is_used @@ -141,13 +140,15 @@ def test_lazy_msg_access(): for r in reqs: assert not r.request.is_used - assert r.request.index.docs + assert r.request.data.docs assert len(r.dump()) == 3 assert r.request.is_used def test_message_size(): - reqs = [Message(None, r, 'test', '123') for r in request_generator(random_docs(10))] + reqs = [ + Message(None, r, 'test', '123') for r in request_generator('/', random_docs(10)) + ] for r in reqs: assert r.size == 0 assert sys.getsizeof(r.envelope.SerializeToString()) @@ -161,57 +162,16 @@ def test_message_size(): def test_lazy_request_fields(): reqs = ( Request(r.SerializeToString(), EnvelopeProto()) - for r in request_generator(random_docs(10)) + for r in request_generator('/', random_docs(10)) ) for r in reqs: assert list(r.DESCRIPTOR.fields_by_name.keys()) -def test_request_extend_queryset(): - q1 = {'name': 'SliceQL', 'parameters': {'start': 3, 'end': 4}} - q2 = QueryLang( - {'name': 'SliceQL', 'parameters': {'start': 3, 'end': 4}, 'priority': 1} - ) - q3 = jina_pb2.QueryLangProto() - q3.name = 'SliceQL' - q3.parameters['start'] = 3 - q3.parameters['end'] = 4 - q3.priority = 2 - r = Request() - r.queryset.extend([q1, q2, q3]) - assert isinstance(r.queryset, Sequence) - assert len(r.queryset) == 3 - for idx, q in enumerate(r.queryset): - assert q.priority == idx - assert q.parameters['start'] == 3 - assert q.parameters['end'] == 4 - - # q1 and q2 refer to the same - assert len({id(q) for q in r.queryset}) == 2 - - r2 = Request() - r2.queryset.extend(r.queryset) - assert len({id(q) for q in r2.queryset}) == 2 - - r = Request() - r.queryset.append(q1) - r.queryset.append(q2) - r.queryset.append(q3) - for idx, q in enumerate(r.queryset): - assert q.priority == idx - assert q.parameters['start'] == 3 - assert q.parameters['end'] == 4 - - with pytest.raises(TypeError): - r.queryset.extend(1) - - @pytest.mark.parametrize( 'typ,pb_typ', [ - ('train', jina_pb2.RequestProto.TrainRequestProto), - ('index', jina_pb2.RequestProto.IndexRequestProto), - ('search', jina_pb2.RequestProto.SearchRequestProto), + ('data', jina_pb2.RequestProto.DataRequestProto), ('control', jina_pb2.RequestProto.ControlRequestProto), ], ) @@ -229,8 +189,7 @@ def test_empty_request_type(typ, pb_typ): @pytest.mark.parametrize( 'typ,pb_typ', [ - ('index', jina_pb2.RequestProto.IndexRequestProto), - ('search', jina_pb2.RequestProto.SearchRequestProto), + ('data', jina_pb2.RequestProto.DataRequestProto), ], ) def test_add_doc_to_type(typ, pb_typ): diff --git a/tests/unit/types/querylang/__init__.py b/tests/unit/types/querylang/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/types/querylang/queryset/__init__.py b/tests/unit/types/querylang/queryset/__init__.py deleted file mode 100644 index e69de29bb2d1d..0000000000000 diff --git a/tests/unit/types/querylang/queryset/test_dunderkeys.py b/tests/unit/types/querylang/queryset/test_dunderkeys.py deleted file mode 100644 index 3dec7be0ffd27..0000000000000 --- a/tests/unit/types/querylang/queryset/test_dunderkeys.py +++ /dev/null @@ -1,80 +0,0 @@ -import pytest - -from jina import Document -from jina.types.querylang.queryset.dunderkey import ( - dunderkey, - dunder_init, - dunder_get, - dunder_partition, - undunder_keys, - dunder_truncate, -) - - -def test_dunderkey(): - assert dunderkey('a', 'b', 'c') == 'a__b__c' - - -def test_dunder_init(): - assert dunder_init('a__b__c') == 'a__b' - - -def test_dunder_get(): - assert dunder_get({'a': {'b': 5}}, 'a__b') == 5 - assert dunder_get({'a': {'b': 8, 'c': {'d': 8}}}, 'a__c__d') == 8 - assert dunder_get([1, 2, 3, [4, 5, [6]]], '3__1') == 5 - - class B: - c = 5 - - class A: - b = B - - assert dunder_get(A, 'b__c') == 5 - - with Document() as d: - d.tags['a'] = 'hello' - assert dunder_get(d, 'tags__a') == 'hello' - - # Error on invalid key - - assert dunder_get({'a': {'b': 5}}, 'a__c') is None - # Error if key is too nested - with pytest.raises(Exception): - dunder_get({'a': {'b': 5}, 'c': 8}, 'a__b__c') - # Error using str keys on list - with pytest.raises(Exception): - dunder_get([[1, 2], [3, 4]], 'a') - - -def test_dunder_partition(): - assert dunder_partition('a') == ('a', None) - assert dunder_partition('a__b') == ('a', 'b') - assert dunder_partition('a__b__c') == ('a__b', 'c') - - -def test_undunder_keys(): - assert undunder_keys({'a__b': 5, 'a__c': 6, 'x': 7}) == { - 'a': {'b': 5, 'c': 6}, - 'x': 7, - } - assert undunder_keys({'a__b__c__d': 5}) == {'a': {'b': {'c': {'d': 5}}}} - - # Error when value should be both dict and int - with pytest.raises(Exception): - undunder_keys({'a__b__c': 5, 'a__b': 4}) - with pytest.raises(Exception): - undunder_keys({'a__b': 5, 'a__b__c': 4}) - - -def test_dunder_truncate(): - """ - test with unique keys - test with nonunique keys - test with nonunique keys that are nested lvl 3 - """ - assert dunder_truncate({'a__b': 5, 'a__c': 6}) == {'b': 5, 'c': 6} - assert dunder_truncate({'a__b': 5, 'c__b': 6}) == {'a__b': 5, 'c__b': 6} - - # Does not partially truncate keys - assert dunder_truncate({'a__b__c': 5, 'a__d__c': 6}) == {'a__b__c': 5, 'a__d__c': 6} diff --git a/tests/unit/types/querylang/queryset/test_lookup.py b/tests/unit/types/querylang/queryset/test_lookup.py deleted file mode 100644 index ba85cb2616f5d..0000000000000 --- a/tests/unit/types/querylang/queryset/test_lookup.py +++ /dev/null @@ -1,243 +0,0 @@ -import pytest - -from jina.types.querylang.queryset.lookup import LookupNode, LookupLeaf, Q, QuerySet -from jina.types.document import Document -from tests import random_docs - - -class MockId: - def __init__(self, identity): - self.id = identity - - -class MockStr: - def __init__(self, string): - self.str = string - - -class MockIter: - def __init__(self, iterable): - self.iter = iterable - - -@pytest.fixture(scope='function') -def docs(): - return random_docs(num_docs=10) - - -def test_lookup_leaf_exact(): - leaf = LookupLeaf(id__exact=1) - mock1 = MockId(1) - assert leaf.evaluate(mock1) - mock2 = MockId(2) - assert not leaf.evaluate(mock2) - - -def test_lookup_leaf_exact_document_tags(): - with Document() as doc: - doc.tags['label'] = 'jina' - leaf = LookupLeaf(tags__label='jina') - assert leaf.evaluate(doc) - leaf = LookupLeaf(tags__label='not_jina') - assert not leaf.evaluate(doc) - - -def test_lookup_leaf_exact_document_tags_complex(): - with Document() as doc: - doc.tags['key1'] = {'key2': 'jina'} - leaf = LookupLeaf(tags__key1__key2='jina') - assert leaf.evaluate(doc) - leaf = LookupLeaf(tags__key1__key2='not_jina') - assert not leaf.evaluate(doc) - - -def test_lookup_leaf_neq(): - leaf = LookupLeaf(id__neq=1) - mock1 = MockId(1) - assert not leaf.evaluate(mock1) - mock2 = MockId(2) - assert leaf.evaluate(mock2) - - -def test_lookup_leaf_gt(): - leaf = LookupLeaf(id__gt=1) - mock0 = MockId(0) - assert not leaf.evaluate(mock0) - mock1 = MockId(1) - assert not leaf.evaluate(mock1) - mock2 = MockId(2) - assert leaf.evaluate(mock2) - - -def test_lookup_leaf_gte(): - leaf = LookupLeaf(id__gte=1) - mock0 = MockId(0) - assert not leaf.evaluate(mock0) - mock1 = MockId(1) - assert leaf.evaluate(mock1) - mock2 = MockId(2) - assert leaf.evaluate(mock2) - - -def test_lookup_leaf_lt(): - leaf = LookupLeaf(id__lt=1) - mock0 = MockId(0) - assert leaf.evaluate(mock0) - mock1 = MockId(1) - assert not leaf.evaluate(mock1) - mock2 = MockId(2) - assert not leaf.evaluate(mock2) - - -def test_lookup_leaf_lte(): - leaf = LookupLeaf(id__lte=1) - mock0 = MockId(0) - assert leaf.evaluate(mock0) - mock1 = MockId(1) - assert leaf.evaluate(mock1) - mock2 = MockId(2) - assert not leaf.evaluate(mock2) - - -def test_lookup_leaf_contains(): - leaf = LookupLeaf(str__contains='jina') - mock0 = MockStr('hey jina how are you') - assert leaf.evaluate(mock0) - mock1 = MockStr('not here') - assert not leaf.evaluate(mock1) - mock2 = MockStr('hey jInA how are you') - assert not leaf.evaluate(mock2) - - -def test_lookup_leaf_icontains(): - leaf = LookupLeaf(str__icontains='jina') - mock0 = MockStr('hey jInA how are you') - assert leaf.evaluate(mock0) - mock1 = MockStr('not here') - assert not leaf.evaluate(mock1) - - -def test_lookup_leaf_startswith(): - leaf = LookupLeaf(str__startswith='jina') - mock0 = MockStr('jina is the neural search solution') - assert leaf.evaluate(mock0) - mock1 = MockStr('hey, jina is the neural search solution') - assert not leaf.evaluate(mock1) - mock2 = MockStr('JiNa is the neural search solution') - assert not leaf.evaluate(mock2) - - -def test_lookup_leaf_istartswith(): - leaf = LookupLeaf(str__istartswith='jina') - mock0 = MockStr('jina is the neural search solution') - assert leaf.evaluate(mock0) - mock1 = MockStr('hey, jina is the neural search solution') - assert not leaf.evaluate(mock1) - mock2 = MockStr('JiNa is the neural search solution') - assert leaf.evaluate(mock2) - - -def test_lookup_leaf_endswith(): - leaf = LookupLeaf(str__endswith='jina') - mock0 = MockStr('how is jina') - assert leaf.evaluate(mock0) - mock1 = MockStr('hey, jina is the neural search solution') - assert not leaf.evaluate(mock1) - mock2 = MockStr('how is JiNa') - assert not leaf.evaluate(mock2) - - -def test_lookup_leaf_iendswith(): - leaf = LookupLeaf(str__iendswith='jina') - mock0 = MockStr('how is jina') - assert leaf.evaluate(mock0) - mock1 = MockStr('hey, jina is the neural search solution') - assert not leaf.evaluate(mock1) - mock2 = MockStr('how is JiNa') - assert leaf.evaluate(mock2) - - -def test_lookup_leaf_regex(): - leaf = LookupLeaf(str__regex='j*na') - mock0 = MockStr('hey, juna is good') - assert leaf.evaluate(mock0) - mock1 = MockStr('hey, Oinja is the neural search solution') - assert not leaf.evaluate(mock1) - mock2 = MockStr('how is JiNa') - assert not leaf.evaluate(mock2) - - -def test_lookup_leaf_in(): - leaf = LookupLeaf(id__in=[0, 1, 2, 3]) - mock0 = MockId(3) - assert leaf.evaluate(mock0) - mock1 = MockId(4) - assert not leaf.evaluate(mock1) - - -def test_lookup_leaf_None(): - leaf = LookupLeaf(id=3) - mock0 = MockId(3) - assert leaf.evaluate(mock0) - mock1 = MockId(4) - assert not leaf.evaluate(mock1) - - -def test_docs_filter(docs): - filtered_docs = QuerySet(docs).filter(tags__id__lt=5, tags__id__gt=3) - filtered_docs = list(filtered_docs) - assert len(filtered_docs) == 1 - for d in filtered_docs: - assert 3 < d.tags['id'] < 5 - - -def test_docs_filter_equal(docs): - filtered_docs = QuerySet(docs).filter(tags__id=4) - filtered_docs = list(filtered_docs) - assert len(filtered_docs) == 1 - for d in filtered_docs: - assert int(d.tags['id']) == 4 - assert len(d.chunks) == 5 - - -def test_nested_chunks_filter(docs): - filtered_docs = QuerySet(docs).filter( - Q(chunks__filter=Q(tags__id__lt=35, tags__id__gt=33)) - ) - filtered_docs = list(filtered_docs) - assert len(filtered_docs) == 1 - for d in filtered_docs: - assert len(d.chunks) == 5 - - -def test_lookup_node_in(): - node = LookupNode() - leaf1 = LookupLeaf(id__in=[0, 1]) - leaf2 = LookupLeaf(id__in=[1, 2]) - node.add_child(leaf1) - node.add_child(leaf2) - assert len(node.children) == 2 - - mock0 = MockId(0) - mock1 = MockId(1) - mock2 = MockId(2) - assert node.op == 'and' - assert not node.evaluate(mock0) - assert node.evaluate(mock1) - assert not node.evaluate(mock2) - - assert not node.negate - assert (~node).evaluate(mock0) - assert not (~node).evaluate(mock1) - assert (~node).evaluate(mock2) - - node.op = 'or' - assert node.op == 'or' - assert node.evaluate(mock0) - assert node.evaluate(mock1) - assert node.evaluate(mock2) - - assert (~node).negate - assert (~node).evaluate(mock0) - assert not (~node).evaluate(mock1) - assert (~node).evaluate(mock2) diff --git a/tests/unit/types/querylang/test_querylang.py b/tests/unit/types/querylang/test_querylang.py deleted file mode 100644 index 1bbd148cedf06..0000000000000 --- a/tests/unit/types/querylang/test_querylang.py +++ /dev/null @@ -1,58 +0,0 @@ -import pytest -from google.protobuf.json_format import MessageToDict, MessageToJson - -from jina.types.querylang import QueryLang - - -def test_ql_constructors_from_driver_info(): - q = QueryLang( - {'name': 'SliceQL', 'parameters': {'start': 3, 'end': 5}, 'priority': 999} - ) - qb = q.proto - assert q.name == 'SliceQL' - assert q.parameters['start'] == 3 - assert q.parameters['end'] == 5 - assert q.priority == 999 - - assert qb.name == 'SliceQL' - assert qb.parameters['start'] == 3 - assert qb.parameters['end'] == 5 - assert qb.priority == 999 - - -@pytest.mark.parametrize( - 'source', - [ - lambda x: x.SerializeToString(), - lambda x: MessageToDict(x), - lambda x: MessageToJson(x), - lambda x: x, - ], -) -def test_ql_constructors_from_proto(source): - q = QueryLang( - {'name': 'SliceQL', 'parameters': {'start': 3, 'end': 5}, 'priority': 999} - ).proto - - qlr = QueryLang(source(q)) - assert qlr.name == 'SliceQL' - assert qlr.parameters['start'] == 3 - assert qlr.parameters['end'] == 5 - assert qlr.priority == 999 - - -def test_ql_priority(): - qs = QueryLang( - {'name': 'SliceQL', 'parameters': {'start': 1, 'end': 4}, 'priority': 1} - ) - assert qs.priority == 1 - qs._pb_body.priority = -1 - assert qs._pb_body.priority == -1 - assert qs.priority == -1 - - qs.priority = -2 - assert qs._pb_body.priority == -2 - assert qs.priority == -2 - - qs2 = QueryLang({'name': 'SliceQL', 'parameters': {'start': 1, 'end': 4}}) - assert qs2.priority == 0 diff --git a/tests/unit/types/request/test_common.py b/tests/unit/types/request/test_common.py deleted file mode 100644 index 91421d22de8db..0000000000000 --- a/tests/unit/types/request/test_common.py +++ /dev/null @@ -1,23 +0,0 @@ -import pytest - -from jina.types.request.common import ( - DryRunRequest, - IndexDryRunRequest, - TrainDryRunRequest, - ControlDryRunRequest, - SearchDryRunRequest, -) - - -@pytest.mark.parametrize( - 'ReqCls', - [ - DryRunRequest, - IndexDryRunRequest, - TrainDryRunRequest, - ControlDryRunRequest, - SearchDryRunRequest, - ], -) -def test_init(ReqCls): - assert ReqCls() diff --git a/tests/unit/types/request/test_request.py b/tests/unit/types/request/test_request.py index d52449d44caca..62be3b8bea0c6 100644 --- a/tests/unit/types/request/test_request.py +++ b/tests/unit/types/request/test_request.py @@ -1,20 +1,18 @@ import pytest from google.protobuf.json_format import MessageToDict, MessageToJson -from jina.enums import RequestType from jina.excepts import BadRequestType from jina.helper import random_identity from jina.proto import jina_pb2 -from jina.types.request import Request from jina.types.arrays.document import DocumentArray -from jina.types.arrays.querylang import QueryLangArray +from jina.types.request import Request @pytest.fixture(scope='function') def req(): r = jina_pb2.RequestProto() r.request_id = random_identity() - r.index.docs.add() + r.data.docs.add() return r @@ -31,23 +29,18 @@ def test_init_fail(): Request(request=5) -@pytest.mark.parametrize('req_type', ['index', 'search', 'train']) -def test_docs(req, req_type): +def test_docs(req): request = Request(request=req, copy=False) - request.request_type = req_type + request.request_type = 'data' docs = request.docs assert request.is_used assert isinstance(docs, DocumentArray) - if req_type == 'index': - assert len(docs) == 1 - else: - assert len(docs) == 0 + assert len(docs) == 1 -@pytest.mark.parametrize('req_type', ['index', 'search', 'train']) -def test_groundtruth(req, req_type): +def test_groundtruth(req): request = Request(request=req, copy=False) - request.request_type = req_type + request.request_type = 'data' groundtruths = request.groundtruths assert request.is_used assert isinstance(groundtruths, DocumentArray) @@ -56,8 +49,8 @@ def test_groundtruth(req, req_type): def test_request_type_set_get(req): request = Request(request=req, copy=False) - request.request_type = 'search' - assert request.request_type == 'SearchRequestProto' + request.request_type = 'data' + assert request.request_type == 'DataRequestProto' def test_request_type_set_get_fail(req): @@ -66,13 +59,6 @@ def test_request_type_set_get_fail(req): request.request_type = 'random' -def test_queryset(req): - request = Request(request=req, copy=False) - queryset = request.queryset - assert request.is_used - assert isinstance(queryset, QueryLangArray) - - def test_command(req): request = Request(request=req, copy=False) request.request_type = 'control' @@ -98,8 +84,8 @@ def test_as_json_str(req): assert isinstance(request.json(), str) -def test_delete_request(): - req = Request() - req.request_type = str(RequestType.DELETE) - req.ids.extend(['123', '456']) - assert req.dict()['delete']['ids'] == ['123', '456'] +def test_access_header(req): + request = Request(request=req) + assert not request.is_used + request.header + assert request.is_used diff --git a/tests/unit/types/test_repr_str.py b/tests/unit/types/test_repr_str.py index d4968a53e2e80..d56e0d3b57f6e 100644 --- a/tests/unit/types/test_repr_str.py +++ b/tests/unit/types/test_repr_str.py @@ -1,10 +1,12 @@ import numpy as np import pytest -from jina import Document, Request, QueryLang, NdArray -from jina.types.score import NamedScore -from jina.types.arrays import ChunkArray +from jina import Document +from jina.types.arrays.chunk import ChunkArray from jina.types.arrays.match import MatchArray +from jina.types.ndarray.generic import NdArray +from jina.types.request import Request +from jina.types.score import NamedScore @pytest.mark.parametrize( @@ -12,7 +14,6 @@ [ Document(), Request(), - QueryLang(), NamedScore(), NdArray(), MatchArray([Document()], Document()), @@ -28,16 +29,6 @@ def test_builtin_str_repr_no_content(obj): 'obj', [ Document(content='123', chunks=[Document(content='abc')]), - QueryLang( - { - 'name': 'FilterQL', - 'priority': 1, - 'parameters': { - 'lookups': {'tags__label': 'label2'}, - 'traversal_paths': ['r'], - }, - } - ), NamedScore( op_name='operation', value=10.0, diff --git a/tests/unit/yaml/dummy_encode.py b/tests/unit/yaml/dummy_encode.py index 7a13beef98a60..aac1d5ce7f2ce 100644 --- a/tests/unit/yaml/dummy_encode.py +++ b/tests/unit/yaml/dummy_encode.py @@ -1,5 +1,5 @@ -from jina.executors.encoders import BaseEncoder +from jina import Executor -class DummyEncoder(BaseEncoder): +class DummyEncoder(Executor): pass diff --git a/tests/unit/yaml/dummy_encode_driver.py b/tests/unit/yaml/dummy_encode_driver.py deleted file mode 100644 index 78bd4057cfda6..0000000000000 --- a/tests/unit/yaml/dummy_encode_driver.py +++ /dev/null @@ -1,14 +0,0 @@ -from jina import DocumentArray -from jina.drivers import FlatRecursiveMixin -from jina.drivers.encode import BaseEncodeDriver - - -class DummyEncodeDriver(FlatRecursiveMixin, BaseEncodeDriver): - def _apply_all( - self, - docs: 'DocumentArray', - *args, - **kwargs, - ) -> None: - for doc in docs: - doc.text = 'hello from DummyEncodeDriver' diff --git a/tests/unit/yaml/examples/faces/flow-index.yml b/tests/unit/yaml/examples/faces/flow-index.yml index 79607036ddf5d..15b8f82c13f62 100644 --- a/tests/unit/yaml/examples/faces/flow-index.yml +++ b/tests/unit/yaml/examples/faces/flow-index.yml @@ -21,6 +21,5 @@ pods: - name: doc_indexer needs: loader - name: join_all - uses: _pass needs: [doc_indexer, chunk_indexer] read_only: true diff --git a/tests/unit/yaml/examples/faces/flow-query.yml b/tests/unit/yaml/examples/faces/flow-query.yml index b75bd23fbbf42..fe3011bf97412 100644 --- a/tests/unit/yaml/examples/faces/flow-query.yml +++ b/tests/unit/yaml/examples/faces/flow-query.yml @@ -21,6 +21,5 @@ pods: polling: all parallel: 1 - name: ranker - uses: _pass - name: doc_indexer parallel: 1 diff --git a/tests/unit/yaml/examples/faiss/flow-index.yml b/tests/unit/yaml/examples/faiss/flow-index.yml index b746755856cfb..7c64b57e93fae 100644 --- a/tests/unit/yaml/examples/faiss/flow-index.yml +++ b/tests/unit/yaml/examples/faiss/flow-index.yml @@ -12,6 +12,5 @@ pods: - name: doc_indexer needs: crafter - name: join_all - uses: _pass needs: [doc_indexer, faiss_indexer] read_only: true diff --git a/tests/unit/yaml/examples/faiss/flow-query.yml b/tests/unit/yaml/examples/faiss/flow-query.yml index 5245cf81cec01..a555654825971 100644 --- a/tests/unit/yaml/examples/faiss/flow-query.yml +++ b/tests/unit/yaml/examples/faiss/flow-query.yml @@ -12,6 +12,5 @@ pods: timeout_ready: 10000 volumes: './workspace' - name: ranker - uses: _pass - name: doc_indexer parallel: 1 diff --git a/tests/unit/yaml/test-compound-indexer-components-with-workspace.yml b/tests/unit/yaml/test-compound-indexer-components-with-workspace.yml deleted file mode 100644 index f381db70876a7..0000000000000 --- a/tests/unit/yaml/test-compound-indexer-components-with-workspace.yml +++ /dev/null @@ -1,17 +0,0 @@ -!CompoundExecutor -components: - - !BinaryPbIndexer - with: - index_filename: tmp1 - metas: - name: test1 - workspace: $JINA_TEST_WORKSPACE_COMP1 - - !BinaryPbIndexer - with: - index_filename: tmp2 - metas: - name: test2 - workspace: $JINA_TEST_WORKSPACE_COMP2 -metas: - name: compound1 - workspace: $JINA_TEST_WORKSPACE \ No newline at end of file diff --git a/tests/unit/yaml/test-compound-indexer.yml b/tests/unit/yaml/test-compound-indexer.yml deleted file mode 100644 index 32e9f73075b15..0000000000000 --- a/tests/unit/yaml/test-compound-indexer.yml +++ /dev/null @@ -1,15 +0,0 @@ -!CompoundExecutor -components: - - !BinaryPbIndexer - with: - index_filename: tmp1 - metas: - name: test1 - - !BinaryPbIndexer - with: - index_filename: tmp2 - metas: - name: test2 -metas: - name: compound1 - workspace: $JINA_TEST_WORKSPACE \ No newline at end of file diff --git a/tests/unit/yaml/test-compound-indexer2.yml b/tests/unit/yaml/test-compound-indexer2.yml deleted file mode 100644 index d011e87d42e94..0000000000000 --- a/tests/unit/yaml/test-compound-indexer2.yml +++ /dev/null @@ -1,16 +0,0 @@ -!CompoundExecutor -components: - - !BinaryPbIndexer - with: - index_filename: metaproto - metas: - name: test_meta - - !NumpyIndexer - with: - metric: euclidean - index_filename: npidx - metas: - name: test_numpy -metas: - name: real-compound - workspace: $JINA_TEST_WORKSPACE \ No newline at end of file diff --git a/tests/unit/yaml/test-compound-workspace.yml b/tests/unit/yaml/test-compound-workspace.yml deleted file mode 100644 index d16070aecdfd2..0000000000000 --- a/tests/unit/yaml/test-compound-workspace.yml +++ /dev/null @@ -1,10 +0,0 @@ -!CompoundExecutor -components: - - !BaseExecutor - metas: - name: test1 - - !BaseExecutor - metas: - name: test2 -metas: - name: compound0 \ No newline at end of file diff --git a/tests/unit/yaml/test-docpb.yml b/tests/unit/yaml/test-docpb.yml deleted file mode 100644 index 41612c7dc48a8..0000000000000 --- a/tests/unit/yaml/test-docpb.yml +++ /dev/null @@ -1,6 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: test-doc-shard.gz -metas: - name: docpb2 - workspace: $TEST_DOCSHARD_WORKSPACE/test-docshard-tmp \ No newline at end of file diff --git a/tests/unit/yaml/test-driver.yml b/tests/unit/yaml/test-driver.yml deleted file mode 100644 index 5df6bb8275303..0000000000000 --- a/tests/unit/yaml/test-driver.yml +++ /dev/null @@ -1,6 +0,0 @@ -- !KVSearchDriver - with: - executor: blah - method: goto -- !ControlReqDriver {} -- !BaseDriver {} \ No newline at end of file diff --git a/tests/unit/yaml/test-encoder-env.yml b/tests/unit/yaml/test-encoder-env.yml index b7c16cce1e966..7d42d39a5a3fb 100644 --- a/tests/unit/yaml/test-encoder-env.yml +++ b/tests/unit/yaml/test-encoder-env.yml @@ -1,6 +1,4 @@ -!BaseEncoder +!BaseExecutor metas: name: ${{BE_TEST_NAME}} - batch_size: ${{BATCH_SIZE}} - pea_id: ${{pea_id}} - workspace: ${{this.name}}-${{this.batch_size}} \ No newline at end of file + workspace: ${{this.name}} \ No newline at end of file diff --git a/tests/unit/yaml/test-exec-with-driver.yml b/tests/unit/yaml/test-exec-with-driver.yml deleted file mode 100644 index 79a18ff8c5f9b..0000000000000 --- a/tests/unit/yaml/test-exec-with-driver.yml +++ /dev/null @@ -1,15 +0,0 @@ -!BaseEncoder -with: {} -metas: - name: blah - workspace: $JINA_TEST_EXEC_WITH_DRIVER -requests: - on: - [SearchRequest]: - - !EncodeDriver - with: - method: encode - - !ExcludeQL - with: - fields: - - embedding diff --git a/tests/unit/yaml/test-executor-with-custom-driver.yml b/tests/unit/yaml/test-executor-with-custom-driver.yml deleted file mode 100644 index 90ee5951a86bb..0000000000000 --- a/tests/unit/yaml/test-executor-with-custom-driver.yml +++ /dev/null @@ -1,13 +0,0 @@ -!DummyEncoder -metas: - py_modules: - - dummy_encode.py - - dummy_encode_driver.py # driver file -requests: - on: - IndexRequest: - - !DummyEncodeDriver - with: - traversal_paths: ['r'] - SearchRequest: - - !EncodeDriver {} diff --git a/tests/unit/yaml/test-flow-v1.yml b/tests/unit/yaml/test-flow-v1.yml index 6a4eec3539ce4..54f60ff7aaced 100644 --- a/tests/unit/yaml/test-flow-v1.yml +++ b/tests/unit/yaml/test-flow-v1.yml @@ -12,5 +12,4 @@ pods: parallel: 2 needs: chunk_seg - name: joiner - uses: _pass needs: [wqncode1, encode2] diff --git a/tests/unit/yaml/test-flow.yml b/tests/unit/yaml/test-flow.yml index 6a4eec3539ce4..54f60ff7aaced 100644 --- a/tests/unit/yaml/test-flow.yml +++ b/tests/unit/yaml/test-flow.yml @@ -12,5 +12,4 @@ pods: parallel: 2 needs: chunk_seg - name: joiner - uses: _pass needs: [wqncode1, encode2] diff --git a/tests/unit/yaml/test-if-encode1.yml b/tests/unit/yaml/test-if-encode1.yml deleted file mode 100644 index 0280ca839c825..0000000000000 --- a/tests/unit/yaml/test-if-encode1.yml +++ /dev/null @@ -1,6 +0,0 @@ -!Encode1 -requests: - on: - IndexRequest: - - !EncodeDriver - if: doc.mime_type.startswith('text') diff --git a/tests/unit/yaml/test-if-encode2.yml b/tests/unit/yaml/test-if-encode2.yml deleted file mode 100644 index d16248bc816e7..0000000000000 --- a/tests/unit/yaml/test-if-encode2.yml +++ /dev/null @@ -1,6 +0,0 @@ -!Encode2 -requests: - on: - IndexRequest: - - !EncodeDriver - if: doc.mime_type.startswith('image') diff --git a/tests/unit/yaml/test-ifdriver1.yml b/tests/unit/yaml/test-ifdriver1.yml deleted file mode 100644 index e99ce354d39e4..0000000000000 --- a/tests/unit/yaml/test-ifdriver1.yml +++ /dev/null @@ -1,6 +0,0 @@ -!BaseExecutor -requests: - on: - IndexRequest: - - !BaseDriver - if: '2 > 1' diff --git a/tests/unit/yaml/test-ifdriver2.yml b/tests/unit/yaml/test-ifdriver2.yml deleted file mode 100644 index 49232b0909b38..0000000000000 --- a/tests/unit/yaml/test-ifdriver2.yml +++ /dev/null @@ -1,5 +0,0 @@ -!BaseExecutor -requests: - on: - IndexRequest: - - !FilterDriver {} \ No newline at end of file diff --git a/tests/unit/yaml/test-index-remote.yml b/tests/unit/yaml/test-index-remote.yml deleted file mode 100644 index 89aa5265eb92e..0000000000000 --- a/tests/unit/yaml/test-index-remote.yml +++ /dev/null @@ -1,19 +0,0 @@ -!NumpyIndexer -with: - metric: euclidean - index_filename: tmp2 -metas: - name: test2 - workspace: $JINA_TEST_INDEX_REMOTE -requests: - on: - SearchRequest: - - !VectorSearchDriver - with: - method: query - IndexRequest: - - !VectorIndexDriver - with: - method: add - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/tests/unit/yaml/test-index.yml b/tests/unit/yaml/test-index.yml deleted file mode 100644 index a352999d6b0fb..0000000000000 --- a/tests/unit/yaml/test-index.yml +++ /dev/null @@ -1,19 +0,0 @@ -!NumpyIndexer -with: - metric: euclidean - index_filename: tmp2 -metas: - name: test2 - workspace: $JINA_TEST_INDEX -requests: - on: - SearchRequest: - - !VectorSearchDriver - with: - method: query - IndexRequest: - - !VectorIndexDriver - with: - method: add - ControlRequest: - - !ControlReqDriver {} \ No newline at end of file diff --git a/tests/unit/yaml/test-indexer-workspace.yml b/tests/unit/yaml/test-indexer-workspace.yml deleted file mode 100644 index 9f3bfcd767f94..0000000000000 --- a/tests/unit/yaml/test-indexer-workspace.yml +++ /dev/null @@ -1,6 +0,0 @@ -!NumpyIndexer -with: - index_filename: 'inner_indexer.gz' -metas: - name: ref_indexer - workspace: $JINA_TEST_WORKSPACE \ No newline at end of file diff --git a/tests/unit/yaml/test-joint-wrap.yml b/tests/unit/yaml/test-joint-wrap.yml deleted file mode 100644 index 72f3e7df34236..0000000000000 --- a/tests/unit/yaml/test-joint-wrap.yml +++ /dev/null @@ -1,21 +0,0 @@ -!CompoundIndexer -components: - - !AnnoyIndexer - with: - ref_indexer: - !NumpyIndexer - with: - metric: euclidean - index_filename: vec.gz - metas: - name: vecidx # a customized name - workspace: $TEST_WORKDIR - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx - workspace: $TEST_WORKDIR -metas: - name: chunk_compound_indexer - workspace: $TEST_WORKDIR diff --git a/tests/unit/yaml/test-joint.yml b/tests/unit/yaml/test-joint.yml deleted file mode 100644 index bb993daa9353f..0000000000000 --- a/tests/unit/yaml/test-joint.yml +++ /dev/null @@ -1,16 +0,0 @@ -!CompoundIndexer -components: - - !NumpyIndexer - with: - metric: euclidean - index_filename: vec.gz - metas: - name: vecidx # a customized name - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: chunkidx -metas: - name: chunk_compound_indexer - workspace: $JINA_TEST_JOINT diff --git a/tests/unit/yaml/test-kvindexer-workspace.yml b/tests/unit/yaml/test-kvindexer-workspace.yml deleted file mode 100644 index 735966a5bc5c3..0000000000000 --- a/tests/unit/yaml/test-kvindexer-workspace.yml +++ /dev/null @@ -1,6 +0,0 @@ -!BinaryPbIndexer -with: - index_filename: 'inner_indexer.gz' -metas: - name: kv_indexer - workspace: $JINA_TEST_WORKSPACE \ No newline at end of file diff --git a/tests/unit/yaml/test-multifield-encoder.yml b/tests/unit/yaml/test-multifield-encoder.yml deleted file mode 100644 index 54a8fffa4be4f..0000000000000 --- a/tests/unit/yaml/test-multifield-encoder.yml +++ /dev/null @@ -1,7 +0,0 @@ -!DummyOneHotTextEncoder -requests: - on: - [SearchRequest, IndexRequest]: - - !EncodeDriver - with: - filter_by: $FILTER_BY diff --git a/tests/unit/yaml/test-multifield-indexer-2.yml b/tests/unit/yaml/test-multifield-indexer-2.yml deleted file mode 100644 index d6fc80b41f75b..0000000000000 --- a/tests/unit/yaml/test-multifield-indexer-2.yml +++ /dev/null @@ -1,47 +0,0 @@ -!CompoundIndexer -metas: - workspace: $TMP_WORKSPACE_2 -components: - - !NumpyIndexer - with: - metric: euclidean - index_filename: vec.gz - metas: - name: vecidx # a customized name - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: kvidx # a customized name -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY_2 - - !PruneDriver - with: - pruned: - - embedding - - blob - - text - - !KVIndexDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY_2 - SearchRequest: - - !VectorSearchDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY_2 - - !PruneDriver - with: - pruned: - - embedding - - blob - - text - - !KVSearchDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY_2 diff --git a/tests/unit/yaml/test-multifield-indexer.yml b/tests/unit/yaml/test-multifield-indexer.yml deleted file mode 100644 index 8c4b147bc9945..0000000000000 --- a/tests/unit/yaml/test-multifield-indexer.yml +++ /dev/null @@ -1,47 +0,0 @@ -!CompoundIndexer -metas: - workspace: $TMP_WORKSPACE -components: - - !NumpyIndexer - with: - metric: euclidean - index_filename: vec.gz - metas: - name: vecidx # a customized name - - !BinaryPbIndexer - with: - index_filename: chunk.gz - metas: - name: kvidx # a customized name -requests: - on: - IndexRequest: - - !VectorIndexDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY - - !PruneDriver - with: - pruned: - - embedding - - blob - - text - - !KVIndexDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY - SearchRequest: - - !VectorSearchDriver - with: - executor: NumpyIndexer - filter_by: $FILTER_BY - - !PruneDriver - with: - pruned: - - embedding - - blob - - text - - !KVSearchDriver - with: - executor: BinaryPbIndexer - filter_by: $FILTER_BY diff --git a/tests/unit/yaml/test-query.yml b/tests/unit/yaml/test-query.yml deleted file mode 100644 index d021b154f7f64..0000000000000 --- a/tests/unit/yaml/test-query.yml +++ /dev/null @@ -1,7 +0,0 @@ -!NumpyIndexer -with: - metric: euclidean - index_filename: tmp2 -metas: - name: test2 - workspace: test-index-file \ No newline at end of file diff --git a/tests/unit/yaml/test-refindexer-workspace.yml b/tests/unit/yaml/test-refindexer-workspace.yml deleted file mode 100644 index c7ce584cc3775..0000000000000 --- a/tests/unit/yaml/test-refindexer-workspace.yml +++ /dev/null @@ -1,10 +0,0 @@ -!NumpyIndexer -with: - ref_indexer: - !NumpyIndexer - metas: - name: ref_indexer - workspace: $JINA_TEST_WORKSPACE -metas: - name: outer_indexer - workspace: $JINA_TEST_WORKSPACE diff --git a/tests/unit/yaml/test_compount_indexer2-ref-indexer.yml b/tests/unit/yaml/test_compount_indexer2-ref-indexer.yml deleted file mode 100644 index ff72d41bb5e78..0000000000000 --- a/tests/unit/yaml/test_compount_indexer2-ref-indexer.yml +++ /dev/null @@ -1,19 +0,0 @@ -!CompoundExecutor -components: - - !BinaryPbIndexer - with: - index_filename: metaproto - metas: - name: test_meta - - !NumpyIndexer - with: - ref_indexer: - !NumpyIndexer - metas: - name: test_numpy - workspace: $JINA_TEST_WORKSPACE - metas: - name: outer_numpy -metas: - name: real-compound - workspace: $JINA_TEST_WORKSPACE diff --git a/tests/unit/yaml/test_flow_visualization.yml b/tests/unit/yaml/test_flow_visualization.yml index d05e147debafa..6bf926692139b 100644 --- a/tests/unit/yaml/test_flow_visualization.yml +++ b/tests/unit/yaml/test_flow_visualization.yml @@ -20,6 +20,5 @@ pods: - name: text_indexer read_only: true - name: join_all - uses: _pass needs: [image_vector_indexer, image_kv_indexer, text_indexer] read_only: true \ No newline at end of file diff --git a/tests/unit/yaml/unarycrafter.yml b/tests/unit/yaml/unarycrafter.yml deleted file mode 100644 index e69de29bb2d1d..0000000000000