Releases: jina-ai/serve
🎉 v0.6.0
Jina v0.6.0
We are excited to release Jina 0.6.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include:
- Improve the memory footprint for the
Indexer
. - Add an example for building a cross-modal search system with Jina.
- Add support for indexing
.pdf
files.
Release 0.6.0
⬆️ Major Features and Improvements
Scalability
- Improve the memory footprint for the
Indexer
. Instead of using the in-memory index during the query mode, both theNumpyIndexer
and theBinaryPbIndexer
use the memory mapping to better support scaling out for large datasets. To further improve the memory footprint for the vector index,ZarrIndexer
based on Zarr has been added to Jina Hub. #950, #984.
Universal
- Add an example for building a cross-modal search system with Jina. #978
- Add support for indexing
.pdf
files.PdfExtractor
has been added to Jina Hub. #981
⚠️ Breaking Changes
For details of all breaking changes, please refer to #885
- Improve the way of traversing recursive document structure. #944, #933, #923, #893, #889,
- Rename
--yaml-path
to--uses
in Flow CLI #925, #922 - Rename
--uses-reducing
to--uses-after
and add--uses-before
. This change enables us to customize the executors' behaviors before sending them to and after receiving from all parallels/shards. #925
🐞 Bug Fixes and Other Changes
Flow
- Improve context management of Flow and Pod with ExitStack. #901,
- Improve shut-down logic for log server #935, #958
- Fix shut-down logic for Peas and Pods #907, #956
- Refactor de-/serialization logic #988, #991
Executors
- Add a meta variable
force_register
for executors in order to force Jina to use local version of executor. #883 - Fix a bug in reducing functions for encoders. #900
- Fix default behavior of
CompoundIndexer
#939 - Fix bug in overwriting metas using Python client. #980
Drivers
- Add
CollectMatches2DocRankDriver
for calculatingmatches
withgranularity=k-1
fromMatches
atgranularity=k
. #851 - Add
Matches2DocRankDriver
for calculating new scores ofmatches
from original scores #919 - Add
VectorFillDriver
for filling embeddings of Document 2 #909, #913 - Add support for using
tags
withQueryLangDrivers
#938 - Add support for traversing recursive Documents via explicit tree path definition. #983, #979, #994, #993
- Enable
BaseSegmenter
to changemime_type
. #981 - Add
NdArray2PngURI
andBlob2PngURI
for convert numpy arrays into data URI. #982
CLI
- Add
--test-uses
option forjina hub build
CLI for skipping failed-start peas when building Docker file. #902, #965 - Add
is_build_success
field for checking results ofjina hub build
. #903 - Add
--type app
option forjina hub new
CLI for creating a new Jina app. #917 - Add
--push
option forjina hub build
CLI for building and pushing local executors to Jina Hub. #937 - Improve
jina hub list
CLI. #985 - Improve speed of CLI autocompletion. #992
Tests
- Add more unit tests for reducing functions 1 #898
- Move dependencies for unit tests into
extra-requirements.txt
#906 - Add unit tests for sleeping executors #918
- Add more unit tests for checking Peas #921
- Add more unit tests for decorators of executors. #930
- Add more unit tests for overriding Flow arguments. #926, #927
- Fix name conflicts in test when running unit tests on Github. #961
- Add more unit tests for support of Documents with chunks of different
mime_type
, #968
Documentation
- Improve documentation for drivers #886, #888, #990
- Improve README #894
- Fix typos in documentation. #904, #912, #940, #978
Others
- Improve helper functions. #948, #972, #974
- Improve type of annotation. #962, #966, #967
- Improve module importing logic for classes from
jina-hub
. #967 - Improve authentification for
jina hub
#977 - Jina ❤️ Hacktoberfest. #986
🙏 Thanks to our Contributors
This release contains contributions from Alasdair Tran, Alex C-G, David Sanwald, Deepankar Mahapatro, Han Xiao, JamesTang-jinaai, Joan Fontanals Martinez, Maximilian Werk, Nan Wang, Rutuja Surve, Sreerag-ibtl, Susana Guzman, Yue Liu, pswu11, rameshwara
🙏 Thanks to our Community
And thanks to all of you out there as well! Without you, Jina couldn't do what we do. Your support means a lot to us.
🤝 Work with Jina
Want to work with Jina full-time? Check out our openings on our website.
🎉 v0.5.0
Jina 0.5.0 Release
We are excited to release Jina 0.5.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include:
- Recursive Document structure
- Native data querying capabilities
- Migration of Executors to Jina Hub
- Support for Mindspore
⬆️ Major Features and Improvements
Completeness
- Introduce recursive Document structure. In short, the protobuf definition of
Document
andChunk
are unified. In this new representation,Document
has a recursive structure and the deprecatedChunk
is now a nestedDocument
one level deeper. This new proto enables cleaner driver design, yields more consistent low-level APIs, and provides great extensibility on future features. #652, #684, #700, #709 #729 #726
This is a breaking change. If you started using Jina before
0.4.1
, we highly suggest you read our migration guide.
- Add native data querying capabilities. With the new family of Drivers based on
BaseQueryLangDriver
, you can perform standard query operations on theDocument
. Here is a list of the new drivers:
Name | Description | Counterpart in other query languages |
---|---|---|
FilterQL |
Filter the Document/Chunk by its attributes | filter /where |
SelectQL , SelectRegQL , ExcludeQL , ExcludeRegQL |
Select attributes | select /exclude |
SliceQL |
Take the first k doc/chunk | limit /take /slicing |
SortQL |
Sort a list of Document s |
sort /order_by |
ReverseQL |
Reverse the list of collections | reverse |
Check more details at New Query Language Driver.
Usability
- Migrate executors to Jina Hub. Jina Hub is an open registry for hosting Jina executors via container images. It enables users to ship and exchange reusable components across various Jina search applications. Jina Hub is referred as a Git Submodule in Jina. The Jina team will maintain the executors on Jina Hub. You can build your own executors as well. #852, #842, #848, #855, #857, #861, #860, #871, #872, #879, #880, #854
Check more details at Jina Hub.
Universal
⚠️ Breaking Changes
- Unify
yaml_file
and image withuses
. You can use: a YAML file path, a supported Executor's class name, the content of a YAML config, or a Docker image. Check more details by runningjina pod --help
or in the Jina docs #684
v0.4.0 | v0.5.0 |
f = (Flow()
.add(name='from_class', yaml_file='_pass')
.add(name='from_yaml', yaml_file='mwu.yml')
.add(name='from_str', yaml_file='!OneHotTextEnocoder')
.add(name='from_docker', image='jinaai/hub.examples.mwu_encoder')) |
f = (Flow()
.add(name='from_class', uses='_pass')
.add(name='from_yaml', uses='mwu.yml')
.add(name='from_str', uses='!OneHotTextEnocoder')
.add(name='from_docker', uses='jinaai/hub.examples.mwu_encoder')) |
- Replace the
replicas
argument withparallel
to avoid misunderstanding.parallel
indicates how many Peas are running in parallel. #700
v0.4.0 | v0.5.0 |
!Flow
pods:
encode:
uses: helloworld.encoder.yml
replicas: 2 |
!Flow
pods:
encode:
uses: helloworld.encoder.yml
parallel: 2 |
- Replace
join
withneeds
to improve readability. #762
v0.4.0 | v0.5.0 |
f = (Flow()
.add(name='p1', uses='_pass')
.add(name='p2', uses='_pass', needs='p1')
.add(name='p3', uses='_pass', needs='p1')
.needs(['p2', 'p3'])) |
f = (Flow()
.add(name='p1', uses='_pass')
.add(name='p2', uses='_pass', needs='p1')
.add(name='p3', uses='_pass', needs='p1')
.join(needs=['p2', 'p3'])) |
- Introduce recursive Document structure. This affects a wide range of drivers and executors. Please refer to the full list at #702
🐞 Bug Fixes and Other Changes
Flow
- Refactor and improve the code for building the Flow. #685
- Fix
export_api
. #695 - Fix the Pea name. #698
- Fix the bug of two
join
operations in the same Flow. #730 - Add an alias
_pass
for_forward
; add an argument,name
, forFlow.join()
so that one can customize the name of the Pods; add an argument,uses
, forFlow.join()
, which unifies the usage ofyaml_path
andimages
. #748 - Improve URL regex pattern matching #780
Executors
- Add
FeatureAgglomeration
,TSNEEncoder
,RandomSparseEncoder
,RandomGaussianEncoder
in the numeric encoders. #567, #838 - Fix multiple bugs in
MilvusIndexer
#677 #679 - Support full range of models from 🤗Transformers. #701
- Fix the type bug in
NgtIndexer
. #742 - Refactor the image crafter. #759
- Refactor the framework-based executors to make it easier to build executors from various DL frameworks. #771, #800
- Add
ImageFlipper
. #777 - Fix
cached_property
. #785 - Add
TorchObjectDetectionSegmenter
in the crafters for object detection. #770, #784, #788 - Fix the bug in cropping the image. #769
- Add a
query_by_id
function for BaseVectorIndexer so that we can query by Document id. #827 - Refactor
FaissIndexer
#825 - Fix a bug in serialization of the indexer. #874
Drivers
- Fix the slicing bug in the
QueryLang
and improve the documents. #696, #714, #822 - Add
ConcateEmbedDriver
for concatenate vectors. #748 - Fix the default value issue of the
level_depth
. #817
Documentation
- Add a shortcut for search in the docs. You can start searching by hitting the
/
key. #683 - Add section on common practices. #812
- Add a wall of contributors. For our awesome contributors, we've now put your profiles on our README Thanks to all of you! #832, #835
- Add more explanations for commit messages to make it easier to contribute. #826
- Rephrase and fix typos #722, #731, #740, #768, #818, #820, #821, #837, #849
- Improve visualization and fix cluttered TOC. #801
Protos
- Refactor
tags
frommap
toStruct
. #719
Tests
- Add unit test for
QueryLang
. #710 - Add tests for
VectorSearchDriver
andKVSearchDriver
. #733 - Add tests for
EncodeDriver
. #734 - Add tests for
CraftDriver
. #737 - Add tests for
SegmentDriver
. #738 - Add tests for
SliceQL
. #782 - Add tests for
Chunk2DocRankDriver
. #813 - Improve the unit tests for indexers and add type checking. #838, #844
Others
- Add tests and coverage report in CI. Jina's current test coverage is 76.52% #713 #682
- Add typing to Jina. #761
- Fix the broken labeling action. #787
- Support ignoring packages on the dependency list. #859
- Add missing
Pillow
dependency. #858
🙏 Thanks to our Contributors
This release contains contributions from Alex C-G, Andrey Vasnetsov, Anish Pawar, BingHo1013, Emmanuel Adesile, Eric Shen, Han Xiao, JamesTang616, Joan Fontanals Martinez, Kavan72, Maanav Shah, Morry Wang, Nan Wang, Rohan Chaudhari, Shivam Ra, Shivam Raj, Yue Liu, Zenahr Barzani, coolmian, dima, fhaase2, hanxiao, joanna350, roccia, shivam-raj.
🙏 Thanks to our Community
And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.
🤝 Work with Jina
Want to work with Jina full-time? Check out our openings on our website.
🎉 v0.4.0
Jina 0.4.0
We are excited to release Jina 0.4.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include fallbacks if GPU is unavailable, FaissIndexer on GPU, and switching indexers during querying.
Release 0.4.0
⬆️ Major Features and Improvements
Usability
-
Add a new value for the
on_gpu
field. Settingon_gpu: auto
in the yaml configure will first check if a GPU device is available and fallback to CPUs when no GPU is found. #617 -
Improve the accessibility of
jina helloworld
. We add a CLI argument to enable downloading via the proxy. If you are using a proxy to speed up your internet, tryjina helloworld --download-proxy http://127.0.0.1:1087
. Just replace the ip and port with your proxy settings. #595 -
Support to switching between different
Indexers
during querying. A new argument,ref_indexer
, is added for this purpose. With the following yaml config ofIndexer
,NumpyIndexer
is used for indexing andAnnoyIndexer
is used for querying. The supportedIndexer
includesFaissIndexer
,AnnoyIndexer
,NGTIndexer
,NmslibIndexer
,SptagIndexer
, andNumpyIndexer
.!AnnoyIndexer with: ref_indexer: !NumpyIndexer with: index_filename: wrap-npidx
-
Add a new parameter
skip-on-error
for the Pods. This argument is used to set up on which level you want jina to skip the errors. Check out more details at jina docs #570!ImageReader with: skip-on-error: 'EXECUTOR'
Scalability
- Multiple improvements have been made to speed up the performance.
-
Improve the performance of
NumpyIndexer
. Theargsort
function is replaced byargpartion
, which avoids the unnecessary sorting procedure and speed up the querying process. #641 -
Switch to
zmqstream
for the default message handler, which improves the performance of networking. #618 -
Use
uvloop
fromtornado
to improve the event handling speed in the Pods. #615
-
New Executors
-
Add
NGTIndexer
. NGT provides high-speed ANN searches for a large volume of data in high dimensional vector data space. #533!NGTIndexer with: index_filename: index.gz num_threads: 2 metric: 'l2' epsilon: 0.1
-
Add support to running
FaissIndexer
on GPU and a new argumentn_prob
forFaissIndexer
. Check out more details of the usages at our examples. #636 #638!FaissIndexer with: index_filename: index.gz index_key: 'IVF10,PQ4' train_filepath: train.gz distance: 'l2' nprobe: 1
-
Add support for Milvus as a new
Indexer
. Now you can do indexing and querying withMilvusIndexer
. [W.I.P] #651 -
Add
CustomKerasImageEncoder
so that you can use your customized model from keras to encode images in jina. The following yaml config loads the model frompath/to/your/model
and use output of the layer with the name ofawesome/encoding/layer
as embedding results. #563!CustomKerasImageEncoder with: model_path: path/to/your/model layer_name: awesome/encoding/layer
-
Add an argument
search_k
forAnnoyIndexer
. #642!AnnoyIndexer with: index_filename: index.gz metric: 'euclidean' n_trees: 10 search_k: -1
-
Add
FastICAEncoder
for encoding. #590!FastICAEncoder with: output_dim: 32, num_features: 128, whiten: False,
Documentation
- Welcome our evangelist @alexcg1 from New Zealand! He has been working hard on improving document readability, Jina 101, contribution guidelines and README retouches. A new document has been added to guide new contributors. #566
Unit tests
⚠️ Breaking Changes
-
Rename
port_grpc
toport_expose
. Now we’ve support both gRPC and RESTful APIs and thereforeport_grpc
does not live up to its name any longer.port_grpc
will be deprecated in the future version. #598 -
Refactor
ImageReader
to inherit fromBaseDocCrafter
ratherDocSegmenter
. In case that you are usingImageReader
, check out our examples for more details. #627 -
Refactor
Ranker
. TheTopKFilterDriver
is now used to filter out the chunks that do not belong to the top k documents. This driver is attached toRanker
by default. ForDocPbIndexer
andDataURIPbIndexer
,TopKFilterDriver
is removed from the default attachment. With k shards, this will leads to n * k results returned from the indexer when querying. #574 -
Remove the
password_stdin
argument for thejina hub
CLI. #569
🐞Bug Fixes and Other Changes
Flow
- Fix the
search_lines
API for theFlow
#606
Executors
- Add a new argument
truncation_strategy
inBaseTransformerEncoder
to adapt the latest Huggingface Transformers v3.0.0. #623!TransformerTorchEncoder with: pooling_strategy: cls model_name: distilbert-base-cased max_length: 96 truncation_strategy: longest_first
- Add
size
property for the indexers. #581
Drivers
-
Add a new driver
UnaryEncoderDriver
dedicated for testing and debugging. #635 -
Fix the problem of
PublishDriver
. PublishDriver is used to modify the num_parts when the pod is connect to another by the PUB-SUB connection. However, PublishDriver overwrites the original driver of the pod. #569 -
Remove the
if
clauses from theDrivers
. #646
Protos
-
Add
tags
field in the Chunk and Document proto. Thetags
field is a map of strings and is designed to storage the value of the other fields that will be used for the filtering purpose. #574 -
Add
location
field for the Chunks.location
is a list of integers. It can be used to mark the position or string, or the coordinates of an image, or the timestamp of an audio clip. #578
Tests
🙏 Thanks to our Contributors
This release contains contributions from hanxiao, JoanFM, nan-wang, fhaase2, anish2197, alexcg1, BingHo1013, shivam-raj, Morriaty-The-Murderer, festeh, generall, emmaadesile, coolmian, JamesTang616, and YueLiu-jina
🙏 Thanks to our Community
And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.
🤝 Work with Jina
Want to work with Jina full-time? Check out our openings on our website.