Skip to content

Releases: jina-ai/serve

🎉 v0.6.0

04 Oct 18:14
Compare
Choose a tag to compare

Jina v0.6.0

We are excited to release Jina 0.6.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include:

  • Improve the memory footprint for the Indexer.
  • Add an example for building a cross-modal search system with Jina.
  • Add support for indexing .pdf files.

Release 0.6.0

⬆️ Major Features and Improvements

Scalability

  • Improve the memory footprint for the Indexer. Instead of using the in-memory index during the query mode, both the NumpyIndexer and the BinaryPbIndexer use the memory mapping to better support scaling out for large datasets. To further improve the memory footprint for the vector index, ZarrIndexer based on Zarr has been added to Jina Hub. #950, #984.

Universal

⚠️ Breaking Changes

For details of all breaking changes, please refer to #885

  • Improve the way of traversing recursive document structure. #944, #933, #923, #893, #889,
  • Rename --yaml-path to --uses in Flow CLI #925, #922
  • Rename --uses-reducing to --uses-after and add --uses-before. This change enables us to customize the executors' behaviors before sending them to and after receiving from all parallels/shards. #925

🐞 Bug Fixes and Other Changes

Flow

  • Improve context management of Flow and Pod with ExitStack. #901,
  • Improve shut-down logic for log server #935, #958
  • Fix shut-down logic for Peas and Pods #907, #956
  • Refactor de-/serialization logic #988, #991

Executors

  • Add a meta variable force_register for executors in order to force Jina to use local version of executor. #883
  • Fix a bug in reducing functions for encoders. #900
  • Fix default behavior of CompoundIndexer #939
  • Fix bug in overwriting metas using Python client. #980

Drivers

  • Add CollectMatches2DocRankDriver for calculating matches with granularity=k-1 from Matches at granularity=k. #851
  • Add Matches2DocRankDriver for calculating new scores of matches from original scores #919
  • Add VectorFillDriver for filling embeddings of Document 2 #909, #913
  • Add support for using tags with QueryLangDrivers #938
  • Add support for traversing recursive Documents via explicit tree path definition. #983, #979, #994, #993
  • Enable BaseSegmenter to change mime_type. #981
  • Add NdArray2PngURI and Blob2PngURI for convert numpy arrays into data URI. #982

CLI

  • Add --test-uses option for jina hub build CLI for skipping failed-start peas when building Docker file. #902, #965
  • Add is_build_success field for checking results of jina hub build. #903
  • Add --type app option for jina hub new CLI for creating a new Jina app. #917
  • Add --push option for jina hub build CLI for building and pushing local executors to Jina Hub. #937
  • Improve jina hub list CLI. #985
  • Improve speed of CLI autocompletion. #992

Tests

  • Add more unit tests for reducing functions 1 #898
  • Move dependencies for unit tests into extra-requirements.txt #906
  • Add unit tests for sleeping executors #918
  • Add more unit tests for checking Peas #921
  • Add more unit tests for decorators of executors. #930
  • Add more unit tests for overriding Flow arguments. #926, #927
  • Fix name conflicts in test when running unit tests on Github. #961
  • Add more unit tests for support of Documents with chunks of different mime_type, #968

Documentation

Others

🙏 Thanks to our Contributors

This release contains contributions from Alasdair Tran, Alex C-G, David Sanwald, Deepankar Mahapatro, Han Xiao, JamesTang-jinaai, Joan Fontanals Martinez, Maximilian Werk, Nan Wang, Rutuja Surve, Sreerag-ibtl, Susana Guzman, Yue Liu, pswu11, rameshwara

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you, Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.

🎉 v0.5.0

30 Aug 13:12
Compare
Choose a tag to compare

Jina 0.5.0 Release

We are excited to release Jina 0.5.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include:

  • Recursive Document structure
  • Native data querying capabilities
  • Migration of Executors to Jina Hub
  • Support for Mindspore

⬆️ Major Features and Improvements

Completeness

  • Introduce recursive Document structure. In short, the protobuf definition of Document and Chunk are unified. In this new representation, Document has a recursive structure and the deprecated Chunk is now a nested Document one level deeper. This new proto enables cleaner driver design, yields more consistent low-level APIs, and provides great extensibility on future features. #652, #684, #700, #709 #729 #726

This is a breaking change. If you started using Jina before 0.4.1, we highly suggest you read our migration guide.

  • Add native data querying capabilities. With the new family of Drivers based on BaseQueryLangDriver, you can perform standard query operations on the Document. Here is a list of the new drivers:
Name Description Counterpart in other query languages
FilterQL Filter the Document/Chunk by its attributes filter/where
SelectQL, SelectRegQL, ExcludeQL, ExcludeRegQL Select attributes select/exclude
SliceQL Take the first k doc/chunk limit/take/slicing
SortQL Sort a list of Documents sort/order_by
ReverseQL Reverse the list of collections reverse

Check more details at New Query Language Driver.

Usability

  • Migrate executors to Jina Hub. Jina Hub is an open registry for hosting Jina executors via container images. It enables users to ship and exchange reusable components across various Jina search applications. Jina Hub is referred as a Git Submodule in Jina. The Jina team will maintain the executors on Jina Hub. You can build your own executors as well. #852, #842, #848, #855, #857, #861, #860, #871, #872, #879, #880, #854

Check more details at Jina Hub.

Universal

⚠️ Breaking Changes

  • Unify yaml_file and image with uses. You can use: a YAML file path, a supported Executor's class name, the content of a YAML config, or a Docker image. Check more details by running jina pod --help or in the Jina docs #684
v0.4.0 v0.5.0
f = (Flow()
     .add(name='from_class', yaml_file='_pass')
     .add(name='from_yaml', yaml_file='mwu.yml')
     .add(name='from_str', yaml_file='!OneHotTextEnocoder')
     .add(name='from_docker', image='jinaai/hub.examples.mwu_encoder'))
f = (Flow()
     .add(name='from_class', uses='_pass')
     .add(name='from_yaml', uses='mwu.yml')
     .add(name='from_str', uses='!OneHotTextEnocoder')
     .add(name='from_docker', uses='jinaai/hub.examples.mwu_encoder'))
  • Replace the replicas argument with parallel to avoid misunderstanding. parallel indicates how many Peas are running in parallel. #700
v0.4.0 v0.5.0
!Flow
pods:
  encode:
    uses: helloworld.encoder.yml
    replicas: 2
!Flow
pods:
  encode:
    uses: helloworld.encoder.yml
    parallel: 2
  • Replace join with needs to improve readability. #762
v0.4.0 v0.5.0
f = (Flow()
     .add(name='p1', uses='_pass')
     .add(name='p2', uses='_pass', needs='p1')
     .add(name='p3', uses='_pass', needs='p1')
     .needs(['p2', 'p3']))
f = (Flow()
     .add(name='p1', uses='_pass')
     .add(name='p2', uses='_pass', needs='p1')
     .add(name='p3', uses='_pass', needs='p1')
     .join(needs=['p2', 'p3']))
  • Introduce recursive Document structure. This affects a wide range of drivers and executors. Please refer to the full list at #702

🐞 Bug Fixes and Other Changes

Flow

  • Refactor and improve the code for building the Flow. #685
  • Fix export_api. #695
  • Fix the Pea name. #698
  • Fix the bug of two join operations in the same Flow. #730
  • Add an alias _pass for _forward; add an argument, name, for Flow.join() so that one can customize the name of the Pods; add an argument, uses, for Flow.join(), which unifies the usage of yaml_path and images. #748
  • Improve URL regex pattern matching #780

Executors

  • Add FeatureAgglomeration, TSNEEncoder, RandomSparseEncoder, RandomGaussianEncoder in the numeric encoders. #567, #838
  • Fix multiple bugs in MilvusIndexer #677 #679
  • Support full range of models from 🤗Transformers. #701
  • Fix the type bug in NgtIndexer. #742
  • Refactor the image crafter. #759
  • Refactor the framework-based executors to make it easier to build executors from various DL frameworks. #771, #800
  • Add ImageFlipper. #777
  • Fix cached_property. #785
  • Add TorchObjectDetectionSegmenter in the crafters for object detection. #770, #784, #788
  • Fix the bug in cropping the image. #769
  • Add a query_by_id function for BaseVectorIndexer so that we can query by Document id. #827
  • Refactor FaissIndexer #825
  • Fix a bug in serialization of the indexer. #874

Drivers

  • Fix the slicing bug in the QueryLang and improve the documents. #696, #714, #822
  • Add ConcateEmbedDriver for concatenate vectors. #748
  • Fix the default value issue of the level_depth. #817

Documentation

Protos

  • Refactor tags from map to Struct. #719

Tests

  • Add unit test for QueryLang. #710
  • Add tests for VectorSearchDriver and KVSearchDriver. #733
  • Add tests for EncodeDriver. #734
  • Add tests for CraftDriver. #737
  • Add tests for SegmentDriver. #738
  • Add tests for SliceQL. #782
  • Add tests for Chunk2DocRankDriver. #813
  • Improve the unit tests for indexers and add type checking. #838, #844

Others

  • Add tests and coverage report in CI. Jina's current test coverage is 76.52% #713 #682
  • Add typing to Jina. #761
  • Fix the broken labeling action. #787
  • Support ignoring packages on the dependency list. #859
  • Add missing Pillow dependency. #858

🙏 Thanks to our Contributors

This release contains contributions from Alex C-G, Andrey Vasnetsov, Anish Pawar, BingHo1013, Emmanuel Adesile, Eric Shen, Han Xiao, JamesTang616, Joan Fontanals Martinez, Kavan72, Maanav Shah, Morry Wang, Nan Wang, Rohan Chaudhari, Shivam Ra, Shivam Raj, Yue Liu, Zenahr Barzani, coolmian, dima, fhaase2, hanxiao, joanna350, roccia, shivam-raj.

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.

🎉 v0.4.0

30 Jul 02:43
Compare
Choose a tag to compare

Jina 0.4.0

We are excited to release Jina 0.4.0. Jina is the easier way to do neural search on the cloud. Highlights of this release include fallbacks if GPU is unavailable, FaissIndexer on GPU, and switching indexers during querying.

Release 0.4.0

⬆️ Major Features and Improvements

Usability

  • Add a new value for the on_gpu field. Setting on_gpu: auto in the yaml configure will first check if a GPU device is available and fallback to CPUs when no GPU is found. #617

  • Improve the accessibility of jina helloworld. We add a CLI argument to enable downloading via the proxy. If you are using a proxy to speed up your internet, try jina helloworld --download-proxy http://127.0.0.1:1087. Just replace the ip and port with your proxy settings. #595

  • Support to switching between different Indexers during querying. A new argument, ref_indexer, is added for this purpose. With the following yaml config of Indexer, NumpyIndexer is used for indexing and AnnoyIndexer is used for querying. The supported Indexer includes FaissIndexer, AnnoyIndexer, NGTIndexer, NmslibIndexer, SptagIndexer, and NumpyIndexer.

    !AnnoyIndexer
    with:
        ref_indexer:
            !NumpyIndexer
            with:
                index_filename: wrap-npidx
    

    #599 #589

  • Add a new parameter skip-on-error for the Pods. This argument is used to set up on which level you want jina to skip the errors. Check out more details at jina docs #570

     !ImageReader
     with:
         skip-on-error: 'EXECUTOR'
    

Scalability

  • Multiple improvements have been made to speed up the performance.
    • Improve the performance of NumpyIndexer. The argsort function is replaced by argpartion, which avoids the unnecessary sorting procedure and speed up the querying process. #641

    • Switch to zmqstream for the default message handler, which improves the performance of networking. #618

    • Use uvloop from tornado to improve the event handling speed in the Pods. #615

New Executors

  • Add NGTIndexer. NGT provides high-speed ANN searches for a large volume of data in high dimensional vector data space. #533

     !NGTIndexer
     with:
         index_filename: index.gz
         num_threads: 2
         metric: 'l2'
         epsilon: 0.1
    
  • Add support to running FaissIndexer on GPU and a new argument n_prob for FaissIndexer. Check out more details of the usages at our examples. #636 #638

    !FaissIndexer
    with:
        index_filename: index.gz
        index_key: 'IVF10,PQ4'
        train_filepath: train.gz
        distance: 'l2'
        nprobe: 1
    
  • Add support for Milvus as a new Indexer. Now you can do indexing and querying with MilvusIndexer. [W.I.P] #651

  • Add CustomKerasImageEncoder so that you can use your customized model from keras to encode images in jina. The following yaml config loads the model from path/to/your/model and use output of the layer with the name of awesome/encoding/layer as embedding results. #563

    !CustomKerasImageEncoder
    with:
        model_path: path/to/your/model
        layer_name: awesome/encoding/layer
    
  • Add an argument search_k for AnnoyIndexer. #642

    !AnnoyIndexer
    with:
        index_filename: index.gz
        metric: 'euclidean'
        n_trees: 10
        search_k: -1
    
  • Add FastICAEncoder for encoding. #590

    !FastICAEncoder
    with:
        output_dim: 32,
        num_features: 128,
        whiten: False,
    

Documentation

  • Welcome our evangelist @alexcg1 from New Zealand! He has been working hard on improving document readability, Jina 101, contribution guidelines and README retouches. A new document has been added to guide new contributors. #566

    #564
    #558
    #545

Unit tests

  • Add the coverage testing. Proudly, Jina's current test coverage is 73.04%. #659

⚠️ Breaking Changes

  • Rename port_grpc to port_expose. Now we’ve support both gRPC and RESTful APIs and therefore port_grpc does not live up to its name any longer. port_grpc will be deprecated in the future version. #598

  • Refactor ImageReader to inherit from BaseDocCrafter rather DocSegmenter. In case that you are using ImageReader, check out our examples for more details. #627

  • Refactor Ranker. The TopKFilterDriver is now used to filter out the chunks that do not belong to the top k documents. This driver is attached to Ranker by default. For DocPbIndexer and DataURIPbIndexer, TopKFilterDriver is removed from the default attachment. With k shards, this will leads to n * k results returned from the indexer when querying. #574

  • Remove the password_stdin argument for the jina hub CLI. #569

🐞Bug Fixes and Other Changes

Flow

  • Fix the search_lines API for the Flow #606

Executors

  • Add a new argument truncation_strategy in BaseTransformerEncoder to adapt the latest Huggingface Transformers v3.0.0. #623
    !TransformerTorchEncoder
    with:
        pooling_strategy: cls
        model_name: distilbert-base-cased
        max_length: 96
        truncation_strategy: longest_first
    
  • Add size property for the indexers. #581

Drivers

  • Add a new driver UnaryEncoderDriver dedicated for testing and debugging. #635

  • Fix the problem of PublishDriver. PublishDriver is used to modify the num_parts when the pod is connect to another by the PUB-SUB connection. However, PublishDriver overwrites the original driver of the pod. #569

  • Remove the if clauses from the Drivers. #646

Protos

  • Add tags field in the Chunk and Document proto. The tags field is a map of strings and is designed to storage the value of the other fields that will be used for the filtering purpose. #574

  • Add location field for the Chunks. location is a list of integers. It can be used to mark the position or string, or the coordinates of an image, or the timestamp of an audio clip. #578

Tests

🙏 Thanks to our Contributors

This release contains contributions from hanxiao, JoanFM, nan-wang, fhaase2, anish2197, alexcg1, BingHo1013, shivam-raj, Morriaty-The-Murderer, festeh, generall, emmaadesile, coolmian, JamesTang616, and YueLiu-jina

🙏 Thanks to our Community

And thanks to all of you out there as well! Without you Jina couldn't do what we do. Your support means a lot to us.

🤝 Work with Jina

Want to work with Jina full-time? Check out our openings on our website.