Deployment on Google Cloud Platform #130

falk-stefan · 2022-07-25T10:50:49Z

Hi!

I am currently taking a look at jina-ai. The plan is to get a simple text-based document search going and so far I've managed to make a simple demo locally which uses the PQLiteIndexer (based on AnnLite).

flow = Flow(port=5050)
flow = (
    flow
        .add(uses=TfIdfEncoder, uses_with=dict(tfidf_fp=tfidf_fp))
        .add(uses='jinahub://PQLiteIndexer/latest', install_requirements=True, uses_with=dict(dim=dim))
)

The next step would be for me to see how I can deploy a prototype to Google Cloud Platform (GCP) and, if possible, use Cloud Run in order to keep costs at minimum.

However, since AnnLite requires access to a local file-system I am not sure if that's possible. I intended to use Cloud Storage but it seems AnnLite would not support this.

What options do I have here?

The text was updated successfully, but these errors were encountered:

JoanFM · 2022-07-25T11:17:43Z

You could try using the dockerized version if GCP allows.

JoanFM · 2022-07-25T11:18:24Z

Also, I guess GCP should have access to some temporary file systems, so if u pass those paths it should work

falk-stefan · 2022-07-25T11:24:50Z

Hi and thanks for the quick response!

The problem with Docker here is cost. I want to keep cost down if possible. I think the cheapest solution would be using Cloud Run plus Google Storage.

The temporary file system is limitted to 8GB so that's not an option as well unfortunately.

So, maybe that's a feature request? Make AnnLite flexible enough to run with GCP or AWS buckets?

JoanFM · 2022-07-25T14:31:56Z

We are trying to make some optimizations in term of space, but not sure it will be enough. How many documents do u expect to index? how much data do you use? Maybe u can use another type of Indexer that may keep them in memory?

falk-stefan · 2022-07-25T19:56:31Z

I don't know for sure yet but it's going to be in the tens of millions. Keeping it in memory is probably not feasable in this case. However, I figured that I'll probably have to go with a dockerized + volume mount approach. Cloud Run is stateless so it's probably not what I want after all.

Speaking of Indexer.. would you say that PQLiteIndexer is the weapon of choice here? It looks neat to me beacuse I am going to have meta data which should allow me to filter before running the vector-based search.

JoanFM · 2022-07-25T21:14:43Z

Yes, AnnLiteIndexer is a good weapon of choice. (Please note that PQLiteIndexer was renamed to ANNLiteIndexer and the proper Executor being updated is AnnLiteIndexer. The good thing is that many of these indexers can be replaced easily as a plug-n-play

NicholasDunham · 2022-07-26T02:22:42Z

@falk-stefan Hi, Nicholas from Jina AI here. I'd love to set up a chat with you to learn more about your use case and how we can help. Are you in our community Slack channel? Or is there a more convenient way I can get in touch with you?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deployment on Google Cloud Platform #130

Deployment on Google Cloud Platform #130

falk-stefan commented Jul 25, 2022

JoanFM commented Jul 25, 2022

JoanFM commented Jul 25, 2022

falk-stefan commented Jul 25, 2022

JoanFM commented Jul 25, 2022

falk-stefan commented Jul 25, 2022

JoanFM commented Jul 25, 2022

NicholasDunham commented Jul 26, 2022

Deployment on Google Cloud Platform #130

Deployment on Google Cloud Platform #130

Comments

falk-stefan commented Jul 25, 2022

JoanFM commented Jul 25, 2022

JoanFM commented Jul 25, 2022

falk-stefan commented Jul 25, 2022

JoanFM commented Jul 25, 2022

falk-stefan commented Jul 25, 2022

JoanFM commented Jul 25, 2022

NicholasDunham commented Jul 26, 2022