✨ Fast Insertion #138
-
i was reading about pupyl and i noticed it said: Pupyl is a really fast image search library which you can index your own (millions of) images and find similar images in milliseconds. i have around 160 million 32x32 images and was wondering if it's possible to insert them in a very fast way. i tried using the normal method provided in the docs, but it was very slow. around 3 images/s. |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments 1 reply
-
Hey @Kencho5, how's it going? First of all, thanks for this discussion. I converted it from an issue to a discussion because there's a lot to understand (and actually discuss) about this.
Since the indexation step uses convolutional neural networks to extract features, it performs billions of floating point operation over tensors (over all images), which is very demanding. There's a trade-off between speed and precision on this step and we focus on precision. We could use other classic algorithms, like histograms or SIFT or SURF, which is so much faster (and less resource demanding) than CNNs, but we would end up with decreased precision. Unfortunately this feature is not exposed on the main class yet (on the latest version, from pupyl.indexer.facets import Index
from pupyl.storage.database import ImageDatabase
from pupyl.embeddings.features import Characteristics, Extractors
image_database = ImageDatabase(import_images=True, data_dir='/tmp/pupyl/')
with Extractors(Characteristics.LIGHTWEIGHT_REGULAR_PRECISION) as extractor:
with Index(extractors.output_shape, data_dir='/tmp/pupyl') as indexer:
for ident, uri in enumerate(extractor.scan_images('/tmp/images')):
image_database.insert(ident, uri)
indexer.append(extractor.extract(uri)) Therefore, you can tweak the network characteristics, which on the example above was chosen a faster option than the default one (more about this on features), which is depicted on Another way to speed up indexing without changing code is to use GPUs. As you described processing ~3 images/s, there's a huge likelihood that you're using your CPU to do this. If you've a dedicated GPU,
In some rare cases, where we can see a speed of approximately 20 images/s. Better the GPU, faster the indexation. |
Beta Was this translation helpful? Give feedback.
-
i see, but given the scale (170 million 32x32 images) 20 images/s will take many days. does it support multiprocessing? |
Beta Was this translation helpful? Give feedback.
-
Hey @Kencho5. |
Beta Was this translation helpful? Give feedback.
Hey @Kencho5, how's it going?
First of all, thanks for this discussion. I converted it from an issue to a discussion because there's a lot to understand (and actually discuss) about this.
pupyl
search is several orders of magnitude faster than indexing, because most of the workload happens on the latter. Hence, everything discussed here from now on is about indexing, not searching.Since the indexation step uses convolutional neural networks to extract features, it performs billions of floating point operation over tensors (over all images), which is very demanding. There's a trade-off between speed and precision on this step and we focus on precision. We could use other classic algorithm…