Pekko Cluster Sharding - Race condition #1507

Susmit07 · 2024-09-29T06:09:48Z

Susmit07
Sep 29, 2024

Hello Developers,

We are planning to use Pekko connectors to observe a file directory in a distributed storage like HDFS, and pull parquet files wheneve r available. We thought of multiple approaches of cluster deployment, each has its own pros and cons. one of them being Cluster Sharding

In a Cluster Sharding what are probabilities of a file being processed once at a given time across all the pods in the cluster?

Each entity in a sharded cluster is uniquely identified by its entityId as far the documentation mentions.

I have 2 doubts:

For each file within a HDFS directory, if we provide a unique entity ID by hashing the file path, will it ensure at a given time the file is processed by exactly one actor (node) in the cluster, or we need to have a locking mechanism in place / or implement the file-processing logic in such a way that it ensure to be idempotent
If there are too many files in the source HDFS directory then there will be good number of actors at a given time will be created - will it add to a performance bottleneck
Considering the above requirements is Cluster Sharding an appropriate technique to adopt for distributed file download (Singleton Cluster mode deployment will ensure exactly one time pull for a file to be processed which is good for our usecase but problem is it won't scale when directories and files increase, and idle pods in the cluster is another drawback, so we thought of exploring sharding cluster)

Hoping for contributors to provide some insight, grateful !

pjfanning · 2024-09-29T06:12:12Z

pjfanning
Sep 29, 2024
Collaborator

Please don't spam the community by opening multiple discussions about the same thing. You already have apache/pekko-connectors#835

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Pekko Cluster Sharding - Race condition #1507

{{title}}

Replies: 1 comment

{{title}}

Select a reply

Pekko Cluster Sharding - Race condition #1507

Susmit07 Sep 29, 2024

Replies: 1 comment

pjfanning Sep 29, 2024 Collaborator

Susmit07
Sep 29, 2024

pjfanning
Sep 29, 2024
Collaborator