Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

MINOR: Cache topic resolution in TopicIds set #17285

Merged

Conversation

squah-confluent
Copy link
Contributor

@squah-confluent squah-confluent commented Sep 26, 2024

Looking up topics in a TopicsImage is relatively slow. Cache the results
in TopicIds to improve assignor performance. In benchmarks, we see a
noticeable improvement in performance in the heterogeneous case.

Before

Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt    Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS          1000  avgt    5   36.400 ± 3.004  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10       HETEROGENEOUS          1000  avgt    5  158.340 ± 0.825  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5    1.329 ± 0.041  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10       HETEROGENEOUS          1000  avgt    5  382.901 ± 6.203  ms/op

After

Benchmark                                       (assignmentType)  (assignorType)  (isRackAware)  (memberCount)  (partitionsToMemberRatio)  (subscriptionType)  (topicCount)  Mode  Cnt    Score   Error  Units
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10         HOMOGENEOUS          1000  avgt    5   36.465 ± 1.954  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL           RANGE          false          10000                         10       HETEROGENEOUS          1000  avgt    5  114.043 ± 1.424  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10         HOMOGENEOUS          1000  avgt    5    1.454 ± 0.019  ms/op
ServerSideAssignorBenchmark.doAssignment             INCREMENTAL         UNIFORM          false          10000                         10       HETEROGENEOUS          1000  avgt    5  342.840 ± 2.744  ms/op

Based heavily on #16527.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

Looking up topics in a TopicsImage is relatively slow. Cache the results
in TopicIds to improve assignor performance.
@dajac dajac added the KIP-848 The Next Generation of the Consumer Rebalance Protocol label Sep 26, 2024
@dajac dajac self-requested a review September 26, 2024 13:53
Copy link
Contributor

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squah-confluent Thanks for the patch. I left a few nits.

*
* Provides an implementation of equals and hashCode based on the underlying TopicsImage.
*/
public abstract static class BaseTopicResolver implements TopicResolver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should we keep it private?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it doesn't need to be public.

return result;
}

@Override
public String toString() {
return "TopicIds(topicNames=" + topicNames +
", image=" + image +
", resolver=" + resolver +
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: If we toString it here, should we add toString methods to the resolvers too?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add a toString implementation that returns <class name>(image=<image>).

FrankYang0529 and others added 24 commits September 30, 2024 17:19
…ly for new consumer with poll(0) (apache#16982)

Reviewers: Lianet Magrans <[email protected]>, TaiJuWu <[email protected]>, Kirk True <[email protected]>, TengYao Chi <[email protected]>
…he#16719)

Implement server side changes for epoch bump but keep EndTxn as an unstable API until the client side changes are implemented. EndTxnResponse will return the producer ID and epoch for the transaction. Introduces new tagged fields to the TransactionLogValue to persist the clientTransactionVersion, previousProducerId, and nextProducerId to the log so that the state can be reloaded. See KIP-890 for more details.

Small updates to naming of lastProducerId -> PreviousProducerId. Also cleans up the many TransactionMetadata constructors.

Reviewers: Artem Livshits <[email protected]>, David Jacot <[email protected]>
…active controller is removed (apache#17146)

This change fixes a few issues.

KAFKA-17608; KRaft controller crashes when active controller is removed
When a control batch is committed, the quorum controller currently increases the last stable offset but fails to create a snapshot for that offset. This causes an issue if the quorum controller renounces and needs to revert to that offset (which has no snapshot present). Since the control batches are no-ops for the quorum controller, it does not need to update its offsets for control records. We skip handle commit logic for control batches.

KAFKA-17604; Describe quorum output missing added voters endpoints
Describe quorum output will miss endpoints of voters which were added via AddRaftVoter. This is due to a bug in LeaderState's updateVoterAndObserverStates which will pull replica state from observer states map (which does not include endpoints). The fix is to populate endpoints from the lastVoterSet passed into the method.

Reviewers: José Armando García Sancio <[email protected]>, Colin P. McCabe <[email protected]>, Chia-Ping Tsai <[email protected]>
…pache#17288)

While running large scale performance tests, we noticed that the logging on the ConsumerGroupHeartbeat path took a significant amount of CPU. It is mainly due to the very large data structures that we print out. I made a pass on those logs and I switched some of them to debug.

Reviewers: Lianet Magrans <[email protected]>
…the feature command tool (apache#17093)

This patch belongs to the ongoing efforts of implementing KIP-1022.

Added feature-dependencies command to look up dependencies for a given feature version supplied by --feature flag. If the feature is not known or the version not yet defined, we throw an error.

Examples :
bin/kafka-feature feature-dependencies --feature transaction.version=2
transaction.version=2 requires:
metadata.version=4 (3.3-IV0) (listing any other version dependencies)

bin/kafka-feature feature-dependencies --feature metadata.version=17
metadata.version=17 (3.7-IV2) has no dependencies

Reviewers: Justine Olshan <[email protected]>, Artem Livshits <[email protected]>
…pache#17182)

This PR simply StreamsMetricsImpl to avoid passing in the unused "metric version" parameter.

Reviewers: Matthias J. Sax <[email protected]>
…17259)

With EOSv1 removed, we don't need to create a producer per task, and thus can simplify the code by removing KafkaClientSupplier from the deeply nested StreamsProducer, to simplify the code.

Reviewers: Bill Bejeck <[email protected]>
This patch allows workflow runs to be controlled by the ci-approved label on Pull Requests. Rather than manually approving each workflow run explicitly, committers can now add the appropriate label and the new "CI Requested" and "PR Labeled" workflows will auto-approve the requested run.

Reviewers: Chia-Ping Tsai <[email protected]>
…pache#17295)

This is a small patch to change the default value of group.consumer.migration.policy to BIDIRECTIONAL.

Reviewers: David Jacot <[email protected]>
… tests from EmbeddedZookeeper to KRaft (apache#17016)

Migrate the EmbeddedKafkaCluster from the EmbeddedZookeeper to KRaft

Reviewers Bill Bejeck <[email protected]>
Fixes a regression introduced by apache#16669 which inadvertently stopped processing SCRAM arguments from kafka-storage.sh

Reviewers: Colin P. McCabe <[email protected]>, Federico Valeri <[email protected]>
@github-actions github-actions bot added storage Pull requests that target the storage module KIP-932 Queues for Kafka build Gradle build or GitHub Actions transactions Transactions and EOS clients labels Sep 30, 2024
This reverts commit 1f9c7ae.
This reverts commit 0b97fa6.
This reverts commit d9254d5.
This reverts commit b905a9f.
This reverts commit f438bb4.
This reverts commit f0a2d43.
This reverts commit 945023d.
This reverts commit 2df3ff0.
This reverts commit 3b555b8.
This reverts commit 2059889.
This reverts commit fe5056e.
This reverts commit 0589e93.
This reverts commit 99e9f81.
This reverts commit 813d79c.
This reverts commit 5aaf623.
This reverts commit 0d8485a.
This reverts commit 25b048e.
This reverts commit 536dfbb.
This reverts commit 655cddd.
This reverts commit 48c022c.
This reverts commit 03b4cdc.
This reverts commit a59e8c5.
@mumrah mumrah removed streams core Kafka Broker consumer tools storage Pull requests that target the storage module KIP-932 Queues for Kafka build Gradle build or GitHub Actions transactions Transactions and EOS clients labels Oct 1, 2024
Copy link
Contributor

@mumrah mumrah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@squah-confluent, Seeing that the cache is not actually used outside of tests and benchmarks, I'm guessing this is still WIP.

Does this come down to performance differences between HashMap and PCollectionsImmutableMap?

If we decide we really need faster topic ID to name lookups, I would consider adding it to TopicsImage. Managing a cache outside of the image will be a bit difficult.

Another thing to consider is the lifetime of the cache. Do we really need the ID + name mappings kept in memory forever?

* Provides an implementation of equals, hashCode and toString based on the underlying
* TopicsImage.
*/
private abstract static class BaseTopicResolver implements TopicResolver {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need this abstract class? Can we just have the interface and concrete class?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove it.

@dajac
Copy link
Contributor

dajac commented Oct 2, 2024

@mumrah

Another thing to consider is the lifetime of the cache. Do we really need the ID + name mappings kept in memory forever?

The lifetime of the cache is bound to the call. It is not kept forever.

Does this come down to performance differences between HashMap and PCollectionsImmutableMap?

Yes.

If we decide we really need faster topic ID to name lookups, I would consider adding it to TopicsImage. Managing a cache outside of the image will be a bit difficult.

We could consider this separately. At the moment, we don't really have the time to do it. The current strategy seems to be a good tradeoff at the moment given that it is only bound to the call and not kept forever.

@mumrah
Copy link
Contributor

mumrah commented Oct 2, 2024

@dajac thanks for the explanation, makes sense. Can we include a javadoc on the class describing the expected lifetime of this class?

@squah-confluent
Copy link
Contributor Author

squah-confluent commented Oct 2, 2024

@mumrah

Seeing that the cache is not actually used outside of tests and benchmarks, I'm guessing this is still WIP.

It's used in TargetAssignmentBuilder.build(), which is used by the new group coordinator.

I've updated the javadoc to describe the lifetime of the cache.

Copy link
Contributor

@dajac dajac left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks.

@dajac
Copy link
Contributor

dajac commented Oct 3, 2024

@mumrah I will merge it. If you have further comments, @squah-confluent can address separately.

@dajac dajac merged commit 99e1d8f into apache:trunk Oct 3, 2024
9 checks passed
@dajac dajac deleted the squah-cache-topicids-topic-resolution branch October 3, 2024 07:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ci-approved KIP-848 The Next Generation of the Consumer Rebalance Protocol kraft performance
Projects
None yet
Development

Successfully merging this pull request may close these issues.