Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add dependency processor using Apache Beam #6560

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

yunmaoQu
Copy link

Which problem is this PR solving?

Resolves #5911

Description of the changes

  • add dependency processor using Apache Beam

How was this change tested?

  • e2e tests

Checklist

@yunmaoQu yunmaoQu requested a review from a team as a code owner January 17, 2025 17:37
@yunmaoQu yunmaoQu requested a review from joe-elliott January 17, 2025 17:37
@yunmaoQu yunmaoQu force-pushed the add-dependency-processor branch from af5f794 to 60fb334 Compare January 17, 2025 17:42
Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • where is it hooked up to anything?
  • what would be the e2e testing for this component?

@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 18, 2025

  • where is it hooked up to anything?
  • what would be the e2e testing for this component?

@yurishkuro I have fixed it

Copy link
Member

@yurishkuro yurishkuro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mahadzaryab1 interesting direction here

Signed-off-by: yunmaoQu <[email protected]>
@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 20, 2025

@yurishkuro Except this ,I update all based on your review.

config *Config
aggregator *dependencyAggregator // Define the aggregator below.
telset component.TelemetrySettings
dependencyWriter *memory.Store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as I mentioned, you cannot have concrete store dependency here. The processor needs to work with any storage supported by Jaeger, as long as they implement WriteDependencies.

Example:

f, err := jaegerstorage.GetStorageFactory(storageName, host)

func (tp *dependencyProcessor) Shutdown(ctx context.Context) error {
close(tp.closeChan)
if tp.aggregator != nil {
if err := tp.aggregator.Close(); err != nil {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if aggregator has a Close() function why does it need to be passed closeChan?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

?

// is considered complete and ready for dependency aggregation.
// Default trace completion timeout: 2 seconds of inactivity
InactivityTimeout time.Duration `yaml:"inactivity_timeout"`
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please add Validate method and use valid: notations in the field tags.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ok

Signed-off-by: yunmaoQu <[email protected]>
@yunmaoQu
Copy link
Author

yunmaoQu commented Jan 30, 2025

@yurishkuro I have fixed it

Signed-off-by: yunmaoQu <[email protected]>
Comment on lines +57 to +58
func (agg *dependencyAggregator) Start(closeChan chan struct{}) {
agg.closeChan = closeChan
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (agg *dependencyAggregator) Start(closeChan chan struct{}) {
agg.closeChan = closeChan
func (agg *dependencyAggregator) Start() {

eventTime: time.Now(),
}
select {
case agg.inputChan <- event:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the motivation for having this done in the background instead of in the caller goroutine? Are the operations on Beam pipeline threadsafe or is this the reason for separation?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for processing spans in the background (via a separate goroutine) rather than in the caller goroutine is primarily related to performance optimization, decoupling of concerns, and ensuring thread safety when interacting with the Apache Beam pipeline

config: &cfg,
telset: telset,
dependencyWriter: dependencyWriter,
inputChan: make(chan spanEvent, 1000),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the motivation for making this a bound queue?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for making the inputChan a bounded queue (a buffered channel with a fixed size, e.g., 1000) is primarily to manage backpressure, control resource usage, and ensure system stability in high-throughput scenarios

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The motivation for making the inputChan a bounded queue (a buffered channel with a fixed size, e.g., 1000) is primarily to manage backpressure, control resource usage, and ensure system stability in high-throughput scenarios

When a channel is unbounded, it cannot be written to unless there there is a reader waiting to consume it, so it provides a natural back pressure as the caller goroutine will be blocked and hold the remote caller. And it does not allow the queue to grow and accumulate unprocessed data while making it look like the processing was immediately successful.

Copy link
Author

@yunmaoQu yunmaoQu Feb 2, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@yurishkuro Ok,i will fix it . Could the part of code is ready to be merged ?

@yunmaoQu
Copy link
Author

yunmaoQu commented Feb 4, 2025

Hi @mahadzaryab1 , would you mind taking a look at my PR when you have a moment? I'd appreciate your feedback. Thanks!

@yunmaoQu
Copy link
Author

yunmaoQu commented Feb 4, 2025

@yurishkuro I will fix the part you mentioned . Could the part of code is ready to be merged ?

@yunmaoQu
Copy link
Author

yunmaoQu commented Feb 4, 2025

Hey @mahadzaryab1 , I know you're familiar with this part of the code. Could you give my PR a look and share your thoughts? Thank you!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement in-memory Service Dependency Graph using Apache Beam
3 participants