Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DevSecOps : Identify Current Max Batch Concurrency #17002

Open
emvaldes opened this issue Jan 7, 2025 · 6 comments
Open

DevSecOps : Identify Current Max Batch Concurrency #17002

emvaldes opened this issue Jan 7, 2025 · 6 comments
Assignees
Labels
DevSecOps Team Aq DevSecOps work label documentation Tickets that add documentation on existing features and services platform-current Platform - Current Capabilities reportstream
Milestone

Comments

@emvaldes
Copy link
Collaborator

emvaldes commented Jan 7, 2025

Objective:

Determine the maximum number of concurrent batches (senders) the system can handle without performance degradation, given different batch sizes.


Deliverables

  1. Concurrency Test Results: Metrics for each batch size and concurrency level.
  2. Visualizations: Charts showing concurrency levels vs. latency, error rates, and resource usage.
  3. Concurrency Thresholds: Documented maximum concurrency levels for different batch sizes.
  4. Pilot Report: Insights from testing recommended concurrency levels in production.
@emvaldes emvaldes added DevSecOps Team Aq DevSecOps work label platform-current Platform - Current Capabilities reportstream labels Jan 7, 2025
@emvaldes emvaldes added this to the todo milestone Jan 7, 2025
@emvaldes emvaldes changed the title DevSecOps : Reproducing Production Setup DevSecOps : Identify Current Max Batch Concurrency Jan 7, 2025
@emvaldes
Copy link
Collaborator Author

emvaldes commented Jan 7, 2025

Analyze Current Concurrency Limits

Goal: Understand the system's existing concurrency behavior and identify potential bottlenecks.


Tasks:

  1. Review Historical Logs and Metrics

    • Sub-Tasks:
      1. Query Azure Log Analytics for historical concurrency metrics.
      2. Identify peak concurrency scenarios and their associated performance impacts.
      3. Analyze error rates during high-concurrency periods.
  2. Identify Concurrency Constraints

    • Sub-Tasks:
      1. Document system components that impact concurrency, such as:
        • Database connection limits.
        • Thread pool sizes in APIs or batch processors.
        • Queue throughput (e.g., Azure Service Bus or Event Hub).
      2. Verify any existing concurrency throttling mechanisms.
  3. Baseline Performance Metrics

    • Sub-Tasks:
      1. Define acceptable thresholds for concurrency testing:
        • Maximum latency (e.g., <500ms per batch).
        • Error rates (e.g., <1% failures).
        • Resource utilization (e.g., CPU/memory usage <80%).
      2. Collect baseline metrics for the current concurrency setup.

@emvaldes
Copy link
Collaborator Author

emvaldes commented Jan 7, 2025

Design and Execute Concurrency Tests

Goal: Measure system performance while increasing the number of concurrent senders for different batch sizes.


Tasks:

  1. Prepare the Test Environment

    • Sub-Tasks:
      1. Ensure a high-fidelity staging environment that mirrors production.
      2. Configure monitoring tools (Azure Monitor, Application Insights) to capture concurrency metrics.
  2. Define Test Scenarios

    • Sub-Tasks:
      1. Test concurrency levels with batch sizes of 500, 1000, 2500, and 5000.
      2. Increment concurrency levels gradually:
        • Example: Start with 1 sender and increase to 5, 10, 20, 50, and 100 senders.
      3. Include failure scenarios (e.g., one sender failing mid-batch).
  3. Execute Concurrency Tests

    • Sub-Tasks:
      1. Use tools like K6, Locust, or JMeter to simulate multiple senders.
      2. Monitor performance metrics for each concurrency level:
        • Latency (average, 95th percentile).
        • Error rates (e.g., failed batches).
        • Resource usage (CPU, memory, disk I/O).
      3. Record system behavior at each concurrency level to identify thresholds.
  4. Monitor During Testing

    • Sub-Tasks:
      1. Use Azure Monitor to track resource utilization (CPU, memory, disk IOPS).
      2. Use Application Insights to monitor latency and errors in real time.

@emvaldes
Copy link
Collaborator Author

emvaldes commented Jan 7, 2025

Analyze Results and Determine Concurrency Thresholds

Goal: Evaluate the performance metrics to identify the maximum concurrency levels for different batch sizes.


Tasks:

  1. Aggregate Test Results

    • Sub-Tasks:
      1. Consolidate metrics for each batch size and concurrency level.
      2. Plot concurrency levels vs. latency, error rates, and resource usage.
  2. Identify Concurrency Limits

    • Sub-Tasks:
      1. Determine the concurrency level at which:
        • Latency begins to spike.
        • Error rates exceed acceptable thresholds (<1%).
        • Resource utilization exceeds safe levels (>80% CPU or memory usage).
      2. Document the maximum concurrency for each batch size tested.
  3. Document Bottlenecks

    • Sub-Tasks:
      1. Identify components that limit concurrency (e.g., database, queue, API thread pools).
      2. Provide recommendations for addressing these bottlenecks (e.g., increasing database connection limits, optimizing queue throughput).

@emvaldes
Copy link
Collaborator Author

emvaldes commented Jan 7, 2025

Validate Findings in Production

Goal: Test the identified concurrency thresholds in a controlled production environment.


Tasks:

  1. Pilot Concurrency Levels

    • Sub-Tasks:
      1. Select a subset of production workloads to test concurrency thresholds.
      2. Monitor performance metrics closely during pilot runs.
  2. Compare with Historical Metrics

    • Sub-Tasks:
      1. Analyze how the system performs compared to historical high-concurrency events.
      2. Document any deviations or unexpected behaviors.
  3. Finalize Concurrency Recommendations

    • Sub-Tasks:
      1. Prepare a detailed report summarizing findings and the recommended concurrency limits.
      2. Present results to stakeholders for validation and approval.

@emvaldes
Copy link
Collaborator Author

emvaldes commented Jan 7, 2025

Monitoring Batch Concurrency Metrics

1. Metrics to Monitor

  1. Batch Processing Metrics:
    • Number of concurrent batches processed.
    • Average processing time per batch.
    • Error rates during concurrent processing.
  2. Infrastructure Metrics:
    • CPU, memory, and disk IOPS utilization during concurrency tests.
  3. Queue Metrics (if applicable):
    • Queue depth (number of unprocessed messages).
    • Message processing latency.

2. KQL Queries for Monitoring Concurrency

  1. Batch Processing Latency (Grouped by Sender):

    customEvents
    | where name == "BatchProcessed"
    | summarize AvgLatency = avg(todouble(customDimensions['duration'])), P95Latency = percentile(todouble(customDimensions['duration']), 95) by customDimensions['senderId'], bin(timestamp, 1m)
  2. Error Rate by Sender:

    customEvents
    | where name == "BatchError"
    | summarize ErrorCount = count() by customDimensions['senderId'], bin(timestamp, 1m)
  3. Resource Utilization During Concurrency Testing:

    Perf
    | where ObjectName == "Processor" and CounterName == "% Processor Time"
    | summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 1m)
  4. Throughput (Batches Processed Per Second):

    customEvents
    | where name == "BatchProcessed"
    | summarize Throughput = count() by bin(timestamp, 1s)

@emvaldes
Copy link
Collaborator Author

emvaldes commented Jan 7, 2025

Visualizing Concurrency Metrics

  1. Line Charts:
    • Plot concurrency levels vs. latency and error rates.
    • Example: A chart showing how latency increases as concurrency increases.
  2. Bar Charts:
    • Show the number of errors per sender at different concurrency levels.
  3. Heatmaps:
    • Visualize resource utilization (CPU, memory) at different concurrency levels.

@emvaldes emvaldes added the documentation Tickets that add documentation on existing features and services label Jan 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DevSecOps Team Aq DevSecOps work label documentation Tickets that add documentation on existing features and services platform-current Platform - Current Capabilities reportstream
Projects
None yet
Development

No branches or pull requests

2 participants