DevSecOps : Identify Current Max Batch Concurrency #17002

emvaldes · 2025-01-07T07:25:48Z

Objective:

Determine the maximum number of concurrent batches (senders) the system can handle without performance degradation, given different batch sizes.

Deliverables

Concurrency Test Results: Metrics for each batch size and concurrency level.
Visualizations: Charts showing concurrency levels vs. latency, error rates, and resource usage.
Concurrency Thresholds: Documented maximum concurrency levels for different batch sizes.
Pilot Report: Insights from testing recommended concurrency levels in production.

Scope of Work: These were tasks I created as technical reference (draft) to organize our thoughts on important components to be reviewed during all the stages in these improvement efforts.

Note: They are not themselves issues created to address specific technical requirements and for that reason, I have closed them for now so they are not bloating the queues. Eventually, they can be reviewed and if it's decided, they can be re-activated to be part of the DevSecOps projects.

emvaldes · 2025-01-07T07:49:09Z

Analyze Current Concurrency Limits

Goal: Understand the system's existing concurrency behavior and identify potential bottlenecks.

Tasks:

Review Historical Logs and Metrics
- Sub-Tasks:
  1. Query Azure Log Analytics for historical concurrency metrics.
  2. Identify peak concurrency scenarios and their associated performance impacts.
  3. Analyze error rates during high-concurrency periods.
Identify Concurrency Constraints
- Sub-Tasks:
  1. Document system components that impact concurrency, such as:
    - Database connection limits.
    - Thread pool sizes in APIs or batch processors.
    - Queue throughput (e.g., Azure Service Bus or Event Hub).
  2. Verify any existing concurrency throttling mechanisms.
Baseline Performance Metrics
- Sub-Tasks:
  1. Define acceptable thresholds for concurrency testing:
    - Maximum latency (e.g., <500ms per batch).
    - Error rates (e.g., <1% failures).
    - Resource utilization (e.g., CPU/memory usage <80%).
  2. Collect baseline metrics for the current concurrency setup.

emvaldes · 2025-01-07T07:49:31Z

Design and Execute Concurrency Tests

Goal: Measure system performance while increasing the number of concurrent senders for different batch sizes.

Tasks:

Prepare the Test Environment
- Sub-Tasks:
  1. Ensure a high-fidelity staging environment that mirrors production.
  2. Configure monitoring tools (Azure Monitor, Application Insights) to capture concurrency metrics.
Define Test Scenarios
- Sub-Tasks:
  1. Test concurrency levels with batch sizes of 500, 1000, 2500, and 5000.
  2. Increment concurrency levels gradually:
    - Example: Start with 1 sender and increase to 5, 10, 20, 50, and 100 senders.
  3. Include failure scenarios (e.g., one sender failing mid-batch).
Execute Concurrency Tests
- Sub-Tasks:
  1. Use tools like K6, Locust, or JMeter to simulate multiple senders.
  2. Monitor performance metrics for each concurrency level:
    - Latency (average, 95th percentile).
    - Error rates (e.g., failed batches).
    - Resource usage (CPU, memory, disk I/O).
  3. Record system behavior at each concurrency level to identify thresholds.
Monitor During Testing
- Sub-Tasks:
  1. Use Azure Monitor to track resource utilization (CPU, memory, disk IOPS).
  2. Use Application Insights to monitor latency and errors in real time.

emvaldes · 2025-01-07T07:49:54Z

Analyze Results and Determine Concurrency Thresholds

Goal: Evaluate the performance metrics to identify the maximum concurrency levels for different batch sizes.

Tasks:

Aggregate Test Results
- Sub-Tasks:
  1. Consolidate metrics for each batch size and concurrency level.
  2. Plot concurrency levels vs. latency, error rates, and resource usage.
Identify Concurrency Limits
- Sub-Tasks:
  1. Determine the concurrency level at which:
    - Latency begins to spike.
    - Error rates exceed acceptable thresholds (<1%).
    - Resource utilization exceeds safe levels (>80% CPU or memory usage).
  2. Document the maximum concurrency for each batch size tested.
Document Bottlenecks
- Sub-Tasks:
  1. Identify components that limit concurrency (e.g., database, queue, API thread pools).
  2. Provide recommendations for addressing these bottlenecks (e.g., increasing database connection limits, optimizing queue throughput).

emvaldes · 2025-01-07T07:50:24Z

Validate Findings in Production

Goal: Test the identified concurrency thresholds in a controlled production environment.

Tasks:

Pilot Concurrency Levels
- Sub-Tasks:
  1. Select a subset of production workloads to test concurrency thresholds.
  2. Monitor performance metrics closely during pilot runs.
Compare with Historical Metrics
- Sub-Tasks:
  1. Analyze how the system performs compared to historical high-concurrency events.
  2. Document any deviations or unexpected behaviors.
Finalize Concurrency Recommendations
- Sub-Tasks:
  1. Prepare a detailed report summarizing findings and the recommended concurrency limits.
  2. Present results to stakeholders for validation and approval.

emvaldes · 2025-01-07T07:51:23Z

Monitoring Batch Concurrency Metrics

1. Metrics to Monitor

Batch Processing Metrics:
- Number of concurrent batches processed.
- Average processing time per batch.
- Error rates during concurrent processing.
Infrastructure Metrics:
- CPU, memory, and disk IOPS utilization during concurrency tests.
Queue Metrics (if applicable):
- Queue depth (number of unprocessed messages).
- Message processing latency.

2. KQL Queries for Monitoring Concurrency

Batch Processing Latency (Grouped by Sender):

customEvents
| where name == "BatchProcessed"
| summarize AvgLatency = avg(todouble(customDimensions['duration'])), P95Latency = percentile(todouble(customDimensions['duration']), 95) by customDimensions['senderId'], bin(timestamp, 1m)

Error Rate by Sender:

customEvents
| where name == "BatchError"
| summarize ErrorCount = count() by customDimensions['senderId'], bin(timestamp, 1m)

Resource Utilization During Concurrency Testing:

Perf
| where ObjectName == "Processor" and CounterName == "% Processor Time"
| summarize AvgCPU = avg(CounterValue) by bin(TimeGenerated, 1m)

Throughput (Batches Processed Per Second):

customEvents
| where name == "BatchProcessed"
| summarize Throughput = count() by bin(timestamp, 1s)

emvaldes · 2025-01-07T07:51:49Z

Visualizing Concurrency Metrics

Line Charts:
- Plot concurrency levels vs. latency and error rates.
- Example: A chart showing how latency increases as concurrency increases.
Bar Charts:
- Show the number of errors per sender at different concurrency levels.
Heatmaps:
- Visualize resource utilization (CPU, memory) at different concurrency levels.

emvaldes added DevSecOps Team Aq DevSecOps work label platform-current Platform - Current Capabilities reportstream labels Jan 7, 2025

emvaldes assigned emvaldes and devopsmatt Jan 7, 2025

emvaldes added this to the todo milestone Jan 7, 2025

emvaldes changed the title ~~DevSecOps : Reproducing Production Setup~~ DevSecOps : Identify Current Max Batch Concurrency Jan 7, 2025

emvaldes added the documentation Tickets that add documentation on existing features and services label Jan 7, 2025

emvaldes closed this as completed Jan 28, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

DevSecOps : Identify Current Max Batch Concurrency #17002

DevSecOps : Identify Current Max Batch Concurrency #17002

emvaldes commented Jan 7, 2025 •

edited

Loading

emvaldes commented Jan 7, 2025

emvaldes commented Jan 7, 2025

emvaldes commented Jan 7, 2025

emvaldes commented Jan 7, 2025

emvaldes commented Jan 7, 2025 •

edited

Loading

emvaldes commented Jan 7, 2025

DevSecOps : Identify Current Max Batch Concurrency #17002

DevSecOps : Identify Current Max Batch Concurrency #17002

Comments

emvaldes commented Jan 7, 2025 • edited Loading

Objective:

Deliverables

emvaldes commented Jan 7, 2025

Analyze Current Concurrency Limits

emvaldes commented Jan 7, 2025

Design and Execute Concurrency Tests

emvaldes commented Jan 7, 2025

Analyze Results and Determine Concurrency Thresholds

emvaldes commented Jan 7, 2025

Validate Findings in Production

emvaldes commented Jan 7, 2025 • edited Loading

Monitoring Batch Concurrency Metrics

1. Metrics to Monitor

2. KQL Queries for Monitoring Concurrency

emvaldes commented Jan 7, 2025

Visualizing Concurrency Metrics

emvaldes commented Jan 7, 2025 •

edited

Loading

emvaldes commented Jan 7, 2025 •

edited

Loading