diff --git a/tools/scxtop/README.md b/tools/scxtop/README.md index d772d3c8d3..2526f76d49 100644 --- a/tools/scxtop/README.md +++ b/tools/scxtop/README.md @@ -185,6 +185,44 @@ vtime delta should remain rather stable as DSQs are consumed. If a scheduler is scheduling this field may be blank. image +### Performance Counters (Perf Stat View) + +The Perf Stat view (press **C**) provides real-time hardware and software performance counters, +similar to `perf stat -add`. This view shows: + +**Counter Collection:** +- Hardware: cycles, instructions, branches, cache references/misses, pipeline stalls +- Software: context-switches, cpu-migrations, page-faults +- All counters update in real-time at the configured tick rate + +**Derived Metrics (Automatically Calculated):** +- **IPC** (Instructions Per Cycle): Indicates CPU efficiency +- **Cache Miss Rate**: Shows cache effectiveness +- **Branch Miss Rate**: Indicates branch prediction performance +- **Frontend/Backend Stalls**: Shows pipeline bottlenecks + +**Aggregation Levels** (toggle with 'a' key): +- **LLC**: Per-cache-domain view (default) - Best for understanding cache locality +- **NUMA**: Per-NUMA-node view - Best for memory locality analysis +- **System**: System-wide totals - Best for overall performance overview + +**Visualization Modes** (toggle with 'v' key): +- **Table Mode**: Detailed counter values, rates/sec, and contextual notes +- **Chart Mode**: 6 sparkline charts showing historical trends with min/max/avg statistics + +**Process Filtering:** +- Press 'p' to filter by currently selected process +- Press 'c' to clear filter and return to system-wide view + +**Key Features:** +- Color-coded metrics (green/yellow/red) based on performance quality +- Full-width sparklines that adapt to terminal size +- Per-LLC and per-NUMA views show grid of panels for quick comparison +- Comprehensive statistics: Current, Average, Min, Max values + +This view is invaluable for top-down performance analysis, identifying whether workloads +are CPU-bound, memory-bound, or experiencing cache/branch prediction issues. + ## MCP Mode - AI-Assisted Scheduler Analysis `scxtop` includes a Model Context Protocol (MCP) server that exposes scheduler observability @@ -280,9 +318,12 @@ claude --mcp scxtop "Summarize my system's scheduler" - `get_topology` - Get hardware topology with configurable detail level - `list_event_subsystems` - List available tracing event subsystems - `list_events` - List specific kprobe or perf events with pagination -- `start_perf_profiling` - Start CPU profiling with stack traces +- `start_perf_profiling` - Start CPU profiling with stack traces (sampling mode) - `stop_perf_profiling` - Stop profiling and prepare results - `get_perf_results` - Get symbolized flamegraph data +- `start_perf_stat` - Start performance counter collection (counting mode) +- `stop_perf_stat` - Stop counter collection +- `get_perf_stat_results` - Get counter statistics with derived metrics (IPC, cache miss rates, etc.) - `control_event_tracking` - Enable/disable BPF event collection - `control_stats_collection` - Control BPF statistics sampling - `control_analyzers` - Start/stop event analyzers @@ -332,6 +373,15 @@ claude --mcp scxtop "Summarize my system's scheduler" "What kernel functions are consuming the most CPU?" → Claude starts profiling, collects samples, and analyzes results + +"Show me performance counters aggregated by LLC" +→ Claude uses start_perf_stat with llc aggregation and retrieves metrics + +"What's the IPC and cache miss rate for each NUMA node?" +→ Claude starts perf stat collection and queries with node aggregation + +"Is this workload CPU-bound or memory-bound?" +→ Claude analyzes perf stat metrics (IPC, cache miss rate, stalls) ``` **Claude Code CLI** - Direct command line usage: @@ -382,13 +432,16 @@ This enables AI assistants to perform continuous monitoring and proactive analys See [CLAUDE_INTEGRATION.md](CLAUDE_INTEGRATION.md) for detailed examples and usage patterns. +See [MCP_PERF_STAT.md](MCP_PERF_STAT.md) for complete performance counter API reference. + ## Documentation ### User Guides - **[docs/PERFETTO_TRACE_ANALYSIS.md](docs/PERFETTO_TRACE_ANALYSIS.md)** - Complete perfetto trace analysis guide -- **[docs/TASK_THREAD_DEBUGGING_GUIDE.md](docs/TASK_THREAD_DEBUGGING_GUIDE.md)** - Task/thread debugging workflows +- **[docs/TASK_THREAD_DEBUGGING_GUIDE.md](docs/TASK_THREAD_DEBUGGING_GUIDE.md)** - Task/thread debugging workflows - **[docs/PROTOBUF_LOADING_VERIFIED.md](docs/PROTOBUF_LOADING_VERIFIED.md)** - Protobuf loading verification - **[docs/README.md](docs/README.md)** - Documentation index +- **[MCP_PERF_STAT.md](MCP_PERF_STAT.md)** - Performance counter collection MCP API ### Implementation Documentation - **[COMPLETE_IMPLEMENTATION_SUMMARY.md](COMPLETE_IMPLEMENTATION_SUMMARY.md)** - Full implementation overview diff --git a/tools/scxtop/docs/README.md b/tools/scxtop/docs/README.md index 2306957c12..d06cd38630 100644 --- a/tools/scxtop/docs/README.md +++ b/tools/scxtop/docs/README.md @@ -6,6 +6,7 @@ - **[PERFETTO_TRACE_ANALYSIS.md](PERFETTO_TRACE_ANALYSIS.md)** - Complete guide to perfetto trace analysis - **[TASK_THREAD_DEBUGGING_GUIDE.md](TASK_THREAD_DEBUGGING_GUIDE.md)** - Task/thread debugging workflows - **[PROTOBUF_LOADING_VERIFIED.md](PROTOBUF_LOADING_VERIFIED.md)** - Protobuf file loading verification +- **[../MCP_PERF_STAT.md](../MCP_PERF_STAT.md)** - Performance counter collection API and usage ### Main README - **[../README.md](../README.md)** - scxtop main documentation with MCP integration @@ -19,6 +20,7 @@ Located in `tools/scxtop/` root: ### MCP Integration - **[CLAUDE_INTEGRATION.md](../CLAUDE_INTEGRATION.md)** - Setting up Claude with scxtop MCP - **[MCP_INTEGRATIONS.md](../MCP_INTEGRATIONS.md)** - MCP protocol details +- **[MCP_PERF_STAT.md](../MCP_PERF_STAT.md)** - Performance counter collection MCP API ## Documentation Structure @@ -32,7 +34,8 @@ tools/scxtop/ │ └── PROTOBUF_LOADING_VERIFIED.md # Protobuf verification ├── examples/ │ └── perfetto_trace_analysis_examples.json -└── CLAUDE_INTEGRATION.md # Claude setup +├── CLAUDE_INTEGRATION.md # Claude setup +└── MCP_PERF_STAT.md # Performance counter API ``` @@ -40,7 +43,8 @@ tools/scxtop/ 1. **New to perfetto analysis?** → Read [PERFETTO_TRACE_ANALYSIS.md](PERFETTO_TRACE_ANALYSIS.md) 2. **Need to debug a task?** → Read [TASK_THREAD_DEBUGGING_GUIDE.md](TASK_THREAD_DEBUGGING_GUIDE.md) -3. **Verify protobuf loading?** → Read [PROTOBUF_LOADING_VERIFIED.md](PROTOBUF_LOADING_VERIFIED.md) +3. **Want performance counters?** → Read [MCP_PERF_STAT.md](../MCP_PERF_STAT.md) +4. **Verify protobuf loading?** → Read [PROTOBUF_LOADING_VERIFIED.md](PROTOBUF_LOADING_VERIFIED.md) ## Features Summary @@ -60,6 +64,13 @@ tools/scxtop/ 13. Wakeup→Schedule Correlation ### MCP Tools + +#### Performance Counter Tools +1. start_perf_stat - **Start counter collection** +2. stop_perf_stat - **Stop collection** +3. get_perf_stat_results - **Get metrics with LLC/NUMA/CPU/System aggregation** + +#### Perfetto Trace Analysis Tools 1. load_perfetto_trace - **Load protobuf files** 2. query_trace_events 3. analyze_trace_scheduling @@ -68,3 +79,8 @@ tools/scxtop/ 6. find_scheduling_bottlenecks 7. correlate_wakeup_to_schedule 8. export_trace_analysis + +#### Live Profiling Tools +1. start_perf_profiling - **Start sampling profiler** +2. stop_perf_profiling - **Stop profiler** +3. get_perf_results - **Get symbolized stack traces** diff --git a/tools/scxtop/src/app.rs b/tools/scxtop/src/app.rs index d7058cc35e..fd9391e2f3 100644 --- a/tools/scxtop/src/app.rs +++ b/tools/scxtop/src/app.rs @@ -214,6 +214,12 @@ pub struct App<'a> { perf_top_filtered_symbols: Vec<(String, crate::symbol_data::SymbolSample)>, has_perf_cap: bool, + // perf stat related + perf_stat_collector: crate::PerfStatCollector, + perf_stat_view_mode: crate::PerfStatViewMode, + perf_stat_aggregation: crate::PerfStatAggregationLevel, + perf_stat_filter_pid: Option, + // capability warnings for non-root users capability_warnings: Vec, } @@ -444,6 +450,10 @@ impl<'a> App<'a> { current_sampling_event: None, perf_top_table_state: TableState::default(), perf_top_filtered_symbols: Vec::new(), + perf_stat_collector: crate::PerfStatCollector::new(), + perf_stat_view_mode: crate::PerfStatViewMode::Table, + perf_stat_aggregation: crate::PerfStatAggregationLevel::Llc, + perf_stat_filter_pid: None, capability_warnings: Vec::new(), }; @@ -690,6 +700,10 @@ impl<'a> App<'a> { current_sampling_event: None, perf_top_table_state: TableState::default(), perf_top_filtered_symbols: Vec::new(), + perf_stat_collector: crate::PerfStatCollector::new(), + perf_stat_view_mode: crate::PerfStatViewMode::Table, + perf_stat_aggregation: crate::PerfStatAggregationLevel::Llc, + perf_stat_filter_pid: None, capability_warnings: Vec::new(), }; @@ -743,6 +757,16 @@ impl<'a> App<'a> { self.detach_perf_sampling(); } } + (prev, AppState::PerfStat) if prev != AppState::PerfStat => { + // Entering PerfStat view - start counter collection + if let Err(e) = self.start_perf_stat_collection() { + log::error!("Failed to start perf stat collection: {}", e); + } + } + (AppState::PerfStat, new) if new != AppState::PerfStat => { + // Leaving PerfStat view - stop counter collection + self.stop_perf_stat_collection(); + } _ => {} } @@ -1084,6 +1108,7 @@ impl<'a> App<'a> { AppState::Node => self.on_tick_node(), AppState::PerfEvent | AppState::KprobeEvent => self.on_tick_events(), AppState::PerfTop => self.on_tick_perf_top(), + AppState::PerfStat => self.on_tick_perf_stat(), AppState::Power => self.on_tick_power(), AppState::Process => self.on_tick_process(), AppState::Scheduler => self.on_tick_scheduler(), @@ -2510,6 +2535,7 @@ impl<'a> App<'a> { AppState::Node => self.render_node(frame), AppState::Llc => self.render_llc(frame), AppState::PerfTop => self.render_perf_top(frame), + AppState::PerfStat => self.render_perf_stat(frame), AppState::Power => self.render_power(frame), AppState::Scheduler => { if self.has_capability_warnings() { @@ -2860,6 +2886,31 @@ impl<'a> App<'a> { ), Style::default(), )), + Line::from(Span::styled( + format!( + "{}: display perf stat view (performance counters)", + self.config + .active_keymap + .action_keys_string(Action::SetState(AppState::PerfStat)) + ), + Style::default(), + )), + Line::from(Span::styled( + " v: toggle table/chart view (in Perf Stat)", + Style::default(), + )), + Line::from(Span::styled( + " a: toggle aggregation (System/LLC/NUMA) (in Perf Stat)", + Style::default(), + )), + Line::from(Span::styled( + " p: filter by selected process (in Perf Stat)", + Style::default(), + )), + Line::from(Span::styled( + " c: clear process filter, return to system-wide (in Perf Stat)", + Style::default(), + )), Line::from(Span::styled( format!( "{}: display power monitoring view", @@ -3792,6 +3843,25 @@ impl<'a> App<'a> { ) } + /// Renders the perf stat view with performance counters. + fn render_perf_stat(&mut self, frame: &mut Frame) -> Result<()> { + use crate::render::perf_stat::{PerfStatRenderer, PerfStatViewParams}; + + let params = PerfStatViewParams { + collector: &self.perf_stat_collector, + view_mode: &self.perf_stat_view_mode, + aggregation: &self.perf_stat_aggregation, + filter_pid: self.perf_stat_filter_pid, + proc_data: &self.proc_data, + tick_rate_ms: self.config.tick_rate_ms(), + localize: self.localize, + locale: &self.locale, + theme: self.theme(), + }; + + PerfStatRenderer::render_perf_stat_view(frame, frame.area(), ¶ms) + } + /// Renders the network application state. fn render_network(&mut self, frame: &mut Frame) -> Result<()> { let theme = self.theme(); @@ -5463,7 +5533,13 @@ impl<'a> App<'a> { // XXX handle error } } - Action::NextViewState => self.next_view_state(), + Action::NextViewState => { + if self.state == AppState::PerfStat { + self.toggle_perf_stat_view_mode(); + } else { + self.next_view_state(); + } + } Action::SchedReg => { self.on_scheduler_load()?; } @@ -5708,6 +5784,35 @@ impl<'a> App<'a> { Action::Esc => { self.on_escape()?; } + Action::TogglePerfStatViewMode => { + if self.state == AppState::PerfStat { + self.toggle_perf_stat_view_mode(); + } + } + Action::TogglePerfStatAggregation => { + if self.state == AppState::PerfStat { + self.toggle_perf_stat_aggregation(); + } + } + Action::SetPerfStatFilter(pid) => { + if self.state == AppState::PerfStat { + self.set_perf_stat_filter(*pid)?; + } + } + Action::ApplyPerfStatProcessFilter => { + if self.state == AppState::PerfStat { + if let Err(e) = self.apply_process_filter_to_perf_stat() { + log::error!("Failed to apply process filter: {}", e); + } + } + } + Action::ClearPerfStatFilter => { + if self.state == AppState::PerfStat { + if let Err(e) = self.clear_perf_stat_filter() { + log::error!("Failed to clear process filter: {}", e); + } + } + } _ => {} }; Ok(()) @@ -7433,6 +7538,120 @@ impl<'a> App<'a> { Ok(()) } + /// Perf Stat view: update performance counters + fn on_tick_perf_stat(&mut self) -> Result<()> { + if !self.perf_stat_collector.is_active() { + return Ok(()); + } + + let now = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .unwrap() + .as_millis(); + + let duration = self.config.tick_rate_ms() as u128; + + self.perf_stat_collector.update(now, duration)?; + + Ok(()) + } + + /// Start perf stat collection (called when entering PerfStat view) + pub fn start_perf_stat_collection(&mut self) -> Result<()> { + if self.perf_stat_collector.is_active() { + return Ok(()); + } + + // Build CPU to LLC/Node mappings from topology + let mut cpu_to_llc = BTreeMap::new(); + let mut cpu_to_node = BTreeMap::new(); + for (cpu_id, cpu) in &self.topo.all_cpus { + cpu_to_llc.insert(*cpu_id, cpu.llc_id); + cpu_to_node.insert(*cpu_id, cpu.node_id); + } + self.perf_stat_collector + .set_topology(cpu_to_llc, cpu_to_node); + + let num_cpus = self.cpu_data.len(); + + // Use terminal width for history sizing (like scheduler view does) + let history_size = self.terminal_width.saturating_sub(2) as usize; + self.perf_stat_collector + .init_system_wide_with_history_size(num_cpus, history_size)?; + + // If a process is selected, also collect for it + if let Some(pid) = self.perf_stat_filter_pid { + self.perf_stat_collector.init_process(pid)?; + } + + Ok(()) + } + + /// Stop perf stat collection (called when exiting PerfStat view) + pub fn stop_perf_stat_collection(&mut self) { + self.perf_stat_collector.cleanup(); + } + + /// Set process filter for perf stat + pub fn set_perf_stat_filter(&mut self, pid: Option) -> Result<()> { + self.perf_stat_filter_pid = pid; + + if self.perf_stat_collector.is_active() { + // Reinitialize with new filter + self.stop_perf_stat_collection(); + self.start_perf_stat_collection()?; + } + + Ok(()) + } + + /// Toggle perf stat view mode (table <-> chart) + pub fn toggle_perf_stat_view_mode(&mut self) { + self.perf_stat_view_mode = self.perf_stat_view_mode.next(); + } + + /// Toggle perf stat aggregation level (System -> LLC -> NUMA -> System) + pub fn toggle_perf_stat_aggregation(&mut self) { + self.perf_stat_aggregation = self.perf_stat_aggregation.next(); + } + + /// Get currently selected process PID (from Process view or current selection) + pub fn get_selected_process_pid(&self) -> Option { + // First try to get from selected_process field + if let Some(pid) = self.selected_process { + return Some(pid); + } + + // Fall back to filtered state + if let Ok(filtered) = self.filtered_state.lock() { + if filtered.selected < filtered.list.len() { + if let Some(pid) = filtered.list[filtered.selected].as_int() { + return Some(pid); + } + } + } + + None + } + + /// Apply current process selection as perf stat filter + pub fn apply_process_filter_to_perf_stat(&mut self) -> Result<()> { + if let Some(pid) = self.get_selected_process_pid() { + log::info!("Applying perf stat filter to PID {}", pid); + self.set_perf_stat_filter(Some(pid))?; + } else { + log::warn!("No process selected to filter by"); + } + Ok(()) + } + + /// Clear perf stat filter (return to system-wide) + pub fn clear_perf_stat_filter(&mut self) -> Result<()> { + log::info!("Clearing perf stat filter - returning to system-wide view"); + self.set_perf_stat_filter(None)?; + Ok(()) + } + /// MangoApp view: minimal system data fn on_tick_mango_app(&mut self) -> Result<()> { if let Some(ref mut skel) = self.skel { diff --git a/tools/scxtop/src/keymap.rs b/tools/scxtop/src/keymap.rs index a0f8c25eb2..3e213112c0 100644 --- a/tools/scxtop/src/keymap.rs +++ b/tools/scxtop/src/keymap.rs @@ -59,6 +59,7 @@ impl Default for KeyMap { bindings.insert(Key::Char('w'), Action::SetState(AppState::Power)); bindings.insert(Key::Char('s'), Action::SetState(AppState::Scheduler)); bindings.insert(Key::Char('S'), Action::SaveConfig); + bindings.insert(Key::Char('C'), Action::SetState(AppState::PerfStat)); bindings.insert(Key::Char('a'), Action::RequestTrace); bindings.insert(Key::Char('x'), Action::ClearEvent); bindings.insert(Key::Char('j'), Action::PrevEvent); diff --git a/tools/scxtop/src/lib.rs b/tools/scxtop/src/lib.rs index e5b4afd78f..f1259aa4ff 100644 --- a/tools/scxtop/src/lib.rs +++ b/tools/scxtop/src/lib.rs @@ -23,6 +23,7 @@ pub mod mcp; mod mem_stats; pub mod network_stats; mod node_data; +mod perf_stat_data; mod perfetto_trace; mod power_data; mod proc_data; @@ -51,6 +52,9 @@ pub use llc_data::LlcData; pub use mem_stats::MemStatSnapshot; pub use network_stats::NetworkStatSnapshot; pub use node_data::NodeData; +pub use perf_stat_data::{ + DerivedMetrics, PerfStatCollector, PerfStatCounters, PerfStatHistory, SharedPerfStatCollector, +}; pub use perfetto_trace::PerfettoTraceManager; pub use power_data::{ CStateInfo, CorePowerData, PowerDataCollector, PowerSnapshot, SystemPowerData, @@ -113,6 +117,8 @@ pub enum AppState { PerfEvent, /// Application is in the perf top view state. PerfTop, + /// Application is in the perf stat view state. + PerfStat, /// Application is in the Power state. Power, /// Application is in the Process state. @@ -182,6 +188,57 @@ impl std::fmt::Display for ComponentViewState { } } +#[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)] +pub enum PerfStatViewMode { + Table, + Chart, +} + +impl PerfStatViewMode { + pub fn next(&self) -> Self { + match self { + PerfStatViewMode::Table => PerfStatViewMode::Chart, + PerfStatViewMode::Chart => PerfStatViewMode::Table, + } + } +} + +impl std::fmt::Display for PerfStatViewMode { + fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { + match self { + PerfStatViewMode::Table => write!(f, "table"), + PerfStatViewMode::Chart => write!(f, "chart"), + } + } +} + +#[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)] +pub enum PerfStatAggregationLevel { + System, + Llc, + Node, +} + +impl PerfStatAggregationLevel { + pub fn next(&self) -> Self { + match self { + PerfStatAggregationLevel::System => PerfStatAggregationLevel::Llc, + PerfStatAggregationLevel::Llc => PerfStatAggregationLevel::Node, + PerfStatAggregationLevel::Node => PerfStatAggregationLevel::System, + } + } +} + +impl std::fmt::Display for PerfStatAggregationLevel { + fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result { + match self { + PerfStatAggregationLevel::System => write!(f, "system"), + PerfStatAggregationLevel::Llc => write!(f, "LLC"), + PerfStatAggregationLevel::Node => write!(f, "NUMA"), + } + } +} + #[derive(Debug, Clone, PartialEq)] pub enum FilterItem { String(String), @@ -504,6 +561,12 @@ pub enum Action { Up, UpdateColVisibility(UpdateColVisibilityAction), Wait(WaitAction), + // Perf Stat actions + TogglePerfStatViewMode, + TogglePerfStatAggregation, + SetPerfStatFilter(Option), + ApplyPerfStatProcessFilter, + ClearPerfStatFilter, None, } diff --git a/tools/scxtop/src/main.rs b/tools/scxtop/src/main.rs index 7879e597e0..267778fbd1 100644 --- a/tools/scxtop/src/main.rs +++ b/tools/scxtop/src/main.rs @@ -81,6 +81,12 @@ fn handle_key_event(app: &App, keymap: &KeyMap, key: KeyEvent) -> Action { match (app.state(), c) { // In BPF program detail view, 'p' toggles perf sampling (AppState::BpfProgramDetail, 'p') => Action::ToggleBpfPerfSampling, + // In PerfStat view, 'p' applies process filter + (AppState::PerfStat, 'p') => Action::ApplyPerfStatProcessFilter, + // In PerfStat view, 'c' clears process filter + (AppState::PerfStat, 'c') => Action::ClearPerfStatFilter, + // In PerfStat view, 'a' toggles aggregation level (System/LLC/NUMA) + (AppState::PerfStat, 'a') => Action::TogglePerfStatAggregation, // Fall back to global keymap for all other cases _ => keymap.action(&Key::Char(c)), } diff --git a/tools/scxtop/src/mcp/events.rs b/tools/scxtop/src/mcp/events.rs index 3038a1789c..792fb70446 100644 --- a/tools/scxtop/src/mcp/events.rs +++ b/tools/scxtop/src/mcp/events.rs @@ -282,6 +282,11 @@ pub fn action_to_mcp_event(action: &Action) -> Option { | Action::ToggleLocalization | Action::ToggleHwPressure | Action::ToggleUncoreFreq + | Action::TogglePerfStatViewMode + | Action::TogglePerfStatAggregation + | Action::SetPerfStatFilter(_) + | Action::ApplyPerfStatProcessFilter + | Action::ClearPerfStatFilter | Action::Up => None, } } diff --git a/tools/scxtop/src/mcp/server.rs b/tools/scxtop/src/mcp/server.rs index bc3f3ad9c3..10a7556ea9 100644 --- a/tools/scxtop/src/mcp/server.rs +++ b/tools/scxtop/src/mcp/server.rs @@ -181,6 +181,14 @@ impl McpServer { self.bpf_stats = Some(bpf_stats); self.perf_profiler = Some(perf_profiler); + + // Create perf stat collector + let perf_stat_collector = crate::SharedPerfStatCollector::new(); + + // Set perf stat collector in tools + self.tools + .set_perf_stat_collector(perf_stat_collector.clone()); + self } diff --git a/tools/scxtop/src/mcp/tools.rs b/tools/scxtop/src/mcp/tools.rs index fe3fab3c6d..9b0b577b86 100644 --- a/tools/scxtop/src/mcp/tools.rs +++ b/tools/scxtop/src/mcp/tools.rs @@ -9,6 +9,7 @@ use super::SharedAnalyzerControl; use anyhow::{anyhow, Result}; use perfetto_protos::ftrace_event::ftrace_event; use serde_json::{json, Value}; +use std::collections::BTreeMap; use std::sync::Arc; type TraceCache = @@ -17,6 +18,7 @@ type TraceCache = pub struct McpTools { topo: Option>, perf_profiler: Option, + perf_stat_collector: Option, event_control: Option, analyzer_control: Option, trace_cache: Option, @@ -33,6 +35,7 @@ impl McpTools { Self { topo: None, perf_profiler: None, + perf_stat_collector: None, event_control: None, analyzer_control: None, trace_cache: None, @@ -54,6 +57,10 @@ impl McpTools { self.perf_profiler = Some(profiler); } + pub fn set_perf_stat_collector(&mut self, collector: crate::SharedPerfStatCollector) { + self.perf_stat_collector = Some(collector); + } + pub fn set_event_control(&mut self, control: super::SharedEventControl) { self.event_control = Some(control); } @@ -225,6 +232,61 @@ impl McpTools { } }), }, + McpTool { + name: "start_perf_stat".to_string(), + description: "Start performance counter collection (counting mode, not sampling)" + .to_string(), + input_schema: json!({ + "type": "object", + "properties": { + "aggregation": { + "type": "string", + "enum": ["system", "llc", "node"], + "description": "Aggregation level: 'system' (total), 'llc' (per-LLC), or 'node' (per-NUMA node)", + "default": "llc" + }, + "pid": { + "type": "integer", + "description": "Process ID to monitor (-1 for system-wide)", + "default": -1 + }, + "history_size": { + "type": "integer", + "description": "Number of samples to keep in history for trend analysis", + "default": 100 + } + } + }), + }, + McpTool { + name: "stop_perf_stat".to_string(), + description: "Stop performance counter collection".to_string(), + input_schema: json!({ + "type": "object", + "properties": {} + }), + }, + McpTool { + name: "get_perf_stat_results".to_string(), + description: "Get performance counter statistics with derived metrics (IPC, cache miss rates, etc.)" + .to_string(), + input_schema: json!({ + "type": "object", + "properties": { + "aggregation": { + "type": "string", + "enum": ["system", "cpu", "llc", "node", "process"], + "description": "Aggregation level to retrieve: 'system', 'cpu', 'llc', 'node', or 'process'", + "default": "system" + }, + "include_history": { + "type": "boolean", + "description": "Include historical data for trend analysis", + "default": false + } + } + }), + }, McpTool { name: "control_event_tracking".to_string(), description: @@ -585,6 +647,9 @@ impl McpTools { "start_perf_profiling" => self.tool_start_perf_profiling(arguments), "stop_perf_profiling" => self.tool_stop_perf_profiling(arguments), "get_perf_results" => self.tool_get_perf_results(arguments), + "start_perf_stat" => self.tool_start_perf_stat(arguments), + "stop_perf_stat" => self.tool_stop_perf_stat(arguments), + "get_perf_stat_results" => self.tool_get_perf_stat_results(arguments), "control_event_tracking" => self.tool_control_event_tracking(arguments), "control_stats_collection" => self.tool_control_stats_collection(arguments), "control_analyzers" => self.tool_control_analyzers(arguments), @@ -1092,6 +1157,246 @@ impl McpTools { })) } + fn tool_start_perf_stat(&self, args: &Value) -> Result { + let collector = self + .perf_stat_collector + .as_ref() + .ok_or_else(|| anyhow!("Perf stat collector not available"))?; + + let topo = self + .topo + .as_ref() + .ok_or_else(|| anyhow!("Topology not available"))?; + + let pid = args.get("pid").and_then(|v| v.as_i64()).unwrap_or(-1) as i32; + let history_size = args + .get("history_size") + .and_then(|v| v.as_u64()) + .unwrap_or(100) as usize; + + // Build topology mappings + let mut cpu_to_llc = BTreeMap::new(); + let mut cpu_to_node = BTreeMap::new(); + for (cpu_id, cpu) in &topo.all_cpus { + cpu_to_llc.insert(*cpu_id, cpu.llc_id); + cpu_to_node.insert(*cpu_id, cpu.node_id); + } + collector.set_topology(cpu_to_llc, cpu_to_node); + + // Initialize collection + let num_cpus = topo.all_cpus.len(); + + if pid > 0 { + // Per-process monitoring: only initialize process events to avoid exhausting hardware PMU counters + collector.init_process(pid)?; + } else { + // System-wide monitoring: initialize per-CPU events + collector.init_system_wide_with_history_size(num_cpus, history_size)?; + } + + Ok(json!({ + "content": [{ + "type": "text", + "text": format!( + "Performance counter collection started:\n\n• CPUs: {}\n• LLCs: {}\n• NUMA nodes: {}\n• Process filter: {}\n• History size: {}\n\nCollecting: cycles, instructions, branches, cache, context-switches, migrations, page-faults, stalls", + num_cpus, + topo.all_llcs.len(), + topo.nodes.len(), + if pid > 0 { format!("PID {}", pid) } else { "None (system-wide)".to_string() }, + history_size + ) + }] + })) + } + + fn tool_stop_perf_stat(&self, _args: &Value) -> Result { + let collector = self + .perf_stat_collector + .as_ref() + .ok_or_else(|| anyhow!("Perf stat collector not available"))?; + + if !collector.is_active() { + return Ok(json!({ + "content": [{ + "type": "text", + "text": "Performance counter collection is not active." + }] + })); + } + + collector.cleanup(); + + Ok(json!({ + "content": [{ + "type": "text", + "text": "Performance counter collection stopped.\n\nUse get_perf_stat_results to retrieve final statistics." + }] + })) + } + + fn tool_get_perf_stat_results(&self, args: &Value) -> Result { + let collector = self + .perf_stat_collector + .as_ref() + .ok_or_else(|| anyhow!("Perf stat collector not available"))?; + + if !collector.is_active() { + return Ok(json!({ + "content": [{ + "type": "text", + "text": "Performance counter collection is not active. Use start_perf_stat to begin collection." + }] + })); + } + + // Update counters to get current values + let now = std::time::SystemTime::now() + .duration_since(std::time::UNIX_EPOCH) + .unwrap() + .as_millis(); + let duration_ms = 100; // Default polling interval + collector.update(now, duration_ms)?; + + let aggregation = args + .get("aggregation") + .and_then(|v| v.as_str()) + .unwrap_or("system"); + let include_history = args + .get("include_history") + .and_then(|v| v.as_bool()) + .unwrap_or(false); + + let result; + + match aggregation { + "system" => { + let counters = collector.get_system_counters(); + let metrics = counters.derived_metrics(); + result = + Self::format_counters_json(&counters, &metrics, "System-Wide", include_history); + } + "cpu" => { + let per_cpu = collector.get_per_cpu_counters(); + let mut cpu_results = Vec::new(); + for (cpu_id, counters) in per_cpu { + let metrics = counters.derived_metrics(); + cpu_results.push(json!({ + "cpu_id": cpu_id, + "counters": Self::counters_to_json(&counters), + "metrics": Self::metrics_to_json(&metrics), + })); + } + result = json!({ + "per_cpu": cpu_results, + "total_cpus": cpu_results.len(), + }); + } + "llc" => { + let per_llc = collector.get_per_llc_counters(); + let mut llc_results = Vec::new(); + for (llc_id, counters) in per_llc { + let metrics = counters.derived_metrics(); + llc_results.push(json!({ + "llc_id": llc_id, + "counters": Self::counters_to_json(&counters), + "metrics": Self::metrics_to_json(&metrics), + })); + } + result = json!({ + "per_llc": llc_results, + "total_llcs": llc_results.len(), + }); + } + "node" => { + let per_node = collector.get_per_node_counters(); + let mut node_results = Vec::new(); + for (node_id, counters) in per_node { + let metrics = counters.derived_metrics(); + node_results.push(json!({ + "node_id": node_id, + "counters": Self::counters_to_json(&counters), + "metrics": Self::metrics_to_json(&metrics), + })); + } + result = json!({ + "per_node": node_results, + "total_nodes": node_results.len(), + }); + } + "process" => { + if let Some(counters) = collector.get_process_counters() { + let metrics = counters.derived_metrics(); + let pid = collector.filter_pid().unwrap_or(-1); + result = json!({ + "pid": pid, + "counters": Self::counters_to_json(&counters), + "metrics": Self::metrics_to_json(&metrics), + }); + } else { + return Ok(json!({ + "content": [{ + "type": "text", + "text": "No process filter is active. Use start_perf_stat with pid parameter to filter by process." + }] + })); + } + } + _ => { + return Err(anyhow!( + "Invalid aggregation level: {}. Use 'system', 'cpu', 'llc', 'node', or 'process'", + aggregation + )); + } + } + + Ok(json!({ + "content": [{ + "type": "text", + "text": serde_json::to_string_pretty(&result) + .unwrap_or_else(|_| "Failed to serialize results".to_string()) + }] + })) + } + + fn counters_to_json(counters: &crate::PerfStatCounters) -> Value { + json!({ + "cycles": counters.cycles_delta, + "instructions": counters.instructions_delta, + "branches": counters.branches_delta, + "branch_misses": counters.branch_misses_delta, + "cache_references": counters.cache_references_delta, + "cache_misses": counters.cache_misses_delta, + "stalled_cycles_frontend": counters.stalled_cycles_frontend_delta, + "stalled_cycles_backend": counters.stalled_cycles_backend_delta, + "context_switches": counters.context_switches_delta, + "cpu_migrations": counters.cpu_migrations_delta, + "page_faults": counters.page_faults_delta, + }) + } + + fn metrics_to_json(metrics: &crate::DerivedMetrics) -> Value { + json!({ + "ipc": format!("{:.3}", metrics.ipc), + "cache_miss_rate": format!("{:.2}%", metrics.cache_miss_rate), + "branch_miss_rate": format!("{:.2}%", metrics.branch_miss_rate), + "stalled_frontend_pct": format!("{:.2}%", metrics.stalled_frontend_pct), + "stalled_backend_pct": format!("{:.2}%", metrics.stalled_backend_pct), + }) + } + + fn format_counters_json( + counters: &crate::PerfStatCounters, + metrics: &crate::DerivedMetrics, + label: &str, + _include_history: bool, + ) -> Value { + json!({ + "label": label, + "counters": Self::counters_to_json(counters), + "metrics": Self::metrics_to_json(metrics), + }) + } + fn tool_control_event_tracking(&self, args: &Value) -> Result { let control = self .event_control diff --git a/tools/scxtop/src/perf_stat_data.rs b/tools/scxtop/src/perf_stat_data.rs new file mode 100644 index 0000000000..67ccc31359 --- /dev/null +++ b/tools/scxtop/src/perf_stat_data.rs @@ -0,0 +1,693 @@ +// Copyright (c) Meta Platforms, Inc. and affiliates. +// +// This software may be used and distributed according to the terms of the +// GNU General Public License version 2. + +use crate::PerfEvent; +use anyhow::Result; +use std::collections::BTreeMap; + +/// Stores counter values for a specific scope (system-wide or per-process) +#[derive(Clone, Debug, Default)] +pub struct PerfStatCounters { + // Raw counter values (latest read) + pub cycles: u64, + pub instructions: u64, + pub branches: u64, + pub branch_misses: u64, + pub cache_references: u64, + pub cache_misses: u64, + pub stalled_cycles_frontend: u64, + pub stalled_cycles_backend: u64, + pub context_switches: u64, + pub cpu_migrations: u64, + pub page_faults: u64, + + // Delta values (difference since last read) + pub cycles_delta: u64, + pub instructions_delta: u64, + pub branches_delta: u64, + pub branch_misses_delta: u64, + pub cache_references_delta: u64, + pub cache_misses_delta: u64, + pub stalled_cycles_frontend_delta: u64, + pub stalled_cycles_backend_delta: u64, + pub context_switches_delta: u64, + pub cpu_migrations_delta: u64, + pub page_faults_delta: u64, + + // Timestamp of last update + pub last_update_ms: u128, +} + +impl PerfStatCounters { + /// Update counter values and calculate deltas + pub fn update(&mut self, new_values: &PerfStatCounters, timestamp_ms: u128) { + self.cycles_delta = new_values.cycles.saturating_sub(self.cycles); + self.instructions_delta = new_values.instructions.saturating_sub(self.instructions); + self.branches_delta = new_values.branches.saturating_sub(self.branches); + self.branch_misses_delta = new_values.branch_misses.saturating_sub(self.branch_misses); + self.cache_references_delta = new_values + .cache_references + .saturating_sub(self.cache_references); + self.cache_misses_delta = new_values.cache_misses.saturating_sub(self.cache_misses); + self.stalled_cycles_frontend_delta = new_values + .stalled_cycles_frontend + .saturating_sub(self.stalled_cycles_frontend); + self.stalled_cycles_backend_delta = new_values + .stalled_cycles_backend + .saturating_sub(self.stalled_cycles_backend); + self.context_switches_delta = new_values + .context_switches + .saturating_sub(self.context_switches); + self.cpu_migrations_delta = new_values + .cpu_migrations + .saturating_sub(self.cpu_migrations); + self.page_faults_delta = new_values.page_faults.saturating_sub(self.page_faults); + + // Update absolute values + self.cycles = new_values.cycles; + self.instructions = new_values.instructions; + self.branches = new_values.branches; + self.branch_misses = new_values.branch_misses; + self.cache_references = new_values.cache_references; + self.cache_misses = new_values.cache_misses; + self.stalled_cycles_frontend = new_values.stalled_cycles_frontend; + self.stalled_cycles_backend = new_values.stalled_cycles_backend; + self.context_switches = new_values.context_switches; + self.cpu_migrations = new_values.cpu_migrations; + self.page_faults = new_values.page_faults; + + self.last_update_ms = timestamp_ms; + } + + /// Calculate derived metrics + pub fn derived_metrics(&self) -> DerivedMetrics { + DerivedMetrics::calculate(self) + } +} + +/// Derived performance metrics calculated from raw counters +#[derive(Clone, Debug, Default)] +pub struct DerivedMetrics { + pub ipc: f64, // Instructions per cycle + pub cache_miss_rate: f64, // Cache misses / cache references + pub branch_miss_rate: f64, // Branch misses / branches + pub stalled_frontend_pct: f64, // Frontend stalls / cycles + pub stalled_backend_pct: f64, // Backend stalls / cycles +} + +impl DerivedMetrics { + pub fn calculate(counters: &PerfStatCounters) -> Self { + let ipc = if counters.cycles_delta > 0 { + counters.instructions_delta as f64 / counters.cycles_delta as f64 + } else { + 0.0 + }; + + let cache_miss_rate = if counters.cache_references_delta > 0 { + (counters.cache_misses_delta as f64 / counters.cache_references_delta as f64) * 100.0 + } else { + 0.0 + }; + + let branch_miss_rate = if counters.branches_delta > 0 { + (counters.branch_misses_delta as f64 / counters.branches_delta as f64) * 100.0 + } else { + 0.0 + }; + + let stalled_frontend_pct = if counters.cycles_delta > 0 { + (counters.stalled_cycles_frontend_delta as f64 / counters.cycles_delta as f64) * 100.0 + } else { + 0.0 + }; + + let stalled_backend_pct = if counters.cycles_delta > 0 { + (counters.stalled_cycles_backend_delta as f64 / counters.cycles_delta as f64) * 100.0 + } else { + 0.0 + }; + + Self { + ipc, + cache_miss_rate, + branch_miss_rate, + stalled_frontend_pct, + stalled_backend_pct, + } + } +} + +/// Stores historical delta values for chart visualization +#[derive(Clone, Debug)] +pub struct PerfStatHistory { + max_size: usize, + pub ipc_history: Vec, + pub cache_miss_rate_history: Vec, + pub branch_miss_rate_history: Vec, + pub stalled_frontend_pct_history: Vec, + pub stalled_backend_pct_history: Vec, + pub instructions_per_sec: Vec, + pub cycles_per_sec: Vec, +} + +impl PerfStatHistory { + pub fn new(max_size: usize) -> Self { + Self { + max_size, + ipc_history: Vec::new(), + cache_miss_rate_history: Vec::new(), + branch_miss_rate_history: Vec::new(), + stalled_frontend_pct_history: Vec::new(), + stalled_backend_pct_history: Vec::new(), + instructions_per_sec: Vec::new(), + cycles_per_sec: Vec::new(), + } + } + + pub fn push( + &mut self, + metrics: &DerivedMetrics, + counters: &PerfStatCounters, + duration_ms: u128, + ) { + let duration_secs = duration_ms as f64 / 1000.0; + let instructions_rate = if duration_secs > 0.0 { + (counters.instructions_delta as f64 / duration_secs) as u64 + } else { + 0 + }; + let cycles_rate = if duration_secs > 0.0 { + (counters.cycles_delta as f64 / duration_secs) as u64 + } else { + 0 + }; + + self.ipc_history.push(metrics.ipc); + self.cache_miss_rate_history.push(metrics.cache_miss_rate); + self.branch_miss_rate_history.push(metrics.branch_miss_rate); + self.stalled_frontend_pct_history + .push(metrics.stalled_frontend_pct); + self.stalled_backend_pct_history + .push(metrics.stalled_backend_pct); + self.instructions_per_sec.push(instructions_rate); + self.cycles_per_sec.push(cycles_rate); + + // Trim to max size + if self.ipc_history.len() > self.max_size { + self.ipc_history.remove(0); + self.cache_miss_rate_history.remove(0); + self.branch_miss_rate_history.remove(0); + self.stalled_frontend_pct_history.remove(0); + self.stalled_backend_pct_history.remove(0); + self.instructions_per_sec.remove(0); + self.cycles_per_sec.remove(0); + } + } +} + +/// Manages perf stat counter collection +pub struct PerfStatCollector { + // Per-CPU perf events (system-wide mode) + cpu_events: BTreeMap>, + + // Per-process perf events (filtered mode) + process_events: Option>, + + // Collected counter data + pub system_counters: PerfStatCounters, + pub per_cpu_counters: BTreeMap, + pub per_llc_counters: BTreeMap, + pub per_node_counters: BTreeMap, + pub process_counters: Option, + + // Historical data for charts (store last N deltas) + pub system_history: PerfStatHistory, + pub per_cpu_history: BTreeMap, + pub per_llc_history: BTreeMap, + pub per_node_history: BTreeMap, + + // CPU to LLC/Node mapping + cpu_to_llc: BTreeMap, + cpu_to_node: BTreeMap, + + // Configuration + filter_pid: Option, + is_active: bool, +} + +impl PerfStatCollector { + pub fn new() -> Self { + Self { + cpu_events: BTreeMap::new(), + process_events: None, + system_counters: PerfStatCounters::default(), + per_cpu_counters: BTreeMap::new(), + per_llc_counters: BTreeMap::new(), + per_node_counters: BTreeMap::new(), + process_counters: None, + system_history: PerfStatHistory::new(100), + per_cpu_history: BTreeMap::new(), + per_llc_history: BTreeMap::new(), + per_node_history: BTreeMap::new(), + cpu_to_llc: BTreeMap::new(), + cpu_to_node: BTreeMap::new(), + filter_pid: None, + is_active: false, + } + } + + /// Set CPU to LLC/Node topology mappings + pub fn set_topology( + &mut self, + cpu_to_llc: BTreeMap, + cpu_to_node: BTreeMap, + ) { + self.cpu_to_llc = cpu_to_llc; + self.cpu_to_node = cpu_to_node; + } + + /// Initialize perf events for system-wide collection with custom history size + pub fn init_system_wide_with_history_size( + &mut self, + num_cpus: usize, + history_size: usize, + ) -> Result<()> { + self.cleanup(); + + // Update history sizes + let history_size = history_size.max(10); + self.system_history = PerfStatHistory::new(history_size); + + // Initialize LLC and NUMA node histories based on topology + for &llc_id in self.cpu_to_llc.values() { + self.per_llc_history + .entry(llc_id) + .or_insert_with(|| PerfStatHistory::new(history_size)); + self.per_llc_counters.entry(llc_id).or_default(); + } + + for &node_id in self.cpu_to_node.values() { + self.per_node_history + .entry(node_id) + .or_insert_with(|| PerfStatHistory::new(history_size)); + self.per_node_counters.entry(node_id).or_default(); + } + + for cpu in 0..num_cpus { + let mut events = Vec::new(); + + // Create perf events for all counters + let event_specs = vec![ + ("hw", "cycles"), + ("hw", "instructions"), + ("hw", "branches"), + ("hw", "branch-misses"), + ("hw", "cache-references"), + ("hw", "cache-misses"), + ("hw", "stalled-cycles-frontend"), + ("hw", "stalled-cycles-backend"), + ("sw", "context-switches"), + ("sw", "cpu-migrations"), + ("sw", "page-faults"), + ]; + + for (subsystem, event_name) in event_specs { + let mut event = PerfEvent::new(subsystem.to_string(), event_name.to_string(), cpu as i32); + // Attach in counting mode (no sampling) + if let Err(e) = event.attach(-1) { + log::warn!("Failed to attach {} for CPU {}: {}", event_name, cpu, e); + continue; + } + events.push(event); + } + + if !events.is_empty() { + self.cpu_events.insert(cpu, events); + self.per_cpu_counters + .insert(cpu, PerfStatCounters::default()); + self.per_cpu_history + .insert(cpu, PerfStatHistory::new(history_size)); + } + } + + self.is_active = true; + Ok(()) + } + + /// Initialize perf events for system-wide collection (uses default history size) + pub fn init_system_wide(&mut self, num_cpus: usize) -> Result<()> { + self.init_system_wide_with_history_size(num_cpus, 100) + } + + /// Initialize perf events for per-process collection + pub fn init_process(&mut self, pid: i32) -> Result<()> { + let mut events = Vec::new(); + let mut failed_events = Vec::new(); + + let event_specs = vec![ + ("hw", "cycles"), + ("hw", "instructions"), + ("hw", "branches"), + ("hw", "branch-misses"), + ("hw", "cache-references"), + ("hw", "cache-misses"), + ("hw", "stalled-cycles-frontend"), + ("hw", "stalled-cycles-backend"), + ("sw", "context-switches"), + ("sw", "cpu-migrations"), + ("sw", "page-faults"), + ]; + + let total_events = event_specs.len(); + + for (subsystem, event_name) in &event_specs { + // CPU -1 means monitor on all CPUs + let mut event = PerfEvent::new(subsystem.to_string(), event_name.to_string(), -1); + if let Err(e) = event.attach(pid) { + log::warn!("Failed to attach {} for PID {}: {}", event_name, pid, e); + failed_events.push(*event_name); + continue; + } + events.push(event); + } + + if events.is_empty() { + return Err(anyhow::anyhow!( + "Failed to attach any perf events for PID {} (process may have terminated or insufficient permissions)", + pid + )); + } + + if !failed_events.is_empty() { + log::info!( + "Successfully attached {} of {} events for PID {} (failed: {})", + events.len(), + total_events, + pid, + failed_events.join(", ") + ); + } + + self.process_events = Some(events); + self.process_counters = Some(PerfStatCounters::default()); + self.filter_pid = Some(pid); + self.is_active = true; + + Ok(()) + } + + /// Read all counters and update data structures + pub fn update(&mut self, timestamp_ms: u128, duration_ms: u128) -> Result<()> { + if !self.is_active { + return Ok(()); + } + + // Track if any counters succeeded + let mut any_success = false; + + // Read process counters first if filtering by PID + if let Some(events) = &mut self.process_events { + let mut proc_counters = PerfStatCounters::default(); + let mut proc_success = false; + + for event in events { + if let Ok(value) = event.value(false) { + proc_success = true; + match event.event_name() { + "cycles" | "cpu-cycles" => proc_counters.cycles = value, + "instructions" => proc_counters.instructions = value, + "branches" => proc_counters.branches = value, + "branch-misses" => proc_counters.branch_misses = value, + "cache-references" => proc_counters.cache_references = value, + "cache-misses" => proc_counters.cache_misses = value, + "stalled-cycles-frontend" => proc_counters.stalled_cycles_frontend = value, + "stalled-cycles-backend" => proc_counters.stalled_cycles_backend = value, + "context-switches" => proc_counters.context_switches = value, + "cpu-migrations" => proc_counters.cpu_migrations = value, + "page-faults" => proc_counters.page_faults = value, + _ => {} + } + } + } + + if proc_success { + any_success = true; + if let Some(prev) = &mut self.process_counters { + prev.update(&proc_counters, timestamp_ms); + } + } + } + + // Read system-wide counters + let mut system_total = PerfStatCounters::default(); + + for (cpu, events) in &mut self.cpu_events { + let mut cpu_counters = PerfStatCounters::default(); + let mut cpu_success = false; + + for event in events { + match event.value(false) { + Ok(value) => { + cpu_success = true; + match event.event_name() { + "cycles" | "cpu-cycles" => cpu_counters.cycles = value, + "instructions" => cpu_counters.instructions = value, + "branches" => cpu_counters.branches = value, + "branch-misses" => cpu_counters.branch_misses = value, + "cache-references" => cpu_counters.cache_references = value, + "cache-misses" => cpu_counters.cache_misses = value, + "stalled-cycles-frontend" => { + cpu_counters.stalled_cycles_frontend = value + } + "stalled-cycles-backend" => cpu_counters.stalled_cycles_backend = value, + "context-switches" => cpu_counters.context_switches = value, + "cpu-migrations" => cpu_counters.cpu_migrations = value, + "page-faults" => cpu_counters.page_faults = value, + _ => {} + } + } + Err(e) => { + log::debug!( + "Failed to read {} for CPU {}: {}", + event.event_name(), + cpu, + e + ); + } + } + } + + if cpu_success { + any_success = true; + + // Update per-CPU data with deltas + if let Some(prev) = self.per_cpu_counters.get_mut(cpu) { + prev.update(&cpu_counters, timestamp_ms); + + // Update history + if let Some(history) = self.per_cpu_history.get_mut(cpu) { + let metrics = prev.derived_metrics(); + history.push(&metrics, prev, duration_ms); + } + } + + // Aggregate to system total + system_total.cycles += cpu_counters.cycles; + system_total.instructions += cpu_counters.instructions; + system_total.branches += cpu_counters.branches; + system_total.branch_misses += cpu_counters.branch_misses; + system_total.cache_references += cpu_counters.cache_references; + system_total.cache_misses += cpu_counters.cache_misses; + system_total.stalled_cycles_frontend += cpu_counters.stalled_cycles_frontend; + system_total.stalled_cycles_backend += cpu_counters.stalled_cycles_backend; + system_total.context_switches += cpu_counters.context_switches; + system_total.cpu_migrations += cpu_counters.cpu_migrations; + system_total.page_faults += cpu_counters.page_faults; + } + } + + // Aggregate per-CPU counters into LLC and NUMA node totals + let mut llc_totals: BTreeMap = BTreeMap::new(); + let mut node_totals: BTreeMap = BTreeMap::new(); + + for (cpu, counters) in &self.per_cpu_counters { + // Aggregate by LLC + if let Some(&llc_id) = self.cpu_to_llc.get(cpu) { + let llc_counter = llc_totals.entry(llc_id).or_default(); + llc_counter.cycles += counters.cycles; + llc_counter.instructions += counters.instructions; + llc_counter.branches += counters.branches; + llc_counter.branch_misses += counters.branch_misses; + llc_counter.cache_references += counters.cache_references; + llc_counter.cache_misses += counters.cache_misses; + llc_counter.stalled_cycles_frontend += counters.stalled_cycles_frontend; + llc_counter.stalled_cycles_backend += counters.stalled_cycles_backend; + llc_counter.context_switches += counters.context_switches; + llc_counter.cpu_migrations += counters.cpu_migrations; + llc_counter.page_faults += counters.page_faults; + } + + // Aggregate by NUMA node + if let Some(&node_id) = self.cpu_to_node.get(cpu) { + let node_counter = node_totals.entry(node_id).or_default(); + node_counter.cycles += counters.cycles; + node_counter.instructions += counters.instructions; + node_counter.branches += counters.branches; + node_counter.branch_misses += counters.branch_misses; + node_counter.cache_references += counters.cache_references; + node_counter.cache_misses += counters.cache_misses; + node_counter.stalled_cycles_frontend += counters.stalled_cycles_frontend; + node_counter.stalled_cycles_backend += counters.stalled_cycles_backend; + node_counter.context_switches += counters.context_switches; + node_counter.cpu_migrations += counters.cpu_migrations; + node_counter.page_faults += counters.page_faults; + } + } + + // Update LLC counters with deltas and history + for (llc_id, llc_total) in llc_totals { + if let Some(prev) = self.per_llc_counters.get_mut(&llc_id) { + prev.update(&llc_total, timestamp_ms); + if let Some(history) = self.per_llc_history.get_mut(&llc_id) { + let metrics = prev.derived_metrics(); + history.push(&metrics, prev, duration_ms); + } + } + } + + // Update NUMA node counters with deltas and history + for (node_id, node_total) in node_totals { + if let Some(prev) = self.per_node_counters.get_mut(&node_id) { + prev.update(&node_total, timestamp_ms); + if let Some(history) = self.per_node_history.get_mut(&node_id) { + let metrics = prev.derived_metrics(); + history.push(&metrics, prev, duration_ms); + } + } + } + + if !any_success { + log::warn!("Failed to read any perf counters this cycle"); + return Ok(()); + } + + // Update system counters with deltas + self.system_counters.update(&system_total, timestamp_ms); + let metrics = self.system_counters.derived_metrics(); + self.system_history + .push(&metrics, &self.system_counters, duration_ms); + + Ok(()) + } + + /// Cleanup all perf events + pub fn cleanup(&mut self) { + self.cpu_events.clear(); + self.process_events = None; + self.is_active = false; + } + + pub fn is_active(&self) -> bool { + self.is_active + } + + pub fn filter_pid(&self) -> Option { + self.filter_pid + } + + pub fn has_process_counters(&self) -> bool { + self.process_counters.is_some() + } +} + +impl Default for PerfStatCollector { + fn default() -> Self { + Self::new() + } +} + +/// Thread-safe wrapper for PerfStatCollector +#[derive(Clone)] +pub struct SharedPerfStatCollector { + inner: Arc>, +} + +impl SharedPerfStatCollector { + pub fn new() -> Self { + Self { + inner: Arc::new(Mutex::new(PerfStatCollector::new())), + } + } + + pub fn set_topology( + &self, + cpu_to_llc: BTreeMap, + cpu_to_node: BTreeMap, + ) { + self.inner + .lock() + .unwrap() + .set_topology(cpu_to_llc, cpu_to_node); + } + + pub fn init_system_wide_with_history_size( + &self, + num_cpus: usize, + history_size: usize, + ) -> Result<()> { + self.inner + .lock() + .unwrap() + .init_system_wide_with_history_size(num_cpus, history_size) + } + + pub fn init_process(&self, pid: i32) -> Result<()> { + self.inner.lock().unwrap().init_process(pid) + } + + pub fn update(&self, timestamp_ms: u128, duration_ms: u128) -> Result<()> { + self.inner.lock().unwrap().update(timestamp_ms, duration_ms) + } + + pub fn cleanup(&self) { + self.inner.lock().unwrap().cleanup(); + } + + pub fn is_active(&self) -> bool { + self.inner.lock().unwrap().is_active() + } + + pub fn get_system_counters(&self) -> PerfStatCounters { + self.inner.lock().unwrap().system_counters.clone() + } + + pub fn get_per_cpu_counters(&self) -> BTreeMap { + self.inner.lock().unwrap().per_cpu_counters.clone() + } + + pub fn get_per_llc_counters(&self) -> BTreeMap { + self.inner.lock().unwrap().per_llc_counters.clone() + } + + pub fn get_per_node_counters(&self) -> BTreeMap { + self.inner.lock().unwrap().per_node_counters.clone() + } + + pub fn get_process_counters(&self) -> Option { + self.inner.lock().unwrap().process_counters.clone() + } + + pub fn filter_pid(&self) -> Option { + self.inner.lock().unwrap().filter_pid() + } +} + +impl Default for SharedPerfStatCollector { + fn default() -> Self { + Self::new() + } +} + +use std::sync::{Arc, Mutex}; diff --git a/tools/scxtop/src/profiling_events/mod.rs b/tools/scxtop/src/profiling_events/mod.rs index 5e768f6c37..8bd8e0ca85 100644 --- a/tools/scxtop/src/profiling_events/mod.rs +++ b/tools/scxtop/src/profiling_events/mod.rs @@ -38,7 +38,7 @@ impl ProfilingEvent { match self { ProfilingEvent::Perf(p) => { let mut p = p.clone(); - p.cpu = cpu; + p.cpu = cpu as i32; p.attach(process)?; Ok(ProfilingEvent::Perf(p)) } diff --git a/tools/scxtop/src/profiling_events/perf.rs b/tools/scxtop/src/profiling_events/perf.rs index a9e48c9acf..2bb07fc28b 100644 --- a/tools/scxtop/src/profiling_events/perf.rs +++ b/tools/scxtop/src/profiling_events/perf.rs @@ -50,7 +50,8 @@ pub fn perf_event_config(subsystem: &str, event: &str) -> Result { pub struct PerfEvent { pub subsystem: String, pub event: String, - pub cpu: usize, + /// CPU to monitor: -1 for all CPUs, or a specific CPU number + pub cpu: i32, pub alias: String, pub use_config: bool, pub event_type: u32, @@ -72,7 +73,12 @@ impl Drop for PerfEvent { impl PerfEvent { /// Creates a PerfEvent. - pub fn new(subsystem: String, event: String, cpu: usize) -> Self { + /// + /// # Arguments + /// * `subsystem` - The perf subsystem (e.g., "hw", "sw", "tracepoint") + /// * `event` - The event name (e.g., "cycles", "instructions") + /// * `cpu` - CPU to monitor: -1 for all CPUs, or a specific CPU number + pub fn new(subsystem: String, event: String, cpu: i32) -> Self { Self { subsystem, event, @@ -114,7 +120,11 @@ impl PerfEvent { } /// Returns a perf event from a string. - pub fn from_str_args(event: &str, cpu: usize) -> Result { + /// + /// # Arguments + /// * `event` - Event string in format "subsystem:event" (e.g., "hw:cycles") + /// * `cpu` - CPU to monitor: -1 for all CPUs, or a specific CPU number + pub fn from_str_args(event: &str, cpu: i32) -> Result { let event_parts: Vec<&str> = event.split(':').collect(); if event_parts.len() != 2 { anyhow::bail!("Invalid perf event: {}", event); @@ -270,11 +280,12 @@ impl PerfEvent { attrs.set_disabled(0); attrs.set_exclude_kernel(0); attrs.set_exclude_hv(0); - attrs.set_inherit(if process_id == -1 { 1 } else { 0 }); + // Enable inherit for per-process monitoring to track all threads + attrs.set_inherit(if process_id > 0 { 1 } else { 0 }); attrs.set_pinned(1); let result = - unsafe { perf::perf_event_open(&mut attrs, process_id, self.cpu as i32, -1, 0) }; + unsafe { perf::perf_event_open(&mut attrs, process_id, self.cpu, -1, 0) }; if result < 0 { return Err(anyhow!( diff --git a/tools/scxtop/src/render/mod.rs b/tools/scxtop/src/render/mod.rs index 2d4008d605..2d37d08225 100644 --- a/tools/scxtop/src/render/mod.rs +++ b/tools/scxtop/src/render/mod.rs @@ -10,6 +10,8 @@ pub mod memory; // Network rendering pub mod network; +// Perf stat rendering +pub mod perf_stat; // Scheduler rendering pub mod scheduler; // BPF program rendering @@ -18,5 +20,6 @@ pub mod bpf_programs; pub use bpf_programs::BpfProgramRenderer; pub use memory::MemoryRenderer; pub use network::NetworkRenderer; +pub use perf_stat::PerfStatRenderer; pub use process::ProcessRenderer; pub use scheduler::SchedulerRenderer; diff --git a/tools/scxtop/src/render/perf_stat.rs b/tools/scxtop/src/render/perf_stat.rs new file mode 100644 index 0000000000..d39784a3a4 --- /dev/null +++ b/tools/scxtop/src/render/perf_stat.rs @@ -0,0 +1,1235 @@ +// Copyright (c) Meta Platforms, Inc. and affiliates. +// +// This software may be used and distributed according to the terms of the +// GNU General Public License version 2. + +use crate::{AppTheme, PerfStatCollector, PerfStatCounters, PerfStatViewMode, ProcData}; +use anyhow::Result; +use num_format::{SystemLocale, ToFormattedString}; +use ratatui::layout::{Alignment, Constraint, Layout, Rect}; +use ratatui::style::{Color, Style}; +use ratatui::text::Line; +use ratatui::widgets::{Block, BorderType, Cell, Padding, Paragraph, Row, Sparkline, Table}; +use ratatui::Frame; +use std::collections::BTreeMap; + +/// Parameters for rendering perf stat view +pub struct PerfStatViewParams<'a> { + pub collector: &'a PerfStatCollector, + pub view_mode: &'a PerfStatViewMode, + pub aggregation: &'a crate::PerfStatAggregationLevel, + pub filter_pid: Option, + pub proc_data: &'a BTreeMap, + pub tick_rate_ms: usize, + pub localize: bool, + pub locale: &'a SystemLocale, + pub theme: &'a AppTheme, +} + +/// Renderer for perf stat view +pub struct PerfStatRenderer; + +impl PerfStatRenderer { + /// Main render entry point + pub fn render_perf_stat_view( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + if !params.collector.is_active() { + Self::render_inactive_message(frame, area, params.theme); + return Ok(()); + } + + // Dispatch based on aggregation level + use crate::PerfStatAggregationLevel; + match params.aggregation { + PerfStatAggregationLevel::System => { + // Render single system-wide view + match params.view_mode { + PerfStatViewMode::Table => Self::render_table_view(frame, area, params), + PerfStatViewMode::Chart => Self::render_chart_view(frame, area, params), + } + } + PerfStatAggregationLevel::Llc => { + // Render grid of LLC panels + Self::render_llc_grid(frame, area, params) + } + PerfStatAggregationLevel::Node => { + // Render grid of NUMA node panels + Self::render_node_grid(frame, area, params) + } + } + } + + /// Render table view showing all counters and derived metrics + fn render_table_view(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> { + // Check if we're filtering by process but don't have counters + if params.filter_pid.is_some() && !params.collector.has_process_counters() { + Self::render_process_filter_error( + frame, + area, + params.filter_pid.unwrap(), + params.theme, + ); + return Ok(()); + } + + // Split into sections: header, system counters, derived metrics, per-CPU summary + let [header_area, counters_area, metrics_area, percpu_area] = Layout::vertical([ + Constraint::Length(3), + Constraint::Percentage(40), + Constraint::Percentage(30), + Constraint::Percentage(30), + ]) + .areas(area); + + // Render header with title and filter info + Self::render_header(frame, header_area, params)?; + + // Render main counter table + Self::render_counter_table(frame, counters_area, params)?; + + // Render derived metrics table + Self::render_derived_metrics_table(frame, metrics_area, params)?; + + // Render per-CPU summary + Self::render_per_cpu_summary(frame, percpu_area, params)?; + + Ok(()) + } + + /// Render header section + fn render_header(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> { + let title = if let Some(pid) = params.filter_pid { + let proc_name = params + .proc_data + .get(&pid) + .map(|p| p.process_name.as_str()) + .unwrap_or("unknown"); + format!( + "Performance Counter Statistics - {} (PID: {})", + proc_name, pid + ) + } else { + "Performance Counter Statistics (System-Wide)".to_string() + }; + + let num_cpus = params.collector.per_cpu_counters.len(); + let status_text = format!("● Active - {} CPUs monitored", num_cpus); + + let mode_text = if params.filter_pid.is_some() { + format!( + "View: {} | Aggregation: {} | 'v' view | 'a' agg | 'c' clear filter", + match params.view_mode { + PerfStatViewMode::Table => "Table", + PerfStatViewMode::Chart => "Chart", + }, + params.aggregation + ) + } else { + format!( + "View: {} | Aggregation: {} | 'v' view | 'a' agg | 'p' filter", + match params.view_mode { + PerfStatViewMode::Table => "Table", + PerfStatViewMode::Chart => "Chart", + }, + params.aggregation + ) + }; + + let block = Block::bordered() + .title_top( + Line::from(title) + .style(params.theme.title_style()) + .centered(), + ) + .title_top( + Line::from(status_text) + .style(params.theme.text_important_color()) + .right_aligned(), + ) + .title_bottom( + Line::from(mode_text) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()); + + frame.render_widget(block, area); + Ok(()) + } + + /// Render main counter table (similar to perf stat output) + fn render_counter_table( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + // Use process counters if filtering, otherwise system counters + let counters = if params.filter_pid.is_some() { + params + .collector + .process_counters + .as_ref() + .unwrap_or(¶ms.collector.system_counters) + } else { + ¶ms.collector.system_counters + }; + let duration_ms = params.tick_rate_ms as f64 / 1000.0; + + // Table columns: Counter Name | Value | Rate | Notes + let header = Row::new(vec![ + Cell::from("Counter").style(params.theme.title_style()), + Cell::from("Value").style(params.theme.title_style()), + Cell::from("Rate/sec").style(params.theme.title_style()), + Cell::from("Notes").style(params.theme.title_style()), + ]); + + let rows = vec![ + Self::create_counter_row( + "cpu-clock", + counters.cycles_delta, + duration_ms, + format!( + "{:.3} CPUs utilized", + counters.cycles_delta as f64 / duration_ms / 1_000_000_000.0 + ), + params, + ), + Self::create_counter_row( + "context-switches", + counters.context_switches_delta, + duration_ms, + format!( + "{:.3} K/sec", + counters.context_switches_delta as f64 / duration_ms / 1000.0 + ), + params, + ), + Self::create_counter_row( + "cpu-migrations", + counters.cpu_migrations_delta, + duration_ms, + format!( + "{:.3} K/sec", + counters.cpu_migrations_delta as f64 / duration_ms / 1000.0 + ), + params, + ), + Self::create_counter_row( + "page-faults", + counters.page_faults_delta, + duration_ms, + format!( + "{:.3} K/sec", + counters.page_faults_delta as f64 / duration_ms / 1000.0 + ), + params, + ), + Self::create_counter_row( + "cycles", + counters.cycles_delta, + duration_ms, + format!( + "{:.3} GHz", + counters.cycles_delta as f64 / duration_ms / 1_000_000_000.0 + ), + params, + ), + Self::create_counter_row( + "instructions", + counters.instructions_delta, + duration_ms, + Self::format_ipc_note(counters), + params, + ), + Self::create_counter_row( + "branches", + counters.branches_delta, + duration_ms, + format!( + "{:.3} M/sec", + counters.branches_delta as f64 / duration_ms / 1_000_000.0 + ), + params, + ), + Self::create_counter_row( + "branch-misses", + counters.branch_misses_delta, + duration_ms, + Self::format_branch_miss_note(counters), + params, + ), + Self::create_counter_row( + "cache-references", + counters.cache_references_delta, + duration_ms, + format!( + "{:.3} M/sec", + counters.cache_references_delta as f64 / duration_ms / 1_000_000.0 + ), + params, + ), + Self::create_counter_row( + "cache-misses", + counters.cache_misses_delta, + duration_ms, + Self::format_cache_miss_note(counters), + params, + ), + Self::create_counter_row( + "stalled-cycles-frontend", + counters.stalled_cycles_frontend_delta, + duration_ms, + Self::format_stalled_frontend_note(counters), + params, + ), + Self::create_counter_row( + "stalled-cycles-backend", + counters.stalled_cycles_backend_delta, + duration_ms, + Self::format_stalled_backend_note(counters), + params, + ), + ]; + + let table = Table::new( + rows, + vec![ + Constraint::Percentage(30), // Counter name + Constraint::Percentage(20), // Value + Constraint::Percentage(20), // Rate + Constraint::Percentage(30), // Notes + ], + ) + .header(header) + .block( + Block::bordered() + .title_top( + Line::from("Performance Counters") + .style(params.theme.title_style()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(table, area); + Ok(()) + } + + /// Create a row for the counter table + fn create_counter_row( + name: &str, + value: u64, + duration_secs: f64, + notes: String, + params: &PerfStatViewParams, + ) -> Row<'static> { + let rate = if duration_secs > 0.0 { + (value as f64 / duration_secs) as u64 + } else { + 0 + }; + + let value_str = if params.localize { + value.to_formatted_string(params.locale) + } else { + value.to_string() + }; + + let rate_str = if params.localize { + rate.to_formatted_string(params.locale) + } else { + rate.to_string() + }; + + Row::new(vec![ + Cell::from(name.to_string()).style(params.theme.text_color()), + Cell::from(value_str).style(params.theme.text_important_color()), + Cell::from(rate_str).style(params.theme.text_color()), + Cell::from(notes).style(params.theme.text_color()), + ]) + } + + /// Format IPC note (instructions per cycle) + fn format_ipc_note(counters: &PerfStatCounters) -> String { + let metrics = counters.derived_metrics(); + if counters.cycles_delta > 0 { + format!("{:.2} insn per cycle", metrics.ipc) + } else { + "".to_string() + } + } + + /// Format branch miss note + fn format_branch_miss_note(counters: &PerfStatCounters) -> String { + let metrics = counters.derived_metrics(); + if counters.branches_delta > 0 { + format!("{:.2}% of all branches", metrics.branch_miss_rate) + } else { + "".to_string() + } + } + + /// Format cache miss note + fn format_cache_miss_note(counters: &PerfStatCounters) -> String { + let metrics = counters.derived_metrics(); + if counters.cache_references_delta > 0 { + format!("{:.2}% of all cache accesses", metrics.cache_miss_rate) + } else { + "".to_string() + } + } + + /// Format stalled frontend note + fn format_stalled_frontend_note(counters: &PerfStatCounters) -> String { + let metrics = counters.derived_metrics(); + if counters.cycles_delta > 0 { + format!("{:.2}% frontend cycles idle", metrics.stalled_frontend_pct) + } else { + "".to_string() + } + } + + /// Format stalled backend note + fn format_stalled_backend_note(counters: &PerfStatCounters) -> String { + let metrics = counters.derived_metrics(); + if counters.cycles_delta > 0 { + format!("{:.2}% backend cycles idle", metrics.stalled_backend_pct) + } else { + "".to_string() + } + } + + /// Render derived metrics table (IPC, stalls, etc.) + fn render_derived_metrics_table( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + // Use process counters if filtering, otherwise system counters + let counters = if params.filter_pid.is_some() { + params + .collector + .process_counters + .as_ref() + .unwrap_or(¶ms.collector.system_counters) + } else { + ¶ms.collector.system_counters + }; + let metrics = counters.derived_metrics(); + + let header = Row::new(vec![ + Cell::from("Metric").style(params.theme.title_style()), + Cell::from("Value").style(params.theme.title_style()), + Cell::from("Status").style(params.theme.title_style()), + ]); + + let rows = vec![ + Self::create_metric_row( + "IPC (Instructions per Cycle)", + format!("{:.3}", metrics.ipc), + Self::get_ipc_status_color(metrics.ipc, params.theme), + params, + ), + Self::create_metric_row( + "Cache Miss Rate", + format!("{:.2}%", metrics.cache_miss_rate), + Self::get_miss_rate_status_color(metrics.cache_miss_rate, params.theme), + params, + ), + Self::create_metric_row( + "Branch Miss Rate", + format!("{:.2}%", metrics.branch_miss_rate), + Self::get_miss_rate_status_color(metrics.branch_miss_rate, params.theme), + params, + ), + Self::create_metric_row( + "Frontend Stalls", + format!("{:.2}%", metrics.stalled_frontend_pct), + Self::get_stall_status_color(metrics.stalled_frontend_pct, params.theme), + params, + ), + Self::create_metric_row( + "Backend Stalls", + format!("{:.2}%", metrics.stalled_backend_pct), + Self::get_stall_status_color(metrics.stalled_backend_pct, params.theme), + params, + ), + ]; + + let table = Table::new( + rows, + vec![ + Constraint::Percentage(50), // Metric name + Constraint::Percentage(25), // Value + Constraint::Percentage(25), // Status + ], + ) + .header(header) + .block( + Block::bordered() + .title_top( + Line::from("Derived Metrics") + .style(params.theme.title_style()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(table, area); + Ok(()) + } + + /// Create a row for the derived metrics table + fn create_metric_row( + name: &str, + value: String, + status_color: Color, + params: &PerfStatViewParams, + ) -> Row<'static> { + let status = if status_color == params.theme.positive_value_color() { + "Good" + } else if status_color == params.theme.negative_value_color() { + "Poor" + } else { + "OK" + }; + + Row::new(vec![ + Cell::from(name.to_string()).style(params.theme.text_color()), + Cell::from(value).style(params.theme.text_important_color()), + Cell::from(status.to_string()).style(Style::default().fg(status_color)), + ]) + } + + /// Get status color for IPC (higher is better) + fn get_ipc_status_color(ipc: f64, theme: &AppTheme) -> Color { + if ipc >= 1.0 { + theme.positive_value_color() + } else if ipc >= 0.5 { + theme.text_important_color() + } else { + theme.negative_value_color() + } + } + + /// Get status color for miss rates (lower is better) + fn get_miss_rate_status_color(rate: f64, theme: &AppTheme) -> Color { + if rate < 3.0 { + theme.positive_value_color() + } else if rate < 10.0 { + theme.text_important_color() + } else { + theme.negative_value_color() + } + } + + /// Get status color for stall percentages (lower is better) + fn get_stall_status_color(pct: f64, theme: &AppTheme) -> Color { + if pct < 20.0 { + theme.positive_value_color() + } else if pct < 40.0 { + theme.text_important_color() + } else { + theme.negative_value_color() + } + } + + /// Render per-CPU summary (top CPUs by activity) + fn render_per_cpu_summary( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + // Get top 5 CPUs by instructions + let mut cpu_activity: Vec<(usize, u64)> = params + .collector + .per_cpu_counters + .iter() + .map(|(cpu, counters)| (*cpu, counters.instructions_delta)) + .collect(); + cpu_activity.sort_by(|a, b| b.1.cmp(&a.1)); + cpu_activity.truncate(5); + + let header = Row::new(vec![ + Cell::from("CPU").style(params.theme.title_style()), + Cell::from("Instructions").style(params.theme.title_style()), + Cell::from("IPC").style(params.theme.title_style()), + Cell::from("Cache Miss %").style(params.theme.title_style()), + ]); + + let rows: Vec = cpu_activity + .iter() + .filter_map(|(cpu, _)| { + params.collector.per_cpu_counters.get(cpu).map(|counters| { + let metrics = counters.derived_metrics(); + let instructions_str = if params.localize { + counters + .instructions_delta + .to_formatted_string(params.locale) + } else { + counters.instructions_delta.to_string() + }; + + Row::new(vec![ + Cell::from(format!("CPU {}", cpu)).style(params.theme.text_color()), + Cell::from(instructions_str).style(params.theme.text_important_color()), + Cell::from(format!("{:.2}", metrics.ipc)).style(params.theme.text_color()), + Cell::from(format!("{:.2}%", metrics.cache_miss_rate)) + .style(params.theme.text_color()), + ]) + }) + }) + .collect(); + + let table = Table::new( + rows, + vec![ + Constraint::Percentage(25), + Constraint::Percentage(35), + Constraint::Percentage(20), + Constraint::Percentage(20), + ], + ) + .header(header) + .block( + Block::bordered() + .title_top( + Line::from("Top CPUs by Activity") + .style(params.theme.title_style()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(table, area); + Ok(()) + } + + /// Render chart view (placeholder for Phase 4) + fn render_chart_view(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> { + let history = ¶ms.collector.system_history; + + // Check if we have any data + if history.ipc_history.is_empty() { + Self::render_empty_chart(frame, area, "Collecting data...", params.theme); + return Ok(()); + } + + // Split into grid: 2x3 for different metrics + let [top_row, middle_row, bottom_row] = Layout::vertical([ + Constraint::Percentage(33), + Constraint::Percentage(33), + Constraint::Percentage(34), + ]) + .areas(area); + + let [ipc_area, cache_area] = + Layout::horizontal([Constraint::Percentage(50), Constraint::Percentage(50)]) + .areas(top_row); + + let [branch_area, stalls_area] = + Layout::horizontal([Constraint::Percentage(50), Constraint::Percentage(50)]) + .areas(middle_row); + + let [cycles_area, instructions_area] = + Layout::horizontal([Constraint::Percentage(50), Constraint::Percentage(50)]) + .areas(bottom_row); + + // Render individual charts + Self::render_ipc_chart(frame, ipc_area, params)?; + Self::render_cache_miss_chart(frame, cache_area, params)?; + Self::render_branch_miss_chart(frame, branch_area, params)?; + Self::render_stalls_chart(frame, stalls_area, params)?; + Self::render_cycles_chart(frame, cycles_area, params)?; + Self::render_instructions_chart(frame, instructions_area, params)?; + + Ok(()) + } + + /// Render IPC trend chart + fn render_ipc_chart(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> { + let history = ¶ms.collector.system_history.ipc_history; + + if history.is_empty() { + Self::render_empty_chart(frame, area, "IPC", params.theme); + return Ok(()); + } + + // Convert to u64 for Sparkline (scale by 1000 to preserve precision) + let mut data: Vec = history.iter().map(|&v| (v * 1000.0) as u64).collect(); + + // Adjust data to fill available width (accounting for border) + let target_width = area.width.saturating_sub(2) as usize; + data = Self::adjust_data_for_width(data, target_width); + + let max_val = data.iter().copied().max().unwrap_or(1); + let current = history.last().copied().unwrap_or(0.0); + let avg = history.iter().sum::() / history.len() as f64; + let min = history.iter().copied().fold(f64::INFINITY, f64::min); + let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max); + + let sparkline = Sparkline::default() + .data(&data) + .max(max_val) + .direction(ratatui::widgets::RenderDirection::RightToLeft) + .style(Self::get_ipc_status_color(current, params.theme)) + .block( + Block::bordered() + .title_top( + Line::from("IPC (Instructions per Cycle)") + .style(params.theme.title_style()) + .centered(), + ) + .title_bottom( + Line::from(format!( + "Cur: {:.3} | Avg: {:.3} | Min: {:.3} | Max: {:.3}", + current, avg, min, max + )) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(sparkline, area); + Ok(()) + } + + /// Render cache miss rate chart + fn render_cache_miss_chart( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + let history = ¶ms.collector.system_history.cache_miss_rate_history; + + if history.is_empty() { + Self::render_empty_chart(frame, area, "Cache Miss Rate", params.theme); + return Ok(()); + } + + // Scale percentages by 100 for display + let mut data: Vec = history.iter().map(|&v| (v * 100.0) as u64).collect(); + + // Adjust data to fill available width + let target_width = area.width.saturating_sub(2) as usize; + data = Self::adjust_data_for_width(data, target_width); + + let max_val = data.iter().copied().max().unwrap_or(1); + let current = history.last().copied().unwrap_or(0.0); + let avg = history.iter().sum::() / history.len() as f64; + let min = history.iter().copied().fold(f64::INFINITY, f64::min); + let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max); + + let sparkline = Sparkline::default() + .data(&data) + .max(max_val) + .direction(ratatui::widgets::RenderDirection::RightToLeft) + .style(Self::get_miss_rate_status_color(current, params.theme)) + .block( + Block::bordered() + .title_top( + Line::from("Cache Miss Rate (%)") + .style(params.theme.title_style()) + .centered(), + ) + .title_bottom( + Line::from(format!( + "Cur: {:.2}% | Avg: {:.2}% | Min: {:.2}% | Max: {:.2}%", + current, avg, min, max + )) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(sparkline, area); + Ok(()) + } + + /// Render branch miss rate chart + fn render_branch_miss_chart( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + let history = ¶ms.collector.system_history.branch_miss_rate_history; + + if history.is_empty() { + Self::render_empty_chart(frame, area, "Branch Miss Rate", params.theme); + return Ok(()); + } + + let mut data: Vec = history.iter().map(|&v| (v * 100.0) as u64).collect(); + + // Adjust data to fill available width + let target_width = area.width.saturating_sub(2) as usize; + data = Self::adjust_data_for_width(data, target_width); + + let max_val = data.iter().copied().max().unwrap_or(1); + let current = history.last().copied().unwrap_or(0.0); + let avg = history.iter().sum::() / history.len() as f64; + let min = history.iter().copied().fold(f64::INFINITY, f64::min); + let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max); + + let sparkline = Sparkline::default() + .data(&data) + .max(max_val) + .direction(ratatui::widgets::RenderDirection::RightToLeft) + .style(Self::get_miss_rate_status_color(current, params.theme)) + .block( + Block::bordered() + .title_top( + Line::from("Branch Miss Rate (%)") + .style(params.theme.title_style()) + .centered(), + ) + .title_bottom( + Line::from(format!( + "Cur: {:.2}% | Avg: {:.2}% | Min: {:.2}% | Max: {:.2}%", + current, avg, min, max + )) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(sparkline, area); + Ok(()) + } + + /// Render frontend/backend stalls chart + fn render_stalls_chart( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + let history = ¶ms.collector.system_history.stalled_frontend_pct_history; + + if history.is_empty() { + Self::render_empty_chart(frame, area, "Pipeline Stalls", params.theme); + return Ok(()); + } + + let mut data: Vec = history.iter().map(|&v| (v * 100.0) as u64).collect(); + + // Adjust data to fill available width + let target_width = area.width.saturating_sub(2) as usize; + data = Self::adjust_data_for_width(data, target_width); + + let max_val = data.iter().copied().max().unwrap_or(1); + let current = history.last().copied().unwrap_or(0.0); + let avg = history.iter().sum::() / history.len() as f64; + let min = history.iter().copied().fold(f64::INFINITY, f64::min); + let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max); + + let sparkline = Sparkline::default() + .data(&data) + .max(max_val) + .direction(ratatui::widgets::RenderDirection::RightToLeft) + .style(Self::get_stall_status_color(current, params.theme)) + .block( + Block::bordered() + .title_top( + Line::from("Frontend Stalls (%)") + .style(params.theme.title_style()) + .centered(), + ) + .title_bottom( + Line::from(format!( + "Cur: {:.2}% | Avg: {:.2}% | Min: {:.2}% | Max: {:.2}%", + current, avg, min, max + )) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(sparkline, area); + Ok(()) + } + + /// Render cycles per second chart + fn render_cycles_chart( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + let history = ¶ms.collector.system_history.cycles_per_sec; + + if history.is_empty() { + Self::render_empty_chart(frame, area, "Cycles/sec", params.theme); + return Ok(()); + } + + // Adjust data to fill available width + let target_width = area.width.saturating_sub(2) as usize; + let data = Self::adjust_data_for_width(history.clone(), target_width); + + let max_val = data.iter().copied().max().unwrap_or(1); + let current = history.last().copied().unwrap_or(0); + let avg = history.iter().sum::() / history.len() as u64; + let min = history.iter().copied().min().unwrap_or(0); + let max = history.iter().copied().max().unwrap_or(0); + + let sparkline = Sparkline::default() + .data(&data) + .max(max_val) + .direction(ratatui::widgets::RenderDirection::RightToLeft) + .style(params.theme.sparkline_style()) + .block( + Block::bordered() + .title_top( + Line::from("Cycles/sec") + .style(params.theme.title_style()) + .centered(), + ) + .title_bottom( + Line::from(format!( + "Cur: {:.2} | Avg: {:.2} | Min: {:.2} | Max: {:.2} GHz", + current as f64 / 1_000_000_000.0, + avg as f64 / 1_000_000_000.0, + min as f64 / 1_000_000_000.0, + max as f64 / 1_000_000_000.0 + )) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(sparkline, area); + Ok(()) + } + + /// Render instructions per second chart + fn render_instructions_chart( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + ) -> Result<()> { + let history = ¶ms.collector.system_history.instructions_per_sec; + + if history.is_empty() { + Self::render_empty_chart(frame, area, "Instructions/sec", params.theme); + return Ok(()); + } + + // Adjust data to fill available width + let target_width = area.width.saturating_sub(2) as usize; + let data = Self::adjust_data_for_width(history.clone(), target_width); + + let max_val = data.iter().copied().max().unwrap_or(1); + let current = history.last().copied().unwrap_or(0); + let avg = history.iter().sum::() / history.len() as u64; + let min = history.iter().copied().min().unwrap_or(0); + let max = history.iter().copied().max().unwrap_or(0); + + let sparkline = Sparkline::default() + .data(&data) + .max(max_val) + .direction(ratatui::widgets::RenderDirection::RightToLeft) + .style(params.theme.sparkline_style()) + .block( + Block::bordered() + .title_top( + Line::from("Instructions/sec") + .style(params.theme.title_style()) + .centered(), + ) + .title_bottom( + Line::from(format!( + "Cur: {:.2} | Avg: {:.2} | Min: {:.2} | Max: {:.2} B/s", + current as f64 / 1_000_000_000.0, + avg as f64 / 1_000_000_000.0, + min as f64 / 1_000_000_000.0, + max as f64 / 1_000_000_000.0 + )) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()), + ); + + frame.render_widget(sparkline, area); + Ok(()) + } + + /// Render grid of LLC panels + fn render_llc_grid(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> { + let llc_ids: Vec = params.collector.per_llc_counters.keys().copied().collect(); + + if llc_ids.is_empty() { + Self::render_empty_chart(frame, area, "No LLC data available", params.theme); + return Ok(()); + } + + Self::render_domain_grid(frame, area, params, &llc_ids, "LLC") + } + + /// Render grid of NUMA node panels + fn render_node_grid(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> { + let node_ids: Vec = params.collector.per_node_counters.keys().copied().collect(); + + if node_ids.is_empty() { + Self::render_empty_chart(frame, area, "No NUMA node data available", params.theme); + return Ok(()); + } + + Self::render_domain_grid(frame, area, params, &node_ids, "Node") + } + + /// Render grid of domain panels (generic for LLC or NUMA) + fn render_domain_grid( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + domain_ids: &[usize], + domain_type: &str, + ) -> Result<()> { + // Header area + let [header_area, grid_area] = + Layout::vertical([Constraint::Length(3), Constraint::Min(1)]).areas(area); + + // Render header + let title = format!( + "Performance Counters by {} - {} domains", + domain_type, + domain_ids.len() + ); + let mode_text = format!( + "Aggregation: {} | 'a' toggle | 'v' view mode ({})", + domain_type, params.view_mode + ); + + let block = Block::bordered() + .title_top( + Line::from(title) + .style(params.theme.title_style()) + .centered(), + ) + .title_bottom( + Line::from(mode_text) + .style(params.theme.text_color()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()); + + frame.render_widget(block, header_area); + + // Calculate grid layout + let num_domains = domain_ids.len(); + let cols = if num_domains >= 4 { + 3 + } else if num_domains >= 2 { + 2 + } else { + 1 + }; + let rows = num_domains.div_ceil(cols); + + // Create grid + let row_constraints: Vec = (0..rows) + .map(|_| Constraint::Ratio(1, rows as u32)) + .collect(); + let row_areas = Layout::vertical(row_constraints).split(grid_area); + + let mut domain_idx = 0; + for &row_area in row_areas.iter() { + let col_constraints: Vec = (0..cols) + .map(|_| Constraint::Ratio(1, cols as u32)) + .collect(); + let col_areas = Layout::horizontal(col_constraints).split(row_area); + + for &col_area in col_areas.iter() { + if domain_idx < num_domains { + let domain_id = domain_ids[domain_idx]; + Self::render_domain_panel(frame, col_area, params, domain_id, domain_type)?; + domain_idx += 1; + } + } + } + + Ok(()) + } + + /// Render a compact panel for one LLC or NUMA node + fn render_domain_panel( + frame: &mut Frame, + area: Rect, + params: &PerfStatViewParams, + domain_id: usize, + domain_type: &str, + ) -> Result<()> { + use crate::PerfStatAggregationLevel; + + // Get counters for this domain + let counters = match params.aggregation { + PerfStatAggregationLevel::Llc => params.collector.per_llc_counters.get(&domain_id), + PerfStatAggregationLevel::Node => params.collector.per_node_counters.get(&domain_id), + PerfStatAggregationLevel::System => unreachable!(), + }; + + if counters.is_none() { + return Ok(()); + } + let counters = counters.unwrap(); + let metrics = counters.derived_metrics(); + + // Format metrics + let lines = vec![ + Line::from(format!("IPC: {:.3}", metrics.ipc)) + .style(Style::default().fg(Self::get_ipc_status_color(metrics.ipc, params.theme))), + Line::from(format!("Cache Miss: {:.2}%", metrics.cache_miss_rate)).style( + Style::default().fg(Self::get_miss_rate_status_color( + metrics.cache_miss_rate, + params.theme, + )), + ), + Line::from(format!("Branch Miss: {:.2}%", metrics.branch_miss_rate)).style( + Style::default().fg(Self::get_miss_rate_status_color( + metrics.branch_miss_rate, + params.theme, + )), + ), + Line::from(format!( + "Frontend Stall: {:.1}%", + metrics.stalled_frontend_pct + )) + .style(Style::default().fg(Self::get_stall_status_color( + metrics.stalled_frontend_pct, + params.theme, + ))), + Line::from(format!( + "Cycles: {:.2} GHz", + counters.cycles_delta as f64 + / (params.tick_rate_ms as f64 / 1000.0) + / 1_000_000_000.0 + )), + ]; + + let block = Block::bordered() + .title_top( + Line::from(format!("{} {}", domain_type, domain_id)) + .style(params.theme.title_style()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(params.theme.border_style()); + + let paragraph = Paragraph::new(lines).block(block); + frame.render_widget(paragraph, area); + + Ok(()) + } + + /// Adjust data vector to match target width (truncate or pad with zeros on left) + fn adjust_data_for_width(mut data: Vec, target_width: usize) -> Vec { + if target_width == 0 { + return vec![0]; + } + + if data.len() > target_width { + // Take the most recent samples (from the right) + data = data.split_off(data.len() - target_width); + } else if data.len() < target_width { + // Pad with zeros on the left + let padding_needed = target_width - data.len(); + let mut padded = vec![0; padding_needed]; + padded.extend(data); + data = padded; + } + + data + } + + /// Render empty chart placeholder + fn render_empty_chart(frame: &mut Frame, area: Rect, title: &str, theme: &AppTheme) { + let text = vec![Line::from(""), Line::from("Collecting data...")]; + + let paragraph = Paragraph::new(text).alignment(Alignment::Center).block( + Block::bordered() + .title_top(Line::from(title).style(theme.title_style()).centered()) + .border_type(BorderType::Rounded) + .style(theme.border_style()), + ); + + frame.render_widget(paragraph, area); + } + + /// Render message when collection is inactive + fn render_inactive_message(frame: &mut Frame, area: Rect, theme: &AppTheme) { + let text = vec![ + Line::from(""), + Line::from("Performance counter collection is not active."), + Line::from(""), + Line::from("This view will populate automatically when activated."), + ]; + + let paragraph = Paragraph::new(text).alignment(Alignment::Center).block( + Block::bordered() + .title_top( + Line::from("Perf Stat View") + .style(theme.title_style()) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(theme.border_style()) + .padding(Padding::uniform(2)), + ); + + frame.render_widget(paragraph, area); + } + + /// Render error message when process filtering fails + fn render_process_filter_error(frame: &mut Frame, area: Rect, pid: i32, _theme: &AppTheme) { + let text = vec![ + Line::from(""), + Line::from(format!( + "Failed to collect performance counters for PID {}", + pid + )), + Line::from(""), + Line::from("Possible reasons:"), + Line::from(" • Process has terminated"), + Line::from(" • Insufficient permissions"), + Line::from(" • Hardware doesn't support all counters"), + Line::from(""), + Line::from("Press 'c' to clear filter and return to system-wide view"), + ]; + + let paragraph = Paragraph::new(text) + .alignment(Alignment::Center) + .style(Style::default().fg(Color::Yellow)) + .block( + Block::bordered() + .title_top( + Line::from("Process Filter Error") + .style(Style::default().fg(Color::Red)) + .centered(), + ) + .border_type(BorderType::Rounded) + .style(Style::default().fg(Color::Red)) + .padding(Padding::uniform(2)), + ); + + frame.render_widget(paragraph, area); + } +}