diff --git a/tools/scxtop/README.md b/tools/scxtop/README.md
index d772d3c8d3..2526f76d49 100644
--- a/tools/scxtop/README.md
+++ b/tools/scxtop/README.md
@@ -185,6 +185,44 @@ vtime delta should remain rather stable as DSQs are consumed. If a scheduler is
 scheduling this field may be blank.
 <img width="1919" alt="image" src="https://github.com/user-attachments/assets/34b645d0-afd9-4b8c-a2e3-db2118d87dfd" />
 
+### Performance Counters (Perf Stat View)
+
+The Perf Stat view (press **C**) provides real-time hardware and software performance counters,
+similar to `perf stat -add`. This view shows:
+
+**Counter Collection:**
+- Hardware: cycles, instructions, branches, cache references/misses, pipeline stalls
+- Software: context-switches, cpu-migrations, page-faults
+- All counters update in real-time at the configured tick rate
+
+**Derived Metrics (Automatically Calculated):**
+- **IPC** (Instructions Per Cycle): Indicates CPU efficiency
+- **Cache Miss Rate**: Shows cache effectiveness
+- **Branch Miss Rate**: Indicates branch prediction performance
+- **Frontend/Backend Stalls**: Shows pipeline bottlenecks
+
+**Aggregation Levels** (toggle with 'a' key):
+- **LLC**: Per-cache-domain view (default) - Best for understanding cache locality
+- **NUMA**: Per-NUMA-node view - Best for memory locality analysis
+- **System**: System-wide totals - Best for overall performance overview
+
+**Visualization Modes** (toggle with 'v' key):
+- **Table Mode**: Detailed counter values, rates/sec, and contextual notes
+- **Chart Mode**: 6 sparkline charts showing historical trends with min/max/avg statistics
+
+**Process Filtering:**
+- Press 'p' to filter by currently selected process
+- Press 'c' to clear filter and return to system-wide view
+
+**Key Features:**
+- Color-coded metrics (green/yellow/red) based on performance quality
+- Full-width sparklines that adapt to terminal size
+- Per-LLC and per-NUMA views show grid of panels for quick comparison
+- Comprehensive statistics: Current, Average, Min, Max values
+
+This view is invaluable for top-down performance analysis, identifying whether workloads
+are CPU-bound, memory-bound, or experiencing cache/branch prediction issues.
+
 ## MCP Mode - AI-Assisted Scheduler Analysis
 
 `scxtop` includes a Model Context Protocol (MCP) server that exposes scheduler observability
@@ -280,9 +318,12 @@ claude --mcp scxtop "Summarize my system's scheduler"
 - `get_topology` - Get hardware topology with configurable detail level
 - `list_event_subsystems` - List available tracing event subsystems
 - `list_events` - List specific kprobe or perf events with pagination
-- `start_perf_profiling` - Start CPU profiling with stack traces
+- `start_perf_profiling` - Start CPU profiling with stack traces (sampling mode)
 - `stop_perf_profiling` - Stop profiling and prepare results
 - `get_perf_results` - Get symbolized flamegraph data
+- `start_perf_stat` - Start performance counter collection (counting mode)
+- `stop_perf_stat` - Stop counter collection
+- `get_perf_stat_results` - Get counter statistics with derived metrics (IPC, cache miss rates, etc.)
 - `control_event_tracking` - Enable/disable BPF event collection
 - `control_stats_collection` - Control BPF statistics sampling
 - `control_analyzers` - Start/stop event analyzers
@@ -332,6 +373,15 @@ claude --mcp scxtop "Summarize my system's scheduler"
 
 "What kernel functions are consuming the most CPU?"
 → Claude starts profiling, collects samples, and analyzes results
+
+"Show me performance counters aggregated by LLC"
+→ Claude uses start_perf_stat with llc aggregation and retrieves metrics
+
+"What's the IPC and cache miss rate for each NUMA node?"
+→ Claude starts perf stat collection and queries with node aggregation
+
+"Is this workload CPU-bound or memory-bound?"
+→ Claude analyzes perf stat metrics (IPC, cache miss rate, stalls)
 ```
 
 **Claude Code CLI** - Direct command line usage:
@@ -382,13 +432,16 @@ This enables AI assistants to perform continuous monitoring and proactive analys
 
 See [CLAUDE_INTEGRATION.md](CLAUDE_INTEGRATION.md) for detailed examples and usage patterns.
 
+See [MCP_PERF_STAT.md](MCP_PERF_STAT.md) for complete performance counter API reference.
+
 ## Documentation
 
 ### User Guides
 - **[docs/PERFETTO_TRACE_ANALYSIS.md](docs/PERFETTO_TRACE_ANALYSIS.md)** - Complete perfetto trace analysis guide
-- **[docs/TASK_THREAD_DEBUGGING_GUIDE.md](docs/TASK_THREAD_DEBUGGING_GUIDE.md)** - Task/thread debugging workflows  
+- **[docs/TASK_THREAD_DEBUGGING_GUIDE.md](docs/TASK_THREAD_DEBUGGING_GUIDE.md)** - Task/thread debugging workflows
 - **[docs/PROTOBUF_LOADING_VERIFIED.md](docs/PROTOBUF_LOADING_VERIFIED.md)** - Protobuf loading verification
 - **[docs/README.md](docs/README.md)** - Documentation index
+- **[MCP_PERF_STAT.md](MCP_PERF_STAT.md)** - Performance counter collection MCP API
 
 ### Implementation Documentation
 - **[COMPLETE_IMPLEMENTATION_SUMMARY.md](COMPLETE_IMPLEMENTATION_SUMMARY.md)** - Full implementation overview
diff --git a/tools/scxtop/docs/README.md b/tools/scxtop/docs/README.md
index 2306957c12..d06cd38630 100644
--- a/tools/scxtop/docs/README.md
+++ b/tools/scxtop/docs/README.md
@@ -6,6 +6,7 @@
 - **[PERFETTO_TRACE_ANALYSIS.md](PERFETTO_TRACE_ANALYSIS.md)** - Complete guide to perfetto trace analysis
 - **[TASK_THREAD_DEBUGGING_GUIDE.md](TASK_THREAD_DEBUGGING_GUIDE.md)** - Task/thread debugging workflows
 - **[PROTOBUF_LOADING_VERIFIED.md](PROTOBUF_LOADING_VERIFIED.md)** - Protobuf file loading verification
+- **[../MCP_PERF_STAT.md](../MCP_PERF_STAT.md)** - Performance counter collection API and usage
 
 ### Main README
 - **[../README.md](../README.md)** - scxtop main documentation with MCP integration
@@ -19,6 +20,7 @@ Located in `tools/scxtop/` root:
 ### MCP Integration
 - **[CLAUDE_INTEGRATION.md](../CLAUDE_INTEGRATION.md)** - Setting up Claude with scxtop MCP
 - **[MCP_INTEGRATIONS.md](../MCP_INTEGRATIONS.md)** - MCP protocol details
+- **[MCP_PERF_STAT.md](../MCP_PERF_STAT.md)** - Performance counter collection MCP API
 
 ## Documentation Structure
 
@@ -32,7 +34,8 @@ tools/scxtop/
 │   └── PROTOBUF_LOADING_VERIFIED.md   # Protobuf verification
 ├── examples/
 │   └── perfetto_trace_analysis_examples.json
-└── CLAUDE_INTEGRATION.md              # Claude setup
+├── CLAUDE_INTEGRATION.md               # Claude setup
+└── MCP_PERF_STAT.md                    # Performance counter API
 
 ```
 
@@ -40,7 +43,8 @@ tools/scxtop/
 
 1. **New to perfetto analysis?** → Read [PERFETTO_TRACE_ANALYSIS.md](PERFETTO_TRACE_ANALYSIS.md)
 2. **Need to debug a task?** → Read [TASK_THREAD_DEBUGGING_GUIDE.md](TASK_THREAD_DEBUGGING_GUIDE.md)
-3. **Verify protobuf loading?** → Read [PROTOBUF_LOADING_VERIFIED.md](PROTOBUF_LOADING_VERIFIED.md)
+3. **Want performance counters?** → Read [MCP_PERF_STAT.md](../MCP_PERF_STAT.md)
+4. **Verify protobuf loading?** → Read [PROTOBUF_LOADING_VERIFIED.md](PROTOBUF_LOADING_VERIFIED.md)
 
 ## Features Summary
 
@@ -60,6 +64,13 @@ tools/scxtop/
 13. Wakeup→Schedule Correlation
 
 ### MCP Tools
+
+#### Performance Counter Tools
+1. start_perf_stat - **Start counter collection**
+2. stop_perf_stat - **Stop collection**
+3. get_perf_stat_results - **Get metrics with LLC/NUMA/CPU/System aggregation**
+
+#### Perfetto Trace Analysis Tools
 1. load_perfetto_trace - **Load protobuf files**
 2. query_trace_events
 3. analyze_trace_scheduling
@@ -68,3 +79,8 @@ tools/scxtop/
 6. find_scheduling_bottlenecks
 7. correlate_wakeup_to_schedule
 8. export_trace_analysis
+
+#### Live Profiling Tools
+1. start_perf_profiling - **Start sampling profiler**
+2. stop_perf_profiling - **Stop profiler**
+3. get_perf_results - **Get symbolized stack traces**
diff --git a/tools/scxtop/src/app.rs b/tools/scxtop/src/app.rs
index d7058cc35e..fd9391e2f3 100644
--- a/tools/scxtop/src/app.rs
+++ b/tools/scxtop/src/app.rs
@@ -214,6 +214,12 @@ pub struct App<'a> {
     perf_top_filtered_symbols: Vec<(String, crate::symbol_data::SymbolSample)>,
     has_perf_cap: bool,
 
+    // perf stat related
+    perf_stat_collector: crate::PerfStatCollector,
+    perf_stat_view_mode: crate::PerfStatViewMode,
+    perf_stat_aggregation: crate::PerfStatAggregationLevel,
+    perf_stat_filter_pid: Option<i32>,
+
     // capability warnings for non-root users
     capability_warnings: Vec<String>,
 }
@@ -444,6 +450,10 @@ impl<'a> App<'a> {
             current_sampling_event: None,
             perf_top_table_state: TableState::default(),
             perf_top_filtered_symbols: Vec::new(),
+            perf_stat_collector: crate::PerfStatCollector::new(),
+            perf_stat_view_mode: crate::PerfStatViewMode::Table,
+            perf_stat_aggregation: crate::PerfStatAggregationLevel::Llc,
+            perf_stat_filter_pid: None,
             capability_warnings: Vec::new(),
         };
 
@@ -690,6 +700,10 @@ impl<'a> App<'a> {
             current_sampling_event: None,
             perf_top_table_state: TableState::default(),
             perf_top_filtered_symbols: Vec::new(),
+            perf_stat_collector: crate::PerfStatCollector::new(),
+            perf_stat_view_mode: crate::PerfStatViewMode::Table,
+            perf_stat_aggregation: crate::PerfStatAggregationLevel::Llc,
+            perf_stat_filter_pid: None,
             capability_warnings: Vec::new(),
         };
 
@@ -743,6 +757,16 @@ impl<'a> App<'a> {
                     self.detach_perf_sampling();
                 }
             }
+            (prev, AppState::PerfStat) if prev != AppState::PerfStat => {
+                // Entering PerfStat view - start counter collection
+                if let Err(e) = self.start_perf_stat_collection() {
+                    log::error!("Failed to start perf stat collection: {}", e);
+                }
+            }
+            (AppState::PerfStat, new) if new != AppState::PerfStat => {
+                // Leaving PerfStat view - stop counter collection
+                self.stop_perf_stat_collection();
+            }
             _ => {}
         }
 
@@ -1084,6 +1108,7 @@ impl<'a> App<'a> {
             AppState::Node => self.on_tick_node(),
             AppState::PerfEvent | AppState::KprobeEvent => self.on_tick_events(),
             AppState::PerfTop => self.on_tick_perf_top(),
+            AppState::PerfStat => self.on_tick_perf_stat(),
             AppState::Power => self.on_tick_power(),
             AppState::Process => self.on_tick_process(),
             AppState::Scheduler => self.on_tick_scheduler(),
@@ -2510,6 +2535,7 @@ impl<'a> App<'a> {
             AppState::Node => self.render_node(frame),
             AppState::Llc => self.render_llc(frame),
             AppState::PerfTop => self.render_perf_top(frame),
+            AppState::PerfStat => self.render_perf_stat(frame),
             AppState::Power => self.render_power(frame),
             AppState::Scheduler => {
                 if self.has_capability_warnings() {
@@ -2860,6 +2886,31 @@ impl<'a> App<'a> {
                 ),
                 Style::default(),
             )),
+            Line::from(Span::styled(
+                format!(
+                    "{}: display perf stat view (performance counters)",
+                    self.config
+                        .active_keymap
+                        .action_keys_string(Action::SetState(AppState::PerfStat))
+                ),
+                Style::default(),
+            )),
+            Line::from(Span::styled(
+                "  v: toggle table/chart view (in Perf Stat)",
+                Style::default(),
+            )),
+            Line::from(Span::styled(
+                "  a: toggle aggregation (System/LLC/NUMA) (in Perf Stat)",
+                Style::default(),
+            )),
+            Line::from(Span::styled(
+                "  p: filter by selected process (in Perf Stat)",
+                Style::default(),
+            )),
+            Line::from(Span::styled(
+                "  c: clear process filter, return to system-wide (in Perf Stat)",
+                Style::default(),
+            )),
             Line::from(Span::styled(
                 format!(
                     "{}: display power monitoring view",
@@ -3792,6 +3843,25 @@ impl<'a> App<'a> {
         )
     }
 
+    /// Renders the perf stat view with performance counters.
+    fn render_perf_stat(&mut self, frame: &mut Frame) -> Result<()> {
+        use crate::render::perf_stat::{PerfStatRenderer, PerfStatViewParams};
+
+        let params = PerfStatViewParams {
+            collector: &self.perf_stat_collector,
+            view_mode: &self.perf_stat_view_mode,
+            aggregation: &self.perf_stat_aggregation,
+            filter_pid: self.perf_stat_filter_pid,
+            proc_data: &self.proc_data,
+            tick_rate_ms: self.config.tick_rate_ms(),
+            localize: self.localize,
+            locale: &self.locale,
+            theme: self.theme(),
+        };
+
+        PerfStatRenderer::render_perf_stat_view(frame, frame.area(), &params)
+    }
+
     /// Renders the network application state.
     fn render_network(&mut self, frame: &mut Frame) -> Result<()> {
         let theme = self.theme();
@@ -5463,7 +5533,13 @@ impl<'a> App<'a> {
                     // XXX handle error
                 }
             }
-            Action::NextViewState => self.next_view_state(),
+            Action::NextViewState => {
+                if self.state == AppState::PerfStat {
+                    self.toggle_perf_stat_view_mode();
+                } else {
+                    self.next_view_state();
+                }
+            }
             Action::SchedReg => {
                 self.on_scheduler_load()?;
             }
@@ -5708,6 +5784,35 @@ impl<'a> App<'a> {
             Action::Esc => {
                 self.on_escape()?;
             }
+            Action::TogglePerfStatViewMode => {
+                if self.state == AppState::PerfStat {
+                    self.toggle_perf_stat_view_mode();
+                }
+            }
+            Action::TogglePerfStatAggregation => {
+                if self.state == AppState::PerfStat {
+                    self.toggle_perf_stat_aggregation();
+                }
+            }
+            Action::SetPerfStatFilter(pid) => {
+                if self.state == AppState::PerfStat {
+                    self.set_perf_stat_filter(*pid)?;
+                }
+            }
+            Action::ApplyPerfStatProcessFilter => {
+                if self.state == AppState::PerfStat {
+                    if let Err(e) = self.apply_process_filter_to_perf_stat() {
+                        log::error!("Failed to apply process filter: {}", e);
+                    }
+                }
+            }
+            Action::ClearPerfStatFilter => {
+                if self.state == AppState::PerfStat {
+                    if let Err(e) = self.clear_perf_stat_filter() {
+                        log::error!("Failed to clear process filter: {}", e);
+                    }
+                }
+            }
             _ => {}
         };
         Ok(())
@@ -7433,6 +7538,120 @@ impl<'a> App<'a> {
         Ok(())
     }
 
+    /// Perf Stat view: update performance counters
+    fn on_tick_perf_stat(&mut self) -> Result<()> {
+        if !self.perf_stat_collector.is_active() {
+            return Ok(());
+        }
+
+        let now = std::time::SystemTime::now()
+            .duration_since(std::time::UNIX_EPOCH)
+            .unwrap()
+            .as_millis();
+
+        let duration = self.config.tick_rate_ms() as u128;
+
+        self.perf_stat_collector.update(now, duration)?;
+
+        Ok(())
+    }
+
+    /// Start perf stat collection (called when entering PerfStat view)
+    pub fn start_perf_stat_collection(&mut self) -> Result<()> {
+        if self.perf_stat_collector.is_active() {
+            return Ok(());
+        }
+
+        // Build CPU to LLC/Node mappings from topology
+        let mut cpu_to_llc = BTreeMap::new();
+        let mut cpu_to_node = BTreeMap::new();
+        for (cpu_id, cpu) in &self.topo.all_cpus {
+            cpu_to_llc.insert(*cpu_id, cpu.llc_id);
+            cpu_to_node.insert(*cpu_id, cpu.node_id);
+        }
+        self.perf_stat_collector
+            .set_topology(cpu_to_llc, cpu_to_node);
+
+        let num_cpus = self.cpu_data.len();
+
+        // Use terminal width for history sizing (like scheduler view does)
+        let history_size = self.terminal_width.saturating_sub(2) as usize;
+        self.perf_stat_collector
+            .init_system_wide_with_history_size(num_cpus, history_size)?;
+
+        // If a process is selected, also collect for it
+        if let Some(pid) = self.perf_stat_filter_pid {
+            self.perf_stat_collector.init_process(pid)?;
+        }
+
+        Ok(())
+    }
+
+    /// Stop perf stat collection (called when exiting PerfStat view)
+    pub fn stop_perf_stat_collection(&mut self) {
+        self.perf_stat_collector.cleanup();
+    }
+
+    /// Set process filter for perf stat
+    pub fn set_perf_stat_filter(&mut self, pid: Option<i32>) -> Result<()> {
+        self.perf_stat_filter_pid = pid;
+
+        if self.perf_stat_collector.is_active() {
+            // Reinitialize with new filter
+            self.stop_perf_stat_collection();
+            self.start_perf_stat_collection()?;
+        }
+
+        Ok(())
+    }
+
+    /// Toggle perf stat view mode (table <-> chart)
+    pub fn toggle_perf_stat_view_mode(&mut self) {
+        self.perf_stat_view_mode = self.perf_stat_view_mode.next();
+    }
+
+    /// Toggle perf stat aggregation level (System -> LLC -> NUMA -> System)
+    pub fn toggle_perf_stat_aggregation(&mut self) {
+        self.perf_stat_aggregation = self.perf_stat_aggregation.next();
+    }
+
+    /// Get currently selected process PID (from Process view or current selection)
+    pub fn get_selected_process_pid(&self) -> Option<i32> {
+        // First try to get from selected_process field
+        if let Some(pid) = self.selected_process {
+            return Some(pid);
+        }
+
+        // Fall back to filtered state
+        if let Ok(filtered) = self.filtered_state.lock() {
+            if filtered.selected < filtered.list.len() {
+                if let Some(pid) = filtered.list[filtered.selected].as_int() {
+                    return Some(pid);
+                }
+            }
+        }
+
+        None
+    }
+
+    /// Apply current process selection as perf stat filter
+    pub fn apply_process_filter_to_perf_stat(&mut self) -> Result<()> {
+        if let Some(pid) = self.get_selected_process_pid() {
+            log::info!("Applying perf stat filter to PID {}", pid);
+            self.set_perf_stat_filter(Some(pid))?;
+        } else {
+            log::warn!("No process selected to filter by");
+        }
+        Ok(())
+    }
+
+    /// Clear perf stat filter (return to system-wide)
+    pub fn clear_perf_stat_filter(&mut self) -> Result<()> {
+        log::info!("Clearing perf stat filter - returning to system-wide view");
+        self.set_perf_stat_filter(None)?;
+        Ok(())
+    }
+
     /// MangoApp view: minimal system data
     fn on_tick_mango_app(&mut self) -> Result<()> {
         if let Some(ref mut skel) = self.skel {
diff --git a/tools/scxtop/src/keymap.rs b/tools/scxtop/src/keymap.rs
index a0f8c25eb2..3e213112c0 100644
--- a/tools/scxtop/src/keymap.rs
+++ b/tools/scxtop/src/keymap.rs
@@ -59,6 +59,7 @@ impl Default for KeyMap {
         bindings.insert(Key::Char('w'), Action::SetState(AppState::Power));
         bindings.insert(Key::Char('s'), Action::SetState(AppState::Scheduler));
         bindings.insert(Key::Char('S'), Action::SaveConfig);
+        bindings.insert(Key::Char('C'), Action::SetState(AppState::PerfStat));
         bindings.insert(Key::Char('a'), Action::RequestTrace);
         bindings.insert(Key::Char('x'), Action::ClearEvent);
         bindings.insert(Key::Char('j'), Action::PrevEvent);
diff --git a/tools/scxtop/src/lib.rs b/tools/scxtop/src/lib.rs
index e5b4afd78f..f1259aa4ff 100644
--- a/tools/scxtop/src/lib.rs
+++ b/tools/scxtop/src/lib.rs
@@ -23,6 +23,7 @@ pub mod mcp;
 mod mem_stats;
 pub mod network_stats;
 mod node_data;
+mod perf_stat_data;
 mod perfetto_trace;
 mod power_data;
 mod proc_data;
@@ -51,6 +52,9 @@ pub use llc_data::LlcData;
 pub use mem_stats::MemStatSnapshot;
 pub use network_stats::NetworkStatSnapshot;
 pub use node_data::NodeData;
+pub use perf_stat_data::{
+    DerivedMetrics, PerfStatCollector, PerfStatCounters, PerfStatHistory, SharedPerfStatCollector,
+};
 pub use perfetto_trace::PerfettoTraceManager;
 pub use power_data::{
     CStateInfo, CorePowerData, PowerDataCollector, PowerSnapshot, SystemPowerData,
@@ -113,6 +117,8 @@ pub enum AppState {
     PerfEvent,
     /// Application is in the perf top view state.
     PerfTop,
+    /// Application is in the perf stat view state.
+    PerfStat,
     /// Application is in the Power state.
     Power,
     /// Application is in the Process state.
@@ -182,6 +188,57 @@ impl std::fmt::Display for ComponentViewState {
     }
 }
 
+#[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
+pub enum PerfStatViewMode {
+    Table,
+    Chart,
+}
+
+impl PerfStatViewMode {
+    pub fn next(&self) -> Self {
+        match self {
+            PerfStatViewMode::Table => PerfStatViewMode::Chart,
+            PerfStatViewMode::Chart => PerfStatViewMode::Table,
+        }
+    }
+}
+
+impl std::fmt::Display for PerfStatViewMode {
+    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+        match self {
+            PerfStatViewMode::Table => write!(f, "table"),
+            PerfStatViewMode::Chart => write!(f, "chart"),
+        }
+    }
+}
+
+#[derive(Clone, Debug, Eq, Hash, Ord, PartialEq, PartialOrd)]
+pub enum PerfStatAggregationLevel {
+    System,
+    Llc,
+    Node,
+}
+
+impl PerfStatAggregationLevel {
+    pub fn next(&self) -> Self {
+        match self {
+            PerfStatAggregationLevel::System => PerfStatAggregationLevel::Llc,
+            PerfStatAggregationLevel::Llc => PerfStatAggregationLevel::Node,
+            PerfStatAggregationLevel::Node => PerfStatAggregationLevel::System,
+        }
+    }
+}
+
+impl std::fmt::Display for PerfStatAggregationLevel {
+    fn fmt(&self, f: &mut std::fmt::Formatter) -> std::fmt::Result {
+        match self {
+            PerfStatAggregationLevel::System => write!(f, "system"),
+            PerfStatAggregationLevel::Llc => write!(f, "LLC"),
+            PerfStatAggregationLevel::Node => write!(f, "NUMA"),
+        }
+    }
+}
+
 #[derive(Debug, Clone, PartialEq)]
 pub enum FilterItem {
     String(String),
@@ -504,6 +561,12 @@ pub enum Action {
     Up,
     UpdateColVisibility(UpdateColVisibilityAction),
     Wait(WaitAction),
+    // Perf Stat actions
+    TogglePerfStatViewMode,
+    TogglePerfStatAggregation,
+    SetPerfStatFilter(Option<i32>),
+    ApplyPerfStatProcessFilter,
+    ClearPerfStatFilter,
     None,
 }
 
diff --git a/tools/scxtop/src/main.rs b/tools/scxtop/src/main.rs
index 7879e597e0..267778fbd1 100644
--- a/tools/scxtop/src/main.rs
+++ b/tools/scxtop/src/main.rs
@@ -81,6 +81,12 @@ fn handle_key_event(app: &App, keymap: &KeyMap, key: KeyEvent) -> Action {
                 match (app.state(), c) {
                     // In BPF program detail view, 'p' toggles perf sampling
                     (AppState::BpfProgramDetail, 'p') => Action::ToggleBpfPerfSampling,
+                    // In PerfStat view, 'p' applies process filter
+                    (AppState::PerfStat, 'p') => Action::ApplyPerfStatProcessFilter,
+                    // In PerfStat view, 'c' clears process filter
+                    (AppState::PerfStat, 'c') => Action::ClearPerfStatFilter,
+                    // In PerfStat view, 'a' toggles aggregation level (System/LLC/NUMA)
+                    (AppState::PerfStat, 'a') => Action::TogglePerfStatAggregation,
                     // Fall back to global keymap for all other cases
                     _ => keymap.action(&Key::Char(c)),
                 }
diff --git a/tools/scxtop/src/mcp/events.rs b/tools/scxtop/src/mcp/events.rs
index 3038a1789c..792fb70446 100644
--- a/tools/scxtop/src/mcp/events.rs
+++ b/tools/scxtop/src/mcp/events.rs
@@ -282,6 +282,11 @@ pub fn action_to_mcp_event(action: &Action) -> Option<Value> {
         | Action::ToggleLocalization
         | Action::ToggleHwPressure
         | Action::ToggleUncoreFreq
+        | Action::TogglePerfStatViewMode
+        | Action::TogglePerfStatAggregation
+        | Action::SetPerfStatFilter(_)
+        | Action::ApplyPerfStatProcessFilter
+        | Action::ClearPerfStatFilter
         | Action::Up => None,
     }
 }
diff --git a/tools/scxtop/src/mcp/server.rs b/tools/scxtop/src/mcp/server.rs
index bc3f3ad9c3..10a7556ea9 100644
--- a/tools/scxtop/src/mcp/server.rs
+++ b/tools/scxtop/src/mcp/server.rs
@@ -181,6 +181,14 @@ impl McpServer {
 
         self.bpf_stats = Some(bpf_stats);
         self.perf_profiler = Some(perf_profiler);
+
+        // Create perf stat collector
+        let perf_stat_collector = crate::SharedPerfStatCollector::new();
+
+        // Set perf stat collector in tools
+        self.tools
+            .set_perf_stat_collector(perf_stat_collector.clone());
+
         self
     }
 
diff --git a/tools/scxtop/src/mcp/tools.rs b/tools/scxtop/src/mcp/tools.rs
index fe3fab3c6d..9b0b577b86 100644
--- a/tools/scxtop/src/mcp/tools.rs
+++ b/tools/scxtop/src/mcp/tools.rs
@@ -9,6 +9,7 @@ use super::SharedAnalyzerControl;
 use anyhow::{anyhow, Result};
 use perfetto_protos::ftrace_event::ftrace_event;
 use serde_json::{json, Value};
+use std::collections::BTreeMap;
 use std::sync::Arc;
 
 type TraceCache =
@@ -17,6 +18,7 @@ type TraceCache =
 pub struct McpTools {
     topo: Option<Arc<scx_utils::Topology>>,
     perf_profiler: Option<SharedPerfProfiler>,
+    perf_stat_collector: Option<crate::SharedPerfStatCollector>,
     event_control: Option<super::SharedEventControl>,
     analyzer_control: Option<SharedAnalyzerControl>,
     trace_cache: Option<TraceCache>,
@@ -33,6 +35,7 @@ impl McpTools {
         Self {
             topo: None,
             perf_profiler: None,
+            perf_stat_collector: None,
             event_control: None,
             analyzer_control: None,
             trace_cache: None,
@@ -54,6 +57,10 @@ impl McpTools {
         self.perf_profiler = Some(profiler);
     }
 
+    pub fn set_perf_stat_collector(&mut self, collector: crate::SharedPerfStatCollector) {
+        self.perf_stat_collector = Some(collector);
+    }
+
     pub fn set_event_control(&mut self, control: super::SharedEventControl) {
         self.event_control = Some(control);
     }
@@ -225,6 +232,61 @@ impl McpTools {
                     }
                 }),
             },
+            McpTool {
+                name: "start_perf_stat".to_string(),
+                description: "Start performance counter collection (counting mode, not sampling)"
+                    .to_string(),
+                input_schema: json!({
+                    "type": "object",
+                    "properties": {
+                        "aggregation": {
+                            "type": "string",
+                            "enum": ["system", "llc", "node"],
+                            "description": "Aggregation level: 'system' (total), 'llc' (per-LLC), or 'node' (per-NUMA node)",
+                            "default": "llc"
+                        },
+                        "pid": {
+                            "type": "integer",
+                            "description": "Process ID to monitor (-1 for system-wide)",
+                            "default": -1
+                        },
+                        "history_size": {
+                            "type": "integer",
+                            "description": "Number of samples to keep in history for trend analysis",
+                            "default": 100
+                        }
+                    }
+                }),
+            },
+            McpTool {
+                name: "stop_perf_stat".to_string(),
+                description: "Stop performance counter collection".to_string(),
+                input_schema: json!({
+                    "type": "object",
+                    "properties": {}
+                }),
+            },
+            McpTool {
+                name: "get_perf_stat_results".to_string(),
+                description: "Get performance counter statistics with derived metrics (IPC, cache miss rates, etc.)"
+                    .to_string(),
+                input_schema: json!({
+                    "type": "object",
+                    "properties": {
+                        "aggregation": {
+                            "type": "string",
+                            "enum": ["system", "cpu", "llc", "node", "process"],
+                            "description": "Aggregation level to retrieve: 'system', 'cpu', 'llc', 'node', or 'process'",
+                            "default": "system"
+                        },
+                        "include_history": {
+                            "type": "boolean",
+                            "description": "Include historical data for trend analysis",
+                            "default": false
+                        }
+                    }
+                }),
+            },
             McpTool {
                 name: "control_event_tracking".to_string(),
                 description:
@@ -585,6 +647,9 @@ impl McpTools {
             "start_perf_profiling" => self.tool_start_perf_profiling(arguments),
             "stop_perf_profiling" => self.tool_stop_perf_profiling(arguments),
             "get_perf_results" => self.tool_get_perf_results(arguments),
+            "start_perf_stat" => self.tool_start_perf_stat(arguments),
+            "stop_perf_stat" => self.tool_stop_perf_stat(arguments),
+            "get_perf_stat_results" => self.tool_get_perf_stat_results(arguments),
             "control_event_tracking" => self.tool_control_event_tracking(arguments),
             "control_stats_collection" => self.tool_control_stats_collection(arguments),
             "control_analyzers" => self.tool_control_analyzers(arguments),
@@ -1092,6 +1157,246 @@ impl McpTools {
         }))
     }
 
+    fn tool_start_perf_stat(&self, args: &Value) -> Result<Value> {
+        let collector = self
+            .perf_stat_collector
+            .as_ref()
+            .ok_or_else(|| anyhow!("Perf stat collector not available"))?;
+
+        let topo = self
+            .topo
+            .as_ref()
+            .ok_or_else(|| anyhow!("Topology not available"))?;
+
+        let pid = args.get("pid").and_then(|v| v.as_i64()).unwrap_or(-1) as i32;
+        let history_size = args
+            .get("history_size")
+            .and_then(|v| v.as_u64())
+            .unwrap_or(100) as usize;
+
+        // Build topology mappings
+        let mut cpu_to_llc = BTreeMap::new();
+        let mut cpu_to_node = BTreeMap::new();
+        for (cpu_id, cpu) in &topo.all_cpus {
+            cpu_to_llc.insert(*cpu_id, cpu.llc_id);
+            cpu_to_node.insert(*cpu_id, cpu.node_id);
+        }
+        collector.set_topology(cpu_to_llc, cpu_to_node);
+
+        // Initialize collection
+        let num_cpus = topo.all_cpus.len();
+
+        if pid > 0 {
+            // Per-process monitoring: only initialize process events to avoid exhausting hardware PMU counters
+            collector.init_process(pid)?;
+        } else {
+            // System-wide monitoring: initialize per-CPU events
+            collector.init_system_wide_with_history_size(num_cpus, history_size)?;
+        }
+
+        Ok(json!({
+            "content": [{
+                "type": "text",
+                "text": format!(
+                    "Performance counter collection started:\n\n• CPUs: {}\n• LLCs: {}\n• NUMA nodes: {}\n• Process filter: {}\n• History size: {}\n\nCollecting: cycles, instructions, branches, cache, context-switches, migrations, page-faults, stalls",
+                    num_cpus,
+                    topo.all_llcs.len(),
+                    topo.nodes.len(),
+                    if pid > 0 { format!("PID {}", pid) } else { "None (system-wide)".to_string() },
+                    history_size
+                )
+            }]
+        }))
+    }
+
+    fn tool_stop_perf_stat(&self, _args: &Value) -> Result<Value> {
+        let collector = self
+            .perf_stat_collector
+            .as_ref()
+            .ok_or_else(|| anyhow!("Perf stat collector not available"))?;
+
+        if !collector.is_active() {
+            return Ok(json!({
+                "content": [{
+                    "type": "text",
+                    "text": "Performance counter collection is not active."
+                }]
+            }));
+        }
+
+        collector.cleanup();
+
+        Ok(json!({
+            "content": [{
+                "type": "text",
+                "text": "Performance counter collection stopped.\n\nUse get_perf_stat_results to retrieve final statistics."
+            }]
+        }))
+    }
+
+    fn tool_get_perf_stat_results(&self, args: &Value) -> Result<Value> {
+        let collector = self
+            .perf_stat_collector
+            .as_ref()
+            .ok_or_else(|| anyhow!("Perf stat collector not available"))?;
+
+        if !collector.is_active() {
+            return Ok(json!({
+                "content": [{
+                    "type": "text",
+                    "text": "Performance counter collection is not active. Use start_perf_stat to begin collection."
+                }]
+            }));
+        }
+
+        // Update counters to get current values
+        let now = std::time::SystemTime::now()
+            .duration_since(std::time::UNIX_EPOCH)
+            .unwrap()
+            .as_millis();
+        let duration_ms = 100; // Default polling interval
+        collector.update(now, duration_ms)?;
+
+        let aggregation = args
+            .get("aggregation")
+            .and_then(|v| v.as_str())
+            .unwrap_or("system");
+        let include_history = args
+            .get("include_history")
+            .and_then(|v| v.as_bool())
+            .unwrap_or(false);
+
+        let result;
+
+        match aggregation {
+            "system" => {
+                let counters = collector.get_system_counters();
+                let metrics = counters.derived_metrics();
+                result =
+                    Self::format_counters_json(&counters, &metrics, "System-Wide", include_history);
+            }
+            "cpu" => {
+                let per_cpu = collector.get_per_cpu_counters();
+                let mut cpu_results = Vec::new();
+                for (cpu_id, counters) in per_cpu {
+                    let metrics = counters.derived_metrics();
+                    cpu_results.push(json!({
+                        "cpu_id": cpu_id,
+                        "counters": Self::counters_to_json(&counters),
+                        "metrics": Self::metrics_to_json(&metrics),
+                    }));
+                }
+                result = json!({
+                    "per_cpu": cpu_results,
+                    "total_cpus": cpu_results.len(),
+                });
+            }
+            "llc" => {
+                let per_llc = collector.get_per_llc_counters();
+                let mut llc_results = Vec::new();
+                for (llc_id, counters) in per_llc {
+                    let metrics = counters.derived_metrics();
+                    llc_results.push(json!({
+                        "llc_id": llc_id,
+                        "counters": Self::counters_to_json(&counters),
+                        "metrics": Self::metrics_to_json(&metrics),
+                    }));
+                }
+                result = json!({
+                    "per_llc": llc_results,
+                    "total_llcs": llc_results.len(),
+                });
+            }
+            "node" => {
+                let per_node = collector.get_per_node_counters();
+                let mut node_results = Vec::new();
+                for (node_id, counters) in per_node {
+                    let metrics = counters.derived_metrics();
+                    node_results.push(json!({
+                        "node_id": node_id,
+                        "counters": Self::counters_to_json(&counters),
+                        "metrics": Self::metrics_to_json(&metrics),
+                    }));
+                }
+                result = json!({
+                    "per_node": node_results,
+                    "total_nodes": node_results.len(),
+                });
+            }
+            "process" => {
+                if let Some(counters) = collector.get_process_counters() {
+                    let metrics = counters.derived_metrics();
+                    let pid = collector.filter_pid().unwrap_or(-1);
+                    result = json!({
+                        "pid": pid,
+                        "counters": Self::counters_to_json(&counters),
+                        "metrics": Self::metrics_to_json(&metrics),
+                    });
+                } else {
+                    return Ok(json!({
+                        "content": [{
+                            "type": "text",
+                            "text": "No process filter is active. Use start_perf_stat with pid parameter to filter by process."
+                        }]
+                    }));
+                }
+            }
+            _ => {
+                return Err(anyhow!(
+                    "Invalid aggregation level: {}. Use 'system', 'cpu', 'llc', 'node', or 'process'",
+                    aggregation
+                ));
+            }
+        }
+
+        Ok(json!({
+            "content": [{
+                "type": "text",
+                "text": serde_json::to_string_pretty(&result)
+                    .unwrap_or_else(|_| "Failed to serialize results".to_string())
+            }]
+        }))
+    }
+
+    fn counters_to_json(counters: &crate::PerfStatCounters) -> Value {
+        json!({
+            "cycles": counters.cycles_delta,
+            "instructions": counters.instructions_delta,
+            "branches": counters.branches_delta,
+            "branch_misses": counters.branch_misses_delta,
+            "cache_references": counters.cache_references_delta,
+            "cache_misses": counters.cache_misses_delta,
+            "stalled_cycles_frontend": counters.stalled_cycles_frontend_delta,
+            "stalled_cycles_backend": counters.stalled_cycles_backend_delta,
+            "context_switches": counters.context_switches_delta,
+            "cpu_migrations": counters.cpu_migrations_delta,
+            "page_faults": counters.page_faults_delta,
+        })
+    }
+
+    fn metrics_to_json(metrics: &crate::DerivedMetrics) -> Value {
+        json!({
+            "ipc": format!("{:.3}", metrics.ipc),
+            "cache_miss_rate": format!("{:.2}%", metrics.cache_miss_rate),
+            "branch_miss_rate": format!("{:.2}%", metrics.branch_miss_rate),
+            "stalled_frontend_pct": format!("{:.2}%", metrics.stalled_frontend_pct),
+            "stalled_backend_pct": format!("{:.2}%", metrics.stalled_backend_pct),
+        })
+    }
+
+    fn format_counters_json(
+        counters: &crate::PerfStatCounters,
+        metrics: &crate::DerivedMetrics,
+        label: &str,
+        _include_history: bool,
+    ) -> Value {
+        json!({
+            "label": label,
+            "counters": Self::counters_to_json(counters),
+            "metrics": Self::metrics_to_json(metrics),
+        })
+    }
+
     fn tool_control_event_tracking(&self, args: &Value) -> Result<Value> {
         let control = self
             .event_control
diff --git a/tools/scxtop/src/perf_stat_data.rs b/tools/scxtop/src/perf_stat_data.rs
new file mode 100644
index 0000000000..67ccc31359
--- /dev/null
+++ b/tools/scxtop/src/perf_stat_data.rs
@@ -0,0 +1,693 @@
+// Copyright (c) Meta Platforms, Inc. and affiliates.
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2.
+
+use crate::PerfEvent;
+use anyhow::Result;
+use std::collections::BTreeMap;
+
+/// Stores counter values for a specific scope (system-wide or per-process)
+#[derive(Clone, Debug, Default)]
+pub struct PerfStatCounters {
+    // Raw counter values (latest read)
+    pub cycles: u64,
+    pub instructions: u64,
+    pub branches: u64,
+    pub branch_misses: u64,
+    pub cache_references: u64,
+    pub cache_misses: u64,
+    pub stalled_cycles_frontend: u64,
+    pub stalled_cycles_backend: u64,
+    pub context_switches: u64,
+    pub cpu_migrations: u64,
+    pub page_faults: u64,
+
+    // Delta values (difference since last read)
+    pub cycles_delta: u64,
+    pub instructions_delta: u64,
+    pub branches_delta: u64,
+    pub branch_misses_delta: u64,
+    pub cache_references_delta: u64,
+    pub cache_misses_delta: u64,
+    pub stalled_cycles_frontend_delta: u64,
+    pub stalled_cycles_backend_delta: u64,
+    pub context_switches_delta: u64,
+    pub cpu_migrations_delta: u64,
+    pub page_faults_delta: u64,
+
+    // Timestamp of last update
+    pub last_update_ms: u128,
+}
+
+impl PerfStatCounters {
+    /// Update counter values and calculate deltas
+    pub fn update(&mut self, new_values: &PerfStatCounters, timestamp_ms: u128) {
+        self.cycles_delta = new_values.cycles.saturating_sub(self.cycles);
+        self.instructions_delta = new_values.instructions.saturating_sub(self.instructions);
+        self.branches_delta = new_values.branches.saturating_sub(self.branches);
+        self.branch_misses_delta = new_values.branch_misses.saturating_sub(self.branch_misses);
+        self.cache_references_delta = new_values
+            .cache_references
+            .saturating_sub(self.cache_references);
+        self.cache_misses_delta = new_values.cache_misses.saturating_sub(self.cache_misses);
+        self.stalled_cycles_frontend_delta = new_values
+            .stalled_cycles_frontend
+            .saturating_sub(self.stalled_cycles_frontend);
+        self.stalled_cycles_backend_delta = new_values
+            .stalled_cycles_backend
+            .saturating_sub(self.stalled_cycles_backend);
+        self.context_switches_delta = new_values
+            .context_switches
+            .saturating_sub(self.context_switches);
+        self.cpu_migrations_delta = new_values
+            .cpu_migrations
+            .saturating_sub(self.cpu_migrations);
+        self.page_faults_delta = new_values.page_faults.saturating_sub(self.page_faults);
+
+        // Update absolute values
+        self.cycles = new_values.cycles;
+        self.instructions = new_values.instructions;
+        self.branches = new_values.branches;
+        self.branch_misses = new_values.branch_misses;
+        self.cache_references = new_values.cache_references;
+        self.cache_misses = new_values.cache_misses;
+        self.stalled_cycles_frontend = new_values.stalled_cycles_frontend;
+        self.stalled_cycles_backend = new_values.stalled_cycles_backend;
+        self.context_switches = new_values.context_switches;
+        self.cpu_migrations = new_values.cpu_migrations;
+        self.page_faults = new_values.page_faults;
+
+        self.last_update_ms = timestamp_ms;
+    }
+
+    /// Calculate derived metrics
+    pub fn derived_metrics(&self) -> DerivedMetrics {
+        DerivedMetrics::calculate(self)
+    }
+}
+
+/// Derived performance metrics calculated from raw counters
+#[derive(Clone, Debug, Default)]
+pub struct DerivedMetrics {
+    pub ipc: f64,                  // Instructions per cycle
+    pub cache_miss_rate: f64,      // Cache misses / cache references
+    pub branch_miss_rate: f64,     // Branch misses / branches
+    pub stalled_frontend_pct: f64, // Frontend stalls / cycles
+    pub stalled_backend_pct: f64,  // Backend stalls / cycles
+}
+
+impl DerivedMetrics {
+    pub fn calculate(counters: &PerfStatCounters) -> Self {
+        let ipc = if counters.cycles_delta > 0 {
+            counters.instructions_delta as f64 / counters.cycles_delta as f64
+        } else {
+            0.0
+        };
+
+        let cache_miss_rate = if counters.cache_references_delta > 0 {
+            (counters.cache_misses_delta as f64 / counters.cache_references_delta as f64) * 100.0
+        } else {
+            0.0
+        };
+
+        let branch_miss_rate = if counters.branches_delta > 0 {
+            (counters.branch_misses_delta as f64 / counters.branches_delta as f64) * 100.0
+        } else {
+            0.0
+        };
+
+        let stalled_frontend_pct = if counters.cycles_delta > 0 {
+            (counters.stalled_cycles_frontend_delta as f64 / counters.cycles_delta as f64) * 100.0
+        } else {
+            0.0
+        };
+
+        let stalled_backend_pct = if counters.cycles_delta > 0 {
+            (counters.stalled_cycles_backend_delta as f64 / counters.cycles_delta as f64) * 100.0
+        } else {
+            0.0
+        };
+
+        Self {
+            ipc,
+            cache_miss_rate,
+            branch_miss_rate,
+            stalled_frontend_pct,
+            stalled_backend_pct,
+        }
+    }
+}
+
+/// Stores historical delta values for chart visualization
+#[derive(Clone, Debug)]
+pub struct PerfStatHistory {
+    max_size: usize,
+    pub ipc_history: Vec<f64>,
+    pub cache_miss_rate_history: Vec<f64>,
+    pub branch_miss_rate_history: Vec<f64>,
+    pub stalled_frontend_pct_history: Vec<f64>,
+    pub stalled_backend_pct_history: Vec<f64>,
+    pub instructions_per_sec: Vec<u64>,
+    pub cycles_per_sec: Vec<u64>,
+}
+
+impl PerfStatHistory {
+    pub fn new(max_size: usize) -> Self {
+        Self {
+            max_size,
+            ipc_history: Vec::new(),
+            cache_miss_rate_history: Vec::new(),
+            branch_miss_rate_history: Vec::new(),
+            stalled_frontend_pct_history: Vec::new(),
+            stalled_backend_pct_history: Vec::new(),
+            instructions_per_sec: Vec::new(),
+            cycles_per_sec: Vec::new(),
+        }
+    }
+
+    pub fn push(
+        &mut self,
+        metrics: &DerivedMetrics,
+        counters: &PerfStatCounters,
+        duration_ms: u128,
+    ) {
+        let duration_secs = duration_ms as f64 / 1000.0;
+        let instructions_rate = if duration_secs > 0.0 {
+            (counters.instructions_delta as f64 / duration_secs) as u64
+        } else {
+            0
+        };
+        let cycles_rate = if duration_secs > 0.0 {
+            (counters.cycles_delta as f64 / duration_secs) as u64
+        } else {
+            0
+        };
+
+        self.ipc_history.push(metrics.ipc);
+        self.cache_miss_rate_history.push(metrics.cache_miss_rate);
+        self.branch_miss_rate_history.push(metrics.branch_miss_rate);
+        self.stalled_frontend_pct_history
+            .push(metrics.stalled_frontend_pct);
+        self.stalled_backend_pct_history
+            .push(metrics.stalled_backend_pct);
+        self.instructions_per_sec.push(instructions_rate);
+        self.cycles_per_sec.push(cycles_rate);
+
+        // Trim to max size
+        if self.ipc_history.len() > self.max_size {
+            self.ipc_history.remove(0);
+            self.cache_miss_rate_history.remove(0);
+            self.branch_miss_rate_history.remove(0);
+            self.stalled_frontend_pct_history.remove(0);
+            self.stalled_backend_pct_history.remove(0);
+            self.instructions_per_sec.remove(0);
+            self.cycles_per_sec.remove(0);
+        }
+    }
+}
+
+/// Manages perf stat counter collection
+pub struct PerfStatCollector {
+    // Per-CPU perf events (system-wide mode)
+    cpu_events: BTreeMap<usize, Vec<PerfEvent>>,
+
+    // Per-process perf events (filtered mode)
+    process_events: Option<Vec<PerfEvent>>,
+
+    // Collected counter data
+    pub system_counters: PerfStatCounters,
+    pub per_cpu_counters: BTreeMap<usize, PerfStatCounters>,
+    pub per_llc_counters: BTreeMap<usize, PerfStatCounters>,
+    pub per_node_counters: BTreeMap<usize, PerfStatCounters>,
+    pub process_counters: Option<PerfStatCounters>,
+
+    // Historical data for charts (store last N deltas)
+    pub system_history: PerfStatHistory,
+    pub per_cpu_history: BTreeMap<usize, PerfStatHistory>,
+    pub per_llc_history: BTreeMap<usize, PerfStatHistory>,
+    pub per_node_history: BTreeMap<usize, PerfStatHistory>,
+
+    // CPU to LLC/Node mapping
+    cpu_to_llc: BTreeMap<usize, usize>,
+    cpu_to_node: BTreeMap<usize, usize>,
+
+    // Configuration
+    filter_pid: Option<i32>,
+    is_active: bool,
+}
+
+impl PerfStatCollector {
+    pub fn new() -> Self {
+        Self {
+            cpu_events: BTreeMap::new(),
+            process_events: None,
+            system_counters: PerfStatCounters::default(),
+            per_cpu_counters: BTreeMap::new(),
+            per_llc_counters: BTreeMap::new(),
+            per_node_counters: BTreeMap::new(),
+            process_counters: None,
+            system_history: PerfStatHistory::new(100),
+            per_cpu_history: BTreeMap::new(),
+            per_llc_history: BTreeMap::new(),
+            per_node_history: BTreeMap::new(),
+            cpu_to_llc: BTreeMap::new(),
+            cpu_to_node: BTreeMap::new(),
+            filter_pid: None,
+            is_active: false,
+        }
+    }
+
+    /// Set CPU to LLC/Node topology mappings
+    pub fn set_topology(
+        &mut self,
+        cpu_to_llc: BTreeMap<usize, usize>,
+        cpu_to_node: BTreeMap<usize, usize>,
+    ) {
+        self.cpu_to_llc = cpu_to_llc;
+        self.cpu_to_node = cpu_to_node;
+    }
+
+    /// Initialize perf events for system-wide collection with custom history size
+    pub fn init_system_wide_with_history_size(
+        &mut self,
+        num_cpus: usize,
+        history_size: usize,
+    ) -> Result<()> {
+        self.cleanup();
+
+        // Update history sizes
+        let history_size = history_size.max(10);
+        self.system_history = PerfStatHistory::new(history_size);
+
+        // Initialize LLC and NUMA node histories based on topology
+        for &llc_id in self.cpu_to_llc.values() {
+            self.per_llc_history
+                .entry(llc_id)
+                .or_insert_with(|| PerfStatHistory::new(history_size));
+            self.per_llc_counters.entry(llc_id).or_default();
+        }
+
+        for &node_id in self.cpu_to_node.values() {
+            self.per_node_history
+                .entry(node_id)
+                .or_insert_with(|| PerfStatHistory::new(history_size));
+            self.per_node_counters.entry(node_id).or_default();
+        }
+
+        for cpu in 0..num_cpus {
+            let mut events = Vec::new();
+
+            // Create perf events for all counters
+            let event_specs = vec![
+                ("hw", "cycles"),
+                ("hw", "instructions"),
+                ("hw", "branches"),
+                ("hw", "branch-misses"),
+                ("hw", "cache-references"),
+                ("hw", "cache-misses"),
+                ("hw", "stalled-cycles-frontend"),
+                ("hw", "stalled-cycles-backend"),
+                ("sw", "context-switches"),
+                ("sw", "cpu-migrations"),
+                ("sw", "page-faults"),
+            ];
+
+            for (subsystem, event_name) in event_specs {
+                let mut event = PerfEvent::new(subsystem.to_string(), event_name.to_string(), cpu as i32);
+                // Attach in counting mode (no sampling)
+                if let Err(e) = event.attach(-1) {
+                    log::warn!("Failed to attach {} for CPU {}: {}", event_name, cpu, e);
+                    continue;
+                }
+                events.push(event);
+            }
+
+            if !events.is_empty() {
+                self.cpu_events.insert(cpu, events);
+                self.per_cpu_counters
+                    .insert(cpu, PerfStatCounters::default());
+                self.per_cpu_history
+                    .insert(cpu, PerfStatHistory::new(history_size));
+            }
+        }
+
+        self.is_active = true;
+        Ok(())
+    }
+
+    /// Initialize perf events for system-wide collection (uses default history size)
+    pub fn init_system_wide(&mut self, num_cpus: usize) -> Result<()> {
+        self.init_system_wide_with_history_size(num_cpus, 100)
+    }
+
+    /// Initialize perf events for per-process collection
+    pub fn init_process(&mut self, pid: i32) -> Result<()> {
+        let mut events = Vec::new();
+        let mut failed_events = Vec::new();
+
+        let event_specs = vec![
+            ("hw", "cycles"),
+            ("hw", "instructions"),
+            ("hw", "branches"),
+            ("hw", "branch-misses"),
+            ("hw", "cache-references"),
+            ("hw", "cache-misses"),
+            ("hw", "stalled-cycles-frontend"),
+            ("hw", "stalled-cycles-backend"),
+            ("sw", "context-switches"),
+            ("sw", "cpu-migrations"),
+            ("sw", "page-faults"),
+        ];
+
+        let total_events = event_specs.len();
+
+        for (subsystem, event_name) in &event_specs {
+            // CPU -1 means monitor on all CPUs
+            let mut event = PerfEvent::new(subsystem.to_string(), event_name.to_string(), -1);
+            if let Err(e) = event.attach(pid) {
+                log::warn!("Failed to attach {} for PID {}: {}", event_name, pid, e);
+                failed_events.push(*event_name);
+                continue;
+            }
+            events.push(event);
+        }
+
+        if events.is_empty() {
+            return Err(anyhow::anyhow!(
+                "Failed to attach any perf events for PID {} (process may have terminated or insufficient permissions)",
+                pid
+            ));
+        }
+
+        if !failed_events.is_empty() {
+            log::info!(
+                "Successfully attached {} of {} events for PID {} (failed: {})",
+                events.len(),
+                total_events,
+                pid,
+                failed_events.join(", ")
+            );
+        }
+
+        self.process_events = Some(events);
+        self.process_counters = Some(PerfStatCounters::default());
+        self.filter_pid = Some(pid);
+        self.is_active = true;
+
+        Ok(())
+    }
+
+    /// Read all counters and update data structures
+    pub fn update(&mut self, timestamp_ms: u128, duration_ms: u128) -> Result<()> {
+        if !self.is_active {
+            return Ok(());
+        }
+
+        // Track if any counters succeeded
+        let mut any_success = false;
+
+        // Read process counters first if filtering by PID
+        if let Some(events) = &mut self.process_events {
+            let mut proc_counters = PerfStatCounters::default();
+            let mut proc_success = false;
+
+            for event in events {
+                if let Ok(value) = event.value(false) {
+                    proc_success = true;
+                    match event.event_name() {
+                        "cycles" | "cpu-cycles" => proc_counters.cycles = value,
+                        "instructions" => proc_counters.instructions = value,
+                        "branches" => proc_counters.branches = value,
+                        "branch-misses" => proc_counters.branch_misses = value,
+                        "cache-references" => proc_counters.cache_references = value,
+                        "cache-misses" => proc_counters.cache_misses = value,
+                        "stalled-cycles-frontend" => proc_counters.stalled_cycles_frontend = value,
+                        "stalled-cycles-backend" => proc_counters.stalled_cycles_backend = value,
+                        "context-switches" => proc_counters.context_switches = value,
+                        "cpu-migrations" => proc_counters.cpu_migrations = value,
+                        "page-faults" => proc_counters.page_faults = value,
+                        _ => {}
+                    }
+                }
+            }
+
+            if proc_success {
+                any_success = true;
+                if let Some(prev) = &mut self.process_counters {
+                    prev.update(&proc_counters, timestamp_ms);
+                }
+            }
+        }
+
+        // Read system-wide counters
+        let mut system_total = PerfStatCounters::default();
+
+        for (cpu, events) in &mut self.cpu_events {
+            let mut cpu_counters = PerfStatCounters::default();
+            let mut cpu_success = false;
+
+            for event in events {
+                match event.value(false) {
+                    Ok(value) => {
+                        cpu_success = true;
+                        match event.event_name() {
+                            "cycles" | "cpu-cycles" => cpu_counters.cycles = value,
+                            "instructions" => cpu_counters.instructions = value,
+                            "branches" => cpu_counters.branches = value,
+                            "branch-misses" => cpu_counters.branch_misses = value,
+                            "cache-references" => cpu_counters.cache_references = value,
+                            "cache-misses" => cpu_counters.cache_misses = value,
+                            "stalled-cycles-frontend" => {
+                                cpu_counters.stalled_cycles_frontend = value
+                            }
+                            "stalled-cycles-backend" => cpu_counters.stalled_cycles_backend = value,
+                            "context-switches" => cpu_counters.context_switches = value,
+                            "cpu-migrations" => cpu_counters.cpu_migrations = value,
+                            "page-faults" => cpu_counters.page_faults = value,
+                            _ => {}
+                        }
+                    }
+                    Err(e) => {
+                        log::debug!(
+                            "Failed to read {} for CPU {}: {}",
+                            event.event_name(),
+                            cpu,
+                            e
+                        );
+                    }
+                }
+            }
+
+            if cpu_success {
+                any_success = true;
+
+                // Update per-CPU data with deltas
+                if let Some(prev) = self.per_cpu_counters.get_mut(cpu) {
+                    prev.update(&cpu_counters, timestamp_ms);
+
+                    // Update history
+                    if let Some(history) = self.per_cpu_history.get_mut(cpu) {
+                        let metrics = prev.derived_metrics();
+                        history.push(&metrics, prev, duration_ms);
+                    }
+                }
+
+                // Aggregate to system total
+                system_total.cycles += cpu_counters.cycles;
+                system_total.instructions += cpu_counters.instructions;
+                system_total.branches += cpu_counters.branches;
+                system_total.branch_misses += cpu_counters.branch_misses;
+                system_total.cache_references += cpu_counters.cache_references;
+                system_total.cache_misses += cpu_counters.cache_misses;
+                system_total.stalled_cycles_frontend += cpu_counters.stalled_cycles_frontend;
+                system_total.stalled_cycles_backend += cpu_counters.stalled_cycles_backend;
+                system_total.context_switches += cpu_counters.context_switches;
+                system_total.cpu_migrations += cpu_counters.cpu_migrations;
+                system_total.page_faults += cpu_counters.page_faults;
+            }
+        }
+
+        // Aggregate per-CPU counters into LLC and NUMA node totals
+        let mut llc_totals: BTreeMap<usize, PerfStatCounters> = BTreeMap::new();
+        let mut node_totals: BTreeMap<usize, PerfStatCounters> = BTreeMap::new();
+
+        for (cpu, counters) in &self.per_cpu_counters {
+            // Aggregate by LLC
+            if let Some(&llc_id) = self.cpu_to_llc.get(cpu) {
+                let llc_counter = llc_totals.entry(llc_id).or_default();
+                llc_counter.cycles += counters.cycles;
+                llc_counter.instructions += counters.instructions;
+                llc_counter.branches += counters.branches;
+                llc_counter.branch_misses += counters.branch_misses;
+                llc_counter.cache_references += counters.cache_references;
+                llc_counter.cache_misses += counters.cache_misses;
+                llc_counter.stalled_cycles_frontend += counters.stalled_cycles_frontend;
+                llc_counter.stalled_cycles_backend += counters.stalled_cycles_backend;
+                llc_counter.context_switches += counters.context_switches;
+                llc_counter.cpu_migrations += counters.cpu_migrations;
+                llc_counter.page_faults += counters.page_faults;
+            }
+
+            // Aggregate by NUMA node
+            if let Some(&node_id) = self.cpu_to_node.get(cpu) {
+                let node_counter = node_totals.entry(node_id).or_default();
+                node_counter.cycles += counters.cycles;
+                node_counter.instructions += counters.instructions;
+                node_counter.branches += counters.branches;
+                node_counter.branch_misses += counters.branch_misses;
+                node_counter.cache_references += counters.cache_references;
+                node_counter.cache_misses += counters.cache_misses;
+                node_counter.stalled_cycles_frontend += counters.stalled_cycles_frontend;
+                node_counter.stalled_cycles_backend += counters.stalled_cycles_backend;
+                node_counter.context_switches += counters.context_switches;
+                node_counter.cpu_migrations += counters.cpu_migrations;
+                node_counter.page_faults += counters.page_faults;
+            }
+        }
+
+        // Update LLC counters with deltas and history
+        for (llc_id, llc_total) in llc_totals {
+            if let Some(prev) = self.per_llc_counters.get_mut(&llc_id) {
+                prev.update(&llc_total, timestamp_ms);
+                if let Some(history) = self.per_llc_history.get_mut(&llc_id) {
+                    let metrics = prev.derived_metrics();
+                    history.push(&metrics, prev, duration_ms);
+                }
+            }
+        }
+
+        // Update NUMA node counters with deltas and history
+        for (node_id, node_total) in node_totals {
+            if let Some(prev) = self.per_node_counters.get_mut(&node_id) {
+                prev.update(&node_total, timestamp_ms);
+                if let Some(history) = self.per_node_history.get_mut(&node_id) {
+                    let metrics = prev.derived_metrics();
+                    history.push(&metrics, prev, duration_ms);
+                }
+            }
+        }
+
+        if !any_success {
+            log::warn!("Failed to read any perf counters this cycle");
+            return Ok(());
+        }
+
+        // Update system counters with deltas
+        self.system_counters.update(&system_total, timestamp_ms);
+        let metrics = self.system_counters.derived_metrics();
+        self.system_history
+            .push(&metrics, &self.system_counters, duration_ms);
+
+        Ok(())
+    }
+
+    /// Cleanup all perf events
+    pub fn cleanup(&mut self) {
+        self.cpu_events.clear();
+        self.process_events = None;
+        self.is_active = false;
+    }
+
+    pub fn is_active(&self) -> bool {
+        self.is_active
+    }
+
+    pub fn filter_pid(&self) -> Option<i32> {
+        self.filter_pid
+    }
+
+    pub fn has_process_counters(&self) -> bool {
+        self.process_counters.is_some()
+    }
+}
+
+impl Default for PerfStatCollector {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+/// Thread-safe wrapper for PerfStatCollector
+#[derive(Clone)]
+pub struct SharedPerfStatCollector {
+    inner: Arc<Mutex<PerfStatCollector>>,
+}
+
+impl SharedPerfStatCollector {
+    pub fn new() -> Self {
+        Self {
+            inner: Arc::new(Mutex::new(PerfStatCollector::new())),
+        }
+    }
+
+    pub fn set_topology(
+        &self,
+        cpu_to_llc: BTreeMap<usize, usize>,
+        cpu_to_node: BTreeMap<usize, usize>,
+    ) {
+        self.inner
+            .lock()
+            .unwrap()
+            .set_topology(cpu_to_llc, cpu_to_node);
+    }
+
+    pub fn init_system_wide_with_history_size(
+        &self,
+        num_cpus: usize,
+        history_size: usize,
+    ) -> Result<()> {
+        self.inner
+            .lock()
+            .unwrap()
+            .init_system_wide_with_history_size(num_cpus, history_size)
+    }
+
+    pub fn init_process(&self, pid: i32) -> Result<()> {
+        self.inner.lock().unwrap().init_process(pid)
+    }
+
+    pub fn update(&self, timestamp_ms: u128, duration_ms: u128) -> Result<()> {
+        self.inner.lock().unwrap().update(timestamp_ms, duration_ms)
+    }
+
+    pub fn cleanup(&self) {
+        self.inner.lock().unwrap().cleanup();
+    }
+
+    pub fn is_active(&self) -> bool {
+        self.inner.lock().unwrap().is_active()
+    }
+
+    pub fn get_system_counters(&self) -> PerfStatCounters {
+        self.inner.lock().unwrap().system_counters.clone()
+    }
+
+    pub fn get_per_cpu_counters(&self) -> BTreeMap<usize, PerfStatCounters> {
+        self.inner.lock().unwrap().per_cpu_counters.clone()
+    }
+
+    pub fn get_per_llc_counters(&self) -> BTreeMap<usize, PerfStatCounters> {
+        self.inner.lock().unwrap().per_llc_counters.clone()
+    }
+
+    pub fn get_per_node_counters(&self) -> BTreeMap<usize, PerfStatCounters> {
+        self.inner.lock().unwrap().per_node_counters.clone()
+    }
+
+    pub fn get_process_counters(&self) -> Option<PerfStatCounters> {
+        self.inner.lock().unwrap().process_counters.clone()
+    }
+
+    pub fn filter_pid(&self) -> Option<i32> {
+        self.inner.lock().unwrap().filter_pid()
+    }
+}
+
+impl Default for SharedPerfStatCollector {
+    fn default() -> Self {
+        Self::new()
+    }
+}
+
+use std::sync::{Arc, Mutex};
diff --git a/tools/scxtop/src/profiling_events/mod.rs b/tools/scxtop/src/profiling_events/mod.rs
index 5e768f6c37..8bd8e0ca85 100644
--- a/tools/scxtop/src/profiling_events/mod.rs
+++ b/tools/scxtop/src/profiling_events/mod.rs
@@ -38,7 +38,7 @@ impl ProfilingEvent {
         match self {
             ProfilingEvent::Perf(p) => {
                 let mut p = p.clone();
-                p.cpu = cpu;
+                p.cpu = cpu as i32;
                 p.attach(process)?;
                 Ok(ProfilingEvent::Perf(p))
             }
diff --git a/tools/scxtop/src/profiling_events/perf.rs b/tools/scxtop/src/profiling_events/perf.rs
index a9e48c9acf..2bb07fc28b 100644
--- a/tools/scxtop/src/profiling_events/perf.rs
+++ b/tools/scxtop/src/profiling_events/perf.rs
@@ -50,7 +50,8 @@ pub fn perf_event_config(subsystem: &str, event: &str) -> Result<u64> {
 pub struct PerfEvent {
     pub subsystem: String,
     pub event: String,
-    pub cpu: usize,
+    /// CPU to monitor: -1 for all CPUs, or a specific CPU number
+    pub cpu: i32,
     pub alias: String,
     pub use_config: bool,
     pub event_type: u32,
@@ -72,7 +73,12 @@ impl Drop for PerfEvent {
 
 impl PerfEvent {
     /// Creates a PerfEvent.
-    pub fn new(subsystem: String, event: String, cpu: usize) -> Self {
+    ///
+    /// # Arguments
+    /// * `subsystem` - The perf subsystem (e.g., "hw", "sw", "tracepoint")
+    /// * `event` - The event name (e.g., "cycles", "instructions")
+    /// * `cpu` - CPU to monitor: -1 for all CPUs, or a specific CPU number
+    pub fn new(subsystem: String, event: String, cpu: i32) -> Self {
         Self {
             subsystem,
             event,
@@ -114,7 +120,11 @@ impl PerfEvent {
     }
 
     /// Returns a perf event from a string.
-    pub fn from_str_args(event: &str, cpu: usize) -> Result<Self> {
+    ///
+    /// # Arguments
+    /// * `event` - Event string in format "subsystem:event" (e.g., "hw:cycles")
+    /// * `cpu` - CPU to monitor: -1 for all CPUs, or a specific CPU number
+    pub fn from_str_args(event: &str, cpu: i32) -> Result<Self> {
         let event_parts: Vec<&str> = event.split(':').collect();
         if event_parts.len() != 2 {
             anyhow::bail!("Invalid perf event: {}", event);
@@ -270,11 +280,12 @@ impl PerfEvent {
         attrs.set_disabled(0);
         attrs.set_exclude_kernel(0);
         attrs.set_exclude_hv(0);
-        attrs.set_inherit(if process_id == -1 { 1 } else { 0 });
+        // Enable inherit for per-process monitoring to track all threads
+        attrs.set_inherit(if process_id > 0 { 1 } else { 0 });
         attrs.set_pinned(1);
 
         let result =
-            unsafe { perf::perf_event_open(&mut attrs, process_id, self.cpu as i32, -1, 0) };
+            unsafe { perf::perf_event_open(&mut attrs, process_id, self.cpu, -1, 0) };
 
         if result < 0 {
             return Err(anyhow!(
diff --git a/tools/scxtop/src/render/mod.rs b/tools/scxtop/src/render/mod.rs
index 2d4008d605..2d37d08225 100644
--- a/tools/scxtop/src/render/mod.rs
+++ b/tools/scxtop/src/render/mod.rs
@@ -10,6 +10,8 @@ pub mod memory;
 
 // Network rendering
 pub mod network;
+// Perf stat rendering
+pub mod perf_stat;
 // Scheduler rendering
 pub mod scheduler;
 // BPF program rendering
@@ -18,5 +20,6 @@ pub mod bpf_programs;
 pub use bpf_programs::BpfProgramRenderer;
 pub use memory::MemoryRenderer;
 pub use network::NetworkRenderer;
+pub use perf_stat::PerfStatRenderer;
 pub use process::ProcessRenderer;
 pub use scheduler::SchedulerRenderer;
diff --git a/tools/scxtop/src/render/perf_stat.rs b/tools/scxtop/src/render/perf_stat.rs
new file mode 100644
index 0000000000..d39784a3a4
--- /dev/null
+++ b/tools/scxtop/src/render/perf_stat.rs
@@ -0,0 +1,1235 @@
+// Copyright (c) Meta Platforms, Inc. and affiliates.
+//
+// This software may be used and distributed according to the terms of the
+// GNU General Public License version 2.
+
+use crate::{AppTheme, PerfStatCollector, PerfStatCounters, PerfStatViewMode, ProcData};
+use anyhow::Result;
+use num_format::{SystemLocale, ToFormattedString};
+use ratatui::layout::{Alignment, Constraint, Layout, Rect};
+use ratatui::style::{Color, Style};
+use ratatui::text::Line;
+use ratatui::widgets::{Block, BorderType, Cell, Padding, Paragraph, Row, Sparkline, Table};
+use ratatui::Frame;
+use std::collections::BTreeMap;
+
+/// Parameters for rendering perf stat view
+pub struct PerfStatViewParams<'a> {
+    pub collector: &'a PerfStatCollector,
+    pub view_mode: &'a PerfStatViewMode,
+    pub aggregation: &'a crate::PerfStatAggregationLevel,
+    pub filter_pid: Option<i32>,
+    pub proc_data: &'a BTreeMap<i32, ProcData>,
+    pub tick_rate_ms: usize,
+    pub localize: bool,
+    pub locale: &'a SystemLocale,
+    pub theme: &'a AppTheme,
+}
+
+/// Renderer for perf stat view
+pub struct PerfStatRenderer;
+
+impl PerfStatRenderer {
+    /// Main render entry point
+    pub fn render_perf_stat_view(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        if !params.collector.is_active() {
+            Self::render_inactive_message(frame, area, params.theme);
+            return Ok(());
+        }
+
+        // Dispatch based on aggregation level
+        use crate::PerfStatAggregationLevel;
+        match params.aggregation {
+            PerfStatAggregationLevel::System => {
+                // Render single system-wide view
+                match params.view_mode {
+                    PerfStatViewMode::Table => Self::render_table_view(frame, area, params),
+                    PerfStatViewMode::Chart => Self::render_chart_view(frame, area, params),
+                }
+            }
+            PerfStatAggregationLevel::Llc => {
+                // Render grid of LLC panels
+                Self::render_llc_grid(frame, area, params)
+            }
+            PerfStatAggregationLevel::Node => {
+                // Render grid of NUMA node panels
+                Self::render_node_grid(frame, area, params)
+            }
+        }
+    }
+
+    /// Render table view showing all counters and derived metrics
+    fn render_table_view(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> {
+        // Check if we're filtering by process but don't have counters
+        if params.filter_pid.is_some() && !params.collector.has_process_counters() {
+            Self::render_process_filter_error(
+                frame,
+                area,
+                params.filter_pid.unwrap(),
+                params.theme,
+            );
+            return Ok(());
+        }
+
+        // Split into sections: header, system counters, derived metrics, per-CPU summary
+        let [header_area, counters_area, metrics_area, percpu_area] = Layout::vertical([
+            Constraint::Length(3),
+            Constraint::Percentage(40),
+            Constraint::Percentage(30),
+            Constraint::Percentage(30),
+        ])
+        .areas(area);
+
+        // Render header with title and filter info
+        Self::render_header(frame, header_area, params)?;
+
+        // Render main counter table
+        Self::render_counter_table(frame, counters_area, params)?;
+
+        // Render derived metrics table
+        Self::render_derived_metrics_table(frame, metrics_area, params)?;
+
+        // Render per-CPU summary
+        Self::render_per_cpu_summary(frame, percpu_area, params)?;
+
+        Ok(())
+    }
+
+    /// Render header section
+    fn render_header(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> {
+        let title = if let Some(pid) = params.filter_pid {
+            let proc_name = params
+                .proc_data
+                .get(&pid)
+                .map(|p| p.process_name.as_str())
+                .unwrap_or("unknown");
+            format!(
+                "Performance Counter Statistics - {} (PID: {})",
+                proc_name, pid
+            )
+        } else {
+            "Performance Counter Statistics (System-Wide)".to_string()
+        };
+
+        let num_cpus = params.collector.per_cpu_counters.len();
+        let status_text = format!("● Active - {} CPUs monitored", num_cpus);
+
+        let mode_text = if params.filter_pid.is_some() {
+            format!(
+                "View: {} | Aggregation: {} | 'v' view | 'a' agg | 'c' clear filter",
+                match params.view_mode {
+                    PerfStatViewMode::Table => "Table",
+                    PerfStatViewMode::Chart => "Chart",
+                },
+                params.aggregation
+            )
+        } else {
+            format!(
+                "View: {} | Aggregation: {} | 'v' view | 'a' agg | 'p' filter",
+                match params.view_mode {
+                    PerfStatViewMode::Table => "Table",
+                    PerfStatViewMode::Chart => "Chart",
+                },
+                params.aggregation
+            )
+        };
+
+        let block = Block::bordered()
+            .title_top(
+                Line::from(title)
+                    .style(params.theme.title_style())
+                    .centered(),
+            )
+            .title_top(
+                Line::from(status_text)
+                    .style(params.theme.text_important_color())
+                    .right_aligned(),
+            )
+            .title_bottom(
+                Line::from(mode_text)
+                    .style(params.theme.text_color())
+                    .centered(),
+            )
+            .border_type(BorderType::Rounded)
+            .style(params.theme.border_style());
+
+        frame.render_widget(block, area);
+        Ok(())
+    }
+
+    /// Render main counter table (similar to perf stat output)
+    fn render_counter_table(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        // Use process counters if filtering, otherwise system counters
+        let counters = if params.filter_pid.is_some() {
+            params
+                .collector
+                .process_counters
+                .as_ref()
+                .unwrap_or(&params.collector.system_counters)
+        } else {
+            &params.collector.system_counters
+        };
+        let duration_ms = params.tick_rate_ms as f64 / 1000.0;
+
+        // Table columns: Counter Name | Value | Rate | Notes
+        let header = Row::new(vec![
+            Cell::from("Counter").style(params.theme.title_style()),
+            Cell::from("Value").style(params.theme.title_style()),
+            Cell::from("Rate/sec").style(params.theme.title_style()),
+            Cell::from("Notes").style(params.theme.title_style()),
+        ]);
+
+        let rows = vec![
+            Self::create_counter_row(
+                "cpu-clock",
+                counters.cycles_delta,
+                duration_ms,
+                format!(
+                    "{:.3} CPUs utilized",
+                    counters.cycles_delta as f64 / duration_ms / 1_000_000_000.0
+                ),
+                params,
+            ),
+            Self::create_counter_row(
+                "context-switches",
+                counters.context_switches_delta,
+                duration_ms,
+                format!(
+                    "{:.3} K/sec",
+                    counters.context_switches_delta as f64 / duration_ms / 1000.0
+                ),
+                params,
+            ),
+            Self::create_counter_row(
+                "cpu-migrations",
+                counters.cpu_migrations_delta,
+                duration_ms,
+                format!(
+                    "{:.3} K/sec",
+                    counters.cpu_migrations_delta as f64 / duration_ms / 1000.0
+                ),
+                params,
+            ),
+            Self::create_counter_row(
+                "page-faults",
+                counters.page_faults_delta,
+                duration_ms,
+                format!(
+                    "{:.3} K/sec",
+                    counters.page_faults_delta as f64 / duration_ms / 1000.0
+                ),
+                params,
+            ),
+            Self::create_counter_row(
+                "cycles",
+                counters.cycles_delta,
+                duration_ms,
+                format!(
+                    "{:.3} GHz",
+                    counters.cycles_delta as f64 / duration_ms / 1_000_000_000.0
+                ),
+                params,
+            ),
+            Self::create_counter_row(
+                "instructions",
+                counters.instructions_delta,
+                duration_ms,
+                Self::format_ipc_note(counters),
+                params,
+            ),
+            Self::create_counter_row(
+                "branches",
+                counters.branches_delta,
+                duration_ms,
+                format!(
+                    "{:.3} M/sec",
+                    counters.branches_delta as f64 / duration_ms / 1_000_000.0
+                ),
+                params,
+            ),
+            Self::create_counter_row(
+                "branch-misses",
+                counters.branch_misses_delta,
+                duration_ms,
+                Self::format_branch_miss_note(counters),
+                params,
+            ),
+            Self::create_counter_row(
+                "cache-references",
+                counters.cache_references_delta,
+                duration_ms,
+                format!(
+                    "{:.3} M/sec",
+                    counters.cache_references_delta as f64 / duration_ms / 1_000_000.0
+                ),
+                params,
+            ),
+            Self::create_counter_row(
+                "cache-misses",
+                counters.cache_misses_delta,
+                duration_ms,
+                Self::format_cache_miss_note(counters),
+                params,
+            ),
+            Self::create_counter_row(
+                "stalled-cycles-frontend",
+                counters.stalled_cycles_frontend_delta,
+                duration_ms,
+                Self::format_stalled_frontend_note(counters),
+                params,
+            ),
+            Self::create_counter_row(
+                "stalled-cycles-backend",
+                counters.stalled_cycles_backend_delta,
+                duration_ms,
+                Self::format_stalled_backend_note(counters),
+                params,
+            ),
+        ];
+
+        let table = Table::new(
+            rows,
+            vec![
+                Constraint::Percentage(30), // Counter name
+                Constraint::Percentage(20), // Value
+                Constraint::Percentage(20), // Rate
+                Constraint::Percentage(30), // Notes
+            ],
+        )
+        .header(header)
+        .block(
+            Block::bordered()
+                .title_top(
+                    Line::from("Performance Counters")
+                        .style(params.theme.title_style())
+                        .centered(),
+                )
+                .border_type(BorderType::Rounded)
+                .style(params.theme.border_style()),
+        );
+
+        frame.render_widget(table, area);
+        Ok(())
+    }
+
+    /// Create a row for the counter table
+    fn create_counter_row(
+        name: &str,
+        value: u64,
+        duration_secs: f64,
+        notes: String,
+        params: &PerfStatViewParams,
+    ) -> Row<'static> {
+        let rate = if duration_secs > 0.0 {
+            (value as f64 / duration_secs) as u64
+        } else {
+            0
+        };
+
+        let value_str = if params.localize {
+            value.to_formatted_string(params.locale)
+        } else {
+            value.to_string()
+        };
+
+        let rate_str = if params.localize {
+            rate.to_formatted_string(params.locale)
+        } else {
+            rate.to_string()
+        };
+
+        Row::new(vec![
+            Cell::from(name.to_string()).style(params.theme.text_color()),
+            Cell::from(value_str).style(params.theme.text_important_color()),
+            Cell::from(rate_str).style(params.theme.text_color()),
+            Cell::from(notes).style(params.theme.text_color()),
+        ])
+    }
+
+    /// Format IPC note (instructions per cycle)
+    fn format_ipc_note(counters: &PerfStatCounters) -> String {
+        let metrics = counters.derived_metrics();
+        if counters.cycles_delta > 0 {
+            format!("{:.2} insn per cycle", metrics.ipc)
+        } else {
+            "".to_string()
+        }
+    }
+
+    /// Format branch miss note
+    fn format_branch_miss_note(counters: &PerfStatCounters) -> String {
+        let metrics = counters.derived_metrics();
+        if counters.branches_delta > 0 {
+            format!("{:.2}% of all branches", metrics.branch_miss_rate)
+        } else {
+            "".to_string()
+        }
+    }
+
+    /// Format cache miss note
+    fn format_cache_miss_note(counters: &PerfStatCounters) -> String {
+        let metrics = counters.derived_metrics();
+        if counters.cache_references_delta > 0 {
+            format!("{:.2}% of all cache accesses", metrics.cache_miss_rate)
+        } else {
+            "".to_string()
+        }
+    }
+
+    /// Format stalled frontend note
+    fn format_stalled_frontend_note(counters: &PerfStatCounters) -> String {
+        let metrics = counters.derived_metrics();
+        if counters.cycles_delta > 0 {
+            format!("{:.2}% frontend cycles idle", metrics.stalled_frontend_pct)
+        } else {
+            "".to_string()
+        }
+    }
+
+    /// Format stalled backend note
+    fn format_stalled_backend_note(counters: &PerfStatCounters) -> String {
+        let metrics = counters.derived_metrics();
+        if counters.cycles_delta > 0 {
+            format!("{:.2}% backend cycles idle", metrics.stalled_backend_pct)
+        } else {
+            "".to_string()
+        }
+    }
+
+    /// Render derived metrics table (IPC, stalls, etc.)
+    fn render_derived_metrics_table(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        // Use process counters if filtering, otherwise system counters
+        let counters = if params.filter_pid.is_some() {
+            params
+                .collector
+                .process_counters
+                .as_ref()
+                .unwrap_or(&params.collector.system_counters)
+        } else {
+            &params.collector.system_counters
+        };
+        let metrics = counters.derived_metrics();
+
+        let header = Row::new(vec![
+            Cell::from("Metric").style(params.theme.title_style()),
+            Cell::from("Value").style(params.theme.title_style()),
+            Cell::from("Status").style(params.theme.title_style()),
+        ]);
+
+        let rows = vec![
+            Self::create_metric_row(
+                "IPC (Instructions per Cycle)",
+                format!("{:.3}", metrics.ipc),
+                Self::get_ipc_status_color(metrics.ipc, params.theme),
+                params,
+            ),
+            Self::create_metric_row(
+                "Cache Miss Rate",
+                format!("{:.2}%", metrics.cache_miss_rate),
+                Self::get_miss_rate_status_color(metrics.cache_miss_rate, params.theme),
+                params,
+            ),
+            Self::create_metric_row(
+                "Branch Miss Rate",
+                format!("{:.2}%", metrics.branch_miss_rate),
+                Self::get_miss_rate_status_color(metrics.branch_miss_rate, params.theme),
+                params,
+            ),
+            Self::create_metric_row(
+                "Frontend Stalls",
+                format!("{:.2}%", metrics.stalled_frontend_pct),
+                Self::get_stall_status_color(metrics.stalled_frontend_pct, params.theme),
+                params,
+            ),
+            Self::create_metric_row(
+                "Backend Stalls",
+                format!("{:.2}%", metrics.stalled_backend_pct),
+                Self::get_stall_status_color(metrics.stalled_backend_pct, params.theme),
+                params,
+            ),
+        ];
+
+        let table = Table::new(
+            rows,
+            vec![
+                Constraint::Percentage(50), // Metric name
+                Constraint::Percentage(25), // Value
+                Constraint::Percentage(25), // Status
+            ],
+        )
+        .header(header)
+        .block(
+            Block::bordered()
+                .title_top(
+                    Line::from("Derived Metrics")
+                        .style(params.theme.title_style())
+                        .centered(),
+                )
+                .border_type(BorderType::Rounded)
+                .style(params.theme.border_style()),
+        );
+
+        frame.render_widget(table, area);
+        Ok(())
+    }
+
+    /// Create a row for the derived metrics table
+    fn create_metric_row(
+        name: &str,
+        value: String,
+        status_color: Color,
+        params: &PerfStatViewParams,
+    ) -> Row<'static> {
+        let status = if status_color == params.theme.positive_value_color() {
+            "Good"
+        } else if status_color == params.theme.negative_value_color() {
+            "Poor"
+        } else {
+            "OK"
+        };
+
+        Row::new(vec![
+            Cell::from(name.to_string()).style(params.theme.text_color()),
+            Cell::from(value).style(params.theme.text_important_color()),
+            Cell::from(status.to_string()).style(Style::default().fg(status_color)),
+        ])
+    }
+
+    /// Get status color for IPC (higher is better)
+    fn get_ipc_status_color(ipc: f64, theme: &AppTheme) -> Color {
+        if ipc >= 1.0 {
+            theme.positive_value_color()
+        } else if ipc >= 0.5 {
+            theme.text_important_color()
+        } else {
+            theme.negative_value_color()
+        }
+    }
+
+    /// Get status color for miss rates (lower is better)
+    fn get_miss_rate_status_color(rate: f64, theme: &AppTheme) -> Color {
+        if rate < 3.0 {
+            theme.positive_value_color()
+        } else if rate < 10.0 {
+            theme.text_important_color()
+        } else {
+            theme.negative_value_color()
+        }
+    }
+
+    /// Get status color for stall percentages (lower is better)
+    fn get_stall_status_color(pct: f64, theme: &AppTheme) -> Color {
+        if pct < 20.0 {
+            theme.positive_value_color()
+        } else if pct < 40.0 {
+            theme.text_important_color()
+        } else {
+            theme.negative_value_color()
+        }
+    }
+
+    /// Render per-CPU summary (top CPUs by activity)
+    fn render_per_cpu_summary(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        // Get top 5 CPUs by instructions
+        let mut cpu_activity: Vec<(usize, u64)> = params
+            .collector
+            .per_cpu_counters
+            .iter()
+            .map(|(cpu, counters)| (*cpu, counters.instructions_delta))
+            .collect();
+        cpu_activity.sort_by(|a, b| b.1.cmp(&a.1));
+        cpu_activity.truncate(5);
+
+        let header = Row::new(vec![
+            Cell::from("CPU").style(params.theme.title_style()),
+            Cell::from("Instructions").style(params.theme.title_style()),
+            Cell::from("IPC").style(params.theme.title_style()),
+            Cell::from("Cache Miss %").style(params.theme.title_style()),
+        ]);
+
+        let rows: Vec<Row> = cpu_activity
+            .iter()
+            .filter_map(|(cpu, _)| {
+                params.collector.per_cpu_counters.get(cpu).map(|counters| {
+                    let metrics = counters.derived_metrics();
+                    let instructions_str = if params.localize {
+                        counters
+                            .instructions_delta
+                            .to_formatted_string(params.locale)
+                    } else {
+                        counters.instructions_delta.to_string()
+                    };
+
+                    Row::new(vec![
+                        Cell::from(format!("CPU {}", cpu)).style(params.theme.text_color()),
+                        Cell::from(instructions_str).style(params.theme.text_important_color()),
+                        Cell::from(format!("{:.2}", metrics.ipc)).style(params.theme.text_color()),
+                        Cell::from(format!("{:.2}%", metrics.cache_miss_rate))
+                            .style(params.theme.text_color()),
+                    ])
+                })
+            })
+            .collect();
+
+        let table = Table::new(
+            rows,
+            vec![
+                Constraint::Percentage(25),
+                Constraint::Percentage(35),
+                Constraint::Percentage(20),
+                Constraint::Percentage(20),
+            ],
+        )
+        .header(header)
+        .block(
+            Block::bordered()
+                .title_top(
+                    Line::from("Top CPUs by Activity")
+                        .style(params.theme.title_style())
+                        .centered(),
+                )
+                .border_type(BorderType::Rounded)
+                .style(params.theme.border_style()),
+        );
+
+        frame.render_widget(table, area);
+        Ok(())
+    }
+
+    /// Render chart view (placeholder for Phase 4)
+    fn render_chart_view(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> {
+        let history = &params.collector.system_history;
+
+        // Check if we have any data
+        if history.ipc_history.is_empty() {
+            Self::render_empty_chart(frame, area, "Collecting data...", params.theme);
+            return Ok(());
+        }
+
+        // Split into grid: 2x3 for different metrics
+        let [top_row, middle_row, bottom_row] = Layout::vertical([
+            Constraint::Percentage(33),
+            Constraint::Percentage(33),
+            Constraint::Percentage(34),
+        ])
+        .areas(area);
+
+        let [ipc_area, cache_area] =
+            Layout::horizontal([Constraint::Percentage(50), Constraint::Percentage(50)])
+                .areas(top_row);
+
+        let [branch_area, stalls_area] =
+            Layout::horizontal([Constraint::Percentage(50), Constraint::Percentage(50)])
+                .areas(middle_row);
+
+        let [cycles_area, instructions_area] =
+            Layout::horizontal([Constraint::Percentage(50), Constraint::Percentage(50)])
+                .areas(bottom_row);
+
+        // Render individual charts
+        Self::render_ipc_chart(frame, ipc_area, params)?;
+        Self::render_cache_miss_chart(frame, cache_area, params)?;
+        Self::render_branch_miss_chart(frame, branch_area, params)?;
+        Self::render_stalls_chart(frame, stalls_area, params)?;
+        Self::render_cycles_chart(frame, cycles_area, params)?;
+        Self::render_instructions_chart(frame, instructions_area, params)?;
+
+        Ok(())
+    }
+
+    /// Render IPC trend chart
+    fn render_ipc_chart(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> {
+        let history = &params.collector.system_history.ipc_history;
+
+        if history.is_empty() {
+            Self::render_empty_chart(frame, area, "IPC", params.theme);
+            return Ok(());
+        }
+
+        // Convert to u64 for Sparkline (scale by 1000 to preserve precision)
+        let mut data: Vec<u64> = history.iter().map(|&v| (v * 1000.0) as u64).collect();
+
+        // Adjust data to fill available width (accounting for border)
+        let target_width = area.width.saturating_sub(2) as usize;
+        data = Self::adjust_data_for_width(data, target_width);
+
+        let max_val = data.iter().copied().max().unwrap_or(1);
+        let current = history.last().copied().unwrap_or(0.0);
+        let avg = history.iter().sum::<f64>() / history.len() as f64;
+        let min = history.iter().copied().fold(f64::INFINITY, f64::min);
+        let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max);
+
+        let sparkline = Sparkline::default()
+            .data(&data)
+            .max(max_val)
+            .direction(ratatui::widgets::RenderDirection::RightToLeft)
+            .style(Self::get_ipc_status_color(current, params.theme))
+            .block(
+                Block::bordered()
+                    .title_top(
+                        Line::from("IPC (Instructions per Cycle)")
+                            .style(params.theme.title_style())
+                            .centered(),
+                    )
+                    .title_bottom(
+                        Line::from(format!(
+                            "Cur: {:.3} | Avg: {:.3} | Min: {:.3} | Max: {:.3}",
+                            current, avg, min, max
+                        ))
+                        .style(params.theme.text_color())
+                        .centered(),
+                    )
+                    .border_type(BorderType::Rounded)
+                    .style(params.theme.border_style()),
+            );
+
+        frame.render_widget(sparkline, area);
+        Ok(())
+    }
+
+    /// Render cache miss rate chart
+    fn render_cache_miss_chart(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        let history = &params.collector.system_history.cache_miss_rate_history;
+
+        if history.is_empty() {
+            Self::render_empty_chart(frame, area, "Cache Miss Rate", params.theme);
+            return Ok(());
+        }
+
+        // Scale percentages by 100 for display
+        let mut data: Vec<u64> = history.iter().map(|&v| (v * 100.0) as u64).collect();
+
+        // Adjust data to fill available width
+        let target_width = area.width.saturating_sub(2) as usize;
+        data = Self::adjust_data_for_width(data, target_width);
+
+        let max_val = data.iter().copied().max().unwrap_or(1);
+        let current = history.last().copied().unwrap_or(0.0);
+        let avg = history.iter().sum::<f64>() / history.len() as f64;
+        let min = history.iter().copied().fold(f64::INFINITY, f64::min);
+        let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max);
+
+        let sparkline = Sparkline::default()
+            .data(&data)
+            .max(max_val)
+            .direction(ratatui::widgets::RenderDirection::RightToLeft)
+            .style(Self::get_miss_rate_status_color(current, params.theme))
+            .block(
+                Block::bordered()
+                    .title_top(
+                        Line::from("Cache Miss Rate (%)")
+                            .style(params.theme.title_style())
+                            .centered(),
+                    )
+                    .title_bottom(
+                        Line::from(format!(
+                            "Cur: {:.2}% | Avg: {:.2}% | Min: {:.2}% | Max: {:.2}%",
+                            current, avg, min, max
+                        ))
+                        .style(params.theme.text_color())
+                        .centered(),
+                    )
+                    .border_type(BorderType::Rounded)
+                    .style(params.theme.border_style()),
+            );
+
+        frame.render_widget(sparkline, area);
+        Ok(())
+    }
+
+    /// Render branch miss rate chart
+    fn render_branch_miss_chart(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        let history = &params.collector.system_history.branch_miss_rate_history;
+
+        if history.is_empty() {
+            Self::render_empty_chart(frame, area, "Branch Miss Rate", params.theme);
+            return Ok(());
+        }
+
+        let mut data: Vec<u64> = history.iter().map(|&v| (v * 100.0) as u64).collect();
+
+        // Adjust data to fill available width
+        let target_width = area.width.saturating_sub(2) as usize;
+        data = Self::adjust_data_for_width(data, target_width);
+
+        let max_val = data.iter().copied().max().unwrap_or(1);
+        let current = history.last().copied().unwrap_or(0.0);
+        let avg = history.iter().sum::<f64>() / history.len() as f64;
+        let min = history.iter().copied().fold(f64::INFINITY, f64::min);
+        let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max);
+
+        let sparkline = Sparkline::default()
+            .data(&data)
+            .max(max_val)
+            .direction(ratatui::widgets::RenderDirection::RightToLeft)
+            .style(Self::get_miss_rate_status_color(current, params.theme))
+            .block(
+                Block::bordered()
+                    .title_top(
+                        Line::from("Branch Miss Rate (%)")
+                            .style(params.theme.title_style())
+                            .centered(),
+                    )
+                    .title_bottom(
+                        Line::from(format!(
+                            "Cur: {:.2}% | Avg: {:.2}% | Min: {:.2}% | Max: {:.2}%",
+                            current, avg, min, max
+                        ))
+                        .style(params.theme.text_color())
+                        .centered(),
+                    )
+                    .border_type(BorderType::Rounded)
+                    .style(params.theme.border_style()),
+            );
+
+        frame.render_widget(sparkline, area);
+        Ok(())
+    }
+
+    /// Render frontend/backend stalls chart
+    fn render_stalls_chart(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        let history = &params.collector.system_history.stalled_frontend_pct_history;
+
+        if history.is_empty() {
+            Self::render_empty_chart(frame, area, "Pipeline Stalls", params.theme);
+            return Ok(());
+        }
+
+        let mut data: Vec<u64> = history.iter().map(|&v| (v * 100.0) as u64).collect();
+
+        // Adjust data to fill available width
+        let target_width = area.width.saturating_sub(2) as usize;
+        data = Self::adjust_data_for_width(data, target_width);
+
+        let max_val = data.iter().copied().max().unwrap_or(1);
+        let current = history.last().copied().unwrap_or(0.0);
+        let avg = history.iter().sum::<f64>() / history.len() as f64;
+        let min = history.iter().copied().fold(f64::INFINITY, f64::min);
+        let max = history.iter().copied().fold(f64::NEG_INFINITY, f64::max);
+
+        let sparkline = Sparkline::default()
+            .data(&data)
+            .max(max_val)
+            .direction(ratatui::widgets::RenderDirection::RightToLeft)
+            .style(Self::get_stall_status_color(current, params.theme))
+            .block(
+                Block::bordered()
+                    .title_top(
+                        Line::from("Frontend Stalls (%)")
+                            .style(params.theme.title_style())
+                            .centered(),
+                    )
+                    .title_bottom(
+                        Line::from(format!(
+                            "Cur: {:.2}% | Avg: {:.2}% | Min: {:.2}% | Max: {:.2}%",
+                            current, avg, min, max
+                        ))
+                        .style(params.theme.text_color())
+                        .centered(),
+                    )
+                    .border_type(BorderType::Rounded)
+                    .style(params.theme.border_style()),
+            );
+
+        frame.render_widget(sparkline, area);
+        Ok(())
+    }
+
+    /// Render cycles per second chart
+    fn render_cycles_chart(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        let history = &params.collector.system_history.cycles_per_sec;
+
+        if history.is_empty() {
+            Self::render_empty_chart(frame, area, "Cycles/sec", params.theme);
+            return Ok(());
+        }
+
+        // Adjust data to fill available width
+        let target_width = area.width.saturating_sub(2) as usize;
+        let data = Self::adjust_data_for_width(history.clone(), target_width);
+
+        let max_val = data.iter().copied().max().unwrap_or(1);
+        let current = history.last().copied().unwrap_or(0);
+        let avg = history.iter().sum::<u64>() / history.len() as u64;
+        let min = history.iter().copied().min().unwrap_or(0);
+        let max = history.iter().copied().max().unwrap_or(0);
+
+        let sparkline = Sparkline::default()
+            .data(&data)
+            .max(max_val)
+            .direction(ratatui::widgets::RenderDirection::RightToLeft)
+            .style(params.theme.sparkline_style())
+            .block(
+                Block::bordered()
+                    .title_top(
+                        Line::from("Cycles/sec")
+                            .style(params.theme.title_style())
+                            .centered(),
+                    )
+                    .title_bottom(
+                        Line::from(format!(
+                            "Cur: {:.2} | Avg: {:.2} | Min: {:.2} | Max: {:.2} GHz",
+                            current as f64 / 1_000_000_000.0,
+                            avg as f64 / 1_000_000_000.0,
+                            min as f64 / 1_000_000_000.0,
+                            max as f64 / 1_000_000_000.0
+                        ))
+                        .style(params.theme.text_color())
+                        .centered(),
+                    )
+                    .border_type(BorderType::Rounded)
+                    .style(params.theme.border_style()),
+            );
+
+        frame.render_widget(sparkline, area);
+        Ok(())
+    }
+
+    /// Render instructions per second chart
+    fn render_instructions_chart(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+    ) -> Result<()> {
+        let history = &params.collector.system_history.instructions_per_sec;
+
+        if history.is_empty() {
+            Self::render_empty_chart(frame, area, "Instructions/sec", params.theme);
+            return Ok(());
+        }
+
+        // Adjust data to fill available width
+        let target_width = area.width.saturating_sub(2) as usize;
+        let data = Self::adjust_data_for_width(history.clone(), target_width);
+
+        let max_val = data.iter().copied().max().unwrap_or(1);
+        let current = history.last().copied().unwrap_or(0);
+        let avg = history.iter().sum::<u64>() / history.len() as u64;
+        let min = history.iter().copied().min().unwrap_or(0);
+        let max = history.iter().copied().max().unwrap_or(0);
+
+        let sparkline = Sparkline::default()
+            .data(&data)
+            .max(max_val)
+            .direction(ratatui::widgets::RenderDirection::RightToLeft)
+            .style(params.theme.sparkline_style())
+            .block(
+                Block::bordered()
+                    .title_top(
+                        Line::from("Instructions/sec")
+                            .style(params.theme.title_style())
+                            .centered(),
+                    )
+                    .title_bottom(
+                        Line::from(format!(
+                            "Cur: {:.2} | Avg: {:.2} | Min: {:.2} | Max: {:.2} B/s",
+                            current as f64 / 1_000_000_000.0,
+                            avg as f64 / 1_000_000_000.0,
+                            min as f64 / 1_000_000_000.0,
+                            max as f64 / 1_000_000_000.0
+                        ))
+                        .style(params.theme.text_color())
+                        .centered(),
+                    )
+                    .border_type(BorderType::Rounded)
+                    .style(params.theme.border_style()),
+            );
+
+        frame.render_widget(sparkline, area);
+        Ok(())
+    }
+
+    /// Render grid of LLC panels
+    fn render_llc_grid(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> {
+        let llc_ids: Vec<usize> = params.collector.per_llc_counters.keys().copied().collect();
+
+        if llc_ids.is_empty() {
+            Self::render_empty_chart(frame, area, "No LLC data available", params.theme);
+            return Ok(());
+        }
+
+        Self::render_domain_grid(frame, area, params, &llc_ids, "LLC")
+    }
+
+    /// Render grid of NUMA node panels
+    fn render_node_grid(frame: &mut Frame, area: Rect, params: &PerfStatViewParams) -> Result<()> {
+        let node_ids: Vec<usize> = params.collector.per_node_counters.keys().copied().collect();
+
+        if node_ids.is_empty() {
+            Self::render_empty_chart(frame, area, "No NUMA node data available", params.theme);
+            return Ok(());
+        }
+
+        Self::render_domain_grid(frame, area, params, &node_ids, "Node")
+    }
+
+    /// Render grid of domain panels (generic for LLC or NUMA)
+    fn render_domain_grid(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+        domain_ids: &[usize],
+        domain_type: &str,
+    ) -> Result<()> {
+        // Header area
+        let [header_area, grid_area] =
+            Layout::vertical([Constraint::Length(3), Constraint::Min(1)]).areas(area);
+
+        // Render header
+        let title = format!(
+            "Performance Counters by {} - {} domains",
+            domain_type,
+            domain_ids.len()
+        );
+        let mode_text = format!(
+            "Aggregation: {} | 'a' toggle | 'v' view mode ({})",
+            domain_type, params.view_mode
+        );
+
+        let block = Block::bordered()
+            .title_top(
+                Line::from(title)
+                    .style(params.theme.title_style())
+                    .centered(),
+            )
+            .title_bottom(
+                Line::from(mode_text)
+                    .style(params.theme.text_color())
+                    .centered(),
+            )
+            .border_type(BorderType::Rounded)
+            .style(params.theme.border_style());
+
+        frame.render_widget(block, header_area);
+
+        // Calculate grid layout
+        let num_domains = domain_ids.len();
+        let cols = if num_domains >= 4 {
+            3
+        } else if num_domains >= 2 {
+            2
+        } else {
+            1
+        };
+        let rows = num_domains.div_ceil(cols);
+
+        // Create grid
+        let row_constraints: Vec<Constraint> = (0..rows)
+            .map(|_| Constraint::Ratio(1, rows as u32))
+            .collect();
+        let row_areas = Layout::vertical(row_constraints).split(grid_area);
+
+        let mut domain_idx = 0;
+        for &row_area in row_areas.iter() {
+            let col_constraints: Vec<Constraint> = (0..cols)
+                .map(|_| Constraint::Ratio(1, cols as u32))
+                .collect();
+            let col_areas = Layout::horizontal(col_constraints).split(row_area);
+
+            for &col_area in col_areas.iter() {
+                if domain_idx < num_domains {
+                    let domain_id = domain_ids[domain_idx];
+                    Self::render_domain_panel(frame, col_area, params, domain_id, domain_type)?;
+                    domain_idx += 1;
+                }
+            }
+        }
+
+        Ok(())
+    }
+
+    /// Render a compact panel for one LLC or NUMA node
+    fn render_domain_panel(
+        frame: &mut Frame,
+        area: Rect,
+        params: &PerfStatViewParams,
+        domain_id: usize,
+        domain_type: &str,
+    ) -> Result<()> {
+        use crate::PerfStatAggregationLevel;
+
+        // Get counters for this domain
+        let counters = match params.aggregation {
+            PerfStatAggregationLevel::Llc => params.collector.per_llc_counters.get(&domain_id),
+            PerfStatAggregationLevel::Node => params.collector.per_node_counters.get(&domain_id),
+            PerfStatAggregationLevel::System => unreachable!(),
+        };
+
+        if counters.is_none() {
+            return Ok(());
+        }
+        let counters = counters.unwrap();
+        let metrics = counters.derived_metrics();
+
+        // Format metrics
+        let lines = vec![
+            Line::from(format!("IPC: {:.3}", metrics.ipc))
+                .style(Style::default().fg(Self::get_ipc_status_color(metrics.ipc, params.theme))),
+            Line::from(format!("Cache Miss: {:.2}%", metrics.cache_miss_rate)).style(
+                Style::default().fg(Self::get_miss_rate_status_color(
+                    metrics.cache_miss_rate,
+                    params.theme,
+                )),
+            ),
+            Line::from(format!("Branch Miss: {:.2}%", metrics.branch_miss_rate)).style(
+                Style::default().fg(Self::get_miss_rate_status_color(
+                    metrics.branch_miss_rate,
+                    params.theme,
+                )),
+            ),
+            Line::from(format!(
+                "Frontend Stall: {:.1}%",
+                metrics.stalled_frontend_pct
+            ))
+            .style(Style::default().fg(Self::get_stall_status_color(
+                metrics.stalled_frontend_pct,
+                params.theme,
+            ))),
+            Line::from(format!(
+                "Cycles: {:.2} GHz",
+                counters.cycles_delta as f64
+                    / (params.tick_rate_ms as f64 / 1000.0)
+                    / 1_000_000_000.0
+            )),
+        ];
+
+        let block = Block::bordered()
+            .title_top(
+                Line::from(format!("{} {}", domain_type, domain_id))
+                    .style(params.theme.title_style())
+                    .centered(),
+            )
+            .border_type(BorderType::Rounded)
+            .style(params.theme.border_style());
+
+        let paragraph = Paragraph::new(lines).block(block);
+        frame.render_widget(paragraph, area);
+
+        Ok(())
+    }
+
+    /// Adjust data vector to match target width (truncate or pad with zeros on left)
+    fn adjust_data_for_width(mut data: Vec<u64>, target_width: usize) -> Vec<u64> {
+        if target_width == 0 {
+            return vec![0];
+        }
+
+        if data.len() > target_width {
+            // Take the most recent samples (from the right)
+            data = data.split_off(data.len() - target_width);
+        } else if data.len() < target_width {
+            // Pad with zeros on the left
+            let padding_needed = target_width - data.len();
+            let mut padded = vec![0; padding_needed];
+            padded.extend(data);
+            data = padded;
+        }
+
+        data
+    }
+
+    /// Render empty chart placeholder
+    fn render_empty_chart(frame: &mut Frame, area: Rect, title: &str, theme: &AppTheme) {
+        let text = vec![Line::from(""), Line::from("Collecting data...")];
+
+        let paragraph = Paragraph::new(text).alignment(Alignment::Center).block(
+            Block::bordered()
+                .title_top(Line::from(title).style(theme.title_style()).centered())
+                .border_type(BorderType::Rounded)
+                .style(theme.border_style()),
+        );
+
+        frame.render_widget(paragraph, area);
+    }
+
+    /// Render message when collection is inactive
+    fn render_inactive_message(frame: &mut Frame, area: Rect, theme: &AppTheme) {
+        let text = vec![
+            Line::from(""),
+            Line::from("Performance counter collection is not active."),
+            Line::from(""),
+            Line::from("This view will populate automatically when activated."),
+        ];
+
+        let paragraph = Paragraph::new(text).alignment(Alignment::Center).block(
+            Block::bordered()
+                .title_top(
+                    Line::from("Perf Stat View")
+                        .style(theme.title_style())
+                        .centered(),
+                )
+                .border_type(BorderType::Rounded)
+                .style(theme.border_style())
+                .padding(Padding::uniform(2)),
+        );
+
+        frame.render_widget(paragraph, area);
+    }
+
+    /// Render error message when process filtering fails
+    fn render_process_filter_error(frame: &mut Frame, area: Rect, pid: i32, _theme: &AppTheme) {
+        let text = vec![
+            Line::from(""),
+            Line::from(format!(
+                "Failed to collect performance counters for PID {}",
+                pid
+            )),
+            Line::from(""),
+            Line::from("Possible reasons:"),
+            Line::from("  • Process has terminated"),
+            Line::from("  • Insufficient permissions"),
+            Line::from("  • Hardware doesn't support all counters"),
+            Line::from(""),
+            Line::from("Press 'c' to clear filter and return to system-wide view"),
+        ];
+
+        let paragraph = Paragraph::new(text)
+            .alignment(Alignment::Center)
+            .style(Style::default().fg(Color::Yellow))
+            .block(
+                Block::bordered()
+                    .title_top(
+                        Line::from("Process Filter Error")
+                            .style(Style::default().fg(Color::Red))
+                            .centered(),
+                    )
+                    .border_type(BorderType::Rounded)
+                    .style(Style::default().fg(Color::Red))
+                    .padding(Padding::uniform(2)),
+            );
+
+        frame.render_widget(paragraph, area);
+    }
+}