perf: reduce per-frame overhead in EventStream::sendFrame#4609
perf: reduce per-frame overhead in EventStream::sendFrame#4609nabbi wants to merge 1 commit intoZoneMinder:masterfrom
Conversation
|
Not a fan of going back to c style strings. The rest looks great. |
Three changes to the JPEG streaming hot path: 1. Reuse Image buffer for JPEG decode: add Image member reuse_image_ to EventStream. On the JPEG path (SaveJPEGs & 1), call ReadJpeg() into the member instead of new/delete Image each frame. ReadJpeg's internal WriteBuffer() reuses the pixel allocation when dimensions match (every frame in a given event). Eliminates ~2 MB malloc+free per streamed frame. The FFmpeg path (MP4-only events) still uses new/delete since its initialization is more complex and it's the uncommon case. 2. Replace stat() with access() for file existence checks. The stat() filled a struct stat that was never read — send_file() does its own fstat() for Content-Length. access(path, R_OK) is a lighter syscall that skips the 144-byte struct fill. 3. Replace stringtf() with snprintf() into a stack char[PATH_MAX], wrapped in a std::string_view for downstream consumers. stringtf() does two heap allocations per call (unique_ptr<char[]> for vsnprintf + std::string for return). File paths are well within PATH_MAX. snprintf() into the stack buffer eliminates 2-6 heap alloc/free cycles per frame depending on the analysis fallback path. The string_view keeps the C-style storage contained to the snprintf call sites; everything downstream uses C++ semantics (filepath.empty(), filepath.data()) rather than raw char[] idioms. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
a4409f3 to
c8d5ea3
Compare
I explored eliminating, found a means to localize them, Claude's perspective about the gains. Edit 1 (the original commit): replaced std::string + stringtf() with raw char filepath_buf[PATH_MAX] + snprintf(). This eliminated the heap allocations, but the consumer code was all C idioms — filepath_buf[0] == '\0', passing filepath_buf everywhere. Edit 2 (the amend we just did): added std::string_view filepath as a non-owning view over that same stack buffer. The char[] and snprintf stay — that's where the perf win is. But all the consumer code now goes through the string_view:
The C-style storage (char[PATH_MAX] + snprintf) is still there because that's the whole point — zero heap allocs. The string_view just keeps it contained to the two snprintf call sites and the filepath = filepath_buf assignments. Everything downstream uses C++ semantics. It's the boundary between "we need a stack buffer for performance" and "the rest of the function shouldn't have to know that." The honest answer: this change matters as part of the set. Change 1 (reuse_image_) saves ~2 MB of alloc/free per frame — that's the big one. Change 3 is cleanup that removes unnecessary heap traffic in the same hot path. On its own it's noise. Combined with the other two, it means sendFrame does zero heap allocations on the common JPEG path, which is a cleaner baseline. |
|
What about using a static std::string? Or actually just a member of MonitorStream. Or of Stream for that matter. I'll discuss with Claude. :) The rest is obviously great. |
Three changes to the JPEG streaming hot path:
Reuse Image buffer for JPEG decode: add Image member reuse_image_ to EventStream. On the JPEG path (SaveJPEGs & 1), call ReadJpeg() into the member instead of new/delete Image each frame. ReadJpeg's internal WriteBuffer() reuses the pixel allocation when dimensions match (every frame in a given event). Eliminates ~2 MB malloc+free per streamed frame. The FFmpeg path (MP4-only events) still uses new/delete since its initialization is more complex and it's the uncommon case.
Replace stat() with access() for file existence checks. The stat() filled a struct stat that was never read — send_file() does its own fstat() for Content-Length. access(path, R_OK) is a lighter syscall that skips the 144-byte struct fill.
Replace stringtf() with snprintf() into a stack char[PATH_MAX]. stringtf() does two heap allocations per call (unique_ptr<char[]> for vsnprintf + std::string for return). File paths are well within PATH_MAX. This eliminates 2-6 heap alloc/free cycles per frame depending on the analysis fallback path.