-
Notifications
You must be signed in to change notification settings - Fork 74
Log Parsing #2991
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Log Parsing #2991
Conversation
|
Claude finished @codabrink's task —— View job Code Review - Log Parsing PRReviewing this PR for code quality, bugs, performance, security, and test coverage issues. Todo:
Review Feedback🔴 Critical Issues1. Iterator consumption bug in The code calls let mut line_inner = line.into_inner();
let Some(event) = line_inner.find(|e| matches!(e.as_rule(), Rule::event)) else { ... };
let Some(object) = line_inner.find(|p| matches!(p.as_rule(), Rule::object)) else { ... };Fix: Collect the iterator first or use let line_inner: Vec<_> = line.into_inner().collect();
let Some(event) = line_inner.iter().find(...) else { ... };
let Some(object) = line_inner.into_iter().find(...) else { ... };2. The code uses let (_, inbox) = context
.extract_if(.., |(k, _)| k == "inst") // ⚠️ unstable
.collect::<Vec<_>>()
.pop()Fix: Use stable APIs like 3. Type mismatch in tests - In let group_id = event.context.get("group_id"); // ❌ Vec doesn't have get(&str)🟡 Moderate Issues4. Missing In 5. Recursive parsing causes stack growth in The "intermediate lines" logic recursively calls while Self::from(lines).is_err() {
// consume line
}Each recursive call creates a full parser attempt with error context. For logs with many consecutive unparseable lines, this could cause excessive stack growth. Fix: Check for parse success more cheaply (e.g., peek and try basic pattern match) rather than full recursive parsing. 🟢 Minor Issues6. Test code coverage concern Codecov reports 62.5% patch coverage with 72 lines missing coverage in 7. Potential panic in Uses Summary: This is a good start on structured log parsing, but there are several compilation-blocking issues ( Review completed: 2026-01-22 22:13 UTC |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2991 +/- ##
==========================================
- Coverage 74.38% 74.33% -0.05%
==========================================
Files 413 415 +2
Lines 53249 53435 +186
==========================================
+ Hits 39610 39723 +113
- Misses 13639 13712 +73 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
crates/xmtp_macro/src/logging.rs
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The generated inbox truncation assumes #inbox is &str, which will fail for String or other types. Consider using (#inbox).as_ref() to accept any AsRef<str>.
No longer relevant as of a9c0728
log_parser/src/state.rs
Outdated
| fn as_int(&self) -> Result<i64> { | ||
| match self { | ||
| Self::Int(int) => Ok(*int), | ||
| _ => bail!("{self:?} is not string"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as_int() returns the wrong error text. It says "is not string" but should say "is not int". Consider updating the message to match the type so failures are clearer.
No longer relevant as of a9c0728
| let ctx = |key: &str| { | ||
| event | ||
| .context | ||
| .get(key) | ||
| .with_context(|| format!("Missing context field {key}")) | ||
| }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
event.context is a Vec<(String, Value)>, not a HashMap, so .get(key) won't compile. Consider using event.context.iter().find(|(k, _)| k == key).map(|(_, v)| v) instead.
| let ctx = |key: &str| { | |
| event | |
| .context | |
| .get(key) | |
| .with_context(|| format!("Missing context field {key}")) | |
| }; | |
| let ctx = |key: &str| { | |
| event | |
| .context | |
| .iter() | |
| .find(|(k, _)| k == key) | |
| .map(|(_, v)| v) | |
| .with_context(|| format!("Missing context field {key}")) | |
| }; |
🚀 Want me to fix this? Reply ex: "fix it for me".
| .groups | ||
| .entry(group_id.to_string()) | ||
| .or_insert_with(|| GroupState::new()) | ||
| .update(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GroupState has no update() method defined. Consider removing this call or implementing the method.
| .update(); | |
| ; |
🚀 Want me to fix this? Reply ex: "fix it for me".
|
|
||
| // Collect up the intermediate lines that don't parse. | ||
| let mut intermediate = String::new(); | ||
| while Self::from(lines).is_err() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Parsing via recursive Self::from(lines) advances the iterator during probes, causing skipped lines and dropped successful parses. Consider peeking without consuming or refactoring so look-aheads don’t advance and successful events are preserved.
🚀 Want me to fix this? Reply ex: "fix it for me".
| } | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
build is missing a return statement. Consider adding state at the end of the function.
| } | |
| } | |
| } | |
| state | |
| } |
🚀 Want me to fix this? Reply ex: "fix it for me".
| pub fn inbox(&self) -> &str { | ||
| &self.installation | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This method is named inbox() but returns installation. Consider renaming to installation() to match the field and data source ("inst" key).
| pub fn inbox(&self) -> &str { | |
| &self.installation | |
| } | |
| pub fn installation(&self) -> &str { | |
| &self.installation | |
| } |
🚀 Want me to fix this? Reply ex: "fix it for me".
| open_log(handle, &log_file); | ||
| } | ||
|
|
||
| pub fn open_log(handle: Weak<AppWindow>, log_file: &str) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
log_file is declared as &str but used as a struct with a .streams field on line 115. Consider parsing the string into a log structure before accessing .streams, or change the parameter type to accept an already-parsed struct.
🚀 Want me to fix this? Reply ex: "fix it for me".
Introduce
log_parsercrate and updatexmtp_macro::build_logging_metadatato emit➣-prefixed logs withtimestampandEventdiscriminantsAdd a new
log_parsercrate with apestgrammar for log lines, implementstate::LogEvent::fromto map lines toEventand typed context, and addevent: EventtoEventMetadata. Update macro output to include➣andtimestampin log objects.📍Where to Start
Start with
LogEvent::fromin log_parser/src/state.rs and the grammar in log_parser/src/parser/defs/log.pest.📊 Macroscope summarized b545658. 17 files reviewed, 17 issues evaluated, 10 issues filtered, 6 comments posted
🗂️ Filtered Issues
crates/xmtp_common/src/fmt.rs — 0 comments posted, 1 evaluated, 1 filtered
short_hexfunction now returns the FIRSTSHORT_LENbytes instead of the LASTSHORT_LENbytes as the old trait implementation did. The oldShortHeximpl for&[u8]used&self[self.len().saturating_sub(SHORT_LEN)..](taking from the end), while the new function uses&bytes[..(SHORT_LEN.min(bytes.len()))](taking from the beginning). Any existing callers that depended on getting the trailing bytes of an identifier or hash will now receive different (incorrect) results, which could cause debugging issues or incorrect display output. [ Already posted ]crates/xmtp_macro/src/logging.rs — 0 comments posted, 1 evaluated, 1 filtered
installation_id_fieldfrom the parsedinstallation_idexpression has been removed. Theinstallation_idis still parsed and stored inLogEventInput::installation_id(line 44), but no correspondingFieldis created and pushed tofields. This means theinstallation_idpassed to thelog_event!macro (e.g.,self.context.installation_id()in the consumer code) will be parsed but never included in the log output, causing logs to silently lose theinstallation_idfield. [ Already posted ]crates/xmtp_mls/src/tasks.rs — 0 comments posted, 1 evaluated, 1 filtered
expectedandgotvariable assignments are swapped in the hash mismatch warning message.expectedis assigned the computed hash (data_hash) whilegotis assigned the stored hash (task.data_hash). This is backwards from the intuitive meaning whereexpectedshould represent the stored/reference value andgotshould represent the computed/received value. The error message will mislead developers during debugging. [ Low confidence ]log_parser/src/state/event.rs — 2 comments posted, 5 evaluated, 2 filtered
inbox.as_str()?uses error propagation to return early, but this happens after the line has already been consumed from the iterator at line 55. If this fails, the caller's iterator will be left in an inconsistent state - the line was consumed but noLogEventwas returned, and the caller cannot recover that line. [ Low confidence ]Self::from(lines)fails after line 55 has executed (e.g., at line 97 due to missinginstfield), the line has already been consumed bylines.next()on line 55. However, the while body then callslines.next()again on line 103, consuming the NEXT line instead of the line that failed to parse. This causes lines to be incorrectly attributed tointermediateor skipped entirely. [ Already posted ]log_parser/src/state/mod.rs — 3 comments posted, 7 evaluated, 4 filtered
client.groupsisHashMap<String, Arc<RwLock<GroupState>>>butor_insert_with(|| GroupState::new())inserts a plainGroupStateinstead ofArc<RwLock<GroupState>>. This should beor_insert_with(|| Arc::new(RwLock::new(GroupState::new()))). [ Low confidence ]ingestmethod pushes events toself.streambut never populates thegroupsHashMap. SinceLogEventhas agroup_id()method andClientStatehas agroups: HashMap<String, Arc<RwLock<GroupState>>>field, this may be an incomplete implementation where group-related events should be tracked in thegroupsmap but currently are not. [ Low confidence ]updatemethod creates unbounded memory growth by storing a reference to the previous state inprevon every call. Each invocation ofupdate()creates a newGroupStatewithprev: Some(self.clone()), forming an ever-growing linked list of historical states. Since each new state holds anArcreference to all previous states, memory is never reclaimed, potentially leading to a memory leak over time ifupdate()is called frequently. [ Low confidence ]updatemethod usesunwrap()onself.read()(line 116) andself.write()(line 119), which will panic if theRwLockis poisoned (i.e., if a thread panicked while holding the lock). Depending on the concurrency model of the application, this could cause cascading panics. [ Low confidence ]log_parser/src/state/value.rs — 0 comments posted, 1 evaluated, 1 filtered
fmt_indentedforValue::Array, the nested elements are formatted usingv.to_string()which callsfmt_indented(f, 0)with indent=0. This means nested objects inside arrays will not respect the parent indentation level, producing incorrectly indented output when arrays contain nested objects. [ Already posted ]