Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add xsyslog_ev function and related #5186

Open
wants to merge 12 commits into
base: master
Choose a base branch
from

Conversation

mjdominus-cyrus
Copy link

Per https://linear.app/fastmail/issue/CYR-1407/produce-new-xsyslog-ev-function

Also:

  • Adds a new test suite, util.testc
  • Tests pass and valgrind is clean
  • Includes some unrelated minor documentation fixes and other corrections

@mjdominus-cyrus mjdominus-cyrus force-pushed the cyr-1407-xsyslog_ev-1223 branch from c471a74 to e5a4647 Compare December 23, 2024 21:10
@rsto rsto self-requested a review December 24, 2024 06:04
Copy link
Member

@rsto rsto left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like improving the log format! It's also great we got proper unit tests for this.

Apart from the few items I found in the code, I wonder:
Is there a canonical specification for logfmt somewhere? I see our tests don't assert how empty strings in the key or value are handled, would that be the empty string or ""?

I'm a bit wary of the magic printf formatter guessing, but it looks well thought out so let's see how it turns out in practice.

One other thing that stands out to me are the commit message: in Cyrus, we typically prefix the commit message with the filename that mostly got changed in this commit (e.g. util.c). Also, our commit messages almost always start with a verb.

lib/util.c Outdated Show resolved Hide resolved
lib/util.c Outdated Show resolved Hide resolved
@rjbs
Copy link
Collaborator

rjbs commented Dec 24, 2024

@rsto logfmt is, in my experience, insufficiently documented in the wild. The most cited source is https://brandur.org/logfmt which I find useful but … well, it wouldn't make a good I-D.

I thought we'd improved the clarity of this in the Fastmail internal spec, but I'm wrong. We just use Perl's Log::Fmt, which I wrote. In that, the value must have length, so x="" and not x=. (It serializes Perl's undef value as ~missing~.)

mjdominus and others added 12 commits December 24, 2024 15:05
Also start test suite for util.c functions
Also, more comments

The history of this commit and the previous is complicated:

* MJD wrote an initial draft commit
* Ken revised it to use struct buf instead of C strings
* MJD came back and restored his original comments and replaced
  the printf hackery with smaller printf hackery

NB: The diff for this commit is a mess ---  *unless* you use Git's --patience option.
@mjdominus-cyrus
Copy link
Author

mjdominus-cyrus commented Dec 24, 2024

The new version has exactly the same changes as before, except:

  • I redid all the commit messages
  • There are three new commits on the end to address the case -1, the goto, and the escaped version of the empty string

va_end(ap);

if (!s) {
syslog(priority, "COULD NOT FORMAT SYSLOG STRING FOR EVENT '%s'", event_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Priority might be a LOG_INFO, we probably want to elevate this to LOG_ERR

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- here, the function is reporting its own errors, so it shouldn't be using the caller's error level. I might even go so far as LOG_CRITICAL (or whatever it's called), since I think this is a programmer error, not a runtime error.

default:
// This seems extreme at first but the alternative is to leave ap pointing to the wrong argument,
// and that is already undefined behavior, likely a segfault, so this is really the best we can do
abort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think instead you want fatal("some descriptive error message", EX_TEMPFAIL) which each program that runs cyrus code defines to do the right thing with the message, and then abort or shut down cleanly based on config and other things.

Alternatively, syslog some useful message before dying?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fatal() is the thing to do here.

*
* xsyslog_ev_fmt("earthquake",
* "location", "%s", "Dubuque",
* "richter", "%5.1f", 7.9)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't document the final necessary NULL, we should in all of the examples, etc...

* char *s = xsyslog_ev_fmt(event_name,
* key1, fmt1, val1,
* key2, fmt2, val2,
* ...);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe ..., NULL);

*
* WARNING:
* the function only supports a subset of sprintf-style escapes.
* in particular, each format string MUST contain EXACTLY ONE %-escape.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This says MUST but that restriction isn't enforced. It might be a good idea to fatal() if the formats have more than one %... specification

Copy link
Contributor

@elliefm elliefm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My review is mostly a handful of style nits and advicey comments.

Bigger picture, I don't like the design; but given the design, this seems like a solid implementation of it. If we're considering this a proof of concept, I'd say it proves I don't like the design. :)

free(result);
}

// tests for weird __sep__ feature
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: these could/should be separate test functions, we're not constrained to one-test-function-per-api-function

Comment on lines +18 to +21
result = xsyslog_ev_fmt(
"event-name",
"firstname", "%s", "Fred",
NULL);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: it would be more typical in this codebase to format these calls something like:

Suggested change
result = xsyslog_ev_fmt(
"event-name",
"firstname", "%s", "Fred",
NULL);
result = xsyslog_ev_fmt("event-name",
"firstname", "%s", "Fred",
NULL);

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Further into the review, I saw you formatted this sort of thing elsewhere in the way I suggest here, but I forgot to come back and update this comment.

Comment on lines +87 to +99
const char *test[] = {
/* input, expected output */
"foo", "foo",
"foo bar", "\"foo bar\"",
"call me \"fred\"", "\"call me \\\"fred\\\"\"",
"\n", "\"\\n\"",
"\r", "\"\\r\"",
"\r\n", "\"\\r\\n\"",
"-12.345e67", "-12.345e67",
"", "\"\"",
NULL,
};

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like to do this sort of thing with a const array of anonymous structs, like:

const struct {
    const char *input;
    const char *expected;
} tests[] = {
    { "foo",        "foo" },
    { "foo bar",    "\"foo bar\"" },
    // etc
    // n.b. does not need a NULL element at the end,
    // because n_tests tells us how many there are
};
const size_t n_tests = sizeof(tests) / sizeof(tests[0]);
unsigned i;

for (i = 0; i < n_tests; i++) {
    // now we can refer to tests[i].input and tests[i].expected
    // instead of fiddling with indexes
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see little benefit either way.

va_end(ap);

if (!s) {
syslog(priority, "COULD NOT FORMAT SYSLOG STRING FOR EVENT '%s'", event_name);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah -- here, the function is reporting its own errors, so it shouldn't be using the caller's error level. I might even go so far as LOG_CRITICAL (or whatever it's called), since I think this is a programmer error, not a runtime error.

Comment on lines +2314 to +2317
// special case: __sep__ means reconfigure the separator
if (!strcmp(key, "__sep__")) {
sep = va_arg(ap, const char *);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this actually useful? Surely if we're going to all this effort it's because we want our logs to be in a consistent format

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably not. I only threw it in as a possibly interesting suggestion.

YAGNI says to eliminate it and its associated tests.

default:
// This seems extreme at first but the alternative is to leave ap pointing to the wrong argument,
// and that is already undefined behavior, likely a segfault, so this is really the best we can do
abort();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, fatal() is the thing to do here.

}
}

/* Quick-and-dirty parser for SINGLE, COMMON printf-type escape strings
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we have to have our own parser for printf escape sequences, have we not lost all value we were going to gain by using them in the first place?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. I thought @rjbs had suggested that rather than using a printf-style format string, we just use an enum like [CHAR, STRING, INT, FLOAT] and then convert those into format strings

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rik and I discussed that. i think I persuaded him that this is better, because:

  1. You can write "%5.2f" and "%-20s" and other possibly useful formats
  2. Why use and maintain a set of enum constants that mean "%s" and "%d" when you can just use self-documenting "%s" and "%d"

* p if the expected argument is a (void *)
* -1 if it's something else
*/
static char xsyslog_ev_guess_printf_escape(const char *s) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably return int rather than char. It's fine for it to be logically a single character still, but it's usual in C to use the int type for single characters, for reasons I don't quite remember, but which might be related to the implicit type promotion rules and/or the fact that the signedness of char is platform-dependent.

See for example the int c argument to char *strchr(const char *s, int c);

@rsto
Copy link
Member

rsto commented Dec 27, 2024

We just use Perl's Log::Fmt, which I wrote. In that, the value must have length, so x="" and not x=. (It serializes Perl's undef value as ~missing~.)

Requiring a non-zero length value sounds sane to me, we should do the same in the Cyrus implementation.

@mjdominus-cyrus
Copy link
Author

Today is my last day in this job. Rather than make further revisions to this code, I will leave it to you folks to decide what to do with it, if anything.

@rsto rsto removed their request for review January 8, 2025 07:42
@wolfsage wolfsage self-assigned this Jan 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants