Description
Hi all.
We're using NEL with the Reporting API across our main 2 websites, www.bbc.co.uk and www.bbc.com (plus their apexes), along with most of our asset domains. @chrisn and I have been working to provide some feedback based on our experiences, we hope it's useful and constructive.
Whilst NEL is clearly incredibly powerful and extremely useful, the biggest issue we have with NEL is that we don’t really know how to make use of the data it generates. From my conversations within the BBC and also some with external folks, this seems to be a common sentiment.
Ideally, we’d be able to slot NEL events into one of (probably) two buckets:
- Unrecoverable, critical events which we’d alert on
- Recoverable/informational events which would be available for improvement and triage work
The reasons for these are, in my opinion, that there is currently no discriminator on NEL events to state whether they’re:
- Recovered-from (or not)
error
orinfo
severity
Recovered-from (or not)
By way of illustration, consider the dns
event class and, for example, dns.unreachable
which is described as “DNS server is unreachable”. Typically, multiple DNS nameservers are listed in NS records so does this event mean that say, 1 of 4 nameservers were unreachable or does it mean that all 4 were unreachable? If the former, that’s interesting but almost certainly didn’t impact the user too much, so whilst a website operator would want to have the information available, it’s not something to jump on and fix immediately, whereas the latter would be much more serious and would need to fire an alert to be fixed ASAP.
The same sort of issue is true of most of the dns
, tcp
and some of the http
event classes.
error
or info
severity
As per the above, some NEL events severity depends on whether or not they’re recoverable, some others are either always high-severity (e.g. tls
events) or always low-severity (e.g. unknown
and abandoned
, since they’re non-deterministic).
Both of these issues could be addressed by adding a severity
property to the NEL event which would depend on whether or not the event was recovered-from or not and also the event type itself as follows:
let severity = `info`;
if(event.isRecoveredFrom == false || event.class.alwaysHighSeverity == true) {
severity = `error`;
}