Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

new feature: Provide more ways to format the error in LoggingLayer #4918

Closed
1 task done
evenyag opened this issue Jul 23, 2024 · 9 comments · Fixed by #4961
Closed
1 task done

new feature: Provide more ways to format the error in LoggingLayer #4918

evenyag opened this issue Jul 23, 2024 · 9 comments · Fixed by #4961
Assignees
Labels
enhancement New feature or request

Comments

@evenyag
Copy link
Contributor

evenyag commented Jul 23, 2024

Feature Description

The logging layer provides a flag backtrace_output to control how the layer formats the error, mainly for displaying the backtrace.

#[inline]
fn error_print(&self, err: &Error) -> String {
// Don't print backtrace if it's not unexpected error.
if err.kind() != ErrorKind::Unexpected {
return format!("{err}");
}
if self.backtrace_output {
format!("{err:?}")
} else {
format!("{err}")
}
}

By default, the layer formats the error using the Display trait. Even if we enable the backtrace_output flag, the layer may still use Display if the error kind isn't unexpected.

Sometimes, printing the Error in debug format is helpful because the display format may lack some information for debugging. The layer could add a flag to control how to format the error. e.g.

  • Adds a new flag debug_format_error
  • Adds a callback to format_error() and use that callback to format the error (This might conflict with backtrace_output)

Problem and Solution

The main issue is that reqwest no longer prints the source error in its Display implementation. It only prints the source in Debug. See: seanmonstar/reqwest#2199

So we can't get the source error while opendal's HTTP client returns an error opendal's error also uses Display to format the source error.

Here is an error log we captured:

ERROR log: service=s3 operation=read path=data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet -> Unexpected (persistent) at  => 
Unexpected (temporary) at read, context: { url: https://s3.amazonaws.com/my-bucket/data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet, called: http_util::Client::send, service: s3, path: data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet, range: 1030-2412 } => 
send http request, source: error sending request for url (https://s3.us-west-2.amazonaws.com/my-bucket/data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet) 

The root cause of the error while sending the request is missing. It's quite painful to further diagnose the problem.

Alternative Solution:

  • wraps the reqwest error into a new type that displays the source error
    • It is more straightforward
    • The pitfall is that we must remember to wrap errors by that type
  • attaches the source to the context

Additional Context

No response

Are you willing to contribute to the development of this feature?

  • Yes, I am willing to contribute to the development of this feature.
@evenyag evenyag added the enhancement New feature or request label Jul 23, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Jul 23, 2024

Maybe we can introduce something like RetryInterceptor?

pub trait RetryInterceptor: Send + Sync + 'static {
/// Everytime RetryLayer is retrying, this function will be called.
///
/// # Timing
///
/// just before the retry sleep.
///
/// # Inputs
///
/// - err: The error that caused the current retry.
/// - dur: The duration that will sleep before next retry.
///
/// # Notes
///
/// The intercept must be quick and non-blocking. No heavy IO is
/// allowed. Otherwise the retry will be blocked.
fn intercept(&self, err: &Error, dur: Duration);
}
/// The DefaultRetryInterceptor will log the retry error in warning level.
pub struct DefaultRetryInterceptor;
impl RetryInterceptor for DefaultRetryInterceptor {
fn intercept(&self, err: &Error, dur: Duration) {
warn!(
target: "opendal::layers::retry",
"will retry after {}s because: {}",
dur.as_secs_f64(), err)
}
}

Users can control their own way for errors, or maybe further, the whole log event.

@evenyag
Copy link
Contributor Author

evenyag commented Jul 23, 2024

Makes sense to me.

@evenyag evenyag closed this as completed Jul 24, 2024
@Xuanwo Xuanwo reopened this Jul 24, 2024
@Xuanwo
Copy link
Member

Xuanwo commented Jul 24, 2024

Let's make it open to track the progress of this feature. @evenyag do you have interest to implement this?

@evenyag
Copy link
Contributor Author

evenyag commented Jul 24, 2024

@Xuanwo Oh, sorry. I found people can do this via the RetryInterceptor so I closed the issue. But I forgot that we might not always set a RetryLayer.......

do you have interest to implement this?

Sure.

I'm not sure what the "interceptor" should be.

  • Should it be a part of the LoggingLayer?
  • Is it an ErrorInterceptor or an EventInterceptor?
    • The logging layer logs in many places so we may need to define many events. Maybe we can only provide the ErrorInterceptor
  • If we log the error in the interceptor, how to avoid the logging layer log it again?

@Xuanwo
Copy link
Member

Xuanwo commented Jul 24, 2024

Should it be a part of the LoggingLayer?

Yes, my current idea is make it a part of LoggingLayer.

Is it an ErrorInterceptor or an EventInterceptor?

I wish to have an LoggingInterceptor like:

trait LoggingInterceptor {
  fn log(&self, scheme: Scheme, operation: &'static str, path: &str, message: &str, err: Option<&Error>);
}

The default behaivor will be like:

struct DefaultLoggingInterceptor;

impl LoggingInterceptor for DefaultLoggingInterceptor {
    fn log(&self, scheme: Scheme, operation: &'static str, path: &str, message: &str, err: Option<&Error>) {
        // We can decide level based on `operation` and `err`.
        let lvl = xxx;
        if err.is_none {
          log!(
              target: LOGGING_TARGET,
              lvl,
              "service={scheme} operation={operation} path={path} -> {message}",
          )
        } else {
          // print with error
        } 
   }
}

Some other change may also needed to make it work. We might can remove with_error_level, with_failure_level and backtrace_output.


What do you think? I believe this design will make this layer more useful to play with other logging systems.

@evenyag
Copy link
Contributor Author

evenyag commented Jul 24, 2024

Can we define a struct Record as the argument? Similar to https://docs.rs/log/latest/log/struct.Record.html

pub struct Record<'a> {
    scheme: Scheme,
    operation: &'static str,
    path: &'a str,
    message: &'a str,
    err: Option<&'a Error>,
}

impl<'a> Record<'a> {
    fn scheme(&self) -> Scheme {}
    // ...
}

What's more, should we pass the enum Operation instead of a &'static str for operation?

@Xuanwo
Copy link
Member

Xuanwo commented Jul 24, 2024

Can we define a struct Record as the argument?

Add a new struct means we need to expose another new API along with many getter/setter API at the layer level which I want to avoid.

What's more, should we pass the enum Operation instead of a &'static str for operation?

We have ReadOperation and WriteOperation which can't be represent by Operation directly. However, maybe it's meaningful for us to merge ReadOperation into Operation directly. We can discuss this in another issue.

@evenyag
Copy link
Contributor Author

evenyag commented Jul 25, 2024

Add a new struct means we need to expose another new API along with many getter/setter API at the layer level which I want to avoid.

Got it. I'll implement a prototype first. If I encounter any problems, I'll come back here.

@evenyag
Copy link
Contributor Author

evenyag commented Aug 5, 2024

I'll implement a prototype first.

#4961 It should be ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants