new feature: Provide more ways to format the error in `LoggingLayer` #4918

evenyag · 2024-07-23T06:54:30Z

Feature Description

The logging layer provides a flag backtrace_output to control how the layer formats the error, mainly for displaying the backtrace.

opendal/core/src/layers/logging.rs

Lines 188 to 200 in 5e735cc

    
           #[inline] 
        
           fn error_print(&self, err: &Error) -> String { 
        
               // Don't print backtrace if it's not unexpected error. 
        
               if err.kind() != ErrorKind::Unexpected { 
        
                   return format!("{err}"); 
        
               } 
        
               if self.backtrace_output { 
        
                   format!("{err:?}") 
        
               } else { 
        
                   format!("{err}") 
        
               } 
        
           }

By default, the layer formats the error using the Display trait. Even if we enable the backtrace_output flag, the layer may still use Display if the error kind isn't unexpected.

Sometimes, printing the Error in debug format is helpful because the display format may lack some information for debugging. The layer could add a flag to control how to format the error. e.g.

Adds a new flag debug_format_error
Adds a callback to format_error() and use that callback to format the error (This might conflict with backtrace_output)

Problem and Solution

The main issue is that reqwest no longer prints the source error in its Display implementation. It only prints the source in Debug. See: seanmonstar/reqwest#2199

So we can't get the source error while opendal's HTTP client returns an error opendal's error also uses Display to format the source error.

Here is an error log we captured:

ERROR log: service=s3 operation=read path=data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet -> Unexpected (persistent) at  => 
Unexpected (temporary) at read, context: { url: https://s3.amazonaws.com/my-bucket/data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet, called: http_util::Client::send, service: s3, path: data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet, range: 1030-2412 } => 
send http request, source: error sending request for url (https://s3.us-west-2.amazonaws.com/my-bucket/data/bdf28185-5eaa-4a52-a71a-795686b1d039.parquet)

The root cause of the error while sending the request is missing. It's quite painful to further diagnose the problem.

Alternative Solution:

wraps the reqwest error into a new type that displays the source error
- It is more straightforward
- The pitfall is that we must remember to wrap errors by that type
attaches the source to the context

Additional Context

No response

Are you willing to contribute to the development of this feature?

Yes, I am willing to contribute to the development of this feature.

The text was updated successfully, but these errors were encountered:

Xuanwo · 2024-07-23T07:27:29Z

Maybe we can introduce something like RetryInterceptor?

opendal/core/src/layers/retry.rs

Lines 236 to 265 in 5e735cc

    
           pub trait RetryInterceptor: Send + Sync + 'static { 
        
               /// Everytime RetryLayer is retrying, this function will be called. 
        
               /// 
        
               /// # Timing 
        
               /// 
        
               /// just before the retry sleep. 
        
               /// 
        
               /// # Inputs 
        
               /// 
        
               /// - err: The error that caused the current retry. 
        
               /// - dur: The duration that will sleep before next retry. 
        
               /// 
        
               /// # Notes 
        
               /// 
        
               /// The intercept must be quick and non-blocking. No heavy IO is 
        
               /// allowed. Otherwise the retry will be blocked. 
        
               fn intercept(&self, err: &Error, dur: Duration); 
        
           } 
        
           /// The DefaultRetryInterceptor will log the retry error in warning level. 
        
           pub struct DefaultRetryInterceptor; 
        
           impl RetryInterceptor for DefaultRetryInterceptor { 
        
               fn intercept(&self, err: &Error, dur: Duration) { 
        
                   warn!( 
        
                       target: "opendal::layers::retry", 
        
                       "will retry after {}s because: {}", 
        
                       dur.as_secs_f64(), err) 
        
               } 
        
           }

Users can control their own way for errors, or maybe further, the whole log event.

evenyag · 2024-07-23T08:51:38Z

Makes sense to me.

Xuanwo · 2024-07-24T06:59:32Z

Let's make it open to track the progress of this feature. @evenyag do you have interest to implement this?

evenyag · 2024-07-24T07:30:27Z

@Xuanwo Oh, sorry. I found people can do this via the RetryInterceptor so I closed the issue. But I forgot that we might not always set a RetryLayer.......

do you have interest to implement this?

Sure.

I'm not sure what the "interceptor" should be.

Should it be a part of the LoggingLayer?
Is it an ErrorInterceptor or an EventInterceptor?
- The logging layer logs in many places so we may need to define many events. Maybe we can only provide the ErrorInterceptor
If we log the error in the interceptor, how to avoid the logging layer log it again?

Xuanwo · 2024-07-24T08:08:55Z

Should it be a part of the LoggingLayer?

Yes, my current idea is make it a part of LoggingLayer.

Is it an ErrorInterceptor or an EventInterceptor?

I wish to have an LoggingInterceptor like:

trait LoggingInterceptor {
  fn log(&self, scheme: Scheme, operation: &'static str, path: &str, message: &str, err: Option<&Error>);
}

The default behaivor will be like:

struct DefaultLoggingInterceptor;

impl LoggingInterceptor for DefaultLoggingInterceptor {
    fn log(&self, scheme: Scheme, operation: &'static str, path: &str, message: &str, err: Option<&Error>) {
        // We can decide level based on `operation` and `err`.
        let lvl = xxx;
        if err.is_none {
          log!(
              target: LOGGING_TARGET,
              lvl,
              "service={scheme} operation={operation} path={path} -> {message}",
          )
        } else {
          // print with error
        } 
   }
}

Some other change may also needed to make it work. We might can remove with_error_level, with_failure_level and backtrace_output.

What do you think? I believe this design will make this layer more useful to play with other logging systems.

evenyag · 2024-07-24T13:23:01Z

Can we define a struct Record as the argument? Similar to https://docs.rs/log/latest/log/struct.Record.html

pub struct Record<'a> {
    scheme: Scheme,
    operation: &'static str,
    path: &'a str,
    message: &'a str,
    err: Option<&'a Error>,
}

impl<'a> Record<'a> {
    fn scheme(&self) -> Scheme {}
    // ...
}

What's more, should we pass the enum Operation instead of a &'static str for operation?

Xuanwo · 2024-07-24T13:29:38Z

Can we define a struct Record as the argument?

Add a new struct means we need to expose another new API along with many getter/setter API at the layer level which I want to avoid.

What's more, should we pass the enum Operation instead of a &'static str for operation?

We have ReadOperation and WriteOperation which can't be represent by Operation directly. However, maybe it's meaningful for us to merge ReadOperation into Operation directly. We can discuss this in another issue.

evenyag · 2024-07-25T06:34:57Z

Add a new struct means we need to expose another new API along with many getter/setter API at the layer level which I want to avoid.

Got it. I'll implement a prototype first. If I encounter any problems, I'll come back here.

evenyag · 2024-08-05T14:05:11Z

I'll implement a prototype first.

#4961 It should be ready for review.

evenyag added the enhancement New feature or request label Jul 23, 2024

evenyag closed this as completed Jul 24, 2024

Xuanwo reopened this Jul 24, 2024

Xuanwo assigned evenyag Jul 25, 2024

zyy17 mentioned this issue Jul 25, 2024

refactor: add RetryInterceptor to print detailed error GreptimeTeam/greptimedb#4434

Merged

3 tasks

evenyag mentioned this issue Aug 5, 2024

feat(core)!: implement an interceptor for the logging layer #4961

Merged

Xuanwo closed this as completed in #4961 Aug 9, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

new feature: Provide more ways to format the error in `LoggingLayer` #4918

new feature: Provide more ways to format the error in `LoggingLayer` #4918

evenyag commented Jul 23, 2024 •

edited

Loading

Xuanwo commented Jul 23, 2024 •

edited

Loading

evenyag commented Jul 23, 2024

Xuanwo commented Jul 24, 2024

evenyag commented Jul 24, 2024

Xuanwo commented Jul 24, 2024 •

edited

Loading

evenyag commented Jul 24, 2024 •

edited

Loading

Xuanwo commented Jul 24, 2024 •

edited

Loading

evenyag commented Jul 25, 2024

evenyag commented Aug 5, 2024

new feature: Provide more ways to format the error in LoggingLayer #4918

new feature: Provide more ways to format the error in LoggingLayer #4918

Comments

evenyag commented Jul 23, 2024 • edited Loading

Feature Description

Problem and Solution

Additional Context

Are you willing to contribute to the development of this feature?

Xuanwo commented Jul 23, 2024 • edited Loading

evenyag commented Jul 23, 2024

Xuanwo commented Jul 24, 2024

evenyag commented Jul 24, 2024

Xuanwo commented Jul 24, 2024 • edited Loading

evenyag commented Jul 24, 2024 • edited Loading

Xuanwo commented Jul 24, 2024 • edited Loading

evenyag commented Jul 25, 2024

evenyag commented Aug 5, 2024

new feature: Provide more ways to format the error in `LoggingLayer` #4918

new feature: Provide more ways to format the error in `LoggingLayer` #4918

evenyag commented Jul 23, 2024 •

edited

Loading

Xuanwo commented Jul 23, 2024 •

edited

Loading

Xuanwo commented Jul 24, 2024 •

edited

Loading

evenyag commented Jul 24, 2024 •

edited

Loading

Xuanwo commented Jul 24, 2024 •

edited

Loading