Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmarks and String Creation #316

Open
adligo opened this issue Jun 1, 2022 · 19 comments
Open

Benchmarks and String Creation #316

adligo opened this issue Jun 1, 2022 · 19 comments
Labels
P3 type=other Miscellaneous activities not covered by other type= labels

Comments

@adligo
Copy link

adligo commented Jun 1, 2022

Doesn't this style of logging api incur the cost of string creation, even when the logger is off ?

I haven't looked at the byte code, but I'm wondering what exactly is ment by 'essentially free'.

I remember reading in the initial log4j that the whole reason for the usage of the imperative paradigms if, i.e.;
if (log.isDebugEnabled()) {
log.debug("foo");
}

Was to avoid the cost of the creation of the string "foo". The impartive if statement would create branching bytecode, with GOTO statements. IIt would seem to me that this functional stream log API style would result in a lot of extra string creation and garbage collection. Has anyone benchmarked it ? Where are the benchmarks ?

@lowasser
Copy link

lowasser commented Jun 1, 2022

Not exactly. Here's a useful breakdown:

In the specific code you give, "foo" is free, because it's a class-level constant. When the class gets loaded, all constant strings in the source code get loaded, too, so you really don't pay anything extra for this; even if you have the if as you describe, "foo" would still get loaded as soon as the class gets loaded.

Flogger's format arguments are constant, and therefore free as well -- see e.g. the first example

logger.atInfo().withCause(exception).log("Log message with: %s", argument);

"Log message with: %s" is a constant and therefore free.

The expensive part is replacing %s with argument.toString(), and Flogger only bothers to do that when the log statement is enabled, which is what you'd want.

(I'm also looking right now at very positive benchmarks for Flogger, but I don't know if there is anything published...)

@hagbard
Copy link
Contributor

hagbard commented Jun 1, 2022

As Louis says, there should be no cost to "string creation" because you express that as "literal format string plus arguments" and let Flogger format it only when the log statement is enabled (and Flogger doesn't even have to format anything if the backend can do structured logging).

If what you're wondering is how:
logger.atFine().log("message: %d", myIntValue);

Avoids things like the overhead of varargs and auto-boxing, that's all explained in depth in:
https://google.github.io/flogger/anatomy.html

TL;DR: Flogger is completely zero-allocation for disabled log statements up to 2 arguments of any type (it has overloads to cover all necessary combinations of autoboxing for 1 or 2 arguments) and it's free from varargs up to 10 arguments.

Extensive analysis of many thousands of log statements in Google code showed that this would cover the vast majority of cases (2 arguments is the most common case, and above 5 arguments becomes extremely rare).

It also supports lazy arguments via simple lambda closures:
logger.atFine().log("message: %s", lazy(() -> myExpensiveCall()));
and while this can incur a small allocation for the capturing lambda, it's still very fast (and fast enough to be preferable to the error prone use of guards around log statements).

At some point we should probably publish our benchmarks, though we're also happy if people want to benchmark it for themselves (though you need to be quite careful to mimic real log statements, e.g. at representative stack depths and with a mix of lengths of format strings and types of argument ... it's quite fiddly).

@adligo
Copy link
Author

adligo commented Jun 1, 2022

Well all the bytecode I have ever looked at was doing new String("foo"); for the string literal. Not sure when or if that changed. But if you are both convinced I will get some bytecode from Eclipse and post here ...

@cslee00
Copy link
Contributor

cslee00 commented Jun 1, 2022 via email

@adligo
Copy link
Author

adligo commented Jun 1, 2022

/ Compiled from LogByteCode.java (version 11 : 55.0, super bit)
public class org.adligo.test.LogByteCode {

// Method descriptor #11 ([Ljava/lang/String;)V
// Stack: 2, Locals: 2
public static void main(java.lang.String[] args);
0 invokestatic com.google.common.flogger.FluentLogger.forEnclosingClass() : com.google.common.flogger.FluentLogger [1]
3 astore_1 [f]
4 aload_1 [f]
5 invokevirtual com.google.common.flogger.FluentLogger.atInfo() : com.google.common.flogger.LoggingApi [2]
8 checkcast com.google.common.flogger.FluentLogger$Api [3]
11 ldc <String "my main message"> [4]
13 invokeinterface com.google.common.flogger.FluentLogger$Api.log(java.lang.String) : void [5] [nargs: 2]
18 return
Line numbers:
[pc: 0, line: 8]
[pc: 4, line: 9]
[pc: 18, line: 10]
Local variable table:
[pc: 0, pc: 19] local: args index: 0 type: java.lang.String[]
[pc: 4, pc: 19] local: f index: 1 type: com.google.common.flogger.FluentLogger

// Method descriptor #20 ()V
// Stack: 2, Locals: 2
public LogByteCode();
0 aload_0 [this]
1 invokespecial java.lang.Object() [6]
4 invokestatic com.google.common.flogger.FluentLogger.forEnclosingClass() : com.google.common.flogger.FluentLogger [1]
7 astore_1 [f]
8 aload_1 [f]
9 invokevirtual com.google.common.flogger.FluentLogger.atInfo() : com.google.common.flogger.LoggingApi [2]
12 checkcast com.google.common.flogger.FluentLogger$Api [3]
15 ldc <String "my LogByteCode constructor message"> [7]
17 invokeinterface com.google.common.flogger.FluentLogger$Api.log(java.lang.String) : void [5] [nargs: 2]
22 return
Line numbers:
[pc: 0, line: 12]
[pc: 4, line: 13]
[pc: 8, line: 14]
[pc: 22, line: 15]
Local variable table:
[pc: 0, pc: 23] local: this index: 0 type: org.adligo.test.LogByteCode
[pc: 8, pc: 23] local: f index: 1 type: com.google.common.flogger.FluentLogger

Inner classes:
[inner class info: #3 com/google/common/flogger/FluentLogger$Api, outer class info: #25 com/google/common/flogger/FluentLogger
inner name: #29 Api, accessflags: 1545 public abstract static]
}

@hagbard
Copy link
Contributor

hagbard commented Jun 1, 2022

Well all the bytecode I have ever looked at was doing new String("foo"); for the string literal. Not sure when or if that changed. But if you are both convinced I will get some bytecode from Eclipse and post here ...

I've been using Java for 20 years and literals have always been loaded from the constant pool. It's required by the language specification:
https://docs.oracle.com/javase/specs/jls/se18/html/jls-3.html#jls-3.10.5
and (2005) https://docs.oracle.com/javase/specs/jls/se6/html/lexical.html#3.10.5

It's possible some JVM implementations don't do this, but it's incorrect (and there's even code in Flogger that assumes literals are interned so we can test via == for a fast test for known special cases).

@adligo
Copy link
Author

adligo commented Jun 1, 2022

Ok thanks, perhaps I was remembering the String concatenation instead of construction.

@hagbard
Copy link
Contributor

hagbard commented Jun 1, 2022

11 ldc <String "my main message"> [4]

That's the "load constant" which gets the "my main message" literal and puts a reference to the interned String on the stack. There is no allocation here (allocation and mapping from constant pool index to a heap allocated instance is done during class loading).

@adligo
Copy link
Author

adligo commented Jun 1, 2022

public static void main(String [] args) {
FluentLogger f = FluentLogger.forEnclosingClass();
f.atInfo().log("my main message");
}

public LogByteCode() {
FluentLogger f = FluentLogger.forEnclosingClass();
f.atInfo().log("my LogByteCode constructor message");
}
}

@hagbard
Copy link
Contributor

hagbard commented Jun 1, 2022

Tip (in case it wasn't clear in the docs): You really want to be setting up the logger in a private static final field in the class rather than doing all the work to obtain it many times. And assuming it's the first static field in the class this also lets you use it in any static method and during other static initialization.

@adligo
Copy link
Author

adligo commented Jun 1, 2022

I'm still wondering if there is a noticeable difference between the imperative style to this fluent / functional / chaning style. It seems like things are happening on the stack that wouldn't need to happen.

@adligo
Copy link
Author

adligo commented Jun 1, 2022

Ya, I know you advise the static logger style. Having done that for years (i_log.adligo.org), I'm actually starting to prefer a C# style of passing around a context to get the loggers. I think it's easier to mock. I'm thinking of writing yet another logging api (i_log2.adligo.org?), which is probably why I'm so interested in this minutiae.
THANKS FOR ALL THE RESPONSES :)

@hagbard
Copy link
Contributor

hagbard commented Jun 1, 2022

The hot-spot compiler is good at folding stuff like that into efficient code. CPU is cheap compared to allocations and GC pressure. Yes, there are 2 (or more) calls to the logging API instead of 1, but the advantages you get from having a modular, extensible API (and thus being able to get rid of boxing and varargs without combinatorial explosion of many hundreds of method overloads) is well worth it.

In Java there's no need to obtain logger contexts and pass them around and no need to ever mock loggers for testing. You test logging by using Handlers which can be injected behind the API facade, not by mocking the API itself (and if you're using a different underlying backend to output logs, you should make sure you select one that's equally testable without mocking the user API itself).

Needing some "logger provider" context is a serious downside to any logging API since now you cannot add logging to public static methods without changing the API, and you cannot add logging during static initialization of a class.

@adligo
Copy link
Author

adligo commented Jun 1, 2022

As I mentioned I have implemented a logging API. I didn't say it was necessary, I said I preferred it.
You could always create your own singleton LogFactory or something similar if you really wanted to wrap a context that could be passed around to get things initialized statically.

@adligo
Copy link
Author

adligo commented Jun 1, 2022

So I guess no one has benchmarked the differences in the extra stack calls anywhere ?

@hagbard
Copy link
Contributor

hagbard commented Jun 1, 2022

Oh yes, we certainly have, at least inside Google.

After a week of attempting to pull out any reliable measure of the potential overhead, I gave up. Under realistic conditions there was just no observable difference between a "single method with many parameters" API and a "fluent logging API". The hotspot compiler was just figuring everything out as far as we could tell.

Remember that the only case which matters is for the "disabled" log statements, and then only under the sort of load that would trigger the hot-spot compiler. Since Flogger has only 2 concrete implementations of the fluent API, the hot-spot compiler can do all sorts of tricks by spotting places where bi-morphic dispatching occurs.

Flogger's used everywhere in Google's Java code, including many mission critical systems, so we've measured and checked as much as we can think of with regard to things like this quite carefully.

What we haven't done is tidy up the benchmark code, remove internal dependencies, document it and publish it (and the results) to github. We probably should do that at some point.

@hagbard
Copy link
Contributor

hagbard commented Jun 2, 2022

As I mentioned I have implemented a logging API. I didn't say it was necessary, I said I preferred it. You could always create your own singleton LogFactory or something similar if you really wanted to wrap a context that could be passed around to get things initialized statically.

I suspect you'd run into cyclic static initialization issues pretty fast. Flogger sets up its "Platform" singleton on first use rather than as an explicit singleton exactly to avoid issues of cyclic dependencies during class loading. You also end up almost back at square one if you make a singleton to inject the factory since the singleton removes the choice of implementation and makes the logging in the code hard to test without explicit knowledge of what you need to do to inject a mock.

Since JDK Logger and other common backends all provide ways to capture logs for test purposes without having different implementations of the actual user facing API, I see no reason to try and solve the testability issue using mocks like this.

@adligo
Copy link
Author

adligo commented Jun 2, 2022

Looking forward to the benchmarks.
On the mocking issue, I think my preference is due to my work on tests4j.adligo.org , which is based on JUnit's API but intensely concurrent. You can back a Logging facade with Handlers that use tread locals to get it done, but I've become fond of simply passing around a context. It's old fashoned (pre the Spring Declaritive Style popularity), but since I'm also trying to transpile a lot of Java to JavaScript (and TypeScript) and make the APIs work the same in the different environments, it seems to be the way to go.

@adligo
Copy link
Author

adligo commented Jun 2, 2022

Also, you don't need to test the log API in most cases. But you MAY want to test the usage of the log API. There are certain situations where you need to audit things (i.e. don't log the user's passwords, etc).

@amalloy amalloy added the type=other Miscellaneous activities not covered by other type= labels label Jun 10, 2022
@chaoren chaoren added the P3 label Jun 15, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P3 type=other Miscellaneous activities not covered by other type= labels
Projects
None yet
Development

No branches or pull requests

6 participants