-
Notifications
You must be signed in to change notification settings - Fork 25
Perfume
Perfume infers performance-aware, behavioral models of software using existing runtime logs. Performance metrics present in these logs convey vital information about the execution of most systems, but previous model inference work ignores performance data. Perfume generates more precise models by utilizing this information, which might include such metrics as event timestamps, network traffic, or energy consumption.
Examples to motivate (a) how Perfume might be helpful to developers, and (b) how Perfume might be more useful than Synoptic and related approaches.
Caching:
Using Perfume to uncover underlying caching behavior that is not visible at the event sequence level. For example, it might reveal a performance difference due to cache hits and cache misses along a single abstract path of execution.
Differentiating mocked and real object behavior:
Mocked objects are a standard means of unit-testing complex applications that have dependencies on external resources, such as a database. A mocked object provides sufficient behavior to stand-in for the true object that will be used in production. Perfume can be used to differentiate between executions that were derived with mocked objects versus those that used true underlying resources.
Login-database example:
Let's say the user logs into some application, and then the application accesses a database on the user's behalf. There might be two executions that look identical in the events that they generate: "load page", "user login", "get db info". But, there might underlying dependency that is not apparent from this sequence, such as the impact of the user's authentication on the db operation. This dependency may be visible in the timing of the operations. For example, the first sequence with timing might looks like "0s : load page", "5s : user login", "7s: get db info". While a second sequence might be "0s: load page", "2s: user login", "7s: get db info". The two paths have identical total times, and identical event sequences, but in the first sequence the login takes a long time (e.g., perhaps the login is incorrect) while in the second sequence the login is fast (e.g., a correct login might be faster to verify).
Differentiating machine architectures:
Run the same code on different architectures/networks and you'll get different performance. Perfume might be used to differente traces based on the performance characteristics of the resources accessed by the program that generated the traces.
Enforce splitting of branches if they differ in performance:
Let's say we have four traces -- two fast traces that looks like "a x c Z" and two slow traces that look like "a y c Z". Synoptic will create two branches -- one for "x" and another for "y". These two branches will merge back at the "c" event, followed by a merged tail of some "Z" sequence of events. Due to the performance differences between the "x" and the "y" traces, Perfume will break apart the merged tails and will create completely independent branches -- one for "x" and another for "y".
Memory performance models:
If instead of timestamps the log contains a sequence of total memory usage, then we could use Perfume to construct a memory model of the system that generated the log. That is, an edge between two events would have a distribution of the delta in allocated memory --- this distribution may contain negative allocations (e.g., deallocs). Could the resulting model could be used to find memory leaks?
File system performance models:
Use a tool, such as blktrace on Linux, to trace requests through the kernel vfs and the disk controller/driver for specific block requests. Build a model that describes performance of these block requests and differentiates blocks based on the path/latency that was observed in the corresponding requests.
WProf / webpage load performance:
WProf instruments Chrome and logs timing and dependency information for page loads. A page's load time will vary based on where the client is located, characteristics of the network, characteristics of the client device, etc. So, by collecting a bunch of page load measurements for the same page, we can derive a non-trivial Perfume model that can help to pin-point aspects of the page load that are slow under certain conditions. And, perhaps even more usefully, the model may reveal how the set/sequence of events in the load process varies with performance (e.g., for slow connections the site may show an image instead of a video preview).
Application-specific garbage collection optimization:
Perfume models could tell GC when not to run by predicting what's coming next. One likely positive outcome: Don't run GC if the execution is almost done. Could use the Boehm-Demers-Weiser GC for C/C++ here, which might be easier to control and modify since it's already an add-on itself.
Potential challenges:
- GC is very heavily researched, and it's hard to improve on that. (Although the "we're almost done" optimization is likely to help.)
- GC in Java seems to be very hard to control.
Performance debugging (generally):
Find poorly performing, predicted paths to help identify what to remove.
Energy usage
Like time, energy usage is monotonic. Perfume models of power might be completely novel to the field.
- We can imagine inferring a model of a system before a change, and then after a change, and using the model to predict if now paths exists that will consume LOTS of energy, or if perhaps such paths have been eliminated.
Potential challenges:
- Energy use is not linear and so two deltas are not completely comparable (i.e., different # joules used on two edges in a model may actually be identical if the battery level was the same). It has really weird patterns and researchers are still working on figuring out how we should model energy use exactly.
- Retrieving a log with energy numbers. Potentially we could do something that is at method call granularity. But, there are tools out there and researchers who build them would be excited about us trying these out.
Resources:
- How do architectural choices influence energy consumption? link
- How do code changes affect energy consumption: link
- "V-edge: Fast Self-constructive Power Modeling of Smartphones Based on Battery Voltage Dynamics" link
- "eDoctor: Automatically Diagnosing Abnormal Battery Drain Issues on Smartphones" link
- "Carat: Collaborative Energy Diagnosis for Mobile Devices" link
- "Where is the energy spent inside my app? Fine grained energy accounting on smartphones with eprof." link
- "Empowering developers to estimate app energy consumption" link
- "Energy consumption in mobile phones: A measurement study and implications for network applications." link
Performance counters
Use some performance counter (retrieved by perf) to find known bad behavior. Then Perfume models could show the types of executions that lead to this bad behavior, and we could then apply some kind of fix.
Specific idea:
Look at executions that lead to lots of cache misses, and simply move the conflicting objects in memory farther apart if we're heading down one of these paths.
Utilizing multiple metrics
We could record multiple metrics as a program runs (time, memory usage, network bytes sent/received, etc.) and experiment with using them individually or all at once. Comparing models generated using each metric individually might convey correlations between them, e.g., if time is correlated with network traffic, this part of the execution is being bottlenecked by the network or is at least network heavy.
## Work related to Perfume ##
(Warning: not all of this work is closely related.)
-
Hard-to-Answer Questions about Code, PLATEAU 2010
- Has some motivation for the kinds of performance questions developers ask about code
-
Maintaining mental models: a study of developer work habits, ICSE 2006
- General work on software comprehension and how developers maintain (and fail to maintain) mental models of software.
-
IBM Whole-system Analysis of Idle Time (WAIT)
-
Model-Based Software Performance Analysis (book)
-
Deriving performance models of software architectures from message sequence charts (workshop paper)
-
Model-based performance prediction in software development: a survey
Formalisms and theory:
- A theory of Timed Automata by Rajeev Alur and David L. Dill
- An Abstraction Refinement Technique for Timed Automata Based on Counterexample-Guided Abstraction Refinement Loop
- Behavioral Cartography of Timed Automata
Inference of timed automata:
- A bunch of work by Olga Grinchtein, but all of it (seems to be) constrained to the active learning context with positive and negative examples.
Specification patterns:
Systems performance debugging/tracing:
- "Performance Debugging for Distributed Systems of Black Boxes" link
- "Automatic construction of coordinated performance skeletons" link
- "Catch me if you can: performance bug detection in the wild" link
Systems work on tracing and log analysis:
- "Be Conservative: Enhancing Failure Diagnosis with Proactive Logging" OSDI'12 link
- X-Trace: A Pervasive Network Tracing Framework. Fonseca et al. NSDI 2007. project site.
- Using Magpie for request extraction and workload modeling. Barham et al. OSDI 2004.
- Performance debugging for distributed systems of black boxes. Aguilera et al. SOSP 2003.
- Structured Comparative Analysis of Systems Logs to Diagnose Performance Problems. NSDI 2012.
- Diagnosing performance changes by comparing request flows. NSDI 2011.
- Characterising Logging Practices in Open-Source Software. ICSE 2012.
Performance profiling by method/procedure:
Questions developers ask about software:
- "Maintaining mental models: a study of developer work habits" link
- "Hard-to-Answer Questions about Code" link
## Model Checking (Traversing loops) ##
Loops will often occur in models, and we need some method of deciding when to stop traversing a loop during model checking, or model checking will never terminate. Possible implementations:
- (Current) After completing a loop, if we inhabit any new state in the FSM that we did not inhabit at the start of the loop, continue. If we inhabit no new states, stop looping. * This is simple but might result in over-splitting (and under-merging), as some legitimate loops that existed in the traces will be considered illegal during model checking and split out
- Count the number of times a loop (e.g., A->B->C->A) appears in each individual trace. Use the count from the trace that contains the most of that loop as a loop limit. When model checking, allow that loop to be traversed at most the loop limit number of times. * This is fairly straightforward to implement, although it is insensitive to the fact that traces might have 5 of some loop but never more than one in a row, so even 2x around the loop would not be accurate to the traces
- At each node while model checking, follow all concrete traces to determine if any actual loops at this point existed and how many. Allow at most this many loops at this node * Looks at which loops actually occurred at this node in traces, but harder to implement and would probably be slow
## Refinement ##
The purpose of constrained refinement is to split apart abstract groupings (partitions) of events where the partition as it currently exists causes a time-constrained invariant to be violated. Synoptic's refinement is only concerned with reachability; for example, if some partition of type X can reach some partition of type Y where there were no actual traces with an X before a Y, some intermediate partition needs to be split. Perfume's constrained refinement can do this while also splitting partitions that are only illegal because, following some path of partitions, one can get from some event X to some event Y with some combined time less than or greater than the time that any X-to-Y occurred in actual traces.
Min/max transitions: Consider all concrete transitions with both (1) the source event in some partition i and (2) the target event in some partition j. When refining based on an upper-bound invariant violation, find the maximum time delta of any of those transitions. Now the set of transitions with exactly that maximum time delta are the max transitions between i and j. For a lower-bound invariant, find the minimum time delta instead, and the resulting set is min transitions. Min/max transitions means using the former for upper-bound refinement and the latter for lower-bound. (TODO: Add diagram)
Stitch: In a counter-example path, partition previous immediately precedes partition this, and partition this immediately precedes partition next. Define set arriving as the min/max transitions between previous and this. Define set departing as the min/max transitions between this and next. Partition this contains a stitch if the set of arriving's target events and the set of departing's source events are not equal. (TODO: Add diagram)
1 for (partition j from end of violation subpath to beginning)
2 for (partition i from j to beginning)
3 if (stitch at i)
4 if (>0 legal paths and >0 illegal paths between i and j)
5 add to potential splits
6
7 for (potential splits)
8 if (globally resolves)
9 split(this)
10 return
11 else if (locally resolves)
12 localresolution = this
13 else
14 arbitrary = this
15
16 if (localresolution exists)
17 split(localresolution)
18 else
19 split(arbitrary)
1: We considered not including this outer loop, and then 'j' in line 2 was instead 'end of violation subpath'. However, that can prevent the check in line 4 from working in some double-stitch cases like this, where the needed split at the partition containing (2) would not be detected.
4: Find all concrete paths starting from an event in i and ending on an event in j. Consider what happens if we replace the counter-example path's current subpath between i and j with one of the just-mentioned concrete paths. If this would mean the violation subpath (including the replaced path between i and j) no longer violates the invariant, this concrete path is legal. If not, it is illegal.
5: Events beginning legal paths go in one half of the split, events beginning illegal ones go in the other half. All other events are randomly distributed between the two halves.
8: If performing this split resolves all violations of the current invariant type (e.g., login AP logout upper-bound=60) from the entire partition graph, then this split globally resolves the violation.
11: If performing this split resolves all violations of the current invariant type only between the start and end of the violation subpath, then this split locally resolves the violation.
11-12, 16-17: Planned to be implemented at a later stage. These are not necessary for correct refinement but should help maximize the amount of progress made on each iteration of this algorithm and might result in a more concise final partition graph.
-
Starting from the end (as we do) or the beginning of the violation subpath makes very little difference. Simple examples can be constructed where the former splits better than the latter and vice versa. (TODO: contrive a better way to visualize the splitting process, and make such examples)
-
There must be a stitch in a partition before it is considered for splitting because if there is no stitch, then we passed through this partition by following a real path that existed in a trace. So splitting this partition cannot help resolve this violation.
-
A counter-example is some combination of concrete transitions that can be followed based on the current partition configuration but lead to a violation of an invariant.
-
Progress (toward resolving the violation) is defined as breaking at least one counter-example without creating any new counter-examples. Splitting one partition into two can only remove combinations of concrete paths, never create new ones, so we will never create new counter-examples with this algorithm.
-
As per comment for line 4, we look for concrete paths between two partitions that, if used, (1) would resolve the violation or (2) wouldn't affect anything / would retain the violation.
- If we have only the former (legal paths), then there is no counter-example because none of these subpaths could be taken to cause a violation. The algorithm should never have been run in the first place in this case.
- If we have only the latter (illegal paths), then the counter-example path could follow any of these, and the violation would remain. So splitting here does not help.
- If we have at least one of each, then this split makes progress toward resolving the violation because we break some combination of transitions between events (specifically start to illegal path i->j to end) that currently violations an invariant but after splitting can no longer be followed.
- Because of the previous point, every split chosen by this algorithm makes progress toward resolving the violation. So the violation is guaranteed eventually to be resolved.
- TODO: Find a way to show that the algorithm always finds some split to make, or the above claim is technically not complete)
-
When considering which partition to split, looking only at those within the violation subpath is an optimization. The algorithm would never find a partition to split outside the violation subpath because the timing information outside it is not pertinent to the current violation.
## Invariant-related additions ##
Adding new invariants or new types of time constraints should help Perfume's models convey more useful performance information.
Every trace has the same number of x and y.
- ex: xxxyyxyyyx
- Possible time constraint: Time since the number of x and y were not equal.
After an x, there cannot be another x without first y.
- ex: xyx
- Possible time constraint: Min/max time from x->y, min/max time from y->x.
Find the median time of an invariant (e.g., for login AP logout, consider the times of all such paths between login->logout and compute the median). We can then ensure during refinement and coarsening that the median time of login->logout paths is within some range, maybe no more than 1 stdev away from the mined median.
As a pre-processing step, take a given trace and normalize its times into [0,1]. This would be very helpful in circumstances where some traces came from a very slow machine and others came from a very fast machine but where they both exhibit the same relative timing patterns. Normalizing such traces would mean the constrained invariants would actually capture patterns present in the log rather than finding outlier maxes and mins that don't say much about the system.
## Miscellaneous ##
This seems to only be useful for optimization.
An invariant violation can never be resolved by splitting a partition corresponding to the start or end of the violation subpath of this invariant violation.
Violation subpath: The subpath of this counter-example that starts with the last encountered t=0 state and ends with the first encountered permanent failure state. In other words, where the actual violation was detected during model checking.
As an example, a counter-example path for the invariant "x AP z upper-bound=4" might be:
INIT --> a --1--> x --3--> x --3--> z --> TERM
The claim is not that no partition of type x or of type z should be split but that the x and z partitions which form the endpoints of the actual violation should not be split. Those are in bold above.
Some server might contain traces from users with very fast connections/PCs/environments and some much slower. Imagine there are some relative patterns that exist in all traces, e.g., A->B takes twice as long as B->C. But if the slow traces take 10x as long for every transition in the entire slow trace, Perfume cannot currently detect that these traces are all very similar even if the absolute times are very different. We might therefore consider normalizing the traces in some way.
- Normalize each trace individually to sum to 1, so A --2--> B --4--> C --6--> D becomes A --2/12--> B --4/12--> C --6/12--> D, etc.
- Apply some sort of sequence alignment algorithm to the traces, and then normalize the traces in some way that would take advantage of this
- Normalize not relative to the trace but horizontally across common transitions, so if 3 traces each have an A->B, respectively A --2--> B and A --4--> B and A --6--> B, the weights would become 2/12, 4/12, and 6/12. Possibly apply sequence alignment first
Early tests suggest that merging is slow compared to the rest of Perfume. Possibly optimize the merging process by maintaining a list/matrix of already-rejected merges to cut down on the amount of model checking needed during merging.
Given a log where each event occurred exactly 1 unit (e.g., second) after the previous one, Synoptic and Perfume will produce different models even though no illuminating performance information is present. Experimenting with this might reveal something about one method or the other.
Perfume is currently restricted to modeling logs where each trace's performance information is monotonically increasing (e.g., time, total network traffic). It would become more widely applicable if non-monotonically-increasing traces were supported, as this would allow metrics such as memory usage, CPU usage, network transfer speed, and disk I/O to be used to infer Perfume models.
- Mining may or may not need to change. If it does, changes will likely be confined to ConstrainedInvariantMiner.
- Model checking will definitely need to change. Currently, much of it expects that when an upper bound is violated or a lower bound is satisfied, the violation or satisfaction cannot be undone, as this is the case when confined to monotonically-increasing series. State machines for the invariants (TracingStateSets) and/or the model checker (FsmModelChecker) will need to be updated, and likely other classes will as well.
When Synoptic and Perfume models are identical, it means that the performance invariants do not induce any further refinement and that Synoptic invariants encompass the Perfume invariants.
Now, if the Synoptic and Perfume models are different, we are guaranteed to have less behavior in the Perfume model (traces in the Perfume model have to satisfy the same Synoptic invariants + performance invariants, so if this causes more refinement then there will be less behavior in the Perfume model).
If the two models differ, we could focus/show the parts of the model that are different and ignore the rest. That is, show just the sub-model of Perfume that is relevant to the splits that removed behavior that is present in the Synoptic model. In some sense, Perfume's refinement that goes beyond the Synoptic model can be used to focus user's attention on parts of the model that are most relevant to performance (since the Synoptic invariants are mostly there to assure sufficient model-log matching accuracy).