Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizing Trace load and parsing functionality #48

Open
briancoutinho opened this issue May 5, 2023 · 3 comments
Open

Optimizing Trace load and parsing functionality #48

briancoutinho opened this issue May 5, 2023 · 3 comments
Assignees
Labels
feature request New feature request

Comments

@briancoutinho
Copy link
Contributor

🚀 Motivation and context

As we analyze larger and more traces at scale the time for parsing trace files gets into the critical path.
In this work-stream, we plan to identify and fix performance bottlenecks in trace load and parsing and fix them.

Description

Details

To investigate this we will need 1) benchmarking setup, 2) test trace data, 3) profiling methodology. These are described below.

Benchmark Setup and Test Trace data

We can leverage pyperf for a reliable benchmarking setup.

For trace data we will use the hta/tests/data/ directory, and optionally include any test traces a user may want to run with benchmarks.

Profiling Methodology

In addition to the benchmark measurements we can leverage py-spy to analyze CPU time breakdown across functions.
To install py-spy simply run:

pip install py-spy

And profile the benchmark using

sudo /opt/miniconda3/envs/trace-analyzer/bin/py-spy record -p <pid of benchmark>

Alternatives

No response

Additional context

No response

@briancoutinho
Copy link
Contributor Author

Initial Analysis

Looking at the py-spy results a large fraction of trace_load() was being spent getting the memory footprint of the loaded json.

Screenshot 2023-04-28 at 5 02 23 PM

This was fixed in #43 (PR44)

We now look to optimize the load and parsing together. This could done by merging the load to json and parsing in a single step in pandas, this is still WIP.

@briancoutinho briancoutinho self-assigned this May 5, 2023
@briancoutinho briancoutinho mentioned this issue May 6, 2023
8 tasks
@anupambhatnagar
Copy link
Contributor

Currently the rank parsing is pretty fast with the use of re.search. The trace file is loaded once and converted into a pandas df. What exactly are we trying to optimize here?

@briancoutinho
Copy link
Contributor Author

Currently the rank parsing is pretty fast with the use of re.search. The trace file is loaded once and converted into a pandas df. What exactly are we trying to optimize here?

We are currently loading the trace as a json object and then constructing dataframe, the intermediate step consumes a lot of memory and time (its a dynamic object with a lot of memory allocations). It may be possible to incrementally parse json and fill the dataframe, that is how pandas read_json works.

The time is low for the example traces, but larger traces are taking 120s or more to load. Also, your optimization sped things up a quite a bit; that was a low hanging fruit.

facebook-github-bot pushed a commit that referenced this issue Mar 7, 2024
Summary:
## What does this PR do?
Adds benchmarks for trace load and parsing  #48

## Before submitting

- [x] Was this discussed/approved via a Github issue? (no need for typos, doc improvements)
  - [ ] N/A
- [ ] Did you write any new necessary tests?
  - [x] N/A
- [ ] Did you make sure to update the docs?
  - [x] N/A
- [ ] Did you update the [changelog](https://github.com/facebookresearch/HolisticTraceAnalysis/blob/main/CHANGELOG.md)?
  - [ ] N/A
    not sure?

## Benchmark results

Run using
```
python3 benchmarks/trace_load_benchmark.py -p 10 -l 1
```

Before  Anupam's PR #48  optimizing out the get_memory_size we see the load/parsing times were
```
parse[tests/data/vision_transformer]: Mean +- std dev: 12.4 sec +- 0.5 sec
parse[tests/data/inference_single_rank]: Mean +- std dev: 15.0 sec +- 0.5 sec
```

After his PR we now see the time reduced for loading
```
parse[tests/data/vision_transformer]: Mean +- std dev: 6.24 sec +- 0.19 sec
parse[tests/data/inference_single_rank]: Mean +- std dev: 9.25 sec +- 0.19 sec
```

Pull Request resolved: #49

Differential Revision: D54600738

Pulled By: briancoutinho

fbshipit-source-id: f5b70bf72381397b672a2fcccfd485cbea3e3c9d
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request New feature request
Projects
None yet
Development

No branches or pull requests

2 participants