Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking peak memory usage #689

Closed
itamarst opened this issue May 19, 2020 · 3 comments
Closed

Tracking peak memory usage #689

itamarst opened this issue May 19, 2020 · 3 comments

Comments

@itamarst
Copy link

Hi,

I am helping out a company that would like to use Dr. Elephant, and in particular would like to be tracking peak memory execution stats and getting recommendations based on that (should more/less memory be allocated). They're using Spark 2.3, and concluded that it's not possible to track that specific thing with current Dr. Elephant.

Since I understand you're in the process of updating it, so was curious how the process will go and how it might interact with external contributions:

  1. Do you have a timeline, e.g. when newer Sparks will work out of the box?
  2. Do you have a sense of how intrusive the changes are going to be? Is it minor updates, or will it break PRs against current code base?
  3. Do you expect to do the updates in private and then release them when done, or do development in the open?

Thank you!

@ShubhamGupta29
Copy link
Contributor

Hi @itamarst,

  1. I have made changes for the Spark 2.3 and in the process to make changes for Spark 2.4, the changes for Spark2.3 version are being tried out by several users and it's in the review process. I am hopeful that the changes will be merged at max by mid of June. But if you want to try out Spark2.3 then you can checkout by personal branch, it would helpful for merging the changes soon and to provide you the changes right away.

  2. The changes are for sure major ones as the current Dr.Elephant supports Spark 1.x and for migrating to Spark2.x (especially Spark2.3/2.4) a lot of changes are done. But these changes are done in the Fetcher part and the Heuristics part is unaffected, so if your changes are related to SparkFetcher or FSFectcher class then your PRs will break mostly. You can estimate the changes done for this migration by having a look at the personal branch provide above.

  3. I am making changes in the public forked repo only. Also I update the issue Support for Spark 2.3/2.4 in Dr.Elephant #683 with the updates available for the changes made.

@itamarst
Copy link
Author

So it turns out that there are a bunch of extra metrics one wants for peak memory, and it would be great to have them in Dr. Elephant... once they're in Spark. They are being added in a PR to Spark that will hopefully be merges soon: apache/spark#29020

@ShubhamGupta29
Copy link
Contributor

Thanks @itamarst for the update. I noticed that this PR is merged, will track in which Spark3.x it gets merged.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants