-
Notifications
You must be signed in to change notification settings - Fork 37
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
support for per-vGPU mode profiling #165
Conversation
Looks like this PR breaks some of the travis tests. |
For profiling counters in per-vGPU mode, this menu requests for vgpu_id and hw_id of each vGPU when vGPUs are created Signed-off-by: Pengyuan Jiao <[email protected]>
When a vGPU is selected on the menu, the related hw_id is passed to client-c through webui, then per-vgpu filtering by hw_id is achieved in gputop-client-c.c in userspace. Signed-off-by: Pengyuan Jiao <[email protected]>
add new option "-vgpu" for gputop-csv to print counter data in per-vGPU mode. Signed-off-by: Pengyuan Jiao <[email protected]>
I have modified my patch to pass the travis CI test, please check, thanks. |
@@ -486,6 +568,16 @@ parser.addArgument( | |||
); | |||
|
|||
parser.addArgument( | |||
[ '-vgpu', '--vgpu' ], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you can drop '-vgpu'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have dropped '-vgpu'.
@@ -241,6 +296,7 @@ GputopCSV.prototype.update_features = function(features) | |||
var col_width = 0; | |||
|
|||
if (this.pretty_print_csv_) { | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the newlines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my mistake, I have removed this line.
@@ -350,6 +406,7 @@ GputopCSV.prototype.update_features = function(features) | |||
this.column_titles_.map((line) => { | |||
this.stream.write(line + this.endl); | |||
}); | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please remove the newlines.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my mistake, I have removed this line.
@@ -456,19 +515,42 @@ function write_rows(metric, accumulator) | |||
} | |||
} | |||
|
|||
var flag = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe you find a better name for this.
Using a boolean type seems appropriate too.
Probably make it part of the GputopCSV instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have renamed this parameter with "warning_once" and put it into GputopCSV instance.
for (var i = 0; i < vgpu_id_.length; i++) { | ||
if (vgpu_id_[i] === parseInt(args.vgpu)) { | ||
vgpu_id = vgpu_id_[i]; | ||
break; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please align this break statement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my mistake, I have corrected it.
}); | ||
}).call(this, this.metrics_[i]); | ||
} | ||
this.select_metric_set(this.metrics_[0]); | ||
this.select_metric_set(this.metrics_[0], 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not this.current_hw_id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, I have replaced this with this.current_hw_id.
this.update_metric_set_stream(metric); | ||
var hw_id = this.current_hw_id; | ||
if (hw_id === undefined) | ||
return; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So with this change, if no hw id is selected, we can't pause the stream?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me briefly introduce my idea. If we create 2 vGPUs, there are 3 states. ctx_mode = ["Global", "vGPU1", "vGPU2"], and corresponding hw_id = ["0", hw_id1, hw_id2]. The "Global" state means the current state, without applying my patch, that is, it shows the whole counter values for any context id. All of the vgpu_id and hw_id starts from non-zero value. Hence I define the hw_id as 0 to map the "Global" state. So in the beginning without selecting any hw id, the current_hw_id is 0, it works in the "Global" state.
As a result, we can pause the stream whether we select one hw id or not.
{ | ||
repeated uint64 ctx_hw_id = 1; | ||
repeated uint64 vgpu_id = 2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean you have the UI query every single vgpu, hw_id tuple one by one?
Sounds like you should return an array instead.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I have the UI query each vgpu to the vgpu_id and ctx_hw_id array one by one. Beside of the vgpu_id and ctx_id, we also get the current vgpu number, which is defined as "n_ctx_hw_id" and "n_vgpu_id".
In the function handle_get_features, the array notices[] = "RC6 power saving mode disabled" is sent through defining the "notices" and "n_notices". In the same way, we send the vgpu_id array, ctx_hw_id array, n_ctx_hw_id and n_ctx_hw_id every time after receiving the query in the gputop-server. Then the vgpu_id and ctx_hw_id array can be resolved in the client.
@@ -973,6 +972,71 @@ gputop_get_cmd_line_pid(uint32_t pid, char *buf, int len) | |||
return res; | |||
} | |||
|
|||
void handle_update_hw_id_map(uint64_t *vgpu_id, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function should be static.
You can remove the double space after void.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I have corrected it.
|
||
while (entry = readdir(vgpu_dir)) { | ||
if (entry->d_type == DT_DIR && (strlen(entry->d_name)==UUID_length)) { | ||
int ret_vgpu_id_path = asprintf(&vgpu_id_path, "%s/%s/intel_vgpu/vgpu_id", vgpu_path, entry->d_name); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indentation is wrong here.
You seem to mix 2 different spacing in your change (2 vs 4 spaces), please be consistent with the rest of the file. I believe it's 4 spaces.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for my mistake, I have corrected it.
Closing since you've opened #166. |
There are 4 commits in the pull-request.
The first one is, same as #164, to fix counter filtering and graph plotting issues when one metric set is selected again. I think you can ignore or close that #164 request.
The last 3 commits are about per virtual GPU counter profiling, which are usage for graphics workloads running inside virtual machines. User can select a specific vGPU on the webui or through gputop-csv to observe each vGPU performance.