Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support for per-vGPU mode profiling #165

Closed
wants to merge 3 commits into from

Conversation

jiaopengyuan
Copy link
Contributor

There are 4 commits in the pull-request.
The first one is, same as #164, to fix counter filtering and graph plotting issues when one metric set is selected again. I think you can ignore or close that #164 request.
The last 3 commits are about per virtual GPU counter profiling, which are usage for graphics workloads running inside virtual machines. User can select a specific vGPU on the webui or through gputop-csv to observe each vGPU performance.

@djdeath
Copy link
Collaborator

djdeath commented Oct 10, 2017

Looks like this PR breaks some of the travis tests.

For profiling counters in per-vGPU mode, this menu requests
for vgpu_id and hw_id of each vGPU when vGPUs are created

Signed-off-by: Pengyuan Jiao <[email protected]>
When a vGPU is selected on the menu, the related hw_id is
passed to client-c through webui, then per-vgpu filtering
by hw_id is achieved in gputop-client-c.c in userspace.

Signed-off-by: Pengyuan Jiao <[email protected]>
add new option "-vgpu" for gputop-csv to print counter
data in per-vGPU mode.

Signed-off-by: Pengyuan Jiao <[email protected]>
@jiaopengyuan
Copy link
Contributor Author

I have modified my patch to pass the travis CI test, please check, thanks.

@@ -486,6 +568,16 @@ parser.addArgument(
);

parser.addArgument(
[ '-vgpu', '--vgpu' ],
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can drop '-vgpu'.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have dropped '-vgpu'.

@@ -241,6 +296,7 @@ GputopCSV.prototype.update_features = function(features)
var col_width = 0;

if (this.pretty_print_csv_) {

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the newlines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my mistake, I have removed this line.

@@ -350,6 +406,7 @@ GputopCSV.prototype.update_features = function(features)
this.column_titles_.map((line) => {
this.stream.write(line + this.endl);
});

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please remove the newlines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my mistake, I have removed this line.

@@ -456,19 +515,42 @@ function write_rows(metric, accumulator)
}
}

var flag = 0;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe you find a better name for this.
Using a boolean type seems appropriate too.
Probably make it part of the GputopCSV instance.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have renamed this parameter with "warning_once" and put it into GputopCSV instance.

for (var i = 0; i < vgpu_id_.length; i++) {
if (vgpu_id_[i] === parseInt(args.vgpu)) {
vgpu_id = vgpu_id_[i];
break;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please align this break statement.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my mistake, I have corrected it.

});
}).call(this, this.metrics_[i]);
}
this.select_metric_set(this.metrics_[0]);
this.select_metric_set(this.metrics_[0], 0);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not this.current_hw_id?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, I have replaced this with this.current_hw_id.

this.update_metric_set_stream(metric);
var hw_id = this.current_hw_id;
if (hw_id === undefined)
return;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So with this change, if no hw id is selected, we can't pause the stream?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let me briefly introduce my idea. If we create 2 vGPUs, there are 3 states. ctx_mode = ["Global", "vGPU1", "vGPU2"], and corresponding hw_id = ["0", hw_id1, hw_id2]. The "Global" state means the current state, without applying my patch, that is, it shows the whole counter values for any context id. All of the vgpu_id and hw_id starts from non-zero value. Hence I define the hw_id as 0 to map the "Global" state. So in the beginning without selecting any hw id, the current_hw_id is 0, it works in the "Global" state.
As a result, we can pause the stream whether we select one hw id or not.

{
repeated uint64 ctx_hw_id = 1;
repeated uint64 vgpu_id = 2;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean you have the UI query every single vgpu, hw_id tuple one by one?
Sounds like you should return an array instead.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I have the UI query each vgpu to the vgpu_id and ctx_hw_id array one by one. Beside of the vgpu_id and ctx_id, we also get the current vgpu number, which is defined as "n_ctx_hw_id" and "n_vgpu_id".
In the function handle_get_features, the array notices[] = "RC6 power saving mode disabled" is sent through defining the "notices" and "n_notices". In the same way, we send the vgpu_id array, ctx_hw_id array, n_ctx_hw_id and n_ctx_hw_id every time after receiving the query in the gputop-server. Then the vgpu_id and ctx_hw_id array can be resolved in the client.

@@ -973,6 +972,71 @@ gputop_get_cmd_line_pid(uint32_t pid, char *buf, int len)
return res;
}

void handle_update_hw_id_map(uint64_t *vgpu_id,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function should be static.
You can remove the double space after void.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I have corrected it.


while (entry = readdir(vgpu_dir)) {
if (entry->d_type == DT_DIR && (strlen(entry->d_name)==UUID_length)) {
int ret_vgpu_id_path = asprintf(&vgpu_id_path, "%s/%s/intel_vgpu/vgpu_id", vgpu_path, entry->d_name);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation is wrong here.
You seem to mix 2 different spacing in your change (2 vs 4 spaces), please be consistent with the rest of the file. I believe it's 4 spaces.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for my mistake, I have corrected it.

@djdeath
Copy link
Collaborator

djdeath commented Jan 30, 2018

Closing since you've opened #166.

@djdeath djdeath closed this Jan 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants