Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error when running #41

Open
leibo-hust opened this issue Mar 19, 2021 · 19 comments
Open

Error when running #41

leibo-hust opened this issue Mar 19, 2021 · 19 comments

Comments

@leibo-hust
Copy link

I want to test the mlp_l4_mnist model. When I get to the last step I encounter the following error and I don't know how to fix it. Thanks!

python dpe.py -n mlp

error:
image

@negishubham
Copy link
Contributor

Hi,

We have provided some test scripts (https://github.com/Aayush-Ankit/puma-simulator/tree/training/test/utils) in the simulator. Please try to use the mlp_layer.sh script to run MLP model, this script is well tested and it runs the mlp_l4_mnist model. If you still get the above error please let us know. For the above error, there have been few updates earlier, I hope that you are using the recent version of the simulator from Github.

Thanks,
Shubham

@leibo-hust
Copy link
Author

@negishubham Thanks. Actually, I'm using the latest version that I cloned a few hours ago, should I use the training branch or not?

@leibo-hust
Copy link
Author

@negishubham Hi, I just ran the run-mlp-layer.sh file and use the default fully-connected-layer model. There are some small errors, like no populate.py file. I solved these small errors, but the final run is still incorrect.
image
And I think I've seen a similar error.

Thanks,
Bo Lei

@leibo-hust
Copy link
Author

I was using the default i_mvm before. when I changed it to the following definition
image
I got the same error as the first one.
image

@negishubham
Copy link
Contributor

Hi,

Yes, please use the training branch. Please follow all the instructions from this file (https://github.com/Aayush-Ankit/puma-simulator/blob/training/how_to_run.md) before running the test scripts, this will help with populate.py errors. It has some instructions related to copying few files from the simulator to the compiler.

I think you don't need to change the i_mvm function in the default code for inference. But if you are doing something for training please follow the comments in the code. I would suggest to first check the setup with inference w/o changing anything in src files and run with the test scripts for both mlp and conv layers.

Thanks
Shubham

@leibo-hust
Copy link
Author

I just followed the steps in the how_to_run.md file exactly from the beginning. When running generate-py.sh, there are some similar errors, they seem to be related to instrn_proto.py. I didn't modify it and used the original file. I don't know if the AssertionError is related to the instrn_proto.py. But I can get the mlp folder.
image

Finally, when I execute python dpe.py -n mlp, I get the following error.
image
I checked the subdirectories under the mlp folder and found that only *.npy exists in tile0 and tile1.

Thanks
Bo Lei.

@leibo-hust
Copy link
Author

@negishubham Thanks to @FrankWu1998‘s help, I solved the problem (not sure how though). I found that I can't use the -t parameter, also num_tile_compute in config.py doesn't seem to affect the results. Also if you want to test the mlp model, then you should use the default instrn_proto.py.

@negishubham
Copy link
Contributor

Hi,

Thanks, @FrankWu1998 for helping. The assertion error might be due to not setting the num of matrices correctly in the compiler (which is mentioned in the how_to_run.md). But good that you solved it.

Regarding the num_tile_compute parameter: In the current version, there is a function inside dpe.py that calculates the # of compute tiles itself so you don't need to set it manually now.

@amankr1279
Copy link

@leibo-hust @negishubham I am facing the same problem as leibo faced regarding AssertionError(link) as he mentions on March 22. Pls help. Pic has been attached for reference

Thanks
Aman Kumar
Problem_git

@deepika7497
Copy link
Collaborator

Hi @amankr1279,

This error is probably there because the number of constant MVMUs in the puma-compiler's common.h file is not equal to the num_matrices in puma-simulator's config file.
Please make sure that they are same and try running again.

Hope this helps.
Regards,
Deepika

@amankr1279
Copy link

Hi @deepika7497 ,

Thanks for responding. I looked at common.h and there N_CONSTANT_MVMUS_PER_CORE = 6 and in config.py num_matrix = 2 I changed to constan_mvmu =2. This helped in no AssertionError in running./generate-py.sh.
However, while running python dpe.py -n lstm it should have stopped at 10,000 cycles but it kept on running till ~30k cycles(maybe due to 68 tiles) though I got full simulation.
Thanks for helping.

Regards
Aman

@msabri1372
Copy link

I have same problem. in common.h, N_CONSTANT_MVMUS_PER_CORE = 6 and I have change num_matrix to 6 also. I use mlp, the number of tile is 5 (tile0 to tile4) I have also change the num_tile_compute = 7 but I have a problem yet.please help me.
image

@deepika7497
Copy link
Collaborator

Hi @msabri1372,

This looks like you probably missed a step, either you did not copy the correct folder from compiler to simulator or forgot to use generate_py,sh ... Please follow the steps again and check once. If this happens again then please let us know.

Regards,
Deepika

@U201814647
Copy link

Hi @deepika7497

I believe I have followed the steps in the how_to_run.md file exactly from the beginning. I used the default instrn_proto.py and changed num_matrix to 6. Because there are 5 tiles in mlp model and the comment thells us num_tile_compute is the number of tiles mapped by dnn (leaving input and output tiles), so I changed num_tile_compute to 3.Is there anything wrong with my operations? When I use python dpe.py -n mlp, the problem comes. Please help me, thank you.

9F1%36%{6O}A6~2NV73AE

Regards
U201814647

@msabri1372
Copy link

Hi @deepika7497

I have followed the instructions again as the mentioned in the how_to_run.md but my problem is remained.

Best regards,

@amankr1279
Copy link

Hello @negishubham @deepika7497 and others.

While running python dpe.py -n nmt command, I am facing following problem. Pls help

Traceback (most recent call last):
File "dpe.py", line 231, in
DPE().run(net)
File "dpe.py", line 160, in run
node_dut.node_run(cycle)
File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/node.py", line 82,
in node_run
self.tile_list[i].tile_run (cycle, self.tile_fid_list[i])
File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/tile.py", line 275,
in tile_run
[tag_hit, data] = self.receive_buffer.read (vtile_id)
File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/tile_modules.py",
line 75, in read
if (not self.isempty(vtile_id)):
File "/mnt/hpe/shubhankar/PUMA/puma-simulator/src/tile_modules.py",
line 58, in isempty
if (self.buffer[vtile_id]['valid']):
IndexError: list index out of range

@negishubham
Copy link
Contributor

Hi @amankr1279

How did you select the value for variable "nmt"?
You don't need to give the number of tiles manually, it is internally calculated in the simulator files.
Please follow this reply: #41 (comment)
There are some test scripts in the same folder for CNN as well.

Thanks,
Shubham

@amankr1279
Copy link

Hi @negishubham , @deepika7497 and others.
We did a tile-wise analysis to identify the workload distribution on different tiles and found that some were heavily loaded while others were relatively free. So, we are devising an algorithm which identifies the instructions suitable for shifting. Currently, we are ensuring that the core_num of that instruction in both the original and new tile is same.

However, after shifting the instructions, when I simulate the new ".puma" files i PUMASim, I face following problem(pic attached).
This happens with any instruction, that I shift. FYI, I have taken care of tile send/receive too so that data flow is not changed. Pls help.
problem

@xlonghu
Copy link

xlonghu commented Jul 20, 2022

Hi, @amankr1279 and others.
while running "python dpe.py -n nmt" ----> "IndexError: list index out of range"
how did you solve this problem?
and another problem: while running ./vgg16.test or other commands, a lot of memory is required, so the program will be killed. what should be done to fix this problem?

regards
Xlong Hu

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

7 participants