You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi,
I have implemented Rocket64b1gem16 on my FPGA with default configs and 8GiB DDR3.
The ONNX Resnet18 Model sometimes can run with command '-O 99' and I can get the right result. But sometimes it gets stuck.
With the optimizing command '-O 1' , the model can run every time but it takes more time.
Besides, chipyard spike simulator can always run this model with '-O 1' and '-O 99' correctly.
Here are the compared results.
Below is rocket64b1gem16 with '-O 99' result. This model can run correctly with '-O 99' occasionally.
debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
2023-02-28 11:29:18.129004800 [W:onnxruntime:, graph.cc:1074 Graph] Initializer 301 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2023-02-28 11:29:18.188448000 [W:onnxruntime:, graph.cc:1074 Graph] Initializer 302 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
2023-02-28 11:29:18.217388800 [W:onnxruntime:, graph.cc:1074 Graph] Initializer 303 appears in graph inputs and will not be treated as constant value/weight. This may prevent some of the graph optimizations, like const folding. Move it out of graph inputs if there is no need to override it, by either re-generating the model with latest exporter/converter or with the tool onnxruntime/tools/python/remove_initializer_from_input.py.
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
Number of inputs = 1
Input 0 : name=input.1, type=1, num_dims=4: [1, 3, 256, 256, ]
Number of outputs = 1
Output 0 : name=231, type=1, num_dims=4: [1, 21, 64, 64, ]
yolox init
pose init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 5.487418 s
normalize_transpose took 0 cycles 3.011997 s
Done! Pre Process 1 took 0 cycles 8.499517 s
Done! Inference 1 took 0 cycles 5.220877 s
Done! Pre Process 1 took 0 cycles 1.010774 s
Below is rocket64b1gem16 with '-O 99' stuck result. This model sometimes gets stuck at the same place.
debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -2 pose_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic
Below is rocket64b1gem16 with '-O 1' result. This model can run correctly with '-O 1'.
debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem16 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 1
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 25600, 147)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (16, 6400, 144)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 16)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 144)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic matmul!
Using accelerated matmul with dimensions (32, 1600, 288)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 32)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 288)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 400, 576)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 64)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic add
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 100, 1152)
Called into systolic add
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 100, 128)
Called into systolic matmul!
Using accelerated matmul with dimensions (512, 25, 2304)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 25, 512)
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 9, 1152)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 9, 256)
Called into systolic matmul!
Using accelerated matmul with dimensions (256, 4, 1152)
1x1 case!
Called into systolic matmul!
Using accelerated matmul with dimensions (64, 4, 256)
Called into systolic matmul!
Using accelerated matmul with dimensions (128, 1, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 1, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 4, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 9, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 25, 4608)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (24, 400, 576)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 1, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 4, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 9, 2304)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 25, 4608)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 100, 1152)
Called into systolic matmul!
Using accelerated matmul with dimensions (12, 400, 576)
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 5.440022 s
normalize_transpose took 0 cycles 2.139706 s
Done! Pre Process 1 took 0 cycles 7.579837 s
Done! Inference 1 took 0 cycles 17.962803 s
Done! Pre Process 1 took 0 cycles 1.224211 s
I also tried to run this model on Rocket64b1gem8. This model always runs correctly with '-O 99', and it's inference time is much shorter than gem16 which is weird.
Below is rocket64b1gem8 with '-O 99' result.
debian@debian:~/imagenet_runner_0.7.1$ ./ort_test_gem8 -1 detection_quanV2.onnx -i images/2.jpg -x 2 -O 99
Loaded runner program
Using systolic in mode 2
Using Onnxruntime C++ API
Number of inputs = 1
Input 0 : name=input, type=1, num_dims=4: [1, 3, 320, 320, ]
Number of outputs = 12
Output 0 : name=299, type=1, num_dims=4: [1, 12, 20, 20, ]
Output 1 : name=301, type=1, num_dims=4: [1, 12, 10, 10, ]
Output 2 : name=303, type=1, num_dims=4: [1, 12, 5, 5, ]
Output 3 : name=305, type=1, num_dims=4: [1, 12, 3, 3, ]
Output 4 : name=307, type=1, num_dims=4: [1, 12, 2, 2, ]
Output 5 : name=309, type=1, num_dims=4: [1, 12, 1, 1, ]
Output 6 : name=300, type=1, num_dims=4: [1, 24, 20, 20, ]
Output 7 : name=302, type=1, num_dims=4: [1, 24, 10, 10, ]
Output 8 : name=304, type=1, num_dims=4: [1, 24, 5, 5, ]
Output 9 : name=306, type=1, num_dims=4: [1, 24, 3, 3, ]
Output 10 : name=308, type=1, num_dims=4: [1, 24, 2, 2, ]
Output 11 : name=310, type=1, num_dims=4: [1, 24, 1, 1, ]
yolox init
Loading image
Image dimensions: 256 256 3
Called into systolic conv
Using systolic pooling
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic add
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
Called into systolic conv
0
11
2
0
0
0
84 140 207 264 0
resize took 0 cycles 1.830045 s
normalize_transpose took 0 cycles 1.073210 s
Done! Pre Process 1 took 0 cycles 2.903357 s
Done! Inference 1 took 0 cycles 1.933709 s
Done! Pre Process 1 took 0 cycles 0.445910 s
I also changed DDR to 2Gib DDR3, which I get the same result and the model gets stuck at the same place.
What might be the problem?
Thanks!
The text was updated successfully, but these errors were encountered:
@Alwinnnn Hi, are you able to solve this issue? im facing the same thing.
@Leo-Z-Li
Sorry i haven't fixed this issue yet. However, when i replace rcoket64 with boom medium core, things get worse. The program would get stuck earlier, even couldn't execute ROCCTEST_RESTNET50-linux.
@Alwinnnn Hi, are you able to solve this issue? im facing the same thing.
@Leo-Z-Li Sorry i haven't fixed this issue yet. However, when i replace rcoket64 with boom medium core, things get worse. The program would get stuck earlier, even couldn't execute ROCCTEST_RESTNET50-linux.
@Alwinnnn Which FPGA are you using? Is it the nexys-video?
Hi,
I have implemented Rocket64b1gem16 on my FPGA with default configs and 8GiB DDR3.
The ONNX Resnet18 Model sometimes can run with command '-O 99' and I can get the right result. But sometimes it gets stuck.
With the optimizing command '-O 1' , the model can run every time but it takes more time.
Besides, chipyard spike simulator can always run this model with '-O 1' and '-O 99' correctly.
Here are the compared results.
Below is rocket64b1gem16 with '-O 99' result. This model can run correctly with '-O 99' occasionally.
Below is rocket64b1gem16 with '-O 99' stuck result. This model sometimes gets stuck at the same place.
Below is rocket64b1gem16 with '-O 1' result. This model can run correctly with '-O 1'.
I also tried to run this model on Rocket64b1gem8. This model always runs correctly with '-O 99', and it's inference time is much shorter than gem16 which is weird.
Below is rocket64b1gem8 with '-O 99' result.
I also changed DDR to 2Gib DDR3, which I get the same result and the model gets stuck at the same place.
What might be the problem?
Thanks!
The text was updated successfully, but these errors were encountered: