Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CPP emitted causes Vitis to segfault #47

Open
makslevental opened this issue Apr 22, 2022 · 8 comments
Open

CPP emitted causes Vitis to segfault #47

makslevental opened this issue Apr 22, 2022 · 8 comments

Comments

@makslevental
Copy link

makslevental commented Apr 22, 2022

Description

Tried to ScaleHLS+Vitis a simple model using the instructions in the README, namely

python export_braggnn_mlir.py > braggnn.mlir

torch-mlir-opt braggnn.mlir \
    -torchscript-module-to-torch-backend-pipeline="optimize=true" \
    -torch-backend-to-tosa-backend-pipeline="optimize=true" > braggnn.tosa.mlir

scalehls-opt braggnn.tosa.mlir \
    -scalehls-pytorch-pipeline-v1="top-func=forward loop-tile-size=4 loop-unroll-factor=2" \
    | scalehls-translate -emit-hlscpp > braggnn.cpp

and got this on Vitis v2021.2:

INFO: [XFORM 203-712] Applying dataflow to function 'forward' (braggnn.cpp:2233), detected/extracted 21 process function(s): 
	 'entry_proc484'
	 'forward_node0'
	 'forward_node1'
	 'forward_node2'
	 'forward_node3'
	 'forward_node4'
	 'forward_node5'
	 'forward_node6'
	 'forward_node7'
	 'forward_node8'
	 'forward_node9'
	 'forward_node10'
	 'forward_node11'
	 'forward_node12'
	 'forward_node13'
	 'forward_node14'
	 'forward_node15'
	 'forward_node16'
	 'forward_node17'
	 'forward_node18'
	 'forward_node19'.
Stack dump:
0.	Running pass 'Dead Global Elimination' on module '/home/mlevental/dev_projects/scalehls/samples/pytorch/braggnn/proj/solution1/.autopilot/db/a.o.1.bc'.
Abnormal program termination (11)

with stack dump:

Stack:
/lib/x86_64-linux-gnu/libc.so.6(+0x430c0) [0x7f8f8187e0c0]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libLLVM-3.1.so(llvm::TargetData::getTypeSizeInBits(llvm::Type*) const+0) [0x7f8f6b394210]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libLLVM-3.1.so(llvm::AliasAnalysis::getTypeStoreSize(llvm::Type*)+0x12) [0x7f8f6b708312]
...

Dropping #pragma HLS dataflow both in forward and in forward_node19 doesn't fix but produces a different segfault:

WARNING: [HLS 200-1888] The stable scalar argument 'v1012' is written in a dataflow region ((braggnn.cpp:144:1)). This is not supported and may lead to incorrect RTL code.
Stack dump:
0.      Running pass 'Check subprocesses communication behavior in dataflow region' on module '/home/mlevental/dev_projects/scalehls/samples/pytorch/braggnn/proj/solution1/.autopilot/db/a.o.1.bc'.
Abnormal program termination (11)

with different stack dump:

Stack:
/lib/x86_64-linux-gnu/libc.so.6(+0x430c0) [0x7f31dba080c0]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(llvm::PtrUserTree::visitTree(std::set<llvm::Argument*, std::less<llvm::Argument*>, std::allocator<llvm::Argument*> >&, llvm::SetVector<llvm::Instruction*, std::vector<llvm::Instruction*, std::allocator<llvm::Instruction*> >, llvm::SmallSet<llvm::Instruction*, 16u, std::less<llvm::Instruction*> > >&, bool) const+0xfb) [0x7f31c356f4ab]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(pass::DataflowDepGraph::analyzeAccessBehavior(llvm::CallSite, llvm::Value*)+0x12b) [0x7f31c358876b]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(pass::DataflowDepGraph::initializeSubprocesses(std::map<llvm::Function*, std::vector<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> >, std::allocator<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> > > >, std::less<llvm::Function*>, std::allocator<std::pair<llvm::Function* const, std::vector<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> >, std::allocator<llvm::PointerIntPair<llvm::GlobalVariable*, 2u, pass::DataflowDepGraph::AccessType, llvm::PointerLikeTypeTraits<llvm::GlobalVariable*> > > > > > > const&)+0x108) [0x7f31c3588948]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libhls_hwsyn.so(pass::CheckDFChannels::runOnModule(llvm::Module&)+0xa16) [0x7f31c3563eb6]
/home/mlevental/dev_projects/Xilinx/Vitis_HLS/2021.2/lib/lnx64.o/libLLVM-3.1.so(llvm::MPPassManager::runOnModule(llvm::Module&)+0x182) [0x7f31c9393aa2]

Let me know if there's anything I can do to help debug.

Artifacts

  1. PyTorch model
  2. export script
  3. torch-mlir IR
  4. tosa IR
  5. ScaleHLS emitted CPP
  6. tcl script
  7. autopilot log
  8. first stack dump
  9. second stack dump
@makslevental
Copy link
Author

FWIW rewinding to d6ffcd0 and running with -scalehls-pytorch-pipeline="top-func=forward dataflow-gran=0 opt-level=2" (i.e., completely disabling dataflow) does fix the issue and the synthesis goes all the way through.

@hanchenye
Copy link
Collaborator

This is an error that I've never seen. But it seems Vivado has recognized all the dataflow stages, which is a good sign :) Thanks for providing the artifacts, will have a try on them.

@stephenneuendorffer
Copy link
Collaborator

stephenneuendorffer commented Apr 22, 2022

It would be good to capture this for the vitis hls folks: if you have the source code and TCL that fails?

@makslevental
Copy link
Author

It would be good to capture this for the vitis hls folks: if you have the source code and TCL that fails?

@stephenneuendorffer I think all of the artifacts should be enough for repro but I can provide whatever else is needed.

@SerenaC94
Copy link

Does this mean scaleHLS is guaranteed to work with Vivado, but not with Vitis?

@chhzh123
Copy link

chhzh123 commented May 10, 2022

Does this mean scaleHLS is guaranteed to work with Vivado, but not with Vitis?

Same question. I found the test_gemm_dse.cpp in the README also could not pass the vitis_hls compilation, since it tried to partition the input array with interface specification. My vitis_hls version is v2019.2.1.

@hanchenye
Copy link
Collaborator

I found the test_gemm_dse.cpp in the README also could not pass the vitis_hls compilation, since it tried to partition the input array with interface specification. My vitis_hls version is v2019.2.1.

I have also observed this issue. From a specific version, Vitis has renamed the "resource" directive to "bind_op" and "bind_storage" directives. Meanwhile, "bind_storage" is no longer allowed to be applied on interface arrays. Instead, the "storage_impl" and "storage_type" options are merged into the "interface" directive.

A temporary solution to adapt Vitis HLS is updating the emission logic here: https://github.com/hanchenye/scalehls/blob/4acb8795839dd2ba291733a521f2646db756edd2/lib/Translation/EmitHLSCpp.cpp#L1751-L1754

Ultimately, I'd think to have a target triple for the C++ emitter to specify the vendor tool and version of emission.

@makslevental
Copy link
Author

Ultimately, I'd think to have a target triple for the C++ emitter to specify the vendor tool and version of emission.

probably ultimately @stephenneuendorffer and Xilinx should just buy ScaleHLS 😉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants