Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rocketchip学习笔记6:基于TileLink的DMA设计 #30

Open
meton-robean opened this issue Dec 27, 2019 · 3 comments
Open

rocketchip学习笔记6:基于TileLink的DMA设计 #30

meton-robean opened this issue Dec 27, 2019 · 3 comments

Comments

@meton-robean
Copy link
Owner

背景

  • DMA(Direct Memory Access 直接内存访问 ),它能把CPU从繁忙的数据传输过程中解脱出来,完成数据存取与CPU解耦。

  • 现在大部分rocc加速器都是从cache取数,有的拓展cache例如从L2 cache拿数据,有的有ping-pong机制, 数据预取机制等等,但是有用DMA取数的还比较少。

  • rocketchip生成器中不带有DMA,只能从cache取数。

@meton-robean
Copy link
Owner Author

我的同学@zwk最近学习chisel设计了DMA模块,详细可以参考他的博客:
基于TileLink的DMA设计

Repository owner locked and limited conversation to collaborators Dec 27, 2019
@meton-robean
Copy link
Owner Author

可以参考一些加速器设计里面的DMA模块,例如chipyard里面的icenet和gemmini加速器

@meton-robean
Copy link
Owner Author

基于RocketChip的DMA取数大致情况:

  • DMA读数是这样的,比如发送一个请求读4096个64bits的数组,它会分成多个cache blocks 请求到DRAM,现在每个cache block最大有8个节拍,每个节拍64bits。

  • 每个cache block请求时间在10-15 cycles

  • cache block请求被DRAM响应后,这个cache block 的8*64bits会在后面的8个时钟周期内顺序到来

  • 读取下一个cache block时,又需要向DRAM发起请求,重复上面的过程

  • 假如cache block请求响应时间不变的话,拓宽DRAM->DMA,相同时间取到的数据就变多了。比如位宽从64-》128,应该有两倍的提高

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

1 participant