Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding DX12 support to OpenSubdiv #938

Open
wants to merge 4 commits into
base: dev
Choose a base branch
from

Conversation

wallisc
Copy link

@wallisc wallisc commented Jun 24, 2017

At a high-level, this change includes:

  1. New DX12 back-end support to OSD
  2. DXViewer modified to interop with DX12 using D3D11on12
  3. Fixed device leaks in the D3D11 implementation
  4. Modified D3D11HUD to use dynamic buffers instead of allocating a new buffer each frame

Please do ask away if you have any questions!

DX12 Back End:

The DX12 implementation is ported from the existing DX11 code, and is overall similar in design. Regarding DX12 specific details:

  • Because the actual CPU-side work for OSD is small, the implementation is single-threaded and centralized around D3D12CommandQueueContext
  • Unlike DX11, DX12 requires the application to track resource usage and defer deletion until after the GPU is finished. Resource usages are tracked with fences, and inserted into a deferred deletion queue when deleted.
  • Resource barriers are not needed since OSD works entirely with buffers and relies on common state promotion: https://msdn.microsoft.com/en-us/library/windows/desktop/dn899226(v=vs.85).aspx#Common_state_promotion
  • SRV/UAV descriptors are allocated out of a descriptor heap in ring-allocator fashion.
  • The implementation uses the same shaders as DX11 with the exception of class interfaces since these are not supported in DX12.

DXViewer:

  • Unfortunately there's no ability to share buffers between DX11 and DX12. Currently the only strategy for semi-efficient interop between DX11 and DX12 code is to use D3D11on12 (emulates DX11 by using DX12 underneath the covers, more details here: https://msdn.microsoft.com/en-us/library/windows/desktop/dn913195(v=vs.85).aspx). I've made the minimal set of changes to DXViewer to allow interop with the DX12 implementation, but the changes in DXViewer are fairly intrusive. I'm not completely against just making DX12 port of DXViewer, but I figured this would be a good first step. I'll leave it to the reviewers to see if they'd prefer a full DX12 port instead. (Would be much cleaner at the cost of duplicating a lot of the DXViewer code).
  • DX12 requires flip-model swapchains, I've changed some of the swapchain code to allow for using the more efficient flip-model swapchain
  • When running DXViewer and going from DX11/CPU/etc. -> DX12, it requires the DX11 devices and all resources to be recreated using a DX11 device created with 11on12. This required a bit of refactoring to allow for DXViewer to handle re-construction all DX resources.

Performance:

Note: Performance is not a completely fair test since DXViewer with DX12 is using DX11on12 which is an emulation layer and slightly slower on both the CPU & GPU. A full DX12 port of DXViewer would be required for complete data. Nonetheless, I've put some GPU frames times of workloads (all at Lv 10 tessellation) to give a gist of the perf. Data was taken on a NVidia GTX 760:
------------------------------------ DX11 -------------- DX12
catmark_car ---------------------- 3.9ms ------------- 3.7ms
catmark_bishop ------------------ 2.2ms ------------- 2.0ms
catmark_nonman_qualpole360 - 70.0ms ------------72.0ms

Future Optimizations

  • Async compute: This is the big one. Certain GPUs have the ability to run compute work on a separate engine from where graphics work is done. This can allow for huge wins, and allows apps like DXViewer to run rendering work in parallel with the compute work handling subdivision.
  • Buffer Allocation: Currently buffers are allocated using CreateCommittedResource as needed. A smarter optimization is to have a pool that allocates these rather than making expensive calls to creating/destroying resources. This can be take one step further by allocating one large heap and suballocating buffers out of the heap.

Adds basic d3d12 support for OpenSubdiv. This includes:
* D3D12 port of the D3D11 backend osd code
* Pooled allocation of command lists and command allocators
* Deferred deletion of resources based on fence tracking
* Shaders modified to use structured buffers if USE_STRUCTURED_BUFFERS
macro is true. This allows the d3d12 implementation to use root views
* CMake files modified to grab the Win10 SDK

Adding missing FindDX12SDK.cmake

Using descriptor tables instead of root views, fixing synchronization bug

Adding D3D12VertexBuffer files that were dropped in last commit

Removing defaulting of the d3d12/dxgi debug layer and CMake fixes

Adding 11on12 to interop with 12 OSD implementation

Enabling DXViewer to switch between using DX11 and DX11on12

Additionally fixing DX11 implementation device leaks and removing
unnecessary dx12 files

Fixing several refcount leaks that were causing issues when transitioning in and out of d3d12

Fixing a missed rebase conflict

Cleaning up debug code and other dxviewer cleanup

Reverting garbage that came with my last commit

First Iteration of adding D3D12 backend to OpenSubdiv

Cleaning up hlsl

First Iteration of adding D3D12 backend to OpenSubdiv

Fixing DXViewer to compile with NODX12 flag

First Iteration of adding D3D12 backend to OpenSubdiv
@wallisc
Copy link
Author

wallisc commented Jun 24, 2017

One other note, I'd highly recommend moving the D3D11 code to using CComPtr or some type of RAII structure. I went the route of manually releasing since CComPtr's don't seem to be the coding style, but it's incredibly difficult to avoid regressions if you rely on manual release's. If this sounds interesting, I'd be happy to take a look at this in a separate PR.

@davidgyu
Copy link
Member

Thanks! Taking a look through the PR this evening.

@davidgyu
Copy link
Member

The AppVeyor build failure can be fixed by changing:
#include <dxx12.h>
to
#include "d3dx12.h"
in opensubdiv/osd/d3d12DescriptorHeapManager.cpp

@wallisc
Copy link
Author

wallisc commented Jun 29, 2017

Will play around with this more tomorrow, it's correctly finding the directory I'd expect d3d12.h to live in (C:/Program Files (x86)/Windows Kits/10/Include/10.0.14393.0/um), but somehow still not finding d3d12.h itself. Any idea on how I can find out more about the build environment AppVeyor uses to try and repro this break myself?

@davidgyu
Copy link
Member

CMake is finding the Windows 10 SDK here:
"Found DX12SDK: C:/Program Files (x86)/Windows Kits/10/Include/10.0.14393.0/um;C:/Program Files (x86)/Windows Kits/10/Include/10.0.14393.0/shared"

With the updates to your branch, I'm able to build locally without errors...

@wallisc
Copy link
Author

wallisc commented Jun 30, 2017

Hah, silly cmake issue. I suspect it was building locally because Visual Studio sometimes automagically adds the Windows 10 SDK includes, AppVeyor was catching a legitimate issue.

@wallisc
Copy link
Author

wallisc commented Aug 4, 2017

Talked briefly with David at SIGGRAPH, I'm going move the 11on12 implementation to instead just use pure D3D12 which will be cleaner and more performant, at the cost of partially duplicating a lot of the existing DX11 logic (GUI/DXViewer code).

@jtran56
Copy link

jtran56 commented Oct 6, 2017

Filed as internal issue #151992.

@MSoegtropIMC
Copy link

I wanted to ask how the progress with this is. Since DirectX is now out since 6 years I guess one can even discuss to replace the DirectX11 backend with a DirectX12 backend. As far as I understand what is blocking this was the code duplication caused by DirectX 11/12 coexistence. IMHO those who want to use DirectX11 can use the current release. I don't see a good reason to support DirectX 11 forever and not DirectX 12. At least for new software DirectX 11 is deprecated.

@davidgyu
Copy link
Member

Hi, we've been working on this along with looking into improving support for the other current APIs like Vulkan and Metal, though unfortunately we don't have a timeline to share at the moment for this work. One concern here isn't so much about DX11 vs DX12 and instead about taking into consideration some of the other issues called out in the notes above, e.g. async compute, and staged buffer allocation and data transfers, etc. which are important for efficiency and good performance. The outcome will probably be quite different that what is represented in this pull request.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants