Thursday, July 6, 2017

Automated marshalling of managed- to unmanaged structures

An art that I wouldn't even wish upon my greatest enemies to figure out.

I recently started a new project in a language very familiar to me: C#, a managed language. This means that all the memory management is done for you, which is both a blessing and a curse. For this project, I happened to struggle over the "curse part" of automatic memory management.

The problem I encountered is as follows: when you have two structures that are identical in terms of their members, their respective sizes can (and often will) differ in managed languages compared to unmanaged languages. For example, I have the following structure in C# and C++ respectively:

// C#
[StructLayout(LayoutKind.Sequential)]
struct DebugComponent
{
    public float4 Float4;
    public float Float;
}

// C++
struct CPP_DebugComponent
{
    float4 Float4;
    float Float;
};

The size of the structure can be found in C# by using Marshal.SizeOf() (or sizeof() in unsafe code) and reports that the structure is 20 bytes in size, which is correct. Note that I already applied the StructLayout to Sequential, as this will create a layout similar to unmanaged code.

The size of the same structure in C++ using sizeof() reports that the structure is 32 bytes. This is also correct, because the float4 type here is aligned to 16 bytes, meaning the structure will receive another 12 bytes of padding at the end, to make sure it aligns with 16 bytes.

Unfortunately trying to use this structure in a tool such as ManagedCuda, the CUDA kernel struct will use the C++ version, and when you call the kernel from your C# code you will have to use the other version. This creates a mismatch in memory layout, resulting in very weird artifacts after running the kernel, or even crashing because you're writing to unallocated memory in this case.

The "simple" solution I found is to manually expand the C# structure by using the StructLayout.Size attribute to extend the structure to 32 bytes instead of 20. After asking my question on StackOverflow, I didn't solve the problem to create these structures automatically without counting the sizes of every individual type in the structure itself.

So I had to switch up my solution a little bit. I created a project which contains the raw C# structures that I want to use, along with all their functionality like loading and serialization. I then created another C# project for the automated code generation. Using this project we can load our Structures as a dll, from which we can derive all the structures and what types they contain in text templates:
  • Structures: project that contains raw structures that will be used on the GPU
  • Tools: my general purpose project that will generate GPU versions of the structs defined in Structures.
In order to make this conversion as secure as possible I don't want to manually check every time I create a new structure if the GPU version has the same alignment and size. So I created two more projects:
  • AlignedStructsWrapper: A C++/CLI project that combines managed and unmanaged code
  • Tests: a unit test project
Using the CLI project, we can load both our versions of the structure: the managed C# version and the unmanaged C++ version. We can now measure the difference in their sizes:

public ref struct WrapperGpuDebugComponent
{
public:
 int SizeDiff()
 {
  int managedSize = sizeof(Tools::Content::Generated::DebugComponent);
  int nativeSize = sizeof(CUDA::CPP_DebugComponent);
  return managedSize - nativeSize;
 }
};

In the unit test project we load our CLI from reference and we can create a simple unit test that calls the SizeDiff function and checks if the difference is indeed 0:

[TestMethod]
public void CheckStructureSizes()
{
 WrapperGpuDebugComponent debugcomponent = new WrapperGpuDebugComponent();
 Assert.AreEqual(debugcomponent.SizeDiff(), 0);
}

Of course I also generated the CLI structures and the unit test functions automatically for every structure so I only have to recompile the projects and have everything tested.

Sunday, May 14, 2017

CUDA in Visual Studio 2017

Edit: CUDA 9.0 RC is released. This version shows full Visual Studio 2017 support.

Note: this article only shows how to compile Visual Studio 2015 CUDA projects in Visual Studio 2017. For actual VS2017 support we will have to wait for a new CUDA release.

I previously wrote a small article on CUDA support for VS2015, to support CUDA compilation of older projects. Following the same principle we can 'hack' CUDA compilation support in VS2017. 

What you need
  • CUDA installation with visual studio integration for VS. I used CUDA 8.0 and VS2015 respectively.
  • VS2017 (any edition)
Copying the required files
  • To allow CUDA compilation we have to copy a few files. Find the CUDA 8.0 setting files in the VS2015 buildcustomizations directory:
C:\Program Files (x86)\MSBuild\Microsoft.Cpp\v4.0\V140\BuildCustomizations
Note: If you use a different VS version, you have to change the 'V140' accordingly (V120 for VS2013 for example).
  • Copy the following files: CUDA 8.0.props, CUDA 8.0.targets, CUDA 8.0.xml, and Nvda.Build.CudaTasks.v8.0.dll
  • Find the VS2017 buildcustomizations directory:
C:\Program Files (x86)\Microsoft Visual Studio\2017\Community\Common7\IDE\VC\VCTargets\BuildCustomizations
Note: I used the VS2017 Community edition. If you have another edition, change 'Community' in the path accordingly.

  • Paste the CUDA files here.

That's it
You can now load and compile your VS2015 CUDA projects in VS2017. When you first open your project in VS2017, make sure to not upgrade your project to VS2017, otherwise this won't work. 

Friday, April 28, 2017

IniGenerator

I wrote a simple C# code interface for ini files. There are already great NuGet packages for ini file IO parsing and writing, but not a lot of packages that automatically generate a code layer. For this project I based my solution on the ini parser to do the file IO.

My package only provides a small overlay to create a C# class which handles all the file IO behind the scenes. Using the text-templates, we define both the ini file and create a C# class. An example file I used to create my configuration:

<#@ include file="$(ProjectDir)IniTemplate.tt" #>
<#
    // All properties in the ini file
    // Name, default value and category
    CreateProperty("Width", 1280, "Video");
    CreateProperty("Height", 720, "Video");
    CreateProperty("Fullscreen", false, "Video");

    // Generate the code layer
    GenerateIniClass();
#>

Which will create a C# class with the same name as your text-template. The ini file will either be created the first time you use this class, or the old values will be read from the existing file.

Usually there is no backwards compatibility with older versions of the ini file. If you add new properties, all values in the ini file will be reset to their defaults. I avoid these scenarios using the beautiful functionality to merge two ini files from the ini parser. I can simply add the new properties to the old ini file without changing their values.

Finally, an example of the above template used in code:

// Use the namespace where you placed the template
using IniGenerator.Content.Generated;

// Name of ini file
var config = new Config();
// Can be used directly in code without parsing
var size = new Size(config.Width, config.Height);
var fullscreen = config.Fullscreen;

You can view the source code on GitHub, or download the package from NuGet.

If you have any feedback, leave a comment or post an issue on the GitHub project.

Saturday, April 22, 2017

Master Thesis

Update: You can download the full thesis here.

Level-of-Detail Independent Voxel-Based Surface Approximations was the subject of my master thesis. I wrote a small dissemination that explains the basics of my thesis on this page.


This image shows the final result of my thesis work. The models above are voxel models with 4096 (2^12) voxels in every axis. If they were all filled, I would have to store 4096^3 = 68719476736 voxels in total. There has been a lot of research into compressing the huge amount of data this requires, I mentioned some examples on the thesis page.

Using a Sparse Voxel Octree (SVO) storing scalar field values, the six models above can be stored in 12GB of memory total. Using my multiresolution method we can store visually comparable models in only 2GB of memory total.

Here is a small video showing the current state of the voxel path tracer: