
Super shapes pixel shader.
Computing frustum planes.
By dominic in 3D, Maths-physicsComputing the planes for the viewing frustum in any space can be easily done by taking advantage of clip space.
Clip space:
Clip space (or projection space) is a post-projection space that represents the viewable area of a scene. In clip space the corners of the view frustum map to the corners of a cube that spans [-1, 1] on the x and y axes, and [0, 1] on the z axis, as you can see on this image shamelessly stolen from the MSDN:

Basically this means that in clip space, there is a well defined frustum volume that is no more than a cube who’s plane normals are the following:
// the normal are defined to be pointing inside the volume. near plane: [0, 0, 1] far plane: [0, 0, -1] left plane: [1, 0, 0] right plane: [-1, 0, 0] top plane: [0, -1, 0] bottom plane: [0, 1, 0] |
We now have our viewing frustum in clip space.
Points in 3D:
You may need to perform frustum culling in view or world space. Observing how a point gets transformed from one space to another, to end up in clip space will give us an intuition on how to transform the clip space planes to any other space.
Let’s take a random point Q in 3D space, such that Q = (x, y, z), a view matrix M and a projection matrix P. If we transform this point by the view matrix, we get the point Q’ in view space, and if we then transform Q’ by the projection matrix, we get the point Q” in clip space.
We know from trivial matrix maths, that matrices have the associative property under multiplication, so the following is true:
(QV)P = Q(VP) |
Transforming Q by the concatenated view-projection matrix will give us the point Q’ in clip space.
Now, to go from the point Q in clip space to Q’ in view space, we need to transform it by the inverse of the projection matrix, and to go from there to world space, we need to transform Q’ by the inverse of the view matrix. Given associative property under multiplication, to go from Q in clip space to Q’ in world space we can transform the point Q by the inverse of the view-projection matrix:
(Q(V–¹))(P-¹) = Q((VP)–¹) |
Transforming the planes:
The same rules can be applied to the planes from the clip space viewing frustum. There is one little difference. According to the rules of transformation of normal vectors we must transform the normal vector by the inverse-transpose of the matrix.
Let’s say we want to find the planes in world space. We know that to transform a point from clip space to world space, we transform it by the inverse of the view-projection matrix. According to what was said above, to go from a normal vector in clip space to world space, we must transform it by the inverse-transpose of the inverse of the view-projection matrix.
Given a normal vector N in clip space, we find N’ in world space like this:
N(((VP)-¹)-¹)T = N' |
We can simplify this as we know that the inverse of the transformation matrix that transforms a point from clip space to world space is actualy none other than the view-projection matrix. In other words the inverse-transpose of the inverse of the view-projection matrix, is the transpose of the view-projection matrix:
((VP)-¹)-¹ = VP N(((VP)-¹)-¹)T = N' N(VP)T = N' |
To find the frustum planes in world space, we simply need to transform each clip space plane normal by the transpose of the view-projection matrix.
We can simplify this even further. Because we have so many zeroes in our clip space normals, we don’t need to do a full vector-matrix product.
This shows the process for the near plane N:
N = (0, 0, 1, 1) VP = [ _11, _12, _13, _14, _21, _22, _23, _24, _31, _32, _33, _34, _41, _42, _43, _44 ] N(VP)T = [ 0 * _11 + 0 * _21 + 1 * _31 + 1 * _41 0 * _12 + 0 * _22 + 1 * _32 + 1 * _42 0 * _13 + 0 * _23 + 1 * _33 + 1 * _43 0 * _14 + 0 * _24 + 1 * _34 + 1 * _44 ] N(VP)T = [ _31 + _41 _32 + _42 _33 + _43 _34 + _44 ] |
Using this optimization we find the world space normal by adding the third and forth rows of the transpose of the view-projection matrix. It’s the exact same process for each plane normal:
near = (0, 0, 1, 1) near = (0, 0, 1, 1)(VP)T near = [_31 + _41, _32 + _42, _33 + _43, _34 + _44] far = (0, 0, -1, 1) far = (0, 0, -1, 1)(VP)T far = [-(_31 + _41), -(_32 + _42), -(_33 + _43), -(_34 + _44)] top = (0, -1, 0, 1) top = (0, -1, 0, 1)(VP)T top = [-(_21 + _41), -(_22 + _42), -(_23 + _43), -(_24 + _44)] bottom = (0, 1, 0, 1) bottom = (0, 1, 0, 1)(VP)T bottom = [_21 + _41, _22 + _42, _23 + _43, _24 + _44] left = (1, 0, 0, 1) left = (1, 0, 0, 1)(VP)T left = [_11 + _41, _12 + _42, _13 + _43, _14 + _44] right = (-1, 0, 0, 1) right = (-1, 0, 0, 1)(VP)T right = [-(_11 + _41), -(_12 + _42), -(_13 + _43), -(_14 + _44)] |
A little extra step:
Often, frustum culling tutorials use the transpose of the view-projection matrix to extract the frustum planes. Even though computing the transpose of a matrix is cheap, there is no reason to do such a thing.
Since we are not using vector-matrix multiplication any more, we don’t need to compute the transpose of the view-projection matrix, we can just use the view-projection matrix as it is. If we do this, care must be taken to add the columns of the matrix, and not the rows:
near = (0, 0, 1, 1) near = (0, 0, 1, 1)(VP)T near = [_13 + _14, _23 + _24, _33 + _34, _43 + _44] far = (0, 0, -1, 1) far = (0, 0, -1, 1)(VP)T far = [-(_13 + _14), -(_23 + _24), -(_33 + _34), -(_43 + _44)] top = (0, -1, 0, 1) top = (0, -1, 0, 1)(VP)T top = [-(_12 + _14), -(_22 + _24), -(_32 + _34), -(_42 + _44)] bottom = (0, 1, 0, 1) bottom = (0, 1, 0, 1)(VP)T bottom = [_12 + _14, _22 + _24, _32 + _34, _42 + _44] left = (1, 0, 0, 1) left = (1, 0, 0, 1)(VP)T left = [_11 + _14, _21 + _24, _31 + _34, _41 + _44] right = (-1, 0, 0, 1) right = (-1, 0, 0, 1)(VP)T right = [-(_11 + _14), -(_21 + _24), -(_31 + _34), -(_41 + _44)] |
In order to get the planes in view space, all those steps are exactly the same, but instead of using the view-projection matrix, you just use the projection matrix.
Tags: C-C++, linear algebra, math
Cold – 01
By dominic in C/C++, Demo, Direct3D 11, HLSLA more complete water shader capable of rendering above and underwater. Hardware instancing is used to render the triangles, while a shatter shader is used to brake the icosahedrons in sync with the beat. On the other hand the beat detection is very basic. This (small) demo was coded with my own DX11 renderer/framework.
Cold – 01 from signalsondisplay on Vimeo.
Vignette post process effect
By dominic in HLSLVignetting is an effect that can be used to draw the viewer’s attention to the center of the screen, or to add some spookiness to a scene by darkening the contour of the image, or under water rendering for example.
The effect is very simple, it works by taking the distance from a pixel to the center of the screen, and using that distance to modify the pixel color.
The first line of the PS function samples the texture at the given texture coordinates:
float4 color = g_texture.Sample(g_sampler, v.texcoord); |
Then we need to map the texture coordinates from [0, 1] to [-.5, .5] by simply subtracting .5 from the texture coordinates. This allows us to know how far the texel is from the center:
float2 dist = v.texcoord - 0.5f; |
In this next step, we calculate a factor we will use to scale the color. The closer to the center, the brighter to color, and the further away, the darker it gets. Taking the dot product of the pixel position with itself gives us the squared distance to the center. If we subtract that value to 1, we get a value that starts at 1 when the pixel is at the center of the image and decreases as we move away from the center. Therefore, pixels at the center of the screen will have full intensity:
dist.x = 1 - dot(dist, dist); |
The last line is where we actually scale the sample color depending on the distance from the center. We use the saturate function to clamp the resulting color value to [0, 1]:
1 | color *= saturate(pow(dist.x, 5.5f)); |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 | SamplerState g_sampler : register(s0); Texture2D g_texture : register(t0); struct PS_Input { float4 posH : SV_POSITION; float2 texcoord : TEXCOORD; }; float4 PS( PS_Input v ) : SV_Target { float4 color = g_texture.Sample(g_sampler, v.texcoord); float2 dist = v.texcoord - 0.5f; dist.x = 1 - dot(dist, dist); color *= saturate(pow(dist.x, 5.5f)); return color; } |
Post process running on a solid color background:

No post processing applied:

Same image with vignetting:

Tags: hlsl, postprocessing
Fast vector truncation
By dominic in AS3.0, C/C++, Lua, Maths-physicsVector truncation is used quite a lot in physics when working with forces. Sometimes if a vector’s length is more than a given maximum length you need to truncate it to the max length allowed. It’s always nice for physics code to run as fast as possible, for reasons we all know.
I’ve seen this vector “truncate” function many times, even in books like this one, written by a guy who seems to know what he is talking about.
Said function looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 | void truncate( t_vec2 *v, float maxLen ) { float vLen; float cLen; float angle; vLen = v_length(v); cLen = (maxLen > vLen) ? vLen : maxLen; angle = atan2(v->y, v->x); v->x = cos(angle) * cLen; v->y = sin(angle) * cLen; } |
Assuming the rest of the code looks like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 | typedef struct s_vec2 { float x; float y; } t_vec2; float v_length( t_vec2 *v ) { return sqrt(v->x * v->x + v->y * v->y); } void truncate( t_vec2 *v, float maxLen ) { float vLen; float cLen; float angle; vLen = v_length(v); cLen = (maxLen > vLen) ? vLen : maxLen; angle = atan2(v->y, v->x); v->x = cos(angle) * cLen; v->y = sin(angle) * cLen; } |
Now that’s cool, it’s slow, but it works. If the length of the vector is more than maxLen, the vector is scaled down to maxLen.
The answer is in the previous line: “If the length of the vector is more than maxLen, the vector is scaled down to maxLen“.
Exactly, it’s SCALED down, so why do the users/coders of that function need a call to atan2, cos and sin ? Why not just scale the vector by the ratio of maxLen of the current vector length ?
By doing just that, the function now looks like this:
1 2 3 4 5 6 7 8 9 | void truncate( t_vec2 *v, float maxLen ) { float s; s = maxLen / v_length(v); s = (s < 1.0f) ? 1.0f : s; v->x *= s; v->y *= s; } |
The result is less lines of code, less instructions and more speed. It’s also more logical to do it this way.
I’ve compared both functions in C, C++, Lua and AS3.0, and I get a 2x speedup, that’s pretty good.
C++/HLSL memory alignment
By dominic in C/C++, Direct3D 11, HLSLTo paraphrase wikipedia, data alignment is the way data is arranged and accessed in computer memory. If you don’t know what and how data alignment works, you should have a read at those links:
Data structure alignment
About data alignment
How this works in HLSL is that data is packed into 4 bytes chunks in such a way that it does not cross a 16 bytes boundary. EsseEssentiallyat this means is that any data type cannot overlap between two chunks of 16 bytes. As an example, a float3 will not be stored like this in memory:

In this case, the float3 is layed out over two chunks, HLSL will not let this happen.
On the other hand, what will happen if you have a structure with two float3 variables is that the first float3 will be placed at memory location 0 (to keep it simple), and the second float3 will not be place right after it at location 3, because this would mean that the y and z components of the second vector would be in a different chunk than the x component. What HLSL does is place the second vector at an aligned memory location, 4, in our case:

So with this in mind, if you have a structure that looks like this:
struct s { float a; float b; float3 c; }; // total memory: 4 bytes + 4 bytes + 3 * 4 bytes = 20 bytes // aligned memory: 4 bytes + 4 bytes + 8 "padding" bytes + 4 * 4 bytes = 32 bytes |
Variables a and b will be placed one after the other in memory, and c will be placed 8 bytes after b. An extra 4 bytes are added at the end of the structure to pad it in order to keep whatever comes after s from crossing a 16 bytes boundary:

The dark gray boxes show where the padding bytes are placed.
The HLSL documentation has more information and examples on packing rules.
This can be (and is) usefull usefulportant) when using constant buffers in C/C++. When you create a constant buffer and send it to your shader(s), you need to make sure the data inside the buffer is packed in a way that makes sense to HLSL.
Given a struct that looks like this:
struct light { float a; float s; float4 color; }; |
We now know that it will look like this in memory:

Once again, the gray boxes are the 8 bytes used for padding.
Now, on the C++ side, the struct holding the data for the buffer would look like this:
struct light { float a; float s; XMFLOAT4 color; }; |
While this looks pretty nice and easy, it’s actualy actuallyLet’s say you set the light parameters to be:
light.a = .46f; light.s = .52f; light.color = XMFLOAT4(1, 1, 1, 1); |
Using the Visual Studio debugger, let’s see what the structure looks like in memory:
0x0013F7C0 0.46000001 0.51999998 1.0000000 1.0000000 0x0013F7D0 1.0000000 1.0000000 -1.0737418e+008 -1.0737418e+008 |
As excpecteexpectede the two floating point values followed by our 4 component vector.
If we compare that to the HLSL structure we notice that they do not look the same in memory:
C++:

HLSL:

ThereforThereforee buffer is sent to the GPU, the shader is reading from something that looks like this:
light.a = .46f; // bytes 0 to 3 light.s = .52f; // bytes 4 to 7 // skip 8 padding bytes from the first chunk light.color = float4(1, 1, ?, ?); // read 16 remaining bytes from the second chunk |
This happens because both structures are not aligned the same way. Event more so, the C struct is not aligned at all. So the shader reads the a and s floats from the buffer, and then moves along, ignoring the next 8 bytes to access the float4.
When reading the values for the colour, colorU will gather the second chunk and will build a float4 based on the blue and alpha channels for the red and green components, and will fill the actual blue and alpha components with whatever is in the remaining 8 bytes. This shows that care needs to be taken when using constant buffers.
In order to avoid this type of issue there are a few options:
- try and see if you can store the data in an order that will automatiautomatically/li>
- you can pad the structure accordingly on your own. For example, instead of having two floats, a vector4 and another float, having the 3 floats one after the other, then adding 4 bytes for padding and having the vector4 be the last member in the structure will save some memory and align the struct.
- use __declspec(align(#)) to force a member to be packed on a 16 bytes boundary.
- As mentioned by Guille in his comment, you can also use the aligned XMFLOAT*A types, they will avoid you having to use __declspec(align(16)).
In our case, changing the order of the members will always result in the struct being 32 bytes or more, so we can use __declspec(align(16)) to force the XMFLOAT4 to be packed in it’s own chunk, just like what HLSL does to the float4:
struct light { float a; float s; __declspec(align(16)) XMFLOAT4 color; }; |
And now, as excpecteexpectedmory looks like this:
// two floating point values + 8 bytes of padding 0x002DF750 0.46000001 0.51999998 0.00000000 0.00000000 // nicely aligned vector4 0x002DF760 1.0000000 1.0000000 1.0000000 1.0000000 |
This forces the buffer to be packed like HLSL excpectsexpectsfixes the yellow light problem.
Using Brownian motion to create a maze
By dominic in C/C++I needed to generate random mazes for a small project I was working on, and while there are already some nice maze generation algorithms out there that where written by people way smarter than me, I decided to write my own algorithm (partly because I was too lazy to implement an existing one).
The idea is very basic but it works out just fine if you don’t need the most complex and precise maze.
The algorithm starts out by filling the maze (a 2D array) with walls, this just puts an ‘x’ in each grid cell.
In a second pass, it uses simple Brownian motion to “walk” the maze setting the cell it is at with a ‘.’ character, effectively drawing some corridors.
The cool part is that Brownian motion is random, and this creates a bunch of corridors that may or may not end up anywhere close to the maze exit.
In a last pass, a number of waypoints are randomly spawned across the maze. Those points are then sorted based on their x coordinate, and finally a path is drawn from the entrance to the first waypoint, then from the first point to the second, so and so forth until the exit, thus creating a path that leads out of the maze.
This very simple process ends up creating a maze that is guaranteed to have a path from start to finish, as well as adding a bunch of misleading paths that lead to nowhere.
The entrance is marked 'S', and the exit is marked 'E': xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x.........................................................xx x........xxxx.xxx...xxxxxxxxxxxxxx........................xx x.xxx....xxxx.xxx...xxxxxxxxxxxxx.........................xx x.xxx....xxxx.xxx...xxxxxxxxxxxxxxxxxxxxxx.x.xxxxx.xxxxxx.xx x.xxx.................................xxxx.x.xxxxx.xxxxxx.xx x..............xx.x.xxxxxxxx.xxxxxxxx.xxxx.x.xxxxx.xxxxxx.xx x.xx...x.xxxx.xxx.x.xxxxxxxx.xxxx..............xxx.xxxxxx.xx x.xx.x.x.xxxx.xxx.x.xxxxxxxx.x............................xx x.xx.x.x.xxxx.xxx.x.xxxxxxxx.x.xxxxx.xxxxxxx.x.xxx.xxxxx..xx x.xxxx.x.xxxx.xxx......xxxxx.x.xxxxx.xxxxxxx.x.xxx.xxxxx..xx x.xxx..................xxxxx.x.xxxxx.xxx...................E S.xxxx.x.xxx..x.xx...x.xxxxx.x.xxxxx.xxxxxxx.x.xxx.xxxx.x.xx xxxxxx.x.xxx..x.xx...x.xxxxx.x.xxxxx.xxxxxxx.xxxxx.xxxx.x.xx xxxxxx.x.............x.xxxxx.x.xxxxx.xxxxxxx.xxxxx.xxxx.x.xx xxxxxx.x.xxx..x.xxx..........x.xxxxxxxxxxxxx.xxxxx.xxxx.x.xx xxxxxx...........xx.xx.xxxxx.x.xxxxxxxxxxxxx.xxxxx.xxxx.x.xx xxxxxx.x...................x.x............xx.xxxxx.xx.....xx x.xxxx.x.xxx.........................xxxxxxx.xxxxx.xxxx.x.xx x..xxx.x.x......xxx.xxxxxxxx.x.xxxxxxxxxxxxx.xxxxx.xxxx.x.xx x.................x.xxxxxxxx.x.xxxxxxxxxxxxx.......xxxx.x.xx x...xx.x.x.x..xxx.x.xxxxxxxx.x.xxxxxxxxxxxxxxxxxxx.xxxx.x.xx x.................................xxxxxxxxxxxxxxxx........xx xxxxxx.....xxxxxxxxxxxxxxxxx...xxxxxxxxxxxxxxxxxxx........xx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx In this maze, an additional pass is used to try and remove some of the large blocs of walls, for better or for worse: xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx x.....................................................x.x.xx x..x.x.x.x.x.x.x.x.x.x.x...x.x..x...x.x.x..xxx.x.xxx.x.xx..x x.x.x...x.......x.x.x.x.xxx.x.xx.x.......x....x....x.xxxx..x x.x..xxx.xxxxxx......x..........x.........x.......x......x.x x.x..x.x.x.x.xx..xxxx.xxxxxxxxxx.x.xxxx.x..xxx.x.x.x.xxxx..x x.x..xxx.xxxxxx..xxxx...........x.....x.x..x.x.x.x.x.x.xx..x x.x..xxx.xx....x.................x.xxxx.x..xxx.x.x.x..x.x..x x..x....x.xxxxx..xxxxxxxxxxxxxxxx...x.x.xx.x.x.x.x.x.xxxx..x x.x..xxx.x.xx...x..........x.x.x.x.x.xx.xx.xxx.x.x.x.x.xx..x x..x.........xxx.xxxxxxxxxx.x.x.x...x..........x.x.x.x.xx..x x.x.x.xx....x.xxx.x.x.x.x.xx.x.x.x.x.xx.x.x....x.x.x.x.xx..x S.x...xx.....xx.xx.x.x.x.x.xx.x.x...x.x..x.xx.x..x.x.x.xx..x x.x.x.xx...........................x.xx..x.x...x.x.x.x.xx..x x..x..xx..x....x......xx.x.x.xx.x.x.x.x..x.xx....x.x.x...x.x x............xx.xxxxxxxxx.x.x.xx.x.x.xx..x.x.x.x.x.x.......x x.xx..xx.x...xx.........xx.x.xx.x.x.x.x..x.xx.x.xx.x.x.xx..x x.xx.x.xxx.xx.x.......x.x.x.xx.x.x.xxxx..x.xxxxxxx.xxxxxx..x x.xx.xx.xx.x.x.xxxxxxx.xxx.xxx.xxxxx...x..x.......x.....x..x x.xx.x.x.x.x.xx.x.x.xx.x.xx........x.xx..x.xxxxxxx.xxxxxx..x x.xx.xx.xx.x...xxxxxxx.x.xxxxxxxxxxx.xx..x........x........x x.xx.xxxxx..x.x.......x.x...........x..x.xx.xxxxxx.xxxxxx..x x...x.....x..x.x.x.x.x.x.x.x.x.x.x.x.x..x..x...............E x...........................................xxxxxxxxxxxxxxxx xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
However, the end result is not the sexiest maze you will ever see, it’s a quick and dirty solution, for a nicer and more precise maze, you might want to check out a smarter algorithm. You can check out the source at github.
Tags: algorithms, C++
Saving DX11 shared resources
By dominic in C/C++, Direct3D 11Using shared resources in Direct3D can be useful for a number of reasons. They are pretty useful when it comes to saving a texture to file, or more commonly, saving a screen capture of the running application.
Saving a texture to a file in Direct3D is very easy, there is a function that does it for you: D3DX11SaveTextureToFile. That’s pretty cool, let’s check out what parameters the function takes. As per the documentation, the function takes a pointer to the device context, a pointer to the texture you want to save and an image file format and file name. That sounds sweet, but there is one problem, if you take a look at the first parameter, it’s a pointer to the device’s immediate context, the same one that is being used for draw calls and stuff. That’s a bummer, because if you are trying to save a large texture or you are using a “complex” image file format, this can kill the frame rate because the D3DX11SaveTextureToFile hogs the device context and no rendering can be done until the function returns. Not cool.
The obvious way to fix this would be to move the texture saving code over to it’s own thread. While that works, we also need to create another device and device context so that we can use the secondary context to save the texture, letting the main context deal with the rendering without spending any time in the D3DX11SaveTextureToFile function.
Ok, so it looks like this time we are going to be able to save our big ass texture without destroying the frame rate in the process. Well not quite, because we can’t save a texture created by the main device, with out “secondary” device context. This is where shared resources kick in.
What we need to do is create a shared resource and a shared handle to that resource with our main device in order to open that shared resource in our worker thread, grab the texture and save it with our second device context.
Before we get into some code, a quick disclaimer: this is a kind of “works for me” scenario. I’ve been using this snippet for all of 24 hours, so it might crash/break/deadlock/explode under certain circumstances, I don’t know, try before you buy.
The code for this is actually pretty simple. In the main thread, use the D3D11_RESOURCE_MISC_SHARED MiscFlag in the texture description if you want to make the texture a shared resource.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 | // running in the main thread HRESULT h = WaitForSingleObject(g_mutex, INFINITE); if (h == WAIT_OBJECT_0) { // g_hasFrame is shared with the worker // if we don't have a frame waiting to be saved if (!g_hasFrame) { // we can create a shared texture this->canShareFrame = true; } } ReleaseMutex(g_mutex); if (this->canShareFrame) { ID3D11Texture2D *backBuffer = 0; this->_swapChain->GetBuffer(0, __uuidof(ID3D11Texture2D), reinterpret_cast<void **>(&backBuffer)); D3D11_TEXTURE2D_DESC td; backBuffer->GetDesc(&td); // flags our resource as "shared" td.MiscFlags = D3D11_RESOURCE_MISC_SHARED; this->_device->CreateTexture2D(&td, 0, &g_tex); this->_context->CopyResource(g_tex, backBuffer); IDXGIResource *dxgiResource = 0; g_tex->QueryInterface(__uuidof(IDXGIResource), reinterpret_cast<void **>(&dxgiResource)); // create our shared handle that will be used by the worker to get the shared texture dxgiResource->GetSharedHandle(&g_shaderHandle); dxgiResource->Release(); backBuffer->Release(); this->canShareFrame = false; } h = WaitForSingleObject(g_mutex, INFINITE); if (h == WAIT_OBJECT_0) { // let the worker know there is a texture to be saved if (!this->canShareFrame) g_hasFrame = true; } ReleaseMutex(g_mutex); |
Now, the worker can get that texture from the shared handle, and save it to a file with the second device and device context:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 | // in the worker thread // device_2 and context_2 are the "secondary" device and context bool imgSaved = false; ID3D11Texture2D *texture = 0; HRESULT h = WaitForSingleObject(g_mutex, INFINITE); if (h == WAIT_OBJECT_0) { // check to see if there is an image to save if (g_hasFrame) needsSaved = true; } ReleaseMutex(g_mutex); if (needsSaved) { // use the shared handle to open the texture device_2->OpenSharedResource(g_shaderHandle, __uuidof(ID3D11Texture2D), (LPVOID*)&texture); if (texture) { D3DX11SaveTextureToFile(context_2, texture, D3DX11_IFF_PNG, "image.png"); texture->Release(); needsSaved = false; } } h = WaitForSingleObject(g_mutex, INFINITE); if (h == WAIT_OBJECT_0) { // let the main thread know it can share a new texture if (!needsSaved) g_hasFrame = false; } ReleaseMutex(g_mutex); |
Et voilà, you can now save big ass texture to files without slowing down the application.
A better way to do it, might be to share a texture every frame, and add it to a queue, and let the worker(s) save the textures from the queue. This would allow you to record what the application is rendering. Note that there will be an increase in memory consumption and that you might drop a frame or two from time to time.
Here again, using the queued approach works for me but might not be bullet proof. I might post the code for the queued version if I ever get the time to refactor it and make it look like it wasn’t coded by an epileptic monkey.
This is worth reading too.
Tags: C++, direct3d, multithreading
Water shader
By dominic in C/C++, Direct3D 11, HLSLFirst render of a water shader. I will be adding atmospheric scattering as well as nicer reflection/refraction.
HLSL water shader from signalsondisplay on Vimeo.
A few articles worth reading:
Fresnel equations.
Schlick’s approximation.
Tags: direct3d, hlsl, water rendering
