To paraphrase wikipedia, data alignment is the way data is arranged and accessed in computer memory. If you don’t know what and how data alignment works, you should have a read at those links:
Data structure alignment
About data alignment
How this works in HLSL is that data is packed into 4 bytes chunks in such a way that it does not cross a 16 bytes boundary. EsseEssentiallyat this means is that any data type cannot overlap between two chunks of 16 bytes. As an example, a float3 will not be stored like this in memory:

In this case, the float3 is layed out over two chunks, HLSL will not let this happen.
On the other hand, what will happen if you have a structure with two float3 variables is that the first float3 will be placed at memory location 0 (to keep it simple), and the second float3 will not be place right after it at location 3, because this would mean that the y and z components of the second vector would be in a different chunk than the x component. What HLSL does is place the second vector at an aligned memory location, 4, in our case:

So with this in mind, if you have a structure that looks like this:
struct s
{
float a;
float b;
float3 c;
};
// total memory: 4 bytes + 4 bytes + 3 * 4 bytes = 20 bytes
// aligned memory: 4 bytes + 4 bytes + 8 "padding" bytes + 4 * 4 bytes = 32 bytes
Variables a and b will be placed one after the other in memory, and c will be placed 8 bytes after b. An extra 4 bytes are added at the end of the structure to pad it in order to keep whatever comes after s from crossing a 16 bytes boundary:

The dark gray boxes show where the padding bytes are placed.
The HLSL documentation has more information and examples on packing rules.
This can be (and is) usefull usefulportant) when using constant buffers in C/C++. When you create a constant buffer and send it to your shader(s), you need to make sure the data inside the buffer is packed in a way that makes sense to HLSL.
Given a struct that looks like this:
struct light
{
float a;
float s;
float4 color;
};
We now know that it will look like this in memory:

Once again, the gray boxes are the 8 bytes used for padding.
Now, on the C++ side, the struct holding the data for the buffer would look like this:
struct light
{
float a;
float s;
XMFLOAT4 color;
};
While this looks pretty nice and easy, it’s actualy actuallyLet’s say you set the light parameters to be:
light.a = .46f;
light.s = .52f;
light.color = XMFLOAT4(1, 1, 1, 1);
Using the Visual Studio debugger, let’s see what the structure looks like in memory:
0x0013F7C0 0.46000001 0.51999998 1.0000000 1.0000000
0x0013F7D0 1.0000000 1.0000000 -1.0737418e+008 -1.0737418e+008
As excpecteexpectede the two floating point values followed by our 4 component vector.
If we compare that to the HLSL structure we notice that they do not look the same in memory:
C++:

HLSL:

ThereforThereforee buffer is sent to the GPU, the shader is reading from something that looks like this:
light.a = .46f; // bytes 0 to 3
light.s = .52f; // bytes 4 to 7
// skip 8 padding bytes from the first chunk
light.color = float4(1, 1, ?, ?); // read 16 remaining bytes from the second chunk
This happens because both structures are not aligned the same way. Event more so, the C struct is not aligned at all. So the shader reads the a and s floats from the buffer, and then moves along, ignoring the next 8 bytes to access the float4.
When reading the values for the colour, colorU will gather the second chunk and will build a float4 based on the blue and alpha channels for the red and green components, and will fill the actual blue and alpha components with whatever is in the remaining 8 bytes. This shows that care needs to be taken when using constant buffers.
In order to avoid this type of issue there are a few options:
- try and see if you can store the data in an order that will automatiautomatically/li>
- you can pad the structure accordingly on your own. For example, instead of having two floats, a vector4 and another float, having the 3 floats one after the other, then adding 4 bytes for padding and having the vector4 be the last member in the structure will save some memory and align the struct.
- use __declspec(align(#)) to force a member to be packed on a 16 bytes boundary.
- As mentioned by Guille in his comment, you can also use the aligned XMFLOAT*A types, they will avoid you having to use __declspec(align(16)).
In our case, changing the order of the members will always result in the struct being 32 bytes or more, so we can use __declspec(align(16)) to force the XMFLOAT4 to be packed in it’s own chunk, just like what HLSL does to the float4:
struct light
{
float a;
float s;
__declspec(align(16)) XMFLOAT4 color;
};
And now, as excpecteexpectedmory looks like this:
// two floating point values + 8 bytes of padding
0x002DF750 0.46000001 0.51999998 0.00000000 0.00000000
// nicely aligned vector4
0x002DF760 1.0000000 1.0000000 1.0000000 1.0000000
This forces the buffer to be packed like HLSL excpectsexpectsfixes the yellow light problem.
Tags: C++, direct3d, hlsl, memory