module volt.llvm.abi.sysvamd64

Provides a solution for implementing structs passed by value.

This module could be called 'dothethingthatclangdoestocfunctions.volt', but that's a lot longer to type. Technically it's implementing the System V AMD64 ABI.

At any rate, here's how it breaks down.

ABI Fundamentals

There are three ways that a struct can be passed to a C function.

(Note that while AMD64 processors have other ways to pass floating point values, the ABI spec either passes them in XMM* registers, or as MEMORY values.)

Classification

For primitive types, classification is fairly simple. (Note that this is a gross over-simplification of the SysV ABI. We're only including what Volta cares about here.)

LLVM will handle the normal parameters, hence why the above is so simplified. But LLVM doesn't handle aggregate types (structs and unions), and left to its own devices, will pass them all as MEMORY, but that's not what the C ABI expects.

The alignment of an aggregate is equal to the strictest alignment (i.e. the largest) of one of its constituent members.

The classifications have priority. If one type has a classification of MEMORY in a given group, then the whole group has MEMORY. If there is an INTEGER member in a group of FLOATs, then that group is an INTEGER.

If the size of the aggregate is less than or equal to a single u64 (the ABI documentation uses the term eightbytes, but I'll stick to Volt terminology here) then that type is treated as normal parameter. If it's MEMORY, then it is treated as a normal struct, but if it is a FLOAT, then it's just an LLVM float or double. (Note that floating point values only have two sizes as far as we're concerned, 4 and 8 bytes, and they can't coexist in a group with an integer). If it's an integer it's an LLVM iN, where N is the largest value <= 64, rounded up to alignment boundaries.

I'll repeat the last point, it's important. It is treated as a normal parameter. Remember what I said about LLVM handling normal parameters? It applies here. If registers are exhausted, they still get decomposed to the float/integer. (Unless they're MEMORY, obviously)

If the aggregate is larger than 16 bytes (two u64s), then the whole thing is MEMORY.

Any remaining aggregates are considered in two eight byte (u64) chunks. Those chunks are classified separately*. That is to say, if you have the struct

struct S {
    d: f64;
    i: i8;
    f: f32;
}

Then that struct gets broken down into two eightbyte chunks for consideration: [[dddddddd]][[i ][ffff]] Remember what I said about alignment and precedence? That applies here. So the first chunk is considered as a group, and is a FLOAT that fits in a double. The second chunk is an INTEGER and a FLOAT, so it's decomposed into double, i64 when being passed into a C function. The second chunk may only be 5 bytes, but due to alignment, it rounds up to 8. And as a FLOAT can't exist with any other type, it becomes an INTEGER; hence, i8+f32 (in this instance) == i64.

If an eightbyte chunk is composed of two f32s, then it is vectorized to LLVM (<2 x float>) This is considered to take up one FLOAT register.

Register exhaustion

So the above isn't too complicated, once you get down to it. Go over the functions parameters left to right, if you hit an aggregate, evaluate it according to the above rules, and modify the function parameters and calls accordingly.

But it's not so simple. There are a limited number of registers for parameters to be placed into. And once a multiparameter aggregate is exploded, LLVM doesn't know what type it was originally and can't push it onto the stack (the same doesn't apply for aggregates that become a single INTEGER/FLOAT. They contribute to exhaustion, but they're always decomposed.) Registers are allocated for parameters left-to-right, and once an aggregate can't be entirely allocated, it has to become MEMORY, every chunk.

So remember, 6 integer registers, 8 float registers. For every parameter, subtract how many registers it'll take. The last example would take one integer, and one float register. If you go to allocate a register group, and it is at zero registers, then the entire aggregate, every eightbyte chunk, becomes MEMORY. Any integer size is considered to take a single register, so if you start a function with six by-value structs with a single u8 field, any complex aggregate that would become INTEGER in one of its chunks would have to be MEMORY.

Individual float parameters are not vectorized, and are considered to take a FLOAT register each. That is to say, a function starting (float, float, float, float has 4 float registers remaining, while one that starts ({float, float}, {float, float} has 6 remaining.

Argument coercion.

Single argument ({u64} -> u64, {f64} -> double etc )

This one is fairly simple. Where as before, the code generated is

a = load TheStruct from TheStruct*
call func(a)

It becomes

ptr = gep TheStruct* 0 0  // TheStruct*
load = i64 from i64* ptr

Basically, func(str) becomes func(*(cast(i64*)&str)).

Vectorised floats

So if { f32, f32 } is vectorized into a <2 x float> parameter, the code for call is similar to the previous case, but slightly different.

The original call becomes

ptr = bitcast TheStruct* to <2 float>*
a = load <2 x float> from ptr
func(a)

Which is the same concept as before, but with an explicit cast. I expect this method would work for the other other types too, so you might want to investigate using this path for the above case too.

Other decompositions

So this is an aggregate that is bigger than 8 bytes, but less than or equal to 16.

Step one is to create an aggregate that has the two members, these will be the two parameters we're decomposing down to.

If your initial aggregate is (say) { u32, u32, u32, u32 }, then create an aggregate of { u64, u64 }.
If your initial aggregate is (say) { u32, u32, f32, f32 }, then create an aggregate of { u64, <2 x float> }.

Then the remainder of the method is basically a combination of the last two.

sptr = bitcast TheStruct* to { u64, <2 x float> }*   // or w/e
aptr = gep sptr 0 0   // u64*
a = load u64* aptr  // u64
bptr = gep sptr 0 1   // <2 x float>*
b = load <2 x float>* bptr  // <2 x float>
func(a, b);

Code Map

//! Provides a solution for implementing structs passed by value.
module volt.llvm.abi.sysvamd64;


enum AMD64_SYSV_INTEGER_REGISTERS;
enum AMD64_SYSV_FLOAT_REGISTERS;
enum AMD64_SYSV_MAX_COERCIBLE_SZ;
enum AMD64_SYSV_WORD_SZ;
enum AMD64_SYSV_HALFWORD_SZ;
enum NOT_FLOAT;
enum ONE_FLOAT;
enum TWO_FLOATS;
enum ONE_DOUBLE;

enum Classification
{
	Memory,
	Integer,
	Float,
	CoercedStructSingle,
	CoercedStructDouble,
}

fn sysvAmd64AbiCoerceParameters(state: State, ft: ir.FunctionType, retType: LLVMTypeRef, params: LLVMTypeRef[]) { }
fn consumeRegisters(state: State, types: LLVMTypeRef[], integerRegisters: i32, floatRegisters: i32) { }
fn classifyType(state: State, type: LLVMTypeRef, structTypes: LLVMTypeRef[]) Classification { }
fn classifyStructType(state: State, type: LLVMTypeRef, structTypes: LLVMTypeRef[]) Classification { }
fn sysvAmd64AbiCoerceArguments(state: State, ct: ir.CallableType, args: LLVMValueRef[]) { }
fn sysvAmd64AbiCoercePrologueParameter(state: State, llvmFunc: LLVMValueRef, func: ir.Function, ct: ir.CallableType, val: LLVMValueRef, index: size_t, offset: size_t) CoercedStatus { }