Inline PTX
Cheetah supports inline PTX with positional and named operands, volatility, and memory clobbers.
Basics
(out.val,) = ch.asm("add.u32 $0, $a, $b;", ty.u32, a=a, b=b)
$0
is the first output (typedu32
),$a
and$b
are named inputs.
Mixed positional and named
# named
(out.val,) = ch.asm("add.u32 $out, $a, $b;", out=ty.u32, a=a, b=b)
# positional
(out.val,) = ch.asm("add.u32 $0, $1, $2;", ty.u32, a, b)
# mixed
(out.val,) = ch.asm("add.u32 $0, $a, $b;", ty.u32, a=a, b=b)
Volatile and clobbers
v = ch.asm("mov.u32 $0, %clock;", ch.AsmVolatile(), ty.u32)
ch.asm("st.u32 [$0], $1;", ch.AsmMemoryClobber(), out, a)
Vector outputs
v4 = ch.asm(
"ld.global.v4.u32 {$0, $1, $2, $3}, [$p];",
ty.u32, ty.u32, ty.u32, ty.u32,
p=p,
)
acc = v4[0] + v4[1] + v4[2] + v4[3]
Escaping and placeholders
- Use
$$
to emit a literal$
and%%
for%
. - Output and input placeholder sets must be disjoint; e.g., don’t reuse
$a
as both out and in.
Errors
- Unsupported primitive types as operands raise errors with the offending type.
- Mismatched operand counts or unknown placeholders are reported precisely.