Skip to content

Inline PTX

Cheetah supports inline PTX with positional and named operands, volatility, and memory clobbers.

Basics

(out.val,) = ch.asm("add.u32 $0, $a, $b;", ty.u32, a=a, b=b)
  • $0 is the first output (typed u32), $a and $b are named inputs.

Mixed positional and named

# named
(out.val,) = ch.asm("add.u32 $out, $a, $b;", out=ty.u32, a=a, b=b)
# positional
(out.val,) = ch.asm("add.u32 $0, $1, $2;", ty.u32, a, b)
# mixed
(out.val,) = ch.asm("add.u32 $0, $a, $b;", ty.u32, a=a, b=b)

Volatile and clobbers

v = ch.asm("mov.u32 $0, %clock;", ch.AsmVolatile(), ty.u32)
ch.asm("st.u32 [$0], $1;", ch.AsmMemoryClobber(), out, a)

Vector outputs

v4 = ch.asm(
    "ld.global.v4.u32 {$0, $1, $2, $3}, [$p];",
    ty.u32, ty.u32, ty.u32, ty.u32,
    p=p,
)
acc = v4[0] + v4[1] + v4[2] + v4[3]

Escaping and placeholders

  • Use $$ to emit a literal $ and %% for %.
  • Output and input placeholder sets must be disjoint; e.g., don’t reuse $a as both out and in.

Errors

  • Unsupported primitive types as operands raise errors with the offending type.
  • Mismatched operand counts or unknown placeholders are reported precisely.