Skip to content

Internals overview

Cheetah implements a two-level IR pipeline and a small solver for index algebra.

Data flow

  1. User code in cheetah.api builds a typed core IR inside a global Context.
  2. The function decorator wraps the Python function to capture argument/local names and feed statements into the IR (via operator overloading and context managers).
  3. core_to_codegen.render:
  4. Gathers transitive dependencies of selected entry points
  5. Assigns export and internal names
  6. Analyses inlining and side-effects
  7. Lowers core IR → codegen IR and pretty-prints to CUDA/C++

IR layers

  • Core IR (cheetah.internal.core_ir): typed nodes, control blocks, validation
  • Codegen IR (cheetah.internal.codegen_ir): target-oriented expressions and statements with precedence-aware printing

Index tools

  • cheetah.index_tools models dimension sizes, scoped equations, and runtime indices. A small algebraic solver ensures equations are well-formed and can be turned into linear index computations.

Inline PTX

  • Inline assembly supports positional/named operands with type-checked constraints, volatile flag, and memory clobbers.