Skip to content

Cheetah

Overview

Internals overview

Cheetah implements a two-level IR pipeline and a small solver for index algebra.

Data flow

User code in cheetah.api builds a typed core IR inside a global Context.
The function decorator wraps the Python function to capture argument/local names and feed statements into the IR (via operator overloading and context managers).
core_to_codegen.render:
Gathers transitive dependencies of selected entry points
Assigns export and internal names
Analyses inlining and side-effects
Lowers core IR → codegen IR and pretty-prints to CUDA/C++

IR layers

Core IR (cheetah.internal.core_ir): typed nodes, control blocks, validation
Codegen IR (cheetah.internal.codegen_ir): target-oriented expressions and statements with precedence-aware printing

Index tools

cheetah.index_tools models dimension sizes, scoped equations, and runtime indices. A small algebraic solver ensures equations are well-formed and can be turned into linear index computations.

Inline PTX

Inline assembly supports positional/named operands with type-checked constraints, volatile flag, and memory clobbers.