Cheetah

Cheetah is a small experimental DSL for CUDA/C++ metaprogramming in Python, allowing users to:

Define kernels and helper functions in Python using a typed, expression-based API
Render readable CUDA/C++ with correct scoping and naming
Express index relationships and scoped constraints for structured codegen
Emit inline PTX safely with typed operands

Its features include: - A strongly-typed core IR with error handling - Clean codegen with support for variable naming, comments, and more - Inline PTX with constraints, volatile, clobbers - Builtin support for exporting kernels to python, including PyTorch

This allows for kernels written in python, imported into python, and then tested in python, all in the same file! Compared to other

On top of cheetah, users can build powerful libraries. Some of the ones we have built: - A scoped dimension algebra and index computation library, called index_tools - A set of cuda utilities for

Cheetah is experimental and evolving; the API may change. See Troubleshooting for common pitfalls.

Quick links

Getting started: Installation and quickstart
Guides: Kernels, Index tools, Inline PTX, API Cookbook
Internals: Overview, Dims solver
Troubleshooting: Common issues

Contributing

Use golden tests to stabilize codegen output. Open issues/PRs with a description, minimal example, and proposed tests. Docs are built from docs/ via MkDocs Material.

Some of these pages were written by the sleep deprived or GPT 5, but all have been reviewed by a well rested author. Feel to email anin@mit.edu with questions or clarifications!