Cheetah
Cheetah is a small experimental DSL for CUDA/C++ metaprogramming in Python, allowing users to:
- Define kernels and helper functions in Python using a typed, expression-based API
- Render readable CUDA/C++ with correct scoping and naming
- Express index relationships and scoped constraints for structured codegen
- Emit inline PTX safely with typed operands
Its features include: - A strongly-typed core IR with error handling - Clean codegen with support for variable naming, comments, and more - Inline PTX with constraints, volatile, clobbers - Builtin support for exporting kernels to python, including PyTorch
This allows for kernels written in python, imported into python, and then tested in python, all in the same file! Compared to other
On top of cheetah, users can build powerful libraries. Some of the ones we have built: - A scoped dimension algebra and index computation library, called index_tools - A set of cuda utilities for
Cheetah is experimental and evolving; the API may change. See Troubleshooting for common pitfalls.
Quick links
- Getting started: Installation and quickstart
- Guides: Kernels, Index tools, Inline PTX, API Cookbook
- Internals: Overview, Dims solver
- Troubleshooting: Common issues
Contributing
Use golden tests to stabilize codegen output. Open issues/PRs with a description, minimal example, and proposed tests. Docs are built from docs/
via MkDocs Material.
Some of these pages were written by the sleep deprived or GPT 5, but all have been reviewed by a well rested author. Feel to email anin@mit.edu with questions or clarifications!