What is PyC? PyC is an experimental compiler designed to translate a subset of Python-like code into executable machine code. It leverages the LLVM infrastructure for code generation and optimization, aiming to provide a lightweight and educational tool for understanding compiler design. Additionally, PyC explores CUDA integration for GPU acceleration, targeting AI and scientific computing workloads.
PyC follows a traditional compiler pipeline, broken into several stages:
Lexer: Converts the input source code into tokens, handling Python-style indentation and keywords. Parser: Analyzes the tokens to build an Abstract Syntax Tree (AST), representing the program's structure.
Manages variable scopes and tracks their usage throughout the code, preparing for IR generation.
Transforms the AST into LLVM Intermediate Representation (IR), a low-level, platform-agnostic format.
Applies various LLVM optimization passes to improve the efficiency of the generated code.
Compiles the optimized IR into machine code, either through Just-In-Time (JIT) compilation for immediate execution or by generating object files for later use.
Provides detailed error and warning messages, helping developers identify and fix issues in their code.
Aims to accelerate certain compiler tasks, like tokenization, using GPU parallelism.
- Lexer and Parser: Handle the syntax of the Python-like language, including indentation-based blocks.
- Symbol Table: Ensures variables are correctly scoped and accessible during compilation.
- IR Generation: Bridges the gap between high-level code and machine instructions.
- Backend: Manages code generation and execution, with support for multithreading.
- Error Handling: Enhances the development experience with informative diagnostics.
- CUDA Kernels: Explore the potential of GPU acceleration in compiler tasks.
- PyC can currently compile basic Python-like code, including:
Variable assignments (e.g., x = 5) Expressions (e.g., x + 42) If statements with indentation-based blocks
It also provides:
Detailed error reporting Basic optimizations Experimental GPU tokenization
Future Directions The project aims to expand its capabilities to support:
Full Python syntax, including loops and functions Advanced optimizations for performance Functional CUDA integration for accelerated compilation Enhanced symbol table for complex types and functions
PyC is not just a compiler but a platform for learning and experimentation in compiler design, optimization, and GPU computing.