Skip to content

Commit 027f52c

Browse files
committed
first commit
0 parents  commit 027f52c

File tree

1 file changed

+350
-0
lines changed

1 file changed

+350
-0
lines changed

README.md

Lines changed: 350 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,350 @@
1+
# GPU Programming 101 🚀
2+
3+
A comprehensive hands-on course for learning GPU programming with CUDA and HIP, covering fundamental concepts through advanced optimization techniques.
4+
5+
## 🎯 Course Overview
6+
7+
This course provides practical, hands-on experience with GPU programming, covering everything from basic parallel computing concepts to advanced optimization techniques. Each module contains theory, working code examples, and exercises.
8+
9+
## 📚 Course Structure
10+
11+
### Module 1: Foundations of GPU Computing ✅
12+
**Status**: Complete
13+
**Duration**: 4-6 hours
14+
**Level**: Beginner
15+
16+
**Topics Covered**:
17+
- GPU architecture and SIMT execution model
18+
- CUDA and HIP programming fundamentals
19+
- Memory management and data transfers
20+
- Basic parallel execution patterns
21+
- Debugging and optimization basics
22+
23+
**[📁 Go to Module 1](modules/module1/)**
24+
25+
### Module 2: Multi-Dimensional Data Processing ✅
26+
**Status**: Complete
27+
**Duration**: 6-8 hours
28+
**Level**: Beginner-Intermediate
29+
30+
**Topics Covered**:
31+
- Multidimensional grid organization
32+
- Thread mapping to data structures
33+
- Image processing kernels
34+
- Matrix multiplication algorithms
35+
- Advanced memory management
36+
37+
**[📁 Go to Module 2](modules/module2/)**
38+
39+
### Module 3: GPU Architecture and Execution Models ✅
40+
**Status**: Complete
41+
**Duration**: 6-8 hours
42+
**Level**: Intermediate
43+
44+
**Topics Covered**:
45+
- GPU architecture deep dive
46+
- Warp scheduling and SIMD hardware
47+
- Control divergence and optimization
48+
- Resource partitioning and occupancy
49+
- Advanced parallel patterns
50+
51+
**[📁 Go to Module 3](modules/module3/)**
52+
53+
### Module 4: Advanced GPU Programming Techniques ✅
54+
**Status**: Complete
55+
**Duration**: 8-10 hours
56+
**Level**: Intermediate-Advanced
57+
58+
**Topics Covered**:
59+
- Multi-GPU programming and scalability
60+
- Asynchronous execution with streams
61+
- Dynamic parallelism techniques
62+
- Advanced memory optimization
63+
- Cross-platform development strategies
64+
65+
**[📁 Go to Module 4](modules/module4/)**
66+
67+
### Module 5: Performance Engineering and Optimization ✅
68+
**Status**: Complete
69+
**Duration**: 6-8 hours
70+
**Level**: Advanced
71+
72+
**Topics Covered**:
73+
- Performance profiling and analysis
74+
- Memory bandwidth optimization
75+
- Kernel optimization strategies
76+
- Bottleneck identification and resolution
77+
- Production performance engineering
78+
79+
**[📁 Go to Module 5](modules/module5/)**
80+
81+
### Module 6: Fundamental Parallel Algorithms 🚧
82+
**Status**: Planned
83+
**Duration**: 8-10 hours
84+
**Level**: Intermediate-Advanced
85+
86+
**Topics**:
87+
- Convolution and filtering algorithms
88+
- Stencil computations
89+
- Histogram and atomic operations
90+
- Reduction patterns and optimizations
91+
- Prefix sum (scan) algorithms
92+
93+
### Module 7: Advanced Algorithmic Patterns 🚧
94+
**Status**: Planned
95+
**Duration**: 8-10 hours
96+
**Level**: Advanced
97+
98+
**Topics**:
99+
- Merge and sorting algorithms
100+
- Sparse matrix computations
101+
- Graph traversal algorithms
102+
- Dynamic programming on GPU
103+
- Load balancing techniques
104+
105+
### Module 8: Domain-Specific Applications 🚧
106+
**Status**: Planned
107+
**Duration**: 10-12 hours
108+
**Level**: Advanced
109+
110+
**Topics**:
111+
- Deep learning inference kernels
112+
- Scientific computing applications
113+
- Image and signal processing
114+
- Monte Carlo simulations
115+
- Numerical methods optimization
116+
117+
### Module 9: Production GPU Programming 🚧
118+
**Status**: Planned
119+
**Duration**: 6-8 hours
120+
**Level**: Expert
121+
122+
**Topics**:
123+
- Cluster computing with MPI
124+
- Dynamic parallelism patterns
125+
- Performance regression testing
126+
- Cross-platform deployment
127+
- Future GPU architectures
128+
129+
## 🛠️ Prerequisites
130+
131+
### Hardware Requirements
132+
- **NVIDIA GPU**: GeForce GTX 1060 or better, or Tesla/Quadro equivalent
133+
- **OR AMD GPU**: RX 580 or better, with ROCm support
134+
- **Memory**: 8GB+ system RAM, 4GB+ GPU memory recommended
135+
136+
### Software Requirements
137+
- **Operating System**: Linux (recommended), Windows 10/11, or macOS
138+
- **CUDA Toolkit**: 11.0+ for NVIDIA GPUs
139+
- **ROCm**: 4.0+ for AMD GPUs
140+
- **Compiler**: GCC 7+, Clang 8+, or MSVC 2019+
141+
- **Build Tools**: Make, CMake (optional)
142+
143+
### Programming Knowledge
144+
- **C/C++**: Intermediate level (pointers, memory management, basic OOP)
145+
- **Command Line**: Basic terminal/shell usage
146+
- **Math**: Linear algebra basics helpful but not required
147+
148+
## 🚀 Quick Start
149+
150+
### Option 1: Docker (Recommended)
151+
Perfect for getting started without installing CUDA/ROCm on your host system.
152+
153+
```bash
154+
git clone https://github.com/yourusername/gpu-programming-101.git
155+
cd gpu-programming-101
156+
157+
# Test your Docker setup
158+
./docker/scripts/test.sh
159+
160+
# Build development container
161+
./docker/scripts/build.sh --all
162+
163+
# Auto-detect GPU and start appropriate container
164+
./docker/scripts/run.sh --auto
165+
166+
# Inside container - test your GPU
167+
/workspace/test-gpu.sh
168+
169+
# Start learning!
170+
cd modules/module1 && cat README.md
171+
```
172+
173+
### Option 2: Native Installation
174+
175+
```bash
176+
git clone https://github.com/yourusername/gpu-programming-101.git
177+
cd gpu-programming-101
178+
179+
# Check system requirements
180+
# For NVIDIA systems
181+
nvidia-smi && nvcc --version
182+
183+
# For AMD systems
184+
rocm-smi && hipcc --version
185+
186+
# Start with Module 1
187+
cd modules/module1
188+
cat README.md # Read module overview
189+
cd examples
190+
make # Build examples
191+
./04_device_info_cuda # Check your GPU
192+
```
193+
194+
### 🐳 Docker Benefits
195+
- **No host setup required**: Complete development environment in containers
196+
- **Multi-platform**: Test both CUDA and HIP code easily
197+
- **Consistent environment**: Same setup across different systems
198+
- **Integrated tools**: Profilers, debuggers, and Jupyter Lab included
199+
- **Easy cleanup**: Remove containers when done
200+
201+
**[📖 Full Docker Guide](docker/README.md)**
202+
203+
### 4. Follow Learning Path
204+
Each module contains:
205+
- **README.md** - Module overview and learning objectives
206+
- **content.md** - Comprehensive theory and explanations
207+
- **examples/** - Working code examples with build system
208+
- **exercises/** - Additional practice problems (when available)
209+
210+
## 📁 Project Structure
211+
212+
```
213+
gpu-programming-101/
214+
├── README.md # This file
215+
├── SUMMARY.md # Detailed curriculum overview
216+
├── Makefile # Project-wide build system
217+
└── modules/
218+
├── module1/ # Heterogeneous Data Parallel Computing
219+
│ ├── README.md # Module overview
220+
│ ├── content.md # Theory and explanations
221+
│ └── examples/ # Working code examples
222+
│ ├── Makefile
223+
│ ├── README.md
224+
│ └── *.cu, *.cpp # Source files
225+
├── module2/ # Multidimensional Grids and Data
226+
│ └── [Coming Soon]
227+
├── module3/ # Compute Architecture and Scheduling
228+
│ └── [Coming Soon]
229+
└── [Additional Modules]
230+
```
231+
232+
## 🎓 Learning Path Recommendations
233+
234+
### For Complete Beginners
235+
1. **Start with Module 1** - Focus on understanding basic concepts
236+
2. **Practice extensively** - Modify examples and experiment
237+
3. **Use debugging tools** - Learn proper error handling
238+
4. **Progress gradually** - Master each concept before moving on
239+
240+
### For Experienced Programmers
241+
1. **Skim Module 1 theory** - Focus on GPU-specific concepts
242+
2. **Run all examples** - Understand performance characteristics
243+
3. **Jump to specific topics** - Use course as reference material
244+
4. **Contribute improvements** - Help expand the course content
245+
246+
### For Researchers/Scientists
247+
1. **Focus on relevant modules** - Skip graphics-specific content
248+
2. **Emphasize performance** - Pay special attention to optimization
249+
3. **Explore libraries** - Learn cuBLAS, cuFFT, Thrust, etc.
250+
4. **Real-world applications** - Adapt examples to your domain
251+
252+
## 🔧 Build System
253+
254+
### Project-wide Build
255+
```bash
256+
# Build all available modules
257+
make all
258+
259+
# Build specific module
260+
make module1
261+
262+
# Clean all builds
263+
make clean
264+
265+
# Run tests
266+
make test
267+
```
268+
269+
### Module-specific Build
270+
```bash
271+
cd modules/module1/examples
272+
make # Build all examples
273+
make vector_add_cuda # Build specific example
274+
make test # Run module tests
275+
```
276+
277+
## 🐛 Troubleshooting
278+
279+
### Common Setup Issues
280+
281+
**"nvcc: command not found"**
282+
```bash
283+
export PATH=/usr/local/cuda/bin:$PATH
284+
export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
285+
```
286+
287+
**"No CUDA-capable device found"**
288+
- Check `nvidia-smi` shows your GPU
289+
- Verify driver installation
290+
- Ensure GPU is not in exclusive/prohibited mode
291+
292+
**"HIP compilation failed"**
293+
```bash
294+
# For AMD GPUs
295+
export HIP_PLATFORM=amd
296+
297+
# For NVIDIA GPUs with HIP
298+
export HIP_PLATFORM=nvidia
299+
```
300+
301+
### Getting Help
302+
- **Module Issues**: Check module-specific README files
303+
- **Code Problems**: Look at debugging examples in Module 1
304+
- **Performance**: Use profiling tools covered in later modules
305+
- **Community**: Create an issue in the repository for help
306+
307+
## 📖 Additional Resources
308+
309+
### Official Documentation
310+
- [CUDA Programming Guide](https://docs.nvidia.com/cuda/cuda-c-programming-guide/)
311+
- [HIP Programming Guide](https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-GUIDE.html)
312+
- [ROCm Documentation](https://rocmdocs.amd.com/)
313+
314+
### Books
315+
- "CUDA by Example" - Sanders, Kandrot
316+
- "Professional CUDA C Programming" - Cheng, Grossman, McKercher
317+
- "GPU Computing Gems" - NVIDIA Corporation
318+
319+
### Online Resources
320+
- [NVIDIA Developer Zone](https://developer.nvidia.com/)
321+
- [AMD Developer Central](https://developer.amd.com/)
322+
- [GPU Computing Community](https://forums.developer.nvidia.com/)
323+
324+
## 🤝 Contributing
325+
326+
We welcome contributions! Please follow standard open source contribution practices.
327+
328+
### Ways to Contribute
329+
- **Add examples** for existing modules
330+
- **Create new modules** following the established structure
331+
- **Improve documentation** and fix typos
332+
- **Add exercises** and solutions
333+
- **Port examples** between CUDA and HIP
334+
- **Performance optimizations** and benchmarks
335+
336+
## 📝 License
337+
338+
This course is released under an open source license. Feel free to use, modify, and distribute for educational purposes.
339+
340+
## 🏆 Acknowledgments
341+
342+
- Thanks to the CUDA and ROCm development communities
343+
- Inspired by hands-on learning approaches in parallel computing education
344+
- Built with contributions from GPU programming educators and practitioners
345+
346+
---
347+
348+
**Happy GPU Programming!** 🚀⚡️
349+
350+
*Last Updated: September 2025*

0 commit comments

Comments
 (0)