This repository contains my assignments from CS 6501 - CPU/GPU Memory & Near-Data Processing @ UVA Spring '25 under Prof. Kevin Skadron, focusing on CPU/GPU memory architecture, cache design, DRAM simulation, GPU programming, and near-data processing (PIM). Each assignment applies industry-standard tools to analyze, simulate, and optimize real-world memory and processing behavior.
- Assignment PDF: HW1 Assignment
- Report: HW1 Report
- Analysis of memory and compute bottlenecks across multiple matrix/vector kernels using Intel Advisor's Roofline model. The assignment involved:
- Profiling 10 distinct matrix/vector implementations with varying optimization levels
- Generating Roofline plots to visualize performance bottlenecks
- Measuring INTOPS/sec and arithmetic intensity across different implementations
- Identifying the ridge point where code transitions from memory-bound to compute-bound
Tools: Intel Advisor, C++, Roofline visualization
- Assignment PDF: HW2 Assignment
- Report: HW2 Report
- Systematic exploration of cache design tradeoffs using CACTI cache simulator. Key aspects:
- Parameter sweeps across cache sizes (16KB to 8MB), associativity (1-way to 16-way)
- Analysis of access time, area, energy consumption, and data efficiency
- Examination of technology node impact (65nm vs. 32nm) on cache performance
- Determination of optimal configurations for both L1 and LLC caches
Tools: CACTI 7.0, Bash scripting, data visualization
- Assignment PDF: HW3 Assignment
- Report: HW3 Report
- Comprehensive simulation of various DRAM technologies under different memory access patterns:
- Comparison of DDR4, LPDDR4, GDDR6, and HBM2 under random, streaming, and mixed patterns
- Analysis of bandwidth scaling, energy consumption, and latency characteristics
- Detailed examination of command-level activity distribution (ACT, PRE, RD/WR)
- DRAM selection recommendations for power-constrained vs. performance-driven scenarios
Tools: DRAMsim3, Python for data processing, JSON-to-CSV conversion
- Assignment PDF: HW4 Assignment
- Report: HW4 Report
- Implementation and optimization of parallel algorithms using NVIDIA CUDA:
- Development of matrix addition, matrix multiplication, and parallel reduction kernels
- Implementation of shared memory optimizations and thread cooperative strategies
- Performance evaluation using CUDA events and nvprof profiling
- Comparative analysis between optimized GPU implementations and CPU baselines
Tools: NVIDIA CUDA Toolkit, nvcc compiler, nvprof, CUDA events timing
- Assignment PDF: HW5 Assignment
- Report: HW5 Report
- Exploration of near-data processing using UVA's PIMeval-PIMbench simulator:
- Implementation of RMS Norm and Layer Norm algorithms for the PIM architecture
- Performance analysis across varying HBM configurations (1-32 computing banks)
- Energy efficiency analysis of PIM vs. traditional CPU implementations
- Evaluation of parallelism scalability and resource utilization in PIM context
Tools: PIMeval-PIMbench, C++ for kernel implementation, OpenMP, HBM modeling
- Intel Advisor: Roofline modeling and performance characterization
- CACTI 7.0: Cache architecture simulation and power/area analysis
- DRAMsim3: DRAM timing and energy simulation
- NVIDIA CUDA Toolkit: GPU kernel development and profiling
- PIMeval-PIMbench: Near-memory processing simulation framework
- Supporting tools: Python for data analysis, visualization libraries, shell scripting
Each assignment folder contains:
- Source code and implementations
- Configuration files and execution scripts
- Results and analysis visualizations
- Detailed technical reports
This repository is licensed under the MIT License.