CPU/GPU Memory & Near-Data Processing Assignments

This repository contains my assignments from CS 6501 - CPU/GPU Memory & Near-Data Processing @ UVA Spring '25 under Prof. Kevin Skadron, focusing on CPU/GPU memory architecture, cache design, DRAM simulation, GPU programming, and near-data processing (PIM). Each assignment applies industry-standard tools to analyze, simulate, and optimize real-world memory and processing behavior.

📁 Assignments Overview

HW1: Roofline Model Analysis

Assignment PDF: HW1 Assignment
Report: HW1 Report
Analysis of memory and compute bottlenecks across multiple matrix/vector kernels using Intel Advisor's Roofline model. The assignment involved:
- Profiling 10 distinct matrix/vector implementations with varying optimization levels
- Generating Roofline plots to visualize performance bottlenecks
- Measuring INTOPS/sec and arithmetic intensity across different implementations
- Identifying the ridge point where code transitions from memory-bound to compute-bound

Tools: Intel Advisor, C++, Roofline visualization

HW2: Cache Design with CACTI

Assignment PDF: HW2 Assignment
Report: HW2 Report
Systematic exploration of cache design tradeoffs using CACTI cache simulator. Key aspects:
- Parameter sweeps across cache sizes (16KB to 8MB), associativity (1-way to 16-way)
- Analysis of access time, area, energy consumption, and data efficiency
- Examination of technology node impact (65nm vs. 32nm) on cache performance
- Determination of optimal configurations for both L1 and LLC caches

Tools: CACTI 7.0, Bash scripting, data visualization

HW3: DRAM Simulation with DRAMsim3

Assignment PDF: HW3 Assignment
Report: HW3 Report
Comprehensive simulation of various DRAM technologies under different memory access patterns:
- Comparison of DDR4, LPDDR4, GDDR6, and HBM2 under random, streaming, and mixed patterns
- Analysis of bandwidth scaling, energy consumption, and latency characteristics
- Detailed examination of command-level activity distribution (ACT, PRE, RD/WR)
- DRAM selection recommendations for power-constrained vs. performance-driven scenarios

Tools: DRAMsim3, Python for data processing, JSON-to-CSV conversion

HW4: GPU Programming with CUDA

Assignment PDF: HW4 Assignment
Report: HW4 Report
Implementation and optimization of parallel algorithms using NVIDIA CUDA:
- Development of matrix addition, matrix multiplication, and parallel reduction kernels
- Implementation of shared memory optimizations and thread cooperative strategies
- Performance evaluation using CUDA events and nvprof profiling
- Comparative analysis between optimized GPU implementations and CPU baselines

Tools: NVIDIA CUDA Toolkit, nvcc compiler, nvprof, CUDA events timing

HW5: PIM Programming with PIMeval-PIMbench

Assignment PDF: HW5 Assignment
Report: HW5 Report
Exploration of near-data processing using UVA's PIMeval-PIMbench simulator:
- Implementation of RMS Norm and Layer Norm algorithms for the PIM architecture
- Performance analysis across varying HBM configurations (1-32 computing banks)
- Energy efficiency analysis of PIM vs. traditional CPU implementations
- Evaluation of parallelism scalability and resource utilization in PIM context

Tools: PIMeval-PIMbench, C++ for kernel implementation, OpenMP, HBM modeling

🧰 Technical Environment

Intel Advisor: Roofline modeling and performance characterization
CACTI 7.0: Cache architecture simulation and power/area analysis
DRAMsim3: DRAM timing and energy simulation
NVIDIA CUDA Toolkit: GPU kernel development and profiling
PIMeval-PIMbench: Near-memory processing simulation framework
Supporting tools: Python for data analysis, visualization libraries, shell scripting

📌 Repository Structure

Each assignment folder contains:

Source code and implementations
Configuration files and execution scripts
Results and analysis visualizations
Detailed technical reports

🔍 License

This repository is licensed under the MIT License.

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
DRAM-simulation		DRAM-simulation
cache-design-cacti		cache-design-cacti
cuda-programming		cuda-programming
pim-programming		pim-programming
roofline-model-analysis		roofline-model-analysis
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

CPU/GPU Memory & Near-Data Processing Assignments

📁 Assignments Overview

HW1: Roofline Model Analysis

HW2: Cache Design with CACTI

HW3: DRAM Simulation with DRAMsim3

HW4: GPU Programming with CUDA

HW5: PIM Programming with PIMeval-PIMbench

🧰 Technical Environment

📌 Repository Structure

🔍 License

About

Uh oh!

Releases

Packages

Languages

License

huygnguyen04/cpu-gpu-ndp-work

Folders and files

Latest commit

History

Repository files navigation

CPU/GPU Memory & Near-Data Processing Assignments

📁 Assignments Overview

HW1: Roofline Model Analysis

HW2: Cache Design with CACTI

HW3: DRAM Simulation with DRAMsim3

HW4: GPU Programming with CUDA

HW5: PIM Programming with PIMeval-PIMbench

🧰 Technical Environment

📌 Repository Structure

🔍 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages