Skip to content

Exploring CPU/GPU memory hierarchies, cache modeling, DRAM simulation, GPU programming with CUDA, and near-data processing using PIMeval-PIMbench - CS 6501 CPU/GPU Memory Systems @ UVA Spring '25

License

Notifications You must be signed in to change notification settings

huygnguyen04/cpu-gpu-ndp-work

Repository files navigation

CPU/GPU Memory & Near-Data Processing Assignments

This repository contains my assignments from CS 6501 - CPU/GPU Memory & Near-Data Processing @ UVA Spring '25 under Prof. Kevin Skadron, focusing on CPU/GPU memory architecture, cache design, DRAM simulation, GPU programming, and near-data processing (PIM). Each assignment applies industry-standard tools to analyze, simulate, and optimize real-world memory and processing behavior.

📁 Assignments Overview

  • Assignment PDF: HW1 Assignment
  • Report: HW1 Report
  • Analysis of memory and compute bottlenecks across multiple matrix/vector kernels using Intel Advisor's Roofline model. The assignment involved:
    • Profiling 10 distinct matrix/vector implementations with varying optimization levels
    • Generating Roofline plots to visualize performance bottlenecks
    • Measuring INTOPS/sec and arithmetic intensity across different implementations
    • Identifying the ridge point where code transitions from memory-bound to compute-bound

Tools: Intel Advisor, C++, Roofline visualization


  • Assignment PDF: HW2 Assignment
  • Report: HW2 Report
  • Systematic exploration of cache design tradeoffs using CACTI cache simulator. Key aspects:
    • Parameter sweeps across cache sizes (16KB to 8MB), associativity (1-way to 16-way)
    • Analysis of access time, area, energy consumption, and data efficiency
    • Examination of technology node impact (65nm vs. 32nm) on cache performance
    • Determination of optimal configurations for both L1 and LLC caches

Tools: CACTI 7.0, Bash scripting, data visualization


  • Assignment PDF: HW3 Assignment
  • Report: HW3 Report
  • Comprehensive simulation of various DRAM technologies under different memory access patterns:
    • Comparison of DDR4, LPDDR4, GDDR6, and HBM2 under random, streaming, and mixed patterns
    • Analysis of bandwidth scaling, energy consumption, and latency characteristics
    • Detailed examination of command-level activity distribution (ACT, PRE, RD/WR)
    • DRAM selection recommendations for power-constrained vs. performance-driven scenarios

Tools: DRAMsim3, Python for data processing, JSON-to-CSV conversion


  • Assignment PDF: HW4 Assignment
  • Report: HW4 Report
  • Implementation and optimization of parallel algorithms using NVIDIA CUDA:
    • Development of matrix addition, matrix multiplication, and parallel reduction kernels
    • Implementation of shared memory optimizations and thread cooperative strategies
    • Performance evaluation using CUDA events and nvprof profiling
    • Comparative analysis between optimized GPU implementations and CPU baselines

Tools: NVIDIA CUDA Toolkit, nvcc compiler, nvprof, CUDA events timing


  • Assignment PDF: HW5 Assignment
  • Report: HW5 Report
  • Exploration of near-data processing using UVA's PIMeval-PIMbench simulator:
    • Implementation of RMS Norm and Layer Norm algorithms for the PIM architecture
    • Performance analysis across varying HBM configurations (1-32 computing banks)
    • Energy efficiency analysis of PIM vs. traditional CPU implementations
    • Evaluation of parallelism scalability and resource utilization in PIM context

Tools: PIMeval-PIMbench, C++ for kernel implementation, OpenMP, HBM modeling


🧰 Technical Environment

  • Intel Advisor: Roofline modeling and performance characterization
  • CACTI 7.0: Cache architecture simulation and power/area analysis
  • DRAMsim3: DRAM timing and energy simulation
  • NVIDIA CUDA Toolkit: GPU kernel development and profiling
  • PIMeval-PIMbench: Near-memory processing simulation framework
  • Supporting tools: Python for data analysis, visualization libraries, shell scripting

📌 Repository Structure

Each assignment folder contains:

  • Source code and implementations
  • Configuration files and execution scripts
  • Results and analysis visualizations
  • Detailed technical reports

🔍 License

This repository is licensed under the MIT License.

About

Exploring CPU/GPU memory hierarchies, cache modeling, DRAM simulation, GPU programming with CUDA, and near-data processing using PIMeval-PIMbench - CS 6501 CPU/GPU Memory Systems @ UVA Spring '25

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published