nsight

The MNIST classification problem is a fundamental machine learning task that involves recognizing handwritten digits (0- 9) from a dataset of 70,000 grayscale images (28x28 pixels each). It serves as a benchmark for evaluating machine learning models, particularly neural networks.

benchmarking deep-learning parallel-computing cuda mnist neural-networks high-performance-computing gpu-acceleration profiling shared-memory openacc performance-optimization c-cpp nsight tensor-cores cuda-streams pinned-memory

Updated Sep 12, 2025
Cuda

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

Star

University Project for "Computer Architecture" course (MSc Computer Engineering @ University of Pisa). Implementation of a Parallelized Nearest Neighbor Upscaler using CUDA.

gpu nvidia nvidia-cuda nvidia-gpu nsight image-upscaling parallelized nearest-neighbor-algorithm nsight-compute

Updated Dec 29, 2023
C

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

🎬 Explore GPU training efficiency with FP32 vs FP16 in this modular lab, utilizing Tensor Core acceleration for deep learning insights.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Sep 6, 2025
Python

K-Wu / HET_nsight_utils

Star

cuda nvidia trace gspread profiling ncu nsight nsys

Updated Aug 12, 2024
Python

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

Star

A reproducible GPU benchmarking lab that compares FP16 vs FP32 training on MNIST using PyTorch, CuPy, and Nsight profiling tools. This project blends performance engineering with cinematic storytelling—featuring NVTX-tagged training loops, fused CuPy kernels, and a profiler-driven README that narrates the GPU’s inner workings frame by frame.

performance-engineering deep-learning reproducible-research cuda pytorch fp16 cupy mixed-precision nsight gpu-benchmark nvtx fp32 tensor-core

Updated Sep 5, 2025
Python

Improve this page

Add a description, image, and links to the nsight topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the nsight topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

nsight

Here are 14 public repositories matching this topic...

BrainTwister / docker-devel-env

sharcnet / vscode-hpc

mnicely / computeWorks_examples

Tyler-Hilbert / CUDA-LinearRegression

HROlive / Fundamentals-of-Accelerated-Computing-with-CUDA-C-Cpp

kayush2O6 / nsight-for-remote-gpu-server

salehjg / batch-matmul-cuda

Kulasus / APPS-2.0

Juanx65 / yolov8test

Umer-Farooq-CS / MNIST-Classification

itm-unipi / Parallelized-Nearest-Neighbor-Upscaler

yasser1-0 / FP16-vs-FP32-A-GPU-Lab-in-Frames

K-Wu / HET_nsight_utils

Dartayous / FP16-vs-FP32-A-GPU-Lab-in-Frames

Improve this page

Add this topic to your repo