PrimeProfiler: Cross-Language Prime Algorithm Performance Analysis

matt wolffe, 2025

A comprehensive performance analysis framework comparing identical prime number algorithms across multiple programming languages using Intel PIN binary instrumentation.

Key Findings

Algorithm Performance Validation

Algorithmic optimizations work as expected: 6k±1 pattern > √n optimization > brute force
Hand-optimized algorithms consistently outperform brute force, even with aggressive compiler optimizations
Original hypothesis confirmed: compiler optimizations cannot overcome poor algorithmic choices

Cross-Language Compiler Analysis

Fortran's gfortran shows superior optimization of brute force algorithms (9.08x better than C)
Fortran achieves 92.3% instruction reduction in naive algorithms (16M → 1.2M instructions)
C's GCC excels at optimizing mathematical functions like sqrt() operations
Different compilers have fundamentally different optimization strengths

Implemented Languages

Fully Analyzed

C (GCC): Complete performance profiling across all optimization levels
Fortran 2008 (gfortran): Complete performance profiling with optimization analysis

Implemented but Not Profiled

Rust (rustc/LLVM): Algorithms implemented and verified correct, but Intel PIN binary instrumentation incompatible with LLVM optimization strategy

Algorithms Implemented

Three prime checking algorithms of increasing algorithmic sophistication:

naive_prime: Pure brute force trial division (2 to n-1) - slowest approach
naive_prime_squares: Square root optimization (2 to √n) - significantly faster
less_naive_prime: 6k±1 pattern optimization - fastest algorithmic approach

Research Question: Can aggressive compiler optimizations make brute force competitive with hand-optimized algorithms?

Each algorithm implemented identically across all languages with 27 comprehensive test cases.

Performance Results

Algorithm Performance Hierarchy (as expected)

less_naive_prime (6k±1): ~250K-650K cycles/call - fastest
naive_prime_squares (√n): ~300K-1.4M cycles/call - middle
naive_prime (brute force): ~7M-71M cycles/call - slowest

Cross-Language Performance Leaders

Algorithm	C vs Fortran Winner	Performance Advantage	Best Configuration
`less_naive_prime`	Fortran	2.17x faster	Fortran -O3
`naive_prime_squares`	C	2.78x faster	C -O0
`naive_prime`	Fortran	9.08x faster	Fortran -O3

Key Insight: While Fortran dramatically optimizes brute force better than C, the 6k±1 hand-optimized algorithm still outperforms even the best compiler-optimized brute force by 30x.

Project Structure

PrimeProfiler/
├── src/          # C implementations
├── fortran/      # Fortran 2008 implementations  
├── rust/         # Rust implementations (complete but not profiled)
├── pin-tools/    # Intel PIN instrumentation tools
├── scripts/      # Profiling automation scripts
├── makefiles/    # Build configurations for all languages
├── results/      # Performance analysis results
└── C_vs_Fortran_Performance_Analysis.md  # Detailed analysis report

Build System

Supported Optimization Levels

C: -O0, -O3, -Ofast
Fortran: -O0, -O3, -Ofast
Rust: dev, release, fast (builds successfully, profiling limited)

Quick Start

# Build all C variants
make -f makefiles/Makefile.opt0
make -f makefiles/Makefile.opt3
make -f makefiles/Makefile.optfast

# Build all Fortran variants
make -f makefiles/Makefile.fortran-opt0
make -f makefiles/Makefile.fortran-opt3
make -f makefiles/Makefile.fortran-optfast

# Build Rust variants (for testing)
make -f makefiles/Makefile.rust-dev
make -f makefiles/Makefile.rust-release
make -f makefiles/Makefile.rust-fast

Profiling Framework

Intel PIN Binary Instrumentation

Precise cycle counting: Hardware-level performance measurement
Function-level granularity: Exact instruction and memory access counting
Language-aware symbol resolution: Handles Fortran name mangling
Address range instrumentation: Captures exact function boundaries

Automated Analysis Scripts

scripts/profile_single_optimization.sh: Individual optimization level profiling
scripts/profile_all_optimizations.sh: Complete C optimization analysis
scripts/profile_fortran_optimization.sh: Fortran-specific profiling with symbol detection
scripts/profile_c_vs_fortran.sh: Cross-language comparison framework

Test Suite

Each algorithm tested against 27 comprehensive test cases:

15 prime numbers: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47
12 composite numbers: 1, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20

Identical test cases ensure fair cross-language comparison.

Technical Requirements

Intel PIN Framework: Binary instrumentation (tested with PIN 3.30)
GCC: C compilation and optimization
gfortran: Fortran 2008 compilation
rustc/cargo: Rust compilation (optional, for algorithm verification)
Linux x86_64: Development and testing platform

Key Technical Insights

Compiler Optimization Strategies

gfortran: Aggressive loop transformation and memory optimization
GCC: Conservative, predictable optimization maintaining structure
LLVM/rustc: Optimization so aggressive it breaks binary instrumentation tools

Performance Analysis Methodology Limitations

Traditional binary instrumentation works excellently for GCC-family compilers
LLVM-optimized binaries require alternative profiling approaches
Compiler choice affects both performance AND analysis feasibility

Results and Analysis

Complete performance analysis available in:

C_vs_Fortran_Performance_Analysis.md: Comprehensive technical report
results/c_vs_fortran/: Raw profiling data and CSV results
analyze_results.py: Analysis automation script

Future Work

Alternative Rust profiling: Investigate LLVM-compatible profiling tools
Additional languages: Extend to other compilers (Clang, Intel ICC)
Algorithm expansion: Test with more complex numerical algorithms
Hardware analysis: Explore different CPU architectures

Project History

Originally conceived as an exploration of Intel PIN binary instrumentation on naive prime algorithms posited as a challenge by a professor at JMU. The work began with curiosity about the 6k±1 optimization pattern for prime checking and expanded to reveal fundamental differences in compiler optimization strategies.

For detailed methodology, complete results, and technical analysis, see C_vs_Fortran_Performance_Analysis.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

PrimeProfiler: Cross-Language Prime Algorithm Performance Analysis

Key Findings

Algorithm Performance Validation

Cross-Language Compiler Analysis

Implemented Languages

Fully Analyzed

Implemented but Not Profiled

Algorithms Implemented

Performance Results

Algorithm Performance Hierarchy (as expected)

Cross-Language Performance Leaders

Project Structure

Build System

Supported Optimization Levels

Quick Start

Profiling Framework

Intel PIN Binary Instrumentation

Automated Analysis Scripts

Test Suite

Technical Requirements

Key Technical Insights

Compiler Optimization Strategies

Performance Analysis Methodology Limitations

Results and Analysis

Future Work

Project History

About

Uh oh!

Releases

Packages

Contributors 2

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 51 Commits
fortran		fortran
makefiles		makefiles
pin-tools		pin-tools
results		results
rust		rust
scripts		scripts
src		src
.gitignore		.gitignore
C_vs_Fortran_Performance_Analysis.md		C_vs_Fortran_Performance_Analysis.md
Makefile		Makefile
PERFORMANCE_ANALYSIS.md		PERFORMANCE_ANALYSIS.md
README.md		README.md
analyze_results.py		analyze_results.py
dynamic_profile.out		dynamic_profile.out
show_results.py		show_results.py

mfwolffe/PrimeProfiler

Folders and files

Latest commit

History

Repository files navigation

PrimeProfiler: Cross-Language Prime Algorithm Performance Analysis

Key Findings

Algorithm Performance Validation

Cross-Language Compiler Analysis

Implemented Languages

Fully Analyzed

Implemented but Not Profiled

Algorithms Implemented

Performance Results

Algorithm Performance Hierarchy (as expected)

Cross-Language Performance Leaders

Project Structure

Build System

Supported Optimization Levels

Quick Start

Profiling Framework

Intel PIN Binary Instrumentation

Automated Analysis Scripts

Test Suite

Technical Requirements

Key Technical Insights

Compiler Optimization Strategies

Performance Analysis Methodology Limitations

Results and Analysis

Future Work

Project History

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Uh oh!

Languages

Packages