matt wolffe, 2025
A comprehensive performance analysis framework comparing identical prime number algorithms across multiple programming languages using Intel PIN binary instrumentation.
- Algorithmic optimizations work as expected: 6k±1 pattern > √n optimization > brute force
- Hand-optimized algorithms consistently outperform brute force, even with aggressive compiler optimizations
- Original hypothesis confirmed: compiler optimizations cannot overcome poor algorithmic choices
- Fortran's gfortran shows superior optimization of brute force algorithms (9.08x better than C)
- Fortran achieves 92.3% instruction reduction in naive algorithms (16M → 1.2M instructions)
- C's GCC excels at optimizing mathematical functions like sqrt() operations
- Different compilers have fundamentally different optimization strengths
- C (GCC): Complete performance profiling across all optimization levels
- Fortran 2008 (gfortran): Complete performance profiling with optimization analysis
- Rust (rustc/LLVM): Algorithms implemented and verified correct, but Intel PIN binary instrumentation incompatible with LLVM optimization strategy
Three prime checking algorithms of increasing algorithmic sophistication:
naive_prime: Pure brute force trial division (2 to n-1) - slowest approachnaive_prime_squares: Square root optimization (2 to √n) - significantly fasterless_naive_prime: 6k±1 pattern optimization - fastest algorithmic approach
Research Question: Can aggressive compiler optimizations make brute force competitive with hand-optimized algorithms?
Each algorithm implemented identically across all languages with 27 comprehensive test cases.
less_naive_prime(6k±1): ~250K-650K cycles/call - fastestnaive_prime_squares(√n): ~300K-1.4M cycles/call - middlenaive_prime(brute force): ~7M-71M cycles/call - slowest
| Algorithm | C vs Fortran Winner | Performance Advantage | Best Configuration |
|---|---|---|---|
less_naive_prime |
Fortran | 2.17x faster | Fortran -O3 |
naive_prime_squares |
C | 2.78x faster | C -O0 |
naive_prime |
Fortran | 9.08x faster | Fortran -O3 |
Key Insight: While Fortran dramatically optimizes brute force better than C, the 6k±1 hand-optimized algorithm still outperforms even the best compiler-optimized brute force by 30x.
PrimeProfiler/
├── src/ # C implementations
├── fortran/ # Fortran 2008 implementations
├── rust/ # Rust implementations (complete but not profiled)
├── pin-tools/ # Intel PIN instrumentation tools
├── scripts/ # Profiling automation scripts
├── makefiles/ # Build configurations for all languages
├── results/ # Performance analysis results
└── C_vs_Fortran_Performance_Analysis.md # Detailed analysis report
- C:
-O0,-O3,-Ofast - Fortran:
-O0,-O3,-Ofast - Rust:
dev,release,fast(builds successfully, profiling limited)
# Build all C variants
make -f makefiles/Makefile.opt0
make -f makefiles/Makefile.opt3
make -f makefiles/Makefile.optfast
# Build all Fortran variants
make -f makefiles/Makefile.fortran-opt0
make -f makefiles/Makefile.fortran-opt3
make -f makefiles/Makefile.fortran-optfast
# Build Rust variants (for testing)
make -f makefiles/Makefile.rust-dev
make -f makefiles/Makefile.rust-release
make -f makefiles/Makefile.rust-fast- Precise cycle counting: Hardware-level performance measurement
- Function-level granularity: Exact instruction and memory access counting
- Language-aware symbol resolution: Handles Fortran name mangling
- Address range instrumentation: Captures exact function boundaries
scripts/profile_single_optimization.sh: Individual optimization level profilingscripts/profile_all_optimizations.sh: Complete C optimization analysisscripts/profile_fortran_optimization.sh: Fortran-specific profiling with symbol detectionscripts/profile_c_vs_fortran.sh: Cross-language comparison framework
Each algorithm tested against 27 comprehensive test cases:
- 15 prime numbers: 2, 3, 5, 7, 11, 13, 17, 19, 23, 29, 31, 37, 41, 43, 47
- 12 composite numbers: 1, 4, 6, 8, 9, 10, 12, 14, 15, 16, 18, 20
Identical test cases ensure fair cross-language comparison.
- Intel PIN Framework: Binary instrumentation (tested with PIN 3.30)
- GCC: C compilation and optimization
- gfortran: Fortran 2008 compilation
- rustc/cargo: Rust compilation (optional, for algorithm verification)
- Linux x86_64: Development and testing platform
- gfortran: Aggressive loop transformation and memory optimization
- GCC: Conservative, predictable optimization maintaining structure
- LLVM/rustc: Optimization so aggressive it breaks binary instrumentation tools
- Traditional binary instrumentation works excellently for GCC-family compilers
- LLVM-optimized binaries require alternative profiling approaches
- Compiler choice affects both performance AND analysis feasibility
Complete performance analysis available in:
C_vs_Fortran_Performance_Analysis.md: Comprehensive technical reportresults/c_vs_fortran/: Raw profiling data and CSV resultsanalyze_results.py: Analysis automation script
- Alternative Rust profiling: Investigate LLVM-compatible profiling tools
- Additional languages: Extend to other compilers (Clang, Intel ICC)
- Algorithm expansion: Test with more complex numerical algorithms
- Hardware analysis: Explore different CPU architectures
Originally conceived as an exploration of Intel PIN binary instrumentation on naive prime algorithms posited as a challenge by a professor at JMU. The work began with curiosity about the 6k±1 optimization pattern for prime checking and expanded to reveal fundamental differences in compiler optimization strategies.
For detailed methodology, complete results, and technical analysis, see C_vs_Fortran_Performance_Analysis.md