Skip to content

Commit 6a57409

Browse files
authored
Remove obsolete SIMD code (#57)
Probably due to compiler optimizations the Go code is now faster than the SSSE3/AVX/AVX2 code: ``` BenchmarkHash/GEN_/8Bytes-32 6486468 184 ns/op 43.46 MB/s BenchmarkHash/GEN_/1K-32 545470 2172 ns/op 471.36 MB/s BenchmarkHash/GEN_/8K-32 74073 16106 ns/op 508.64 MB/s BenchmarkHash/GEN_/1M-32 584 2034247 ns/op 515.46 MB/s BenchmarkHash/GEN_/5M-32 100 10190003 ns/op 514.51 MB/s BenchmarkHash/GEN_/10M-32 56 20357139 ns/op 515.09 MB/s BenchmarkHash/AVX2/8Bytes-32 5263258 226 ns/op 35.44 MB/s BenchmarkHash/AVX2/1K-32 444441 2633 ns/op 388.98 MB/s BenchmarkHash/AVX2/8K-32 61855 19513 ns/op 419.81 MB/s BenchmarkHash/AVX2/1M-32 487 2462013 ns/op 425.90 MB/s BenchmarkHash/AVX2/5M-32 91 12384626 ns/op 423.34 MB/s BenchmarkHash/AVX2/10M-32 44 26636364 ns/op 393.66 MB/s BenchmarkHash/AVX_/8Bytes-32 6349206 188 ns/op 42.54 MB/s BenchmarkHash/AVX_/1K-32 461538 2620 ns/op 390.91 MB/s BenchmarkHash/AVX_/8K-32 61224 19567 ns/op 418.65 MB/s BenchmarkHash/AVX_/1M-32 484 2473140 ns/op 423.99 MB/s BenchmarkHash/AVX_/5M-32 99 12505052 ns/op 419.26 MB/s BenchmarkHash/AVX_/10M-32 46 24869557 ns/op 421.63 MB/s BenchmarkHash/SSSE/8Bytes-32 6282679 192 ns/op 41.71 MB/s BenchmarkHash/SSSE/1K-32 461614 2628 ns/op 389.69 MB/s BenchmarkHash/SSSE/8K-32 60913 19651 ns/op 416.88 MB/s BenchmarkHash/SSSE/1M-32 481 2488563 ns/op 421.36 MB/s BenchmarkHash/SSSE/5M-32 91 12516477 ns/op 418.88 MB/s BenchmarkHash/SSSE/10M-32 46 24869561 ns/op 421.63 MB/s ```
1 parent f675151 commit 6a57409

26 files changed

+90
-2829
lines changed

.github/workflows/go.yml

Lines changed: 5 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,8 @@ jobs:
1515
strategy:
1616
max-parallel: 4
1717
matrix:
18-
go-version: [1.13.x, 1.12.x]
19-
os: [ubuntu-latest, windows-latest]
18+
go-version: [1.16.x, 1.15.x, 1.14.x]
19+
os: [ubuntu-latest, windows-latest, macos-latest]
2020
steps:
2121
- name: Set up Go ${{ matrix.go-version }}
2222
uses: actions/setup-go@v1
@@ -30,6 +30,9 @@ jobs:
3030
- name: Build on ${{ matrix.os }}
3131
if: matrix.os == 'windows-latest'
3232
run: go test -race -v ./...
33+
- name: Build on ${{ matrix.os }}
34+
if: matrix.os == 'macos-latest'
35+
run: go test -race -v ./...
3336
- name: Build on ${{ matrix.os }}
3437
if: matrix.os == 'ubuntu-latest'
3538
run: |

README.md

Lines changed: 13 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -1,14 +1,18 @@
11
# sha256-simd
22

3-
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions and AVX2 for Intel and ARM64 for ARM. On AVX512 it provides an up to 8x improvement (over 3 GB/s per core) in comparison to AVX2. SHA Extensions give a performance boost of close to 4x over AVX2.
3+
Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM.
4+
On AVX512 it provides an up to 8x improvement (over 3 GB/s per core).
5+
SHA Extensions give a performance boost of close to 4x over native.
46

57
## Introduction
68

7-
This package is designed as a replacement for `crypto/sha256`. For Intel CPUs it has two flavors for AVX512 and AVX2 (AVX/SSE are also supported). For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.
9+
This package is designed as a replacement for `crypto/sha256`.
10+
For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.
811

9-
This package uses Golang assembly. The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
12+
This package uses Golang assembly.
13+
The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
1014

11-
## New: Support for Intel SHA Extensions
15+
## Support for Intel SHA Extensions
1216

1317
Support for the Intel SHA Extensions has been added by Kristofer Peterson (@svenski123), originally developed for spacemeshos [here](https://github.com/spacemeshos/POET/issues/23). On CPUs that support it (known thus far Intel Celeron J3455 and AMD Ryzen) it gives a significant boost in performance (with thanks to @AudriusButkevicius for reporting the results; full results [here](https://github.com/minio/sha256-simd/pull/37#issuecomment-451607827)).
1418

@@ -18,7 +22,9 @@ benchmark AVX2 MB/s SHA Ext MB/s speedup
1822
BenchmarkHash5M 514.40 1975.17 3.84x
1923
```
2024

21-
Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding, endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes, the other changes increased everything roughly 50%.
25+
Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding,
26+
endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes,
27+
the other changes increased everything roughly 50%.
2228

2329
## Support for AVX512
2430

@@ -58,7 +64,8 @@ More detailed information can be found in this [blog](https://blog.minio.io/acce
5864

5965
## Drop-In Replacement
6066

61-
The following code snippet shows how you can use `github.com/minio/sha256-simd`. This will automatically select the fastest method for the architecture on which it will be executed.
67+
The following code snippet shows how you can use `github.com/minio/sha256-simd`.
68+
This will automatically select the fastest method for the architecture on which it will be executed.
6269

6370
```go
6471
import "github.com/minio/sha256-simd"
@@ -80,9 +87,6 @@ Below is the speed in MB/s for a single core (ranked fast to slow) for blocks la
8087
| 3.0 GHz Intel Xeon Platinum 8124M | AVX512 | 3498 |
8188
| 3.7 GHz AMD Ryzen 7 2700X | SHA Ext | 1979 |
8289
| 1.2 GHz ARM Cortex-A53 | ARM64 | 638 |
83-
| 3.0 GHz Intel Xeon Platinum 8124M | AVX2 | 449 |
84-
| 3.1 GHz Intel Core i7 | AVX | 362 |
85-
| 3.1 GHz Intel Core i7 | SSE | 299 |
8690

8791
## asm2plan9s
8892

cpuid.go

Lines changed: 0 additions & 119 deletions
This file was deleted.

cpuid_386.go

Lines changed: 0 additions & 24 deletions
This file was deleted.

cpuid_386.s

Lines changed: 0 additions & 53 deletions
This file was deleted.

cpuid_amd64.go

Lines changed: 0 additions & 24 deletions
This file was deleted.

cpuid_amd64.s

Lines changed: 0 additions & 53 deletions
This file was deleted.

cpuid_arm.go

Lines changed: 0 additions & 32 deletions
This file was deleted.

0 commit comments

Comments
 (0)