1 files changed, 137 insertions, 0 deletions
diff --git a/vendor/github.com/minio/sha256-simd/README.md b/vendor/github.com/minio/sha256-simd/README.md
new file mode 100644
index 0000000..6117488
--- /dev/null
+++ b/vendor/github.com/minio/sha256-simd/README.md
@@ -0,0 +1,137 @@
+# sha256-simd
+Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM. 
+On AVX512 it provides an up to 8x improvement (over 3 GB/s per core).
+SHA Extensions give a performance boost of close to 4x over native.
+## Introduction
+This package is designed as a replacement for `crypto/sha256`. 
+For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.
+This package uses Golang assembly. 
+The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
+## Support for Intel SHA Extensions
+Support for the Intel SHA Extensions has been added by Kristofer Peterson (@svenski123), originally developed for spacemeshos [here](https://github.com/spacemeshos/POET/issues/23). On CPUs that support it (known thus far Intel Celeron J3455 and AMD Ryzen) it gives a significant boost in performance (with thanks to @AudriusButkevicius for reporting the results; full results [here](https://github.com/minio/sha256-simd/pull/37#issuecomment-451607827)).
+```
+$ benchcmp avx2.txt sha-ext.txt
+benchmark           AVX2 MB/s    SHA Ext MB/s  speedup
+BenchmarkHash5M     514.40       1975.17       3.84x
+```
+Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding,
+endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes,
+the other changes increased everything roughly 50%.
+## Support for AVX512
+We have added support for AVX512 which results in an up to 8x performance improvement over AVX2 (3.0 GHz Xeon Platinum 8124M CPU):
+```
+$ benchcmp avx2.txt avx512.txt
+benchmark           AVX2 MB/s    AVX512 MB/s  speedup
+BenchmarkHash5M     448.62       3498.20      7.80x
+```
+The original code was developed by Intel as part of the [multi-buffer crypto library](https://github.com/intel/intel-ipsec-mb) for IPSec or more specifically this [AVX512](https://github.com/intel/intel-ipsec-mb/blob/master/avx512/sha256_x16_avx512.asm) implementation. The key idea behind it is to process a total of 16 checksums in parallel by “transposing” 16 (independent) messages of 64 bytes between a total of 16 ZMM registers (each 64 bytes wide).
+Transposing the input messages means that in order to take full advantage of the speedup you need to have a (server) workload where multiple threads are doing SHA256 calculations in parallel. Unfortunately for this algorithm it is not possible for two message blocks processed in parallel to be dependent on one another — because then the (interim) result of the first part of the message has to be an input into the processing of the second part of the message.
+Whereas the original Intel C implementation requires some sort of explicit scheduling of messages to be processed in parallel, for Golang it makes sense to take advantage of channels in order to group messages together and use channels as well for sending back the results (thereby effectively decoupling the calculations). We have implemented a fairly simple scheduling mechanism that seems to work well in practice.
+Due to this different way of scheduling, we decided to use an explicit method to instantiate the AVX512 version. Essentially one or more AVX512 processing servers ([`Avx512Server`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L294)) have to be created whereby each server can hash over 3 GB/s on a single core. An `hash.Hash` object ([`Avx512Digest`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L45)) is then instantiated using one of these servers and used in the regular fashion:
+```go
+import "github.com/minio/sha256-simd"
+func main() {
+        server := sha256.NewAvx512Server()
+        h512 := sha256.NewAvx512(server)
+        h512.Write(fileBlock)
+        digest := h512.Sum([]byte{})
+}
+```
+Note that, because of the scheduling overhead, for small messages (< 1 MB) you will be better off using the regular SHA256 hashing (but those are typically not performance critical anyway). Some other tips to get the best performance:
+* Have many go routines doing SHA256 calculations in parallel.
+* Try to Write() messages in multiples of 64 bytes.
+* Try to keep the overall length of messages to a roughly similar size ie. 5 MB (this way all 16 ‘lanes’ in the AVX512 computations are contributing as much as possible).
+More detailed information can be found in this [blog](https://blog.minio.io/accelerate-sha256-up-to-8x-over-3-gb-s-per-core-with-avx512-a0b1d64f78f) post including scaling across cores.
+## Drop-In Replacement
+The following code snippet shows how you can use `github.com/minio/sha256-simd`. 
+This will automatically select the fastest method for the architecture on which it will be executed.
+```go
+import "github.com/minio/sha256-simd"
+func main() {
+        ...
+        shaWriter := sha256.New()
+        io.Copy(shaWriter, file)
+        ...
+}
+```
+## Performance
+Below is the speed in MB/s for a single core (ranked fast to slow) for blocks larger than 1 MB.
+| Processor                         | SIMD    | Speed (MB/s) |
+| --------------------------------- | ------- | ------------:|
+| 3.0 GHz Intel Xeon Platinum 8124M | AVX512  |         3498 |
+| 3.7 GHz AMD Ryzen 7 2700X         | SHA Ext |         1979 |
+| 1.2 GHz ARM Cortex-A53            | ARM64   |          638 |
+## asm2plan9s
+In order to be able to work more easily with AVX512/AVX2 instructions, a separate tool was developed to convert SIMD instructions into the corresponding BYTE sequence as accepted by Go assembly. See [asm2plan9s](https://github.com/minio/asm2plan9s) for more information.
+## Why and benefits
+One of the most performance sensitive parts of the [Minio](https://github.com/minio/minio) object storage server is related to SHA256 hash sums calculations. For instance during multi part uploads each part that is uploaded needs to be verified for data integrity by the server.
+Other applications that can benefit from enhanced SHA256 performance are deduplication in storage systems, intrusion detection, version control systems, integrity checking, etc.
+## ARM SHA Extensions
+The 64-bit ARMv8 core has introduced new instructions for SHA1 and SHA2 acceleration as part of the [Cryptography Extensions](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0501f/CHDFJBCJ.html). Below you can see a small excerpt highlighting one of the rounds as is done for the SHA256 calculation process (for full code see [sha256block_arm64.s](https://github.com/minio/sha256-simd/blob/master/sha256block_arm64.s)).
+ ```
+ sha256h    q2, q3, v9.4s
+ sha256h2   q3, q4, v9.4s
+ sha256su0  v5.4s, v6.4s
+ rev32      v8.16b, v8.16b
+ add        v9.4s, v7.4s, v18.4s
+ mov        v4.16b, v2.16b
+ sha256h    q2, q3, v10.4s
+ sha256h2   q3, q4, v10.4s
+ sha256su0  v6.4s, v7.4s
+ sha256su1  v5.4s, v7.4s, v8.4s
+ ```
+### Detailed benchmarks
+Benchmarks generated on a 1.2 Ghz Quad-Core ARM Cortex A53 equipped [Pine64](https://www.pine64.com/).
+```
+minio@minio-arm:$ benchcmp golang.txt arm64.txt
+benchmark                 golang         arm64        speedup
+BenchmarkHash8Bytes-4     0.68 MB/s      5.70 MB/s      8.38x
+BenchmarkHash1K-4         5.65 MB/s    326.30 MB/s     57.75x
+BenchmarkHash8K-4         6.00 MB/s    570.63 MB/s     95.11x
+BenchmarkHash1M-4         6.05 MB/s    638.23 MB/s    105.49x
+```
+## License
+Released under the Apache License v2.0. You can find the complete text in the file LICENSE.
+## Contributing
+Contributions are welcome, please send PRs for any enhancements.

diff --git a/vendor/github.com/minio/sha256-simd/README.md b/vendor/github.com/minio/sha256-simd/README.md new file mode 100644 index 0000000..6117488 --- /dev/null +++ b/vendor/github.com/minio/sha256-simd/README.md
@@ -0,0 +1,137 @@
	1	# sha256-simd
	2
	3	Accelerate SHA256 computations in pure Go using AVX512, SHA Extensions for x86 and ARM64 for ARM.
	4	On AVX512 it provides an up to 8x improvement (over 3 GB/s per core).
	5	SHA Extensions give a performance boost of close to 4x over native.
	6
	7	## Introduction
	8
	9	This package is designed as a replacement for `crypto/sha256`.
	10	For ARM CPUs with the Cryptography Extensions, advantage is taken of the SHA2 instructions resulting in a massive performance improvement.
	11
	12	This package uses Golang assembly.
	13	The AVX512 version is based on the Intel's "multi-buffer crypto library for IPSec" whereas the other Intel implementations are described in "Fast SHA-256 Implementations on Intel Architecture Processors" by J. Guilford et al.
	14
	15	## Support for Intel SHA Extensions
	16
	17	Support for the Intel SHA Extensions has been added by Kristofer Peterson (@svenski123), originally developed for spacemeshos [here](https://github.com/spacemeshos/POET/issues/23). On CPUs that support it (known thus far Intel Celeron J3455 and AMD Ryzen) it gives a significant boost in performance (with thanks to @AudriusButkevicius for reporting the results; full results [here](https://github.com/minio/sha256-simd/pull/37#issuecomment-451607827)).
	18
	19	```
	20	$ benchcmp avx2.txt sha-ext.txt
	21	benchmark AVX2 MB/s SHA Ext MB/s speedup
	22	BenchmarkHash5M 514.40 1975.17 3.84x
	23	```
	24
	25	Thanks to Kristofer Peterson, we also added additional performance changes such as optimized padding,
	26	endian conversions which sped up all implementations i.e. Intel SHA alone while doubled performance for small sizes,
	27	the other changes increased everything roughly 50%.
	28
	29	## Support for AVX512
	30
	31	We have added support for AVX512 which results in an up to 8x performance improvement over AVX2 (3.0 GHz Xeon Platinum 8124M CPU):
	32
	33	```
	34	$ benchcmp avx2.txt avx512.txt
	35	benchmark AVX2 MB/s AVX512 MB/s speedup
	36	BenchmarkHash5M 448.62 3498.20 7.80x
	37	```
	38
	39	The original code was developed by Intel as part of the [multi-buffer crypto library](https://github.com/intel/intel-ipsec-mb) for IPSec or more specifically this [AVX512](https://github.com/intel/intel-ipsec-mb/blob/master/avx512/sha256_x16_avx512.asm) implementation. The key idea behind it is to process a total of 16 checksums in parallel by “transposing” 16 (independent) messages of 64 bytes between a total of 16 ZMM registers (each 64 bytes wide).
	40
	41	Transposing the input messages means that in order to take full advantage of the speedup you need to have a (server) workload where multiple threads are doing SHA256 calculations in parallel. Unfortunately for this algorithm it is not possible for two message blocks processed in parallel to be dependent on one another — because then the (interim) result of the first part of the message has to be an input into the processing of the second part of the message.
	42
	43	Whereas the original Intel C implementation requires some sort of explicit scheduling of messages to be processed in parallel, for Golang it makes sense to take advantage of channels in order to group messages together and use channels as well for sending back the results (thereby effectively decoupling the calculations). We have implemented a fairly simple scheduling mechanism that seems to work well in practice.
	44
	45	Due to this different way of scheduling, we decided to use an explicit method to instantiate the AVX512 version. Essentially one or more AVX512 processing servers ([`Avx512Server`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L294)) have to be created whereby each server can hash over 3 GB/s on a single core. An `hash.Hash` object ([`Avx512Digest`](https://github.com/minio/sha256-simd/blob/master/sha256blockAvx512_amd64.go#L45)) is then instantiated using one of these servers and used in the regular fashion:
	46
	47	```go
	48	import "github.com/minio/sha256-simd"
	49
	50	func main() {
	51	server := sha256.NewAvx512Server()
	52	h512 := sha256.NewAvx512(server)
	53	h512.Write(fileBlock)
	54	digest := h512.Sum([]byte{})
	55	}
	56	```
	57
	58	Note that, because of the scheduling overhead, for small messages (< 1 MB) you will be better off using the regular SHA256 hashing (but those are typically not performance critical anyway). Some other tips to get the best performance:
	59	* Have many go routines doing SHA256 calculations in parallel.
	60	* Try to Write() messages in multiples of 64 bytes.
	61	* Try to keep the overall length of messages to a roughly similar size ie. 5 MB (this way all 16 ‘lanes’ in the AVX512 computations are contributing as much as possible).
	62
	63	More detailed information can be found in this [blog](https://blog.minio.io/accelerate-sha256-up-to-8x-over-3-gb-s-per-core-with-avx512-a0b1d64f78f) post including scaling across cores.
	64
	65	## Drop-In Replacement
	66
	67	The following code snippet shows how you can use `github.com/minio/sha256-simd`.
	68	This will automatically select the fastest method for the architecture on which it will be executed.
	69
	70	```go
	71	import "github.com/minio/sha256-simd"
	72
	73	func main() {
	74	...
	75	shaWriter := sha256.New()
	76	io.Copy(shaWriter, file)
	77	...
	78	}
	79	```
	80
	81	## Performance
	82
	83	Below is the speed in MB/s for a single core (ranked fast to slow) for blocks larger than 1 MB.
	84
	85	\| Processor \| SIMD \| Speed (MB/s) \|
	86	\| --------------------------------- \| ------- \| ------------:\|
	87	\| 3.0 GHz Intel Xeon Platinum 8124M \| AVX512 \| 3498 \|
	88	\| 3.7 GHz AMD Ryzen 7 2700X \| SHA Ext \| 1979 \|
	89	\| 1.2 GHz ARM Cortex-A53 \| ARM64 \| 638 \|
	90
	91	## asm2plan9s
	92
	93	In order to be able to work more easily with AVX512/AVX2 instructions, a separate tool was developed to convert SIMD instructions into the corresponding BYTE sequence as accepted by Go assembly. See [asm2plan9s](https://github.com/minio/asm2plan9s) for more information.
	94
	95	## Why and benefits
	96
	97	One of the most performance sensitive parts of the [Minio](https://github.com/minio/minio) object storage server is related to SHA256 hash sums calculations. For instance during multi part uploads each part that is uploaded needs to be verified for data integrity by the server.
	98
	99	Other applications that can benefit from enhanced SHA256 performance are deduplication in storage systems, intrusion detection, version control systems, integrity checking, etc.
	100
	101	## ARM SHA Extensions
	102
	103	The 64-bit ARMv8 core has introduced new instructions for SHA1 and SHA2 acceleration as part of the [Cryptography Extensions](http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0501f/CHDFJBCJ.html). Below you can see a small excerpt highlighting one of the rounds as is done for the SHA256 calculation process (for full code see [sha256block_arm64.s](https://github.com/minio/sha256-simd/blob/master/sha256block_arm64.s)).
	104
	105	```
	106	sha256h q2, q3, v9.4s
	107	sha256h2 q3, q4, v9.4s
	108	sha256su0 v5.4s, v6.4s
	109	rev32 v8.16b, v8.16b
	110	add v9.4s, v7.4s, v18.4s
	111	mov v4.16b, v2.16b
	112	sha256h q2, q3, v10.4s
	113	sha256h2 q3, q4, v10.4s
	114	sha256su0 v6.4s, v7.4s
	115	sha256su1 v5.4s, v7.4s, v8.4s
	116	```
	117
	118	### Detailed benchmarks
	119
	120	Benchmarks generated on a 1.2 Ghz Quad-Core ARM Cortex A53 equipped [Pine64](https://www.pine64.com/).
	121
	122	```
	123	minio@minio-arm:$ benchcmp golang.txt arm64.txt
	124	benchmark golang arm64 speedup
	125	BenchmarkHash8Bytes-4 0.68 MB/s 5.70 MB/s 8.38x
	126	BenchmarkHash1K-4 5.65 MB/s 326.30 MB/s 57.75x
	127	BenchmarkHash8K-4 6.00 MB/s 570.63 MB/s 95.11x
	128	BenchmarkHash1M-4 6.05 MB/s 638.23 MB/s 105.49x
	129	```
	130
	131	## License
	132
	133	Released under the Apache License v2.0. You can find the complete text in the file LICENSE.
	134
	135	## Contributing
	136
	137	Contributions are welcome, please send PRs for any enhancements.