aboutsummaryrefslogtreecommitdiffstats
path: root/vendor/github.com/minio/md5-simd
diff options
context:
space:
mode:
Diffstat (limited to 'vendor/github.com/minio/md5-simd')
-rw-r--r--vendor/github.com/minio/md5-simd/LICENSE202
-rw-r--r--vendor/github.com/minio/md5-simd/LICENSE.Golang27
-rw-r--r--vendor/github.com/minio/md5-simd/README.md198
-rw-r--r--vendor/github.com/minio/md5-simd/block16_amd64.s228
-rw-r--r--vendor/github.com/minio/md5-simd/block8_amd64.s281
-rw-r--r--vendor/github.com/minio/md5-simd/block_amd64.go210
-rw-r--r--vendor/github.com/minio/md5-simd/md5-digest_amd64.go188
-rw-r--r--vendor/github.com/minio/md5-simd/md5-server_amd64.go397
-rw-r--r--vendor/github.com/minio/md5-simd/md5-server_fallback.go12
-rw-r--r--vendor/github.com/minio/md5-simd/md5-util_amd64.go85
-rw-r--r--vendor/github.com/minio/md5-simd/md5.go63
-rw-r--r--vendor/github.com/minio/md5-simd/md5block_amd64.go11
-rw-r--r--vendor/github.com/minio/md5-simd/md5block_amd64.s714
13 files changed, 2616 insertions, 0 deletions
diff --git a/vendor/github.com/minio/md5-simd/LICENSE b/vendor/github.com/minio/md5-simd/LICENSE
new file mode 100644
index 0000000..d645695
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/LICENSE
@@ -0,0 +1,202 @@
1
2 Apache License
3 Version 2.0, January 2004
4 http://www.apache.org/licenses/
5
6 TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
7
8 1. Definitions.
9
10 "License" shall mean the terms and conditions for use, reproduction,
11 and distribution as defined by Sections 1 through 9 of this document.
12
13 "Licensor" shall mean the copyright owner or entity authorized by
14 the copyright owner that is granting the License.
15
16 "Legal Entity" shall mean the union of the acting entity and all
17 other entities that control, are controlled by, or are under common
18 control with that entity. For the purposes of this definition,
19 "control" means (i) the power, direct or indirect, to cause the
20 direction or management of such entity, whether by contract or
21 otherwise, or (ii) ownership of fifty percent (50%) or more of the
22 outstanding shares, or (iii) beneficial ownership of such entity.
23
24 "You" (or "Your") shall mean an individual or Legal Entity
25 exercising permissions granted by this License.
26
27 "Source" form shall mean the preferred form for making modifications,
28 including but not limited to software source code, documentation
29 source, and configuration files.
30
31 "Object" form shall mean any form resulting from mechanical
32 transformation or translation of a Source form, including but
33 not limited to compiled object code, generated documentation,
34 and conversions to other media types.
35
36 "Work" shall mean the work of authorship, whether in Source or
37 Object form, made available under the License, as indicated by a
38 copyright notice that is included in or attached to the work
39 (an example is provided in the Appendix below).
40
41 "Derivative Works" shall mean any work, whether in Source or Object
42 form, that is based on (or derived from) the Work and for which the
43 editorial revisions, annotations, elaborations, or other modifications
44 represent, as a whole, an original work of authorship. For the purposes
45 of this License, Derivative Works shall not include works that remain
46 separable from, or merely link (or bind by name) to the interfaces of,
47 the Work and Derivative Works thereof.
48
49 "Contribution" shall mean any work of authorship, including
50 the original version of the Work and any modifications or additions
51 to that Work or Derivative Works thereof, that is intentionally
52 submitted to Licensor for inclusion in the Work by the copyright owner
53 or by an individual or Legal Entity authorized to submit on behalf of
54 the copyright owner. For the purposes of this definition, "submitted"
55 means any form of electronic, verbal, or written communication sent
56 to the Licensor or its representatives, including but not limited to
57 communication on electronic mailing lists, source code control systems,
58 and issue tracking systems that are managed by, or on behalf of, the
59 Licensor for the purpose of discussing and improving the Work, but
60 excluding communication that is conspicuously marked or otherwise
61 designated in writing by the copyright owner as "Not a Contribution."
62
63 "Contributor" shall mean Licensor and any individual or Legal Entity
64 on behalf of whom a Contribution has been received by Licensor and
65 subsequently incorporated within the Work.
66
67 2. Grant of Copyright License. Subject to the terms and conditions of
68 this License, each Contributor hereby grants to You a perpetual,
69 worldwide, non-exclusive, no-charge, royalty-free, irrevocable
70 copyright license to reproduce, prepare Derivative Works of,
71 publicly display, publicly perform, sublicense, and distribute the
72 Work and such Derivative Works in Source or Object form.
73
74 3. Grant of Patent License. Subject to the terms and conditions of
75 this License, each Contributor hereby grants to You a perpetual,
76 worldwide, non-exclusive, no-charge, royalty-free, irrevocable
77 (except as stated in this section) patent license to make, have made,
78 use, offer to sell, sell, import, and otherwise transfer the Work,
79 where such license applies only to those patent claims licensable
80 by such Contributor that are necessarily infringed by their
81 Contribution(s) alone or by combination of their Contribution(s)
82 with the Work to which such Contribution(s) was submitted. If You
83 institute patent litigation against any entity (including a
84 cross-claim or counterclaim in a lawsuit) alleging that the Work
85 or a Contribution incorporated within the Work constitutes direct
86 or contributory patent infringement, then any patent licenses
87 granted to You under this License for that Work shall terminate
88 as of the date such litigation is filed.
89
90 4. Redistribution. You may reproduce and distribute copies of the
91 Work or Derivative Works thereof in any medium, with or without
92 modifications, and in Source or Object form, provided that You
93 meet the following conditions:
94
95 (a) You must give any other recipients of the Work or
96 Derivative Works a copy of this License; and
97
98 (b) You must cause any modified files to carry prominent notices
99 stating that You changed the files; and
100
101 (c) You must retain, in the Source form of any Derivative Works
102 that You distribute, all copyright, patent, trademark, and
103 attribution notices from the Source form of the Work,
104 excluding those notices that do not pertain to any part of
105 the Derivative Works; and
106
107 (d) If the Work includes a "NOTICE" text file as part of its
108 distribution, then any Derivative Works that You distribute must
109 include a readable copy of the attribution notices contained
110 within such NOTICE file, excluding those notices that do not
111 pertain to any part of the Derivative Works, in at least one
112 of the following places: within a NOTICE text file distributed
113 as part of the Derivative Works; within the Source form or
114 documentation, if provided along with the Derivative Works; or,
115 within a display generated by the Derivative Works, if and
116 wherever such third-party notices normally appear. The contents
117 of the NOTICE file are for informational purposes only and
118 do not modify the License. You may add Your own attribution
119 notices within Derivative Works that You distribute, alongside
120 or as an addendum to the NOTICE text from the Work, provided
121 that such additional attribution notices cannot be construed
122 as modifying the License.
123
124 You may add Your own copyright statement to Your modifications and
125 may provide additional or different license terms and conditions
126 for use, reproduction, or distribution of Your modifications, or
127 for any such Derivative Works as a whole, provided Your use,
128 reproduction, and distribution of the Work otherwise complies with
129 the conditions stated in this License.
130
131 5. Submission of Contributions. Unless You explicitly state otherwise,
132 any Contribution intentionally submitted for inclusion in the Work
133 by You to the Licensor shall be under the terms and conditions of
134 this License, without any additional terms or conditions.
135 Notwithstanding the above, nothing herein shall supersede or modify
136 the terms of any separate license agreement you may have executed
137 with Licensor regarding such Contributions.
138
139 6. Trademarks. This License does not grant permission to use the trade
140 names, trademarks, service marks, or product names of the Licensor,
141 except as required for reasonable and customary use in describing the
142 origin of the Work and reproducing the content of the NOTICE file.
143
144 7. Disclaimer of Warranty. Unless required by applicable law or
145 agreed to in writing, Licensor provides the Work (and each
146 Contributor provides its Contributions) on an "AS IS" BASIS,
147 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
148 implied, including, without limitation, any warranties or conditions
149 of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
150 PARTICULAR PURPOSE. You are solely responsible for determining the
151 appropriateness of using or redistributing the Work and assume any
152 risks associated with Your exercise of permissions under this License.
153
154 8. Limitation of Liability. In no event and under no legal theory,
155 whether in tort (including negligence), contract, or otherwise,
156 unless required by applicable law (such as deliberate and grossly
157 negligent acts) or agreed to in writing, shall any Contributor be
158 liable to You for damages, including any direct, indirect, special,
159 incidental, or consequential damages of any character arising as a
160 result of this License or out of the use or inability to use the
161 Work (including but not limited to damages for loss of goodwill,
162 work stoppage, computer failure or malfunction, or any and all
163 other commercial damages or losses), even if such Contributor
164 has been advised of the possibility of such damages.
165
166 9. Accepting Warranty or Additional Liability. While redistributing
167 the Work or Derivative Works thereof, You may choose to offer,
168 and charge a fee for, acceptance of support, warranty, indemnity,
169 or other liability obligations and/or rights consistent with this
170 License. However, in accepting such obligations, You may act only
171 on Your own behalf and on Your sole responsibility, not on behalf
172 of any other Contributor, and only if You agree to indemnify,
173 defend, and hold each Contributor harmless for any liability
174 incurred by, or claims asserted against, such Contributor by reason
175 of your accepting any such warranty or additional liability.
176
177 END OF TERMS AND CONDITIONS
178
179 APPENDIX: How to apply the Apache License to your work.
180
181 To apply the Apache License to your work, attach the following
182 boilerplate notice, with the fields enclosed by brackets "[]"
183 replaced with your own identifying information. (Don't include
184 the brackets!) The text should be enclosed in the appropriate
185 comment syntax for the file format. We also recommend that a
186 file or class name and description of purpose be included on the
187 same "printed page" as the copyright notice for easier
188 identification within third-party archives.
189
190 Copyright [yyyy] [name of copyright owner]
191
192 Licensed under the Apache License, Version 2.0 (the "License");
193 you may not use this file except in compliance with the License.
194 You may obtain a copy of the License at
195
196 http://www.apache.org/licenses/LICENSE-2.0
197
198 Unless required by applicable law or agreed to in writing, software
199 distributed under the License is distributed on an "AS IS" BASIS,
200 WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
201 See the License for the specific language governing permissions and
202 limitations under the License.
diff --git a/vendor/github.com/minio/md5-simd/LICENSE.Golang b/vendor/github.com/minio/md5-simd/LICENSE.Golang
new file mode 100644
index 0000000..6a66aea
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/LICENSE.Golang
@@ -0,0 +1,27 @@
1Copyright (c) 2009 The Go Authors. All rights reserved.
2
3Redistribution and use in source and binary forms, with or without
4modification, are permitted provided that the following conditions are
5met:
6
7 * Redistributions of source code must retain the above copyright
8notice, this list of conditions and the following disclaimer.
9 * Redistributions in binary form must reproduce the above
10copyright notice, this list of conditions and the following disclaimer
11in the documentation and/or other materials provided with the
12distribution.
13 * Neither the name of Google Inc. nor the names of its
14contributors may be used to endorse or promote products derived from
15this software without specific prior written permission.
16
17THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
18"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
19LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
20A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
21OWNER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
22SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
23LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
24DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
25THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
26(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
27OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
diff --git a/vendor/github.com/minio/md5-simd/README.md b/vendor/github.com/minio/md5-simd/README.md
new file mode 100644
index 0000000..fa6fce1
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/README.md
@@ -0,0 +1,198 @@
1
2# md5-simd
3
4This is a SIMD accelerated MD5 package, allowing up to either 8 (AVX2) or 16 (AVX512) independent MD5 sums to be calculated on a single CPU core.
5
6It was originally based on the [md5vec](https://github.com/igneous-systems/md5vec) repository by Igneous Systems, but has been made more flexible by amongst others supporting different message sizes per lane and adding AVX512.
7
8`md5-simd` integrates a similar mechanism as described in [minio/sha256-simd](https://github.com/minio/sha256-simd#support-for-avx512) for making it easy for clients to take advantages of the parallel nature of the MD5 calculation. This will result in reduced overall CPU load.
9
10It is important to understand that `md5-simd` **does not speed up** a single threaded MD5 hash sum.
11Rather it allows multiple __independent__ MD5 sums to be computed in parallel on the same CPU core,
12thereby making more efficient usage of the computing resources.
13
14## Usage
15
16[![Documentation](https://godoc.org/github.com/minio/md5-simd?status.svg)](https://pkg.go.dev/github.com/minio/md5-simd?tab=doc)
17
18
19In order to use `md5-simd`, you must first create an `Server` which can be
20used to instantiate one or more objects for MD5 hashing.
21
22These objects conform to the regular [`hash.Hash`](https://pkg.go.dev/hash?tab=doc#Hash) interface
23and as such the normal Write/Reset/Sum functionality works as expected.
24
25As an example:
26```
27 // Create server
28 server := md5simd.NewServer()
29 defer server.Close()
30
31 // Create hashing object (conforming to hash.Hash)
32 md5Hash := server.NewHash()
33 defer md5Hash.Close()
34
35 // Write one (or more) blocks
36 md5Hash.Write(block)
37
38 // Return digest
39 digest := md5Hash.Sum([]byte{})
40```
41
42To keep performance both a [Server](https://pkg.go.dev/github.com/minio/md5-simd?tab=doc#Server)
43and individual [Hasher](https://pkg.go.dev/github.com/minio/md5-simd?tab=doc#Hasher) should
44be closed using the `Close()` function when no longer needed.
45
46A Hasher can efficiently be re-used by using [`Reset()`](https://pkg.go.dev/hash?tab=doc#Hash) functionality.
47
48In case your system does not support the instructions required it will fall back to using `crypto/md5` for hashing.
49
50## Limitations
51
52As explained above `md5-simd` does not speed up an individual MD5 hash sum computation,
53unless some hierarchical tree construct is used but this will result in different outcomes.
54Running a single hash on a server results in approximately half the throughput.
55
56Instead, it allows running multiple MD5 calculations in parallel on a single CPU core.
57This can be beneficial in e.g. multi-threaded server applications where many go-routines
58are dealing with many requests and multiple MD5 calculations can be packed/scheduled for parallel execution on a single core.
59
60This will result in a lower overall CPU usage as compared to using the standard `crypto/md5`
61functionality where each MD5 hash computation will consume a single thread (core).
62
63It is best to test and measure the overall CPU usage in a representative usage scenario in your application
64to get an overall understanding of the benefits of `md5-simd` as compared to `crypto/md5`, ideally under heavy CPU load.
65
66Also note that `md5-simd` is best meant to work with large objects,
67so if your application only hashes small objects of a few kilobytes
68you may be better of by using `crypto/md5`.
69
70## Performance
71
72For the best performance writes should be a multiple of 64 bytes, ideally a multiple of 32KB.
73To help with that a [`buffered := bufio.NewWriterSize(hasher, 32<<10)`](https://golang.org/pkg/bufio/#NewWriterSize)
74can be inserted if you are unsure of the sizes of the writes.
75Remember to [flush](https://golang.org/pkg/bufio/#Writer.Flush) `buffered` before reading the hash.
76
77A single 'server' can process 16 streams concurrently with 1 core (AVX-512) or 2 cores (AVX2).
78In situations where it is likely that more than 16 streams are fully loaded it may be beneficial
79to use multiple servers.
80
81The following chart compares the multi-core performance between `crypto/md5` vs the AVX2 vs the AVX512 code:
82
83![md5-performance-overview](chart/Multi-core-MD5-Aggregated-Hashing-Performance.png)
84
85Compared to `crypto/md5`, the AVX2 version is up to 4x faster:
86
87```
88$ benchcmp crypto-md5.txt avx2.txt
89benchmark old MB/s new MB/s speedup
90BenchmarkParallel/32KB-4 2229.22 7370.50 3.31x
91BenchmarkParallel/64KB-4 2233.61 8248.46 3.69x
92BenchmarkParallel/128KB-4 2235.43 8660.74 3.87x
93BenchmarkParallel/256KB-4 2236.39 8863.87 3.96x
94BenchmarkParallel/512KB-4 2238.05 8985.39 4.01x
95BenchmarkParallel/1MB-4 2233.56 9042.62 4.05x
96BenchmarkParallel/2MB-4 2224.11 9014.46 4.05x
97BenchmarkParallel/4MB-4 2199.78 8993.61 4.09x
98BenchmarkParallel/8MB-4 2182.48 8748.22 4.01x
99```
100
101Compared to `crypto/md5`, the AVX512 is up to 8x faster (for larger block sizes):
102
103```
104$ benchcmp crypto-md5.txt avx512.txt
105benchmark old MB/s new MB/s speedup
106BenchmarkParallel/32KB-4 2229.22 11605.78 5.21x
107BenchmarkParallel/64KB-4 2233.61 14329.65 6.42x
108BenchmarkParallel/128KB-4 2235.43 16166.39 7.23x
109BenchmarkParallel/256KB-4 2236.39 15570.09 6.96x
110BenchmarkParallel/512KB-4 2238.05 16705.83 7.46x
111BenchmarkParallel/1MB-4 2233.56 16941.95 7.59x
112BenchmarkParallel/2MB-4 2224.11 17136.01 7.70x
113BenchmarkParallel/4MB-4 2199.78 17218.61 7.83x
114BenchmarkParallel/8MB-4 2182.48 17252.88 7.91x
115```
116
117These measurements were performed on AWS EC2 instance of type `c5.xlarge` equipped with a Xeon Platinum 8124M CPU at 3.0 GHz.
118
119If only one or two inputs are available the scalar calculation method will be used for the
120optimal speed in these cases.
121
122## Operation
123
124To make operation as easy as possible there is a “Server” coordinating everything. The server keeps track of individual hash states and updates them as new data comes in. This can be visualized as follows:
125
126![server-architecture](chart/server-architecture.png)
127
128The data is sent to the server from each hash input in blocks of up to 32KB per round. In our testing we found this to be the block size that yielded the best results.
129
130Whenever there is data available the server will collect data for up to 16 hashes and process all 16 lanes in parallel. This means that if 16 hashes have data available all the lanes will be filled. However since that may not be the case, the server will fill less lanes and do a round anyway. Lanes can also be partially filled if less than 32KB of data is written.
131
132![server-lanes-example](chart/server-lanes-example.png)
133
134In this example 4 lanes are fully filled and 2 lanes are partially filled. In this case the black areas will simply be masked out from the results and ignored. This is also why calculating a single hash on a server will not result in any speedup and hash writes should be a multiple of 32KB for the best performance.
135
136For AVX512 all 16 calculations will be done on a single core, on AVX2 on 2 cores if there is data for more than 8 lanes.
137So for optimal usage there should be data available for all 16 hashes. It may be perfectly reasonable to use more than 16 concurrent hashes.
138
139
140## Design & Tech
141
142md5-simd has both an AVX2 (8-lane parallel), and an AVX512 (16-lane parallel version) algorithm to accelerate the computation with the following function definitions:
143```
144//go:noescape
145func block8(state *uint32, base uintptr, bufs *int32, cache *byte, n int)
146
147//go:noescape
148func block16(state *uint32, ptrs *int64, mask uint64, n int)
149```
150
151The AVX2 version is based on the [md5vec](https://github.com/igneous-systems/md5vec) repository and is essentially unchanged except for minor (cosmetic) changes.
152
153The AVX512 version is derived from the AVX2 version but adds some further optimizations and simplifications.
154
155### Caching in upper ZMM registers
156
157The AVX2 version passes in a `cache8` block of memory (about 0.5 KB) for temporary storage of intermediate results during `ROUND1` which are subsequently used during `ROUND2` through to `ROUND4`.
158
159Since AVX512 has double the amount of registers (32 ZMM registers as compared to 16 YMM registers), it is possible to use the upper 16 ZMM registers for keeping the intermediate states on the CPU. As such, there is no need to pass in a corresponding `cache16` into the AVX512 block function.
160
161### Direct loading using 64-bit pointers
162
163The AVX2 uses the `VPGATHERDD` instruction (for YMM) to do a parallel load of 8 lanes using (8 independent) 32-bit offets. Since there is no control over how the 8 slices that are passed into the (Golang) `blockMd5` function are laid out into memory, it is not possible to derive a "base" address and corresponding offsets (all within 32-bits) for all 8 slices.
164
165As such the AVX2 version uses an interim buffer to collect the byte slices to be hashed from all 8 inut slices and passed this buffer along with (fixed) 32-bit offsets into the assembly code.
166
167For the AVX512 version this interim buffer is not needed since the AVX512 code uses a pair of `VPGATHERQD` instructions to directly dereference 64-bit pointers (from a base register address that is initialized to zero).
168
169Note that two load (gather) instructions are needed because the AVX512 version processes 16-lanes in parallel, requiring 16 times 64-bit = 1024 bits in total to be loaded. A simple `VALIGND` and `VPORD` are subsequently used to merge the lower and upper halves together into a single ZMM register (that contains 16 lanes of 32-bit DWORDS).
170
171### Masking support
172
173Due to the fact that pointers are passed directly from the Golang slices, we need to protect against NULL pointers.
174For this a 16-bit mask is passed in the AVX512 assembly code which is used during the `VPGATHERQD` instructions to mask out lanes that could otherwise result in segment violations.
175
176### Minor optimizations
177
178The `roll` macro (three instructions on AVX2) is no longer needed for AVX512 and is replaced by a single `VPROLD` instruction.
179
180Also several logical operations from the various ROUNDS of the AVX2 version could be combined into a single instruction using ternary logic (with the `VPTERMLOGD` instruction), resulting in a further simplification and speed-up.
181
182## Low level block function performance
183
184The benchmark below shows the (single thread) maximum performance of the `block()` function for AVX2 (having 8 lanes) and AVX512 (having 16 lanes). Also the baseline single-core performance from the standard `crypto/md5` package is shown for comparison.
185
186```
187BenchmarkCryptoMd5-4 687.66 MB/s 0 B/op 0 allocs/op
188BenchmarkBlock8-4 4144.80 MB/s 0 B/op 0 allocs/op
189BenchmarkBlock16-4 8228.88 MB/s 0 B/op 0 allocs/op
190```
191
192## License
193
194`md5-simd` is released under the Apache License v2.0. You can find the complete text in the file LICENSE.
195
196## Contributing
197
198Contributions are welcome, please send PRs for any enhancements. \ No newline at end of file
diff --git a/vendor/github.com/minio/md5-simd/block16_amd64.s b/vendor/github.com/minio/md5-simd/block16_amd64.s
new file mode 100644
index 0000000..be0a43a
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/block16_amd64.s
@@ -0,0 +1,228 @@
1// Copyright (c) 2020 MinIO Inc. All rights reserved.
2// Use of this source code is governed by a license that can be
3// found in the LICENSE file.
4
5//+build !noasm,!appengine,gc
6
7// This is the AVX512 implementation of the MD5 block function (16-way parallel)
8
9#define prep(index) \
10 KMOVQ kmask, ktmp \
11 VPGATHERDD index*4(base)(ptrs*1), ktmp, mem
12
13#define ROUND1(a, b, c, d, index, const, shift) \
14 VPXORQ c, tmp, tmp \
15 VPADDD 64*const(consts), a, a \
16 VPADDD mem, a, a \
17 VPTERNLOGD $0x6C, b, d, tmp \
18 prep(index) \
19 VPADDD tmp, a, a \
20 VPROLD $shift, a, a \
21 VMOVAPD c, tmp \
22 VPADDD b, a, a
23
24#define ROUND1noload(a, b, c, d, const, shift) \
25 VPXORQ c, tmp, tmp \
26 VPADDD 64*const(consts), a, a \
27 VPADDD mem, a, a \
28 VPTERNLOGD $0x6C, b, d, tmp \
29 VPADDD tmp, a, a \
30 VPROLD $shift, a, a \
31 VMOVAPD c, tmp \
32 VPADDD b, a, a
33
34#define ROUND2(a, b, c, d, zreg, const, shift) \
35 VPADDD 64*const(consts), a, a \
36 VPADDD zreg, a, a \
37 VANDNPD c, tmp, tmp \
38 VPTERNLOGD $0xEC, b, tmp, tmp2 \
39 VMOVAPD c, tmp \
40 VPADDD tmp2, a, a \
41 VMOVAPD c, tmp2 \
42 VPROLD $shift, a, a \
43 VPADDD b, a, a
44
45#define ROUND3(a, b, c, d, zreg, const, shift) \
46 VPADDD 64*const(consts), a, a \
47 VPADDD zreg, a, a \
48 VPTERNLOGD $0x96, b, d, tmp \
49 VPADDD tmp, a, a \
50 VPROLD $shift, a, a \
51 VMOVAPD b, tmp \
52 VPADDD b, a, a
53
54#define ROUND4(a, b, c, d, zreg, const, shift) \
55 VPADDD 64*const(consts), a, a \
56 VPADDD zreg, a, a \
57 VPTERNLOGD $0x36, b, c, tmp \
58 VPADDD tmp, a, a \
59 VPROLD $shift, a, a \
60 VPXORQ c, ones, tmp \
61 VPADDD b, a, a
62
63TEXT ·block16(SB), 4, $0-40
64
65 MOVQ state+0(FP), BX
66 MOVQ base+8(FP), SI
67 MOVQ ptrs+16(FP), AX
68 KMOVQ mask+24(FP), K1
69 MOVQ n+32(FP), DX
70 MOVQ ·avx512md5consts+0(SB), DI
71
72#define a Z0
73#define b Z1
74#define c Z2
75#define d Z3
76
77#define sa Z4
78#define sb Z5
79#define sc Z6
80#define sd Z7
81
82#define tmp Z8
83#define tmp2 Z9
84#define ptrs Z10
85#define ones Z12
86#define mem Z15
87
88#define kmask K1
89#define ktmp K3
90
91// ----------------------------------------------------------
92// Registers Z16 through to Z31 are used for caching purposes
93// ----------------------------------------------------------
94
95#define dig BX
96#define count DX
97#define base SI
98#define consts DI
99
100 // load digest into state registers
101 VMOVUPD (dig), a
102 VMOVUPD 0x40(dig), b
103 VMOVUPD 0x80(dig), c
104 VMOVUPD 0xc0(dig), d
105
106 // load source pointers
107 VMOVUPD 0x00(AX), ptrs
108
109 MOVQ $-1, AX
110 VPBROADCASTQ AX, ones
111
112loop:
113 VMOVAPD a, sa
114 VMOVAPD b, sb
115 VMOVAPD c, sc
116 VMOVAPD d, sd
117
118 prep(0)
119 VMOVAPD d, tmp
120 VMOVAPD mem, Z16
121
122 ROUND1(a,b,c,d, 1,0x00, 7)
123 VMOVAPD mem, Z17
124 ROUND1(d,a,b,c, 2,0x01,12)
125 VMOVAPD mem, Z18
126 ROUND1(c,d,a,b, 3,0x02,17)
127 VMOVAPD mem, Z19
128 ROUND1(b,c,d,a, 4,0x03,22)
129 VMOVAPD mem, Z20
130 ROUND1(a,b,c,d, 5,0x04, 7)
131 VMOVAPD mem, Z21
132 ROUND1(d,a,b,c, 6,0x05,12)
133 VMOVAPD mem, Z22
134 ROUND1(c,d,a,b, 7,0x06,17)
135 VMOVAPD mem, Z23
136 ROUND1(b,c,d,a, 8,0x07,22)
137 VMOVAPD mem, Z24
138 ROUND1(a,b,c,d, 9,0x08, 7)
139 VMOVAPD mem, Z25
140 ROUND1(d,a,b,c,10,0x09,12)
141 VMOVAPD mem, Z26
142 ROUND1(c,d,a,b,11,0x0a,17)
143 VMOVAPD mem, Z27
144 ROUND1(b,c,d,a,12,0x0b,22)
145 VMOVAPD mem, Z28
146 ROUND1(a,b,c,d,13,0x0c, 7)
147 VMOVAPD mem, Z29
148 ROUND1(d,a,b,c,14,0x0d,12)
149 VMOVAPD mem, Z30
150 ROUND1(c,d,a,b,15,0x0e,17)
151 VMOVAPD mem, Z31
152
153 ROUND1noload(b,c,d,a, 0x0f,22)
154
155 VMOVAPD d, tmp
156 VMOVAPD d, tmp2
157
158 ROUND2(a,b,c,d, Z17,0x10, 5)
159 ROUND2(d,a,b,c, Z22,0x11, 9)
160 ROUND2(c,d,a,b, Z27,0x12,14)
161 ROUND2(b,c,d,a, Z16,0x13,20)
162 ROUND2(a,b,c,d, Z21,0x14, 5)
163 ROUND2(d,a,b,c, Z26,0x15, 9)
164 ROUND2(c,d,a,b, Z31,0x16,14)
165 ROUND2(b,c,d,a, Z20,0x17,20)
166 ROUND2(a,b,c,d, Z25,0x18, 5)
167 ROUND2(d,a,b,c, Z30,0x19, 9)
168 ROUND2(c,d,a,b, Z19,0x1a,14)
169 ROUND2(b,c,d,a, Z24,0x1b,20)
170 ROUND2(a,b,c,d, Z29,0x1c, 5)
171 ROUND2(d,a,b,c, Z18,0x1d, 9)
172 ROUND2(c,d,a,b, Z23,0x1e,14)
173 ROUND2(b,c,d,a, Z28,0x1f,20)
174
175 VMOVAPD c, tmp
176
177 ROUND3(a,b,c,d, Z21,0x20, 4)
178 ROUND3(d,a,b,c, Z24,0x21,11)
179 ROUND3(c,d,a,b, Z27,0x22,16)
180 ROUND3(b,c,d,a, Z30,0x23,23)
181 ROUND3(a,b,c,d, Z17,0x24, 4)
182 ROUND3(d,a,b,c, Z20,0x25,11)
183 ROUND3(c,d,a,b, Z23,0x26,16)
184 ROUND3(b,c,d,a, Z26,0x27,23)
185 ROUND3(a,b,c,d, Z29,0x28, 4)
186 ROUND3(d,a,b,c, Z16,0x29,11)
187 ROUND3(c,d,a,b, Z19,0x2a,16)
188 ROUND3(b,c,d,a, Z22,0x2b,23)
189 ROUND3(a,b,c,d, Z25,0x2c, 4)
190 ROUND3(d,a,b,c, Z28,0x2d,11)
191 ROUND3(c,d,a,b, Z31,0x2e,16)
192 ROUND3(b,c,d,a, Z18,0x2f,23)
193
194 VPXORQ d, ones, tmp
195
196 ROUND4(a,b,c,d, Z16,0x30, 6)
197 ROUND4(d,a,b,c, Z23,0x31,10)
198 ROUND4(c,d,a,b, Z30,0x32,15)
199 ROUND4(b,c,d,a, Z21,0x33,21)
200 ROUND4(a,b,c,d, Z28,0x34, 6)
201 ROUND4(d,a,b,c, Z19,0x35,10)
202 ROUND4(c,d,a,b, Z26,0x36,15)
203 ROUND4(b,c,d,a, Z17,0x37,21)
204 ROUND4(a,b,c,d, Z24,0x38, 6)
205 ROUND4(d,a,b,c, Z31,0x39,10)
206 ROUND4(c,d,a,b, Z22,0x3a,15)
207 ROUND4(b,c,d,a, Z29,0x3b,21)
208 ROUND4(a,b,c,d, Z20,0x3c, 6)
209 ROUND4(d,a,b,c, Z27,0x3d,10)
210 ROUND4(c,d,a,b, Z18,0x3e,15)
211 ROUND4(b,c,d,a, Z25,0x3f,21)
212
213 VPADDD sa, a, a
214 VPADDD sb, b, b
215 VPADDD sc, c, c
216 VPADDD sd, d, d
217
218 LEAQ 64(base), base
219 SUBQ $64, count
220 JNE loop
221
222 VMOVUPD a, (dig)
223 VMOVUPD b, 0x40(dig)
224 VMOVUPD c, 0x80(dig)
225 VMOVUPD d, 0xc0(dig)
226
227 VZEROUPPER
228 RET
diff --git a/vendor/github.com/minio/md5-simd/block8_amd64.s b/vendor/github.com/minio/md5-simd/block8_amd64.s
new file mode 100644
index 0000000..f57db17
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/block8_amd64.s
@@ -0,0 +1,281 @@
1//+build !noasm,!appengine,gc
2
3// Copyright (c) 2018 Igneous Systems
4// MIT License
5//
6// Permission is hereby granted, free of charge, to any person obtaining a copy
7// of this software and associated documentation files (the "Software"), to deal
8// in the Software without restriction, including without limitation the rights
9// to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
10// copies of the Software, and to permit persons to whom the Software is
11// furnished to do so, subject to the following conditions:
12//
13// The above copyright notice and this permission notice shall be included in all
14// copies or substantial portions of the Software.
15//
16// THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
17// IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
18// FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
19// AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
20// LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
21// OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
22// SOFTWARE.
23
24// Copyright (c) 2020 MinIO Inc. All rights reserved.
25// Use of this source code is governed by a license that can be
26// found in the LICENSE file.
27
28// This is the AVX2 implementation of the MD5 block function (8-way parallel)
29
30// block8(state *uint64, base uintptr, bufs *int32, cache *byte, n int)
31TEXT ·block8(SB), 4, $0-40
32 MOVQ state+0(FP), BX
33 MOVQ base+8(FP), SI
34 MOVQ bufs+16(FP), AX
35 MOVQ cache+24(FP), CX
36 MOVQ n+32(FP), DX
37 MOVQ ·avx256md5consts+0(SB), DI
38
39 // Align cache (which is stack allocated by the compiler)
40 // to a 256 bit boundary (ymm register alignment)
41 // The cache8 type is deliberately oversized to permit this.
42 ADDQ $31, CX
43 ANDB $-32, CL
44
45#define a Y0
46#define b Y1
47#define c Y2
48#define d Y3
49
50#define sa Y4
51#define sb Y5
52#define sc Y6
53#define sd Y7
54
55#define tmp Y8
56#define tmp2 Y9
57
58#define mask Y10
59#define off Y11
60
61#define ones Y12
62
63#define rtmp1 Y13
64#define rtmp2 Y14
65
66#define mem Y15
67
68#define dig BX
69#define cache CX
70#define count DX
71#define base SI
72#define consts DI
73
74#define prepmask \
75 VPXOR mask, mask, mask \
76 VPCMPGTD mask, off, mask
77
78#define prep(index) \
79 VMOVAPD mask, rtmp2 \
80 VPGATHERDD rtmp2, index*4(base)(off*1), mem
81
82#define load(index) \
83 VMOVAPD index*32(cache), mem
84
85#define store(index) \
86 VMOVAPD mem, index*32(cache)
87
88#define roll(shift, a) \
89 VPSLLD $shift, a, rtmp1 \
90 VPSRLD $32-shift, a, a \
91 VPOR rtmp1, a, a
92
93#define ROUND1(a, b, c, d, index, const, shift) \
94 VPXOR c, tmp, tmp \
95 VPADDD 32*const(consts), a, a \
96 VPADDD mem, a, a \
97 VPAND b, tmp, tmp \
98 VPXOR d, tmp, tmp \
99 prep(index) \
100 VPADDD tmp, a, a \
101 roll(shift,a) \
102 VMOVAPD c, tmp \
103 VPADDD b, a, a
104
105#define ROUND1load(a, b, c, d, index, const, shift) \
106 VXORPD c, tmp, tmp \
107 VPADDD 32*const(consts), a, a \
108 VPADDD mem, a, a \
109 VPAND b, tmp, tmp \
110 VPXOR d, tmp, tmp \
111 load(index) \
112 VPADDD tmp, a, a \
113 roll(shift,a) \
114 VMOVAPD c, tmp \
115 VPADDD b, a, a
116
117#define ROUND2(a, b, c, d, index, const, shift) \
118 VPADDD 32*const(consts), a, a \
119 VPADDD mem, a, a \
120 VPAND b, tmp2, tmp2 \
121 VANDNPD c, tmp, tmp \
122 load(index) \
123 VPOR tmp, tmp2, tmp2 \
124 VMOVAPD c, tmp \
125 VPADDD tmp2, a, a \
126 VMOVAPD c, tmp2 \
127 roll(shift,a) \
128 VPADDD b, a, a
129
130#define ROUND3(a, b, c, d, index, const, shift) \
131 VPADDD 32*const(consts), a, a \
132 VPADDD mem, a, a \
133 load(index) \
134 VPXOR d, tmp, tmp \
135 VPXOR b, tmp, tmp \
136 VPADDD tmp, a, a \
137 roll(shift,a) \
138 VMOVAPD b, tmp \
139 VPADDD b, a, a
140
141#define ROUND4(a, b, c, d, index, const, shift) \
142 VPADDD 32*const(consts), a, a \
143 VPADDD mem, a, a \
144 VPOR b, tmp, tmp \
145 VPXOR c, tmp, tmp \
146 VPADDD tmp, a, a \
147 load(index) \
148 roll(shift,a) \
149 VPXOR c, ones, tmp \
150 VPADDD b, a, a
151
152 // load digest into state registers
153 VMOVUPD (dig), a
154 VMOVUPD 32(dig), b
155 VMOVUPD 64(dig), c
156 VMOVUPD 96(dig), d
157
158 // load source buffer offsets
159 VMOVUPD (AX), off
160
161 prepmask
162 VPCMPEQD ones, ones, ones
163
164loop:
165 VMOVAPD a, sa
166 VMOVAPD b, sb
167 VMOVAPD c, sc
168 VMOVAPD d, sd
169
170 prep(0)
171 VMOVAPD d, tmp
172 store(0)
173
174 ROUND1(a,b,c,d, 1,0x00, 7)
175 store(1)
176 ROUND1(d,a,b,c, 2,0x01,12)
177 store(2)
178 ROUND1(c,d,a,b, 3,0x02,17)
179 store(3)
180 ROUND1(b,c,d,a, 4,0x03,22)
181 store(4)
182 ROUND1(a,b,c,d, 5,0x04, 7)
183 store(5)
184 ROUND1(d,a,b,c, 6,0x05,12)
185 store(6)
186 ROUND1(c,d,a,b, 7,0x06,17)
187 store(7)
188 ROUND1(b,c,d,a, 8,0x07,22)
189 store(8)
190 ROUND1(a,b,c,d, 9,0x08, 7)
191 store(9)
192 ROUND1(d,a,b,c,10,0x09,12)
193 store(10)
194 ROUND1(c,d,a,b,11,0x0a,17)
195 store(11)
196 ROUND1(b,c,d,a,12,0x0b,22)
197 store(12)
198 ROUND1(a,b,c,d,13,0x0c, 7)
199 store(13)
200 ROUND1(d,a,b,c,14,0x0d,12)
201 store(14)
202 ROUND1(c,d,a,b,15,0x0e,17)
203 store(15)
204 ROUND1load(b,c,d,a, 1,0x0f,22)
205
206 VMOVAPD d, tmp
207 VMOVAPD d, tmp2
208
209 ROUND2(a,b,c,d, 6,0x10, 5)
210 ROUND2(d,a,b,c,11,0x11, 9)
211 ROUND2(c,d,a,b, 0,0x12,14)
212 ROUND2(b,c,d,a, 5,0x13,20)
213 ROUND2(a,b,c,d,10,0x14, 5)
214 ROUND2(d,a,b,c,15,0x15, 9)
215 ROUND2(c,d,a,b, 4,0x16,14)
216 ROUND2(b,c,d,a, 9,0x17,20)
217 ROUND2(a,b,c,d,14,0x18, 5)
218 ROUND2(d,a,b,c, 3,0x19, 9)
219 ROUND2(c,d,a,b, 8,0x1a,14)
220 ROUND2(b,c,d,a,13,0x1b,20)
221 ROUND2(a,b,c,d, 2,0x1c, 5)
222 ROUND2(d,a,b,c, 7,0x1d, 9)
223 ROUND2(c,d,a,b,12,0x1e,14)
224 ROUND2(b,c,d,a, 0,0x1f,20)
225
226 load(5)
227 VMOVAPD c, tmp
228
229 ROUND3(a,b,c,d, 8,0x20, 4)
230 ROUND3(d,a,b,c,11,0x21,11)
231 ROUND3(c,d,a,b,14,0x22,16)
232 ROUND3(b,c,d,a, 1,0x23,23)
233 ROUND3(a,b,c,d, 4,0x24, 4)
234 ROUND3(d,a,b,c, 7,0x25,11)
235 ROUND3(c,d,a,b,10,0x26,16)
236 ROUND3(b,c,d,a,13,0x27,23)
237 ROUND3(a,b,c,d, 0,0x28, 4)
238 ROUND3(d,a,b,c, 3,0x29,11)
239 ROUND3(c,d,a,b, 6,0x2a,16)
240 ROUND3(b,c,d,a, 9,0x2b,23)
241 ROUND3(a,b,c,d,12,0x2c, 4)
242 ROUND3(d,a,b,c,15,0x2d,11)
243 ROUND3(c,d,a,b, 2,0x2e,16)
244 ROUND3(b,c,d,a, 0,0x2f,23)
245
246 load(0)
247 VPXOR d, ones, tmp
248
249 ROUND4(a,b,c,d, 7,0x30, 6)
250 ROUND4(d,a,b,c,14,0x31,10)
251 ROUND4(c,d,a,b, 5,0x32,15)
252 ROUND4(b,c,d,a,12,0x33,21)
253 ROUND4(a,b,c,d, 3,0x34, 6)
254 ROUND4(d,a,b,c,10,0x35,10)
255 ROUND4(c,d,a,b, 1,0x36,15)
256 ROUND4(b,c,d,a, 8,0x37,21)
257 ROUND4(a,b,c,d,15,0x38, 6)
258 ROUND4(d,a,b,c, 6,0x39,10)
259 ROUND4(c,d,a,b,13,0x3a,15)
260 ROUND4(b,c,d,a, 4,0x3b,21)
261 ROUND4(a,b,c,d,11,0x3c, 6)
262 ROUND4(d,a,b,c, 2,0x3d,10)
263 ROUND4(c,d,a,b, 9,0x3e,15)
264 ROUND4(b,c,d,a, 0,0x3f,21)
265
266 VPADDD sa, a, a
267 VPADDD sb, b, b
268 VPADDD sc, c, c
269 VPADDD sd, d, d
270
271 LEAQ 64(base), base
272 SUBQ $64, count
273 JNE loop
274
275 VMOVUPD a, (dig)
276 VMOVUPD b, 32(dig)
277 VMOVUPD c, 64(dig)
278 VMOVUPD d, 96(dig)
279
280 VZEROUPPER
281 RET
diff --git a/vendor/github.com/minio/md5-simd/block_amd64.go b/vendor/github.com/minio/md5-simd/block_amd64.go
new file mode 100644
index 0000000..16edda2
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/block_amd64.go
@@ -0,0 +1,210 @@
1//+build !noasm,!appengine,gc
2
3// Copyright (c) 2020 MinIO Inc. All rights reserved.
4// Use of this source code is governed by a license that can be
5// found in the LICENSE file.
6
7package md5simd
8
9import (
10 "fmt"
11 "math"
12 "unsafe"
13
14 "github.com/klauspost/cpuid/v2"
15)
16
17var hasAVX512 bool
18
19func init() {
20 // VANDNPD requires AVX512DQ. Technically it could be VPTERNLOGQ which is AVX512F.
21 hasAVX512 = cpuid.CPU.Supports(cpuid.AVX512F, cpuid.AVX512DQ)
22}
23
24//go:noescape
25func block8(state *uint32, base uintptr, bufs *int32, cache *byte, n int)
26
27//go:noescape
28func block16(state *uint32, base uintptr, ptrs *int32, mask uint64, n int)
29
30// 8-way 4x uint32 digests in 4 ymm registers
31// (ymm0, ymm1, ymm2, ymm3)
32type digest8 struct {
33 v0, v1, v2, v3 [8]uint32
34}
35
36// Stack cache for 8x64 byte md5.BlockSize bytes.
37// Must be 32-byte aligned, so allocate 512+32 and
38// align upwards at runtime.
39type cache8 [512 + 32]byte
40
41// MD5 magic numbers for one lane of hashing; inflated
42// 8x below at init time.
43var md5consts = [64]uint32{
44 0xd76aa478, 0xe8c7b756, 0x242070db, 0xc1bdceee,
45 0xf57c0faf, 0x4787c62a, 0xa8304613, 0xfd469501,
46 0x698098d8, 0x8b44f7af, 0xffff5bb1, 0x895cd7be,
47 0x6b901122, 0xfd987193, 0xa679438e, 0x49b40821,
48 0xf61e2562, 0xc040b340, 0x265e5a51, 0xe9b6c7aa,
49 0xd62f105d, 0x02441453, 0xd8a1e681, 0xe7d3fbc8,
50 0x21e1cde6, 0xc33707d6, 0xf4d50d87, 0x455a14ed,
51 0xa9e3e905, 0xfcefa3f8, 0x676f02d9, 0x8d2a4c8a,
52 0xfffa3942, 0x8771f681, 0x6d9d6122, 0xfde5380c,
53 0xa4beea44, 0x4bdecfa9, 0xf6bb4b60, 0xbebfbc70,
54 0x289b7ec6, 0xeaa127fa, 0xd4ef3085, 0x04881d05,
55 0xd9d4d039, 0xe6db99e5, 0x1fa27cf8, 0xc4ac5665,
56 0xf4292244, 0x432aff97, 0xab9423a7, 0xfc93a039,
57 0x655b59c3, 0x8f0ccc92, 0xffeff47d, 0x85845dd1,
58 0x6fa87e4f, 0xfe2ce6e0, 0xa3014314, 0x4e0811a1,
59 0xf7537e82, 0xbd3af235, 0x2ad7d2bb, 0xeb86d391,
60}
61
62// inflate the consts 8-way for 8x md5 (256 bit ymm registers)
63var avx256md5consts = func(c []uint32) []uint32 {
64 inf := make([]uint32, 8*len(c))
65 for i := range c {
66 for j := 0; j < 8; j++ {
67 inf[(i*8)+j] = c[i]
68 }
69 }
70 return inf
71}(md5consts[:])
72
73// 16-way 4x uint32 digests in 4 zmm registers
74type digest16 struct {
75 v0, v1, v2, v3 [16]uint32
76}
77
78// inflate the consts 16-way for 16x md5 (512 bit zmm registers)
79var avx512md5consts = func(c []uint32) []uint32 {
80 inf := make([]uint32, 16*len(c))
81 for i := range c {
82 for j := 0; j < 16; j++ {
83 inf[(i*16)+j] = c[i]
84 }
85 }
86 return inf
87}(md5consts[:])
88
89// Interface function to assembly code
90func (s *md5Server) blockMd5_x16(d *digest16, input [16][]byte, half bool) {
91 if hasAVX512 {
92 blockMd5_avx512(d, input, s.allBufs, &s.maskRounds16)
93 return
94 }
95
96 // Preparing data using copy is slower since copies aren't inlined.
97
98 // Calculate on this goroutine
99 if half {
100 for i := range s.i8[0][:] {
101 s.i8[0][i] = input[i]
102 }
103 for i := range s.d8a.v0[:] {
104 s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i] = d.v0[i], d.v1[i], d.v2[i], d.v3[i]
105 }
106 blockMd5_avx2(&s.d8a, s.i8[0], s.allBufs, &s.maskRounds8a)
107 for i := range s.d8a.v0[:] {
108 d.v0[i], d.v1[i], d.v2[i], d.v3[i] = s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i]
109 }
110 return
111 }
112
113 for i := range s.i8[0][:] {
114 s.i8[0][i], s.i8[1][i] = input[i], input[8+i]
115 }
116
117 for i := range s.d8a.v0[:] {
118 j := (i + 8) & 15
119 s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i] = d.v0[i], d.v1[i], d.v2[i], d.v3[i]
120 s.d8b.v0[i], s.d8b.v1[i], s.d8b.v2[i], s.d8b.v3[i] = d.v0[j], d.v1[j], d.v2[j], d.v3[j]
121 }
122
123 // Benchmarks appears to be slightly faster when spinning up 2 goroutines instead
124 // of using the current for one of the blocks.
125 s.wg.Add(2)
126 go func() { blockMd5_avx2(&s.d8a, s.i8[0], s.allBufs, &s.maskRounds8a); s.wg.Done() }()
127 go func() { blockMd5_avx2(&s.d8b, s.i8[1], s.allBufs, &s.maskRounds8b); s.wg.Done() }()
128 s.wg.Wait()
129 for i := range s.d8a.v0[:] {
130 d.v0[i], d.v1[i], d.v2[i], d.v3[i] = s.d8a.v0[i], s.d8a.v1[i], s.d8a.v2[i], s.d8a.v3[i]
131 }
132 for i := range s.d8b.v0[:] {
133 j := (i + 8) & 15
134 d.v0[j], d.v1[j], d.v2[j], d.v3[j] = s.d8b.v0[i], s.d8b.v1[i], s.d8b.v2[i], s.d8b.v3[i]
135 }
136}
137
138// Interface function to AVX512 assembly code
139func blockMd5_avx512(s *digest16, input [16][]byte, base []byte, maskRounds *[16]maskRounds) {
140 baseMin := uint64(uintptr(unsafe.Pointer(&(base[0]))))
141 ptrs := [16]int32{}
142
143 for i := range ptrs {
144 if len(input[i]) > 0 {
145 if len(input[i]) > internalBlockSize {
146 panic(fmt.Sprintf("Sanity check fails for lane %d: maximum input length cannot exceed internalBlockSize", i))
147 }
148
149 off := uint64(uintptr(unsafe.Pointer(&(input[i][0])))) - baseMin
150 if off > math.MaxUint32 {
151 panic(fmt.Sprintf("invalid buffer sent with offset %x", off))
152 }
153 ptrs[i] = int32(off)
154 }
155 }
156
157 sdup := *s // create copy of initial states to receive intermediate updates
158
159 rounds := generateMaskAndRounds16(input, maskRounds)
160
161 for r := 0; r < rounds; r++ {
162 m := maskRounds[r]
163
164 block16(&sdup.v0[0], uintptr(baseMin), &ptrs[0], m.mask, int(64*m.rounds))
165
166 for j := 0; j < len(ptrs); j++ {
167 ptrs[j] += int32(64 * m.rounds) // update pointers for next round
168 if m.mask&(1<<j) != 0 { // update digest if still masked as active
169 (*s).v0[j], (*s).v1[j], (*s).v2[j], (*s).v3[j] = sdup.v0[j], sdup.v1[j], sdup.v2[j], sdup.v3[j]
170 }
171 }
172 }
173}
174
175// Interface function to AVX2 assembly code
176func blockMd5_avx2(s *digest8, input [8][]byte, base []byte, maskRounds *[8]maskRounds) {
177 baseMin := uint64(uintptr(unsafe.Pointer(&(base[0])))) - 4
178 ptrs := [8]int32{}
179
180 for i := range ptrs {
181 if len(input[i]) > 0 {
182 if len(input[i]) > internalBlockSize {
183 panic(fmt.Sprintf("Sanity check fails for lane %d: maximum input length cannot exceed internalBlockSize", i))
184 }
185
186 off := uint64(uintptr(unsafe.Pointer(&(input[i][0])))) - baseMin
187 if off > math.MaxUint32 {
188 panic(fmt.Sprintf("invalid buffer sent with offset %x", off))
189 }
190 ptrs[i] = int32(off)
191 }
192 }
193
194 sdup := *s // create copy of initial states to receive intermediate updates
195
196 rounds := generateMaskAndRounds8(input, maskRounds)
197
198 for r := 0; r < rounds; r++ {
199 m := maskRounds[r]
200 var cache cache8 // stack storage for block8 tmp state
201 block8(&sdup.v0[0], uintptr(baseMin), &ptrs[0], &cache[0], int(64*m.rounds))
202
203 for j := 0; j < len(ptrs); j++ {
204 ptrs[j] += int32(64 * m.rounds) // update pointers for next round
205 if m.mask&(1<<j) != 0 { // update digest if still masked as active
206 (*s).v0[j], (*s).v1[j], (*s).v2[j], (*s).v3[j] = sdup.v0[j], sdup.v1[j], sdup.v2[j], sdup.v3[j]
207 }
208 }
209 }
210}
diff --git a/vendor/github.com/minio/md5-simd/md5-digest_amd64.go b/vendor/github.com/minio/md5-simd/md5-digest_amd64.go
new file mode 100644
index 0000000..5ea23a4
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/md5-digest_amd64.go
@@ -0,0 +1,188 @@
1//+build !noasm,!appengine,gc
2
3// Copyright (c) 2020 MinIO Inc. All rights reserved.
4// Use of this source code is governed by a license that can be
5// found in the LICENSE file.
6
7package md5simd
8
9import (
10 "encoding/binary"
11 "errors"
12 "fmt"
13 "sync"
14 "sync/atomic"
15)
16
17// md5Digest - Type for computing MD5 using either AVX2 or AVX512
18type md5Digest struct {
19 uid uint64
20 blocksCh chan blockInput
21 cycleServer chan uint64
22 x [BlockSize]byte
23 nx int
24 len uint64
25 buffers <-chan []byte
26}
27
28// NewHash - initialize instance for Md5 implementation.
29func (s *md5Server) NewHash() Hasher {
30 uid := atomic.AddUint64(&s.uidCounter, 1)
31 blockCh := make(chan blockInput, buffersPerLane)
32 s.newInput <- newClient{
33 uid: uid,
34 input: blockCh,
35 }
36 return &md5Digest{
37 uid: uid,
38 buffers: s.buffers,
39 blocksCh: blockCh,
40 cycleServer: s.cycle,
41 }
42}
43
44// Size - Return size of checksum
45func (d *md5Digest) Size() int { return Size }
46
47// BlockSize - Return blocksize of checksum
48func (d md5Digest) BlockSize() int { return BlockSize }
49
50func (d *md5Digest) Reset() {
51 if d.blocksCh == nil {
52 panic("reset after close")
53 }
54 d.nx = 0
55 d.len = 0
56 d.sendBlock(blockInput{uid: d.uid, reset: true}, false)
57}
58
59// write to digest
60func (d *md5Digest) Write(p []byte) (nn int, err error) {
61 if d.blocksCh == nil {
62 return 0, errors.New("md5Digest closed")
63 }
64
65 // break input into chunks of maximum internalBlockSize size
66 for {
67 l := len(p)
68 if l > internalBlockSize {
69 l = internalBlockSize
70 }
71 nnn, err := d.write(p[:l])
72 if err != nil {
73 return nn, err
74 }
75 nn += nnn
76 p = p[l:]
77
78 if len(p) == 0 {
79 break
80 }
81
82 }
83 return
84}
85
86func (d *md5Digest) write(p []byte) (nn int, err error) {
87
88 nn = len(p)
89 d.len += uint64(nn)
90 if d.nx > 0 {
91 n := copy(d.x[d.nx:], p)
92 d.nx += n
93 if d.nx == BlockSize {
94 // Create a copy of the overflow buffer in order to send it async over the channel
95 // (since we will modify the overflow buffer down below with any access beyond multiples of 64)
96 tmp := <-d.buffers
97 tmp = tmp[:BlockSize]
98 copy(tmp, d.x[:])
99 d.sendBlock(blockInput{uid: d.uid, msg: tmp}, len(p)-n < BlockSize)
100 d.nx = 0
101 }
102 p = p[n:]
103 }
104 if len(p) >= BlockSize {
105 n := len(p) &^ (BlockSize - 1)
106 buf := <-d.buffers
107 buf = buf[:n]
108 copy(buf, p)
109 d.sendBlock(blockInput{uid: d.uid, msg: buf}, len(p)-n < BlockSize)
110 p = p[n:]
111 }
112 if len(p) > 0 {
113 d.nx = copy(d.x[:], p)
114 }
115 return
116}
117
118func (d *md5Digest) Close() {
119 if d.blocksCh != nil {
120 close(d.blocksCh)
121 d.blocksCh = nil
122 }
123}
124
125var sumChPool sync.Pool
126
127func init() {
128 sumChPool.New = func() interface{} {
129 return make(chan sumResult, 1)
130 }
131}
132
133// Sum - Return MD5 sum in bytes
134func (d *md5Digest) Sum(in []byte) (result []byte) {
135 if d.blocksCh == nil {
136 panic("sum after close")
137 }
138
139 trail := <-d.buffers
140 trail = append(trail[:0], d.x[:d.nx]...)
141
142 length := d.len
143 // Padding. Add a 1 bit and 0 bits until 56 bytes mod 64.
144 var tmp [64]byte
145 tmp[0] = 0x80
146 if length%64 < 56 {
147 trail = append(trail, tmp[0:56-length%64]...)
148 } else {
149 trail = append(trail, tmp[0:64+56-length%64]...)
150 }
151
152 // Length in bits.
153 length <<= 3
154 binary.LittleEndian.PutUint64(tmp[:], length) // append length in bits
155
156 trail = append(trail, tmp[0:8]...)
157 if len(trail)%BlockSize != 0 {
158 panic(fmt.Errorf("internal error: sum block was not aligned. len=%d, nx=%d", len(trail), d.nx))
159 }
160 sumCh := sumChPool.Get().(chan sumResult)
161 d.sendBlock(blockInput{uid: d.uid, msg: trail, sumCh: sumCh}, true)
162
163 sum := <-sumCh
164 sumChPool.Put(sumCh)
165
166 return append(in, sum.digest[:]...)
167}
168
169// sendBlock will send a block for processing.
170// If cycle is true we will block on cycle, otherwise we will only block
171// if the block channel is full.
172func (d *md5Digest) sendBlock(bi blockInput, cycle bool) {
173 if cycle {
174 select {
175 case d.blocksCh <- bi:
176 d.cycleServer <- d.uid
177 }
178 return
179 }
180 // Only block on cycle if we filled the buffer
181 select {
182 case d.blocksCh <- bi:
183 return
184 default:
185 d.cycleServer <- d.uid
186 d.blocksCh <- bi
187 }
188}
diff --git a/vendor/github.com/minio/md5-simd/md5-server_amd64.go b/vendor/github.com/minio/md5-simd/md5-server_amd64.go
new file mode 100644
index 0000000..94f741c
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/md5-server_amd64.go
@@ -0,0 +1,397 @@
1//+build !noasm,!appengine,gc
2
3// Copyright (c) 2020 MinIO Inc. All rights reserved.
4// Use of this source code is governed by a license that can be
5// found in the LICENSE file.
6
7package md5simd
8
9import (
10 "encoding/binary"
11 "fmt"
12 "runtime"
13 "sync"
14
15 "github.com/klauspost/cpuid/v2"
16)
17
18// MD5 initialization constants
19const (
20 // Lanes is the number of concurrently calculated hashes.
21 Lanes = 16
22
23 init0 = 0x67452301
24 init1 = 0xefcdab89
25 init2 = 0x98badcfe
26 init3 = 0x10325476
27
28 // Use scalar routine when below this many lanes
29 useScalarBelow = 3
30)
31
32// md5ServerUID - Does not start at 0 but next multiple of 16 so as to be able to
33// differentiate with default initialisation value of 0
34const md5ServerUID = Lanes
35
36const buffersPerLane = 3
37
38// Message to send across input channel
39type blockInput struct {
40 uid uint64
41 msg []byte
42 sumCh chan sumResult
43 reset bool
44}
45
46type sumResult struct {
47 digest [Size]byte
48}
49
50type lanesInfo [Lanes]blockInput
51
52// md5Server - Type to implement parallel handling of MD5 invocations
53type md5Server struct {
54 uidCounter uint64
55 cycle chan uint64 // client with uid has update.
56 newInput chan newClient // Add new client.
57 digests map[uint64][Size]byte // Map of uids to (interim) digest results
58 maskRounds16 [16]maskRounds // Pre-allocated static array for max 16 rounds
59 maskRounds8a [8]maskRounds // Pre-allocated static array for max 8 rounds (1st AVX2 core)
60 maskRounds8b [8]maskRounds // Pre-allocated static array for max 8 rounds (2nd AVX2 core)
61 allBufs []byte // Preallocated buffer.
62 buffers chan []byte // Preallocated buffers, sliced from allBufs.
63
64 i8 [2][8][]byte // avx2 temporary vars
65 d8a, d8b digest8
66 wg sync.WaitGroup
67}
68
69// NewServer - Create new object for parallel processing handling
70func NewServer() Server {
71 if !cpuid.CPU.Supports(cpuid.AVX2) {
72 return &fallbackServer{}
73 }
74 md5srv := &md5Server{}
75 md5srv.digests = make(map[uint64][Size]byte)
76 md5srv.newInput = make(chan newClient, Lanes)
77 md5srv.cycle = make(chan uint64, Lanes*10)
78 md5srv.uidCounter = md5ServerUID - 1
79 md5srv.allBufs = make([]byte, 32+buffersPerLane*Lanes*internalBlockSize)
80 md5srv.buffers = make(chan []byte, buffersPerLane*Lanes)
81 // Fill buffers.
82 for i := 0; i < buffersPerLane*Lanes; i++ {
83 s := 32 + i*internalBlockSize
84 md5srv.buffers <- md5srv.allBufs[s : s+internalBlockSize : s+internalBlockSize]
85 }
86
87 // Start a single thread for reading from the input channel
88 go md5srv.process(md5srv.newInput)
89 return md5srv
90}
91
92type newClient struct {
93 uid uint64
94 input chan blockInput
95}
96
97// process - Sole handler for reading from the input channel.
98func (s *md5Server) process(newClients chan newClient) {
99 // To fill up as many lanes as possible:
100 //
101 // 1. Wait for a cycle id.
102 // 2. If not already in a lane, add, otherwise leave on channel
103 // 3. Start timer
104 // 4. Check if lanes is full, if so, goto 10 (process).
105 // 5. If timeout, goto 10.
106 // 6. Wait for new id (goto 2) or timeout (goto 10).
107 // 10. Process.
108 // 11. Check all input if there is already input, if so add to lanes.
109 // 12. Goto 1
110
111 // lanes contains the lanes.
112 var lanes lanesInfo
113 // lanesFilled contains the number of filled lanes for current cycle.
114 var lanesFilled int
115 // clients contains active clients
116 var clients = make(map[uint64]chan blockInput, Lanes)
117
118 addToLane := func(uid uint64) {
119 cl, ok := clients[uid]
120 if !ok {
121 // Unknown client. Maybe it was already removed.
122 return
123 }
124 // Check if we already have it.
125 for _, lane := range lanes[:lanesFilled] {
126 if lane.uid == uid {
127 return
128 }
129 }
130 // Continue until we get a block or there is nothing on channel
131 for {
132 select {
133 case block, ok := <-cl:
134 if !ok {
135 // Client disconnected
136 delete(clients, block.uid)
137 return
138 }
139 if block.uid != uid {
140 panic(fmt.Errorf("uid mismatch, %d (block) != %d (client)", block.uid, uid))
141 }
142 // If reset message, reset and we're done
143 if block.reset {
144 delete(s.digests, uid)
145 continue
146 }
147
148 // If requesting sum, we will need to maintain state.
149 if block.sumCh != nil {
150 var dig digest
151 d, ok := s.digests[uid]
152 if ok {
153 dig.s[0] = binary.LittleEndian.Uint32(d[0:4])
154 dig.s[1] = binary.LittleEndian.Uint32(d[4:8])
155 dig.s[2] = binary.LittleEndian.Uint32(d[8:12])
156 dig.s[3] = binary.LittleEndian.Uint32(d[12:16])
157 } else {
158 dig.s[0], dig.s[1], dig.s[2], dig.s[3] = init0, init1, init2, init3
159 }
160
161 sum := sumResult{}
162 // Add end block to current digest.
163 blockScalar(&dig.s, block.msg)
164
165 binary.LittleEndian.PutUint32(sum.digest[0:], dig.s[0])
166 binary.LittleEndian.PutUint32(sum.digest[4:], dig.s[1])
167 binary.LittleEndian.PutUint32(sum.digest[8:], dig.s[2])
168 binary.LittleEndian.PutUint32(sum.digest[12:], dig.s[3])
169 block.sumCh <- sum
170 if block.msg != nil {
171 s.buffers <- block.msg
172 }
173 continue
174 }
175 if len(block.msg) == 0 {
176 continue
177 }
178 lanes[lanesFilled] = block
179 lanesFilled++
180 return
181 default:
182 return
183 }
184 }
185 }
186 addNewClient := func(cl newClient) {
187 if _, ok := clients[cl.uid]; ok {
188 panic("internal error: duplicate client registration")
189 }
190 clients[cl.uid] = cl.input
191 }
192
193 allLanesFilled := func() bool {
194 return lanesFilled == Lanes || lanesFilled >= len(clients)
195 }
196
197 for {
198 // Step 1.
199 for lanesFilled == 0 {
200 select {
201 case cl, ok := <-newClients:
202 if !ok {
203 return
204 }
205 addNewClient(cl)
206 // Check if it already sent a payload.
207 addToLane(cl.uid)
208 continue
209 case uid := <-s.cycle:
210 addToLane(uid)
211 }
212 }
213
214 fillLanes:
215 for !allLanesFilled() {
216 select {
217 case cl, ok := <-newClients:
218 if !ok {
219 return
220 }
221 addNewClient(cl)
222
223 case uid := <-s.cycle:
224 addToLane(uid)
225 default:
226 // Nothing more queued...
227 break fillLanes
228 }
229 }
230
231 // If we did not fill all lanes, check if there is more waiting
232 if !allLanesFilled() {
233 runtime.Gosched()
234 for uid := range clients {
235 addToLane(uid)
236 if allLanesFilled() {
237 break
238 }
239 }
240 }
241 if false {
242 if !allLanesFilled() {
243 fmt.Println("Not all lanes filled", lanesFilled, "of", len(clients))
244 //pprof.Lookup("goroutine").WriteTo(os.Stdout, 1)
245 } else if true {
246 fmt.Println("all lanes filled")
247 }
248 }
249 // Process the lanes we could collect
250 s.blocks(lanes[:lanesFilled])
251
252 // Clear lanes...
253 lanesFilled = 0
254 // Add all current queued
255 for uid := range clients {
256 addToLane(uid)
257 if allLanesFilled() {
258 break
259 }
260 }
261 }
262}
263
264func (s *md5Server) Close() {
265 if s.newInput != nil {
266 close(s.newInput)
267 s.newInput = nil
268 }
269}
270
271// Invoke assembly and send results back
272func (s *md5Server) blocks(lanes []blockInput) {
273 if len(lanes) < useScalarBelow {
274 // Use scalar routine when below this many lanes
275 switch len(lanes) {
276 case 0:
277 case 1:
278 lane := lanes[0]
279 var d digest
280 a, ok := s.digests[lane.uid]
281 if ok {
282 d.s[0] = binary.LittleEndian.Uint32(a[0:4])
283 d.s[1] = binary.LittleEndian.Uint32(a[4:8])
284 d.s[2] = binary.LittleEndian.Uint32(a[8:12])
285 d.s[3] = binary.LittleEndian.Uint32(a[12:16])
286 } else {
287 d.s[0] = init0
288 d.s[1] = init1
289 d.s[2] = init2
290 d.s[3] = init3
291 }
292 if len(lane.msg) > 0 {
293 // Update...
294 blockScalar(&d.s, lane.msg)
295 }
296 dig := [Size]byte{}
297 binary.LittleEndian.PutUint32(dig[0:], d.s[0])
298 binary.LittleEndian.PutUint32(dig[4:], d.s[1])
299 binary.LittleEndian.PutUint32(dig[8:], d.s[2])
300 binary.LittleEndian.PutUint32(dig[12:], d.s[3])
301 s.digests[lane.uid] = dig
302
303 if lane.msg != nil {
304 s.buffers <- lane.msg
305 }
306 lanes[0] = blockInput{}
307
308 default:
309 s.wg.Add(len(lanes))
310 var results [useScalarBelow]digest
311 for i := range lanes {
312 lane := lanes[i]
313 go func(i int) {
314 var d digest
315 defer s.wg.Done()
316 a, ok := s.digests[lane.uid]
317 if ok {
318 d.s[0] = binary.LittleEndian.Uint32(a[0:4])
319 d.s[1] = binary.LittleEndian.Uint32(a[4:8])
320 d.s[2] = binary.LittleEndian.Uint32(a[8:12])
321 d.s[3] = binary.LittleEndian.Uint32(a[12:16])
322 } else {
323 d.s[0] = init0
324 d.s[1] = init1
325 d.s[2] = init2
326 d.s[3] = init3
327 }
328 if len(lane.msg) == 0 {
329 results[i] = d
330 return
331 }
332 // Update...
333 blockScalar(&d.s, lane.msg)
334 results[i] = d
335 }(i)
336 }
337 s.wg.Wait()
338 for i, lane := range lanes {
339 dig := [Size]byte{}
340 binary.LittleEndian.PutUint32(dig[0:], results[i].s[0])
341 binary.LittleEndian.PutUint32(dig[4:], results[i].s[1])
342 binary.LittleEndian.PutUint32(dig[8:], results[i].s[2])
343 binary.LittleEndian.PutUint32(dig[12:], results[i].s[3])
344 s.digests[lane.uid] = dig
345
346 if lane.msg != nil {
347 s.buffers <- lane.msg
348 }
349 lanes[i] = blockInput{}
350 }
351 }
352 return
353 }
354
355 inputs := [16][]byte{}
356 for i := range lanes {
357 inputs[i] = lanes[i].msg
358 }
359
360 // Collect active digests...
361 state := s.getDigests(lanes)
362 // Process all lanes...
363 s.blockMd5_x16(&state, inputs, len(lanes) <= 8)
364
365 for i, lane := range lanes {
366 uid := lane.uid
367 dig := [Size]byte{}
368 binary.LittleEndian.PutUint32(dig[0:], state.v0[i])
369 binary.LittleEndian.PutUint32(dig[4:], state.v1[i])
370 binary.LittleEndian.PutUint32(dig[8:], state.v2[i])
371 binary.LittleEndian.PutUint32(dig[12:], state.v3[i])
372
373 s.digests[uid] = dig
374 if lane.msg != nil {
375 s.buffers <- lane.msg
376 }
377 lanes[i] = blockInput{}
378 }
379}
380
381func (s *md5Server) getDigests(lanes []blockInput) (d digest16) {
382 for i, lane := range lanes {
383 a, ok := s.digests[lane.uid]
384 if ok {
385 d.v0[i] = binary.LittleEndian.Uint32(a[0:4])
386 d.v1[i] = binary.LittleEndian.Uint32(a[4:8])
387 d.v2[i] = binary.LittleEndian.Uint32(a[8:12])
388 d.v3[i] = binary.LittleEndian.Uint32(a[12:16])
389 } else {
390 d.v0[i] = init0
391 d.v1[i] = init1
392 d.v2[i] = init2
393 d.v3[i] = init3
394 }
395 }
396 return
397}
diff --git a/vendor/github.com/minio/md5-simd/md5-server_fallback.go b/vendor/github.com/minio/md5-simd/md5-server_fallback.go
new file mode 100644
index 0000000..7814dad
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/md5-server_fallback.go
@@ -0,0 +1,12 @@
1//+build !amd64 appengine !gc noasm
2
3// Copyright (c) 2020 MinIO Inc. All rights reserved.
4// Use of this source code is governed by a license that can be
5// found in the LICENSE file.
6
7package md5simd
8
9// NewServer - Create new object for parallel processing handling
10func NewServer() *fallbackServer {
11 return &fallbackServer{}
12}
diff --git a/vendor/github.com/minio/md5-simd/md5-util_amd64.go b/vendor/github.com/minio/md5-simd/md5-util_amd64.go
new file mode 100644
index 0000000..73981b0
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/md5-util_amd64.go
@@ -0,0 +1,85 @@
1//+build !noasm,!appengine,gc
2
3// Copyright (c) 2020 MinIO Inc. All rights reserved.
4// Use of this source code is governed by a license that can be
5// found in the LICENSE file.
6
7package md5simd
8
9// Helper struct for sorting blocks based on length
10type lane struct {
11 len uint
12 pos uint
13}
14
15type digest struct {
16 s [4]uint32
17}
18
19// Helper struct for generating number of rounds in combination with mask for valid lanes
20type maskRounds struct {
21 mask uint64
22 rounds uint64
23}
24
25func generateMaskAndRounds8(input [8][]byte, mr *[8]maskRounds) (rounds int) {
26 // Sort on blocks length small to large
27 var sorted [8]lane
28 for c, inpt := range input[:] {
29 sorted[c] = lane{uint(len(inpt)), uint(c)}
30 for i := c - 1; i >= 0; i-- {
31 // swap so largest is at the end...
32 if sorted[i].len > sorted[i+1].len {
33 sorted[i], sorted[i+1] = sorted[i+1], sorted[i]
34 continue
35 }
36 break
37 }
38 }
39
40 // Create mask array including 'rounds' (of processing blocks of 64 bytes) between masks
41 m, round := uint64(0xff), uint64(0)
42
43 for _, s := range sorted[:] {
44 if s.len > 0 {
45 if uint64(s.len)>>6 > round {
46 mr[rounds] = maskRounds{m, (uint64(s.len) >> 6) - round}
47 rounds++
48 }
49 round = uint64(s.len) >> 6
50 }
51 m = m & ^(1 << uint(s.pos))
52 }
53 return
54}
55
56func generateMaskAndRounds16(input [16][]byte, mr *[16]maskRounds) (rounds int) {
57 // Sort on blocks length small to large
58 var sorted [16]lane
59 for c, inpt := range input[:] {
60 sorted[c] = lane{uint(len(inpt)), uint(c)}
61 for i := c - 1; i >= 0; i-- {
62 // swap so largest is at the end...
63 if sorted[i].len > sorted[i+1].len {
64 sorted[i], sorted[i+1] = sorted[i+1], sorted[i]
65 continue
66 }
67 break
68 }
69 }
70
71 // Create mask array including 'rounds' (of processing blocks of 64 bytes) between masks
72 m, round := uint64(0xffff), uint64(0)
73
74 for _, s := range sorted[:] {
75 if s.len > 0 {
76 if uint64(s.len)>>6 > round {
77 mr[rounds] = maskRounds{m, (uint64(s.len) >> 6) - round}
78 rounds++
79 }
80 round = uint64(s.len) >> 6
81 }
82 m = m & ^(1 << uint(s.pos))
83 }
84 return
85}
diff --git a/vendor/github.com/minio/md5-simd/md5.go b/vendor/github.com/minio/md5-simd/md5.go
new file mode 100644
index 0000000..11b0cb9
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/md5.go
@@ -0,0 +1,63 @@
1package md5simd
2
3import (
4 "crypto/md5"
5 "hash"
6 "sync"
7)
8
9const (
10 // The blocksize of MD5 in bytes.
11 BlockSize = 64
12
13 // The size of an MD5 checksum in bytes.
14 Size = 16
15
16 // internalBlockSize is the internal block size.
17 internalBlockSize = 32 << 10
18)
19
20type Server interface {
21 NewHash() Hasher
22 Close()
23}
24
25type Hasher interface {
26 hash.Hash
27 Close()
28}
29
30// StdlibHasher returns a Hasher that uses the stdlib for hashing.
31// Used hashers are stored in a pool for fast reuse.
32func StdlibHasher() Hasher {
33 return &md5Wrapper{Hash: md5Pool.New().(hash.Hash)}
34}
35
36// md5Wrapper is a wrapper around the builtin hasher.
37type md5Wrapper struct {
38 hash.Hash
39}
40
41var md5Pool = sync.Pool{New: func() interface{} {
42 return md5.New()
43}}
44
45// fallbackServer - Fallback when no assembly is available.
46type fallbackServer struct {
47}
48
49// NewHash -- return regular Golang md5 hashing from crypto
50func (s *fallbackServer) NewHash() Hasher {
51 return &md5Wrapper{Hash: md5Pool.New().(hash.Hash)}
52}
53
54func (s *fallbackServer) Close() {
55}
56
57func (m *md5Wrapper) Close() {
58 if m.Hash != nil {
59 m.Reset()
60 md5Pool.Put(m.Hash)
61 m.Hash = nil
62 }
63}
diff --git a/vendor/github.com/minio/md5-simd/md5block_amd64.go b/vendor/github.com/minio/md5-simd/md5block_amd64.go
new file mode 100644
index 0000000..4c27936
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/md5block_amd64.go
@@ -0,0 +1,11 @@
1// Code generated by command: go run gen.go -out ../md5block_amd64.s -stubs ../md5block_amd64.go -pkg=md5simd. DO NOT EDIT.
2
3// +build !appengine
4// +build !noasm
5// +build gc
6
7package md5simd
8
9// Encode p to digest
10//go:noescape
11func blockScalar(dig *[4]uint32, p []byte)
diff --git a/vendor/github.com/minio/md5-simd/md5block_amd64.s b/vendor/github.com/minio/md5-simd/md5block_amd64.s
new file mode 100644
index 0000000..fbc4a21
--- /dev/null
+++ b/vendor/github.com/minio/md5-simd/md5block_amd64.s
@@ -0,0 +1,714 @@
1// Code generated by command: go run gen.go -out ../md5block_amd64.s -stubs ../md5block_amd64.go -pkg=md5simd. DO NOT EDIT.
2
3// +build !appengine
4// +build !noasm
5// +build gc
6
7// func blockScalar(dig *[4]uint32, p []byte)
8TEXT ·blockScalar(SB), $0-32
9 MOVQ p_len+16(FP), AX
10 MOVQ dig+0(FP), CX
11 MOVQ p_base+8(FP), DX
12 SHRQ $0x06, AX
13 SHLQ $0x06, AX
14 LEAQ (DX)(AX*1), AX
15 CMPQ DX, AX
16 JEQ end
17 MOVL (CX), BX
18 MOVL 4(CX), BP
19 MOVL 8(CX), SI
20 MOVL 12(CX), CX
21 MOVL $0xffffffff, DI
22
23loop:
24 MOVL (DX), R8
25 MOVL CX, R9
26 MOVL BX, R10
27 MOVL BP, R11
28 MOVL SI, R12
29 MOVL CX, R13
30
31 // ROUND1
32 XORL SI, R9
33 ADDL $0xd76aa478, BX
34 ADDL R8, BX
35 ANDL BP, R9
36 XORL CX, R9
37 MOVL 4(DX), R8
38 ADDL R9, BX
39 ROLL $0x07, BX
40 MOVL SI, R9
41 ADDL BP, BX
42 XORL BP, R9
43 ADDL $0xe8c7b756, CX
44 ADDL R8, CX
45 ANDL BX, R9
46 XORL SI, R9
47 MOVL 8(DX), R8
48 ADDL R9, CX
49 ROLL $0x0c, CX
50 MOVL BP, R9
51 ADDL BX, CX
52 XORL BX, R9
53 ADDL $0x242070db, SI
54 ADDL R8, SI
55 ANDL CX, R9
56 XORL BP, R9
57 MOVL 12(DX), R8
58 ADDL R9, SI
59 ROLL $0x11, SI
60 MOVL BX, R9
61 ADDL CX, SI
62 XORL CX, R9
63 ADDL $0xc1bdceee, BP
64 ADDL R8, BP
65 ANDL SI, R9
66 XORL BX, R9
67 MOVL 16(DX), R8
68 ADDL R9, BP
69 ROLL $0x16, BP
70 MOVL CX, R9
71 ADDL SI, BP
72 XORL SI, R9
73 ADDL $0xf57c0faf, BX
74 ADDL R8, BX
75 ANDL BP, R9
76 XORL CX, R9
77 MOVL 20(DX), R8
78 ADDL R9, BX
79 ROLL $0x07, BX
80 MOVL SI, R9
81 ADDL BP, BX
82 XORL BP, R9
83 ADDL $0x4787c62a, CX
84 ADDL R8, CX
85 ANDL BX, R9
86 XORL SI, R9
87 MOVL 24(DX), R8
88 ADDL R9, CX
89 ROLL $0x0c, CX
90 MOVL BP, R9
91 ADDL BX, CX
92 XORL BX, R9
93 ADDL $0xa8304613, SI
94 ADDL R8, SI
95 ANDL CX, R9
96 XORL BP, R9
97 MOVL 28(DX), R8
98 ADDL R9, SI
99 ROLL $0x11, SI
100 MOVL BX, R9
101 ADDL CX, SI
102 XORL CX, R9
103 ADDL $0xfd469501, BP
104 ADDL R8, BP
105 ANDL SI, R9
106 XORL BX, R9
107 MOVL 32(DX), R8
108 ADDL R9, BP
109 ROLL $0x16, BP
110 MOVL CX, R9
111 ADDL SI, BP
112 XORL SI, R9
113 ADDL $0x698098d8, BX
114 ADDL R8, BX
115 ANDL BP, R9
116 XORL CX, R9
117 MOVL 36(DX), R8
118 ADDL R9, BX
119 ROLL $0x07, BX
120 MOVL SI, R9
121 ADDL BP, BX
122 XORL BP, R9
123 ADDL $0x8b44f7af, CX
124 ADDL R8, CX
125 ANDL BX, R9
126 XORL SI, R9
127 MOVL 40(DX), R8
128 ADDL R9, CX
129 ROLL $0x0c, CX
130 MOVL BP, R9
131 ADDL BX, CX
132 XORL BX, R9
133 ADDL $0xffff5bb1, SI
134 ADDL R8, SI
135 ANDL CX, R9
136 XORL BP, R9
137 MOVL 44(DX), R8
138 ADDL R9, SI
139 ROLL $0x11, SI
140 MOVL BX, R9
141 ADDL CX, SI
142 XORL CX, R9
143 ADDL $0x895cd7be, BP
144 ADDL R8, BP
145 ANDL SI, R9
146 XORL BX, R9
147 MOVL 48(DX), R8
148 ADDL R9, BP
149 ROLL $0x16, BP
150 MOVL CX, R9
151 ADDL SI, BP
152 XORL SI, R9
153 ADDL $0x6b901122, BX
154 ADDL R8, BX
155 ANDL BP, R9
156 XORL CX, R9
157 MOVL 52(DX), R8
158 ADDL R9, BX
159 ROLL $0x07, BX
160 MOVL SI, R9
161 ADDL BP, BX
162 XORL BP, R9
163 ADDL $0xfd987193, CX
164 ADDL R8, CX
165 ANDL BX, R9
166 XORL SI, R9
167 MOVL 56(DX), R8
168 ADDL R9, CX
169 ROLL $0x0c, CX
170 MOVL BP, R9
171 ADDL BX, CX
172 XORL BX, R9
173 ADDL $0xa679438e, SI
174 ADDL R8, SI
175 ANDL CX, R9
176 XORL BP, R9
177 MOVL 60(DX), R8
178 ADDL R9, SI
179 ROLL $0x11, SI
180 MOVL BX, R9
181 ADDL CX, SI
182 XORL CX, R9
183 ADDL $0x49b40821, BP
184 ADDL R8, BP
185 ANDL SI, R9
186 XORL BX, R9
187 MOVL 4(DX), R8
188 ADDL R9, BP
189 ROLL $0x16, BP
190 MOVL CX, R9
191 ADDL SI, BP
192
193 // ROUND2
194 MOVL CX, R9
195 MOVL CX, R14
196 XORL DI, R9
197 ADDL $0xf61e2562, BX
198 ADDL R8, BX
199 ANDL BP, R14
200 ANDL SI, R9
201 MOVL 24(DX), R8
202 ORL R9, R14
203 MOVL SI, R9
204 ADDL R14, BX
205 MOVL SI, R14
206 ROLL $0x05, BX
207 ADDL BP, BX
208 XORL DI, R9
209 ADDL $0xc040b340, CX
210 ADDL R8, CX
211 ANDL BX, R14
212 ANDL BP, R9
213 MOVL 44(DX), R8
214 ORL R9, R14
215 MOVL BP, R9
216 ADDL R14, CX
217 MOVL BP, R14
218 ROLL $0x09, CX
219 ADDL BX, CX
220 XORL DI, R9
221 ADDL $0x265e5a51, SI
222 ADDL R8, SI
223 ANDL CX, R14
224 ANDL BX, R9
225 MOVL (DX), R8
226 ORL R9, R14
227 MOVL BX, R9
228 ADDL R14, SI
229 MOVL BX, R14
230 ROLL $0x0e, SI
231 ADDL CX, SI
232 XORL DI, R9
233 ADDL $0xe9b6c7aa, BP
234 ADDL R8, BP
235 ANDL SI, R14
236 ANDL CX, R9
237 MOVL 20(DX), R8
238 ORL R9, R14
239 MOVL CX, R9
240 ADDL R14, BP
241 MOVL CX, R14
242 ROLL $0x14, BP
243 ADDL SI, BP
244 XORL DI, R9
245 ADDL $0xd62f105d, BX
246 ADDL R8, BX
247 ANDL BP, R14
248 ANDL SI, R9
249 MOVL 40(DX), R8
250 ORL R9, R14
251 MOVL SI, R9
252 ADDL R14, BX
253 MOVL SI, R14
254 ROLL $0x05, BX
255 ADDL BP, BX
256 XORL DI, R9
257 ADDL $0x02441453, CX
258 ADDL R8, CX
259 ANDL BX, R14
260 ANDL BP, R9
261 MOVL 60(DX), R8
262 ORL R9, R14
263 MOVL BP, R9
264 ADDL R14, CX
265 MOVL BP, R14
266 ROLL $0x09, CX
267 ADDL BX, CX
268 XORL DI, R9
269 ADDL $0xd8a1e681, SI
270 ADDL R8, SI
271 ANDL CX, R14
272 ANDL BX, R9
273 MOVL 16(DX), R8
274 ORL R9, R14
275 MOVL BX, R9
276 ADDL R14, SI
277 MOVL BX, R14
278 ROLL $0x0e, SI
279 ADDL CX, SI
280 XORL DI, R9
281 ADDL $0xe7d3fbc8, BP
282 ADDL R8, BP
283 ANDL SI, R14
284 ANDL CX, R9
285 MOVL 36(DX), R8
286 ORL R9, R14
287 MOVL CX, R9
288 ADDL R14, BP
289 MOVL CX, R14
290 ROLL $0x14, BP
291 ADDL SI, BP
292 XORL DI, R9
293 ADDL $0x21e1cde6, BX
294 ADDL R8, BX
295 ANDL BP, R14
296 ANDL SI, R9
297 MOVL 56(DX), R8
298 ORL R9, R14
299 MOVL SI, R9
300 ADDL R14, BX
301 MOVL SI, R14
302 ROLL $0x05, BX
303 ADDL BP, BX
304 XORL DI, R9
305 ADDL $0xc33707d6, CX
306 ADDL R8, CX
307 ANDL BX, R14
308 ANDL BP, R9
309 MOVL 12(DX), R8
310 ORL R9, R14
311 MOVL BP, R9
312 ADDL R14, CX
313 MOVL BP, R14
314 ROLL $0x09, CX
315 ADDL BX, CX
316 XORL DI, R9
317 ADDL $0xf4d50d87, SI
318 ADDL R8, SI
319 ANDL CX, R14
320 ANDL BX, R9
321 MOVL 32(DX), R8
322 ORL R9, R14
323 MOVL BX, R9
324 ADDL R14, SI
325 MOVL BX, R14
326 ROLL $0x0e, SI
327 ADDL CX, SI
328 XORL DI, R9
329 ADDL $0x455a14ed, BP
330 ADDL R8, BP
331 ANDL SI, R14
332 ANDL CX, R9
333 MOVL 52(DX), R8
334 ORL R9, R14
335 MOVL CX, R9
336 ADDL R14, BP
337 MOVL CX, R14
338 ROLL $0x14, BP
339 ADDL SI, BP
340 XORL DI, R9
341 ADDL $0xa9e3e905, BX
342 ADDL R8, BX
343 ANDL BP, R14
344 ANDL SI, R9
345 MOVL 8(DX), R8
346 ORL R9, R14
347 MOVL SI, R9
348 ADDL R14, BX
349 MOVL SI, R14
350 ROLL $0x05, BX
351 ADDL BP, BX
352 XORL DI, R9
353 ADDL $0xfcefa3f8, CX
354 ADDL R8, CX
355 ANDL BX, R14
356 ANDL BP, R9
357 MOVL 28(DX), R8
358 ORL R9, R14
359 MOVL BP, R9
360 ADDL R14, CX
361 MOVL BP, R14
362 ROLL $0x09, CX
363 ADDL BX, CX
364 XORL DI, R9
365 ADDL $0x676f02d9, SI
366 ADDL R8, SI
367 ANDL CX, R14
368 ANDL BX, R9
369 MOVL 48(DX), R8
370 ORL R9, R14
371 MOVL BX, R9
372 ADDL R14, SI
373 MOVL BX, R14
374 ROLL $0x0e, SI
375 ADDL CX, SI
376 XORL DI, R9
377 ADDL $0x8d2a4c8a, BP
378 ADDL R8, BP
379 ANDL SI, R14
380 ANDL CX, R9
381 MOVL 20(DX), R8
382 ORL R9, R14
383 MOVL CX, R9
384 ADDL R14, BP
385 MOVL CX, R14
386 ROLL $0x14, BP
387 ADDL SI, BP
388
389 // ROUND3
390 MOVL SI, R9
391 ADDL $0xfffa3942, BX
392 ADDL R8, BX
393 MOVL 32(DX), R8
394 XORL CX, R9
395 XORL BP, R9
396 ADDL R9, BX
397 ROLL $0x04, BX
398 MOVL BP, R9
399 ADDL BP, BX
400 ADDL $0x8771f681, CX
401 ADDL R8, CX
402 MOVL 44(DX), R8
403 XORL SI, R9
404 XORL BX, R9
405 ADDL R9, CX
406 ROLL $0x0b, CX
407 MOVL BX, R9
408 ADDL BX, CX
409 ADDL $0x6d9d6122, SI
410 ADDL R8, SI
411 MOVL 56(DX), R8
412 XORL BP, R9
413 XORL CX, R9
414 ADDL R9, SI
415 ROLL $0x10, SI
416 MOVL CX, R9
417 ADDL CX, SI
418 ADDL $0xfde5380c, BP
419 ADDL R8, BP
420 MOVL 4(DX), R8
421 XORL BX, R9
422 XORL SI, R9
423 ADDL R9, BP
424 ROLL $0x17, BP
425 MOVL SI, R9
426 ADDL SI, BP
427 ADDL $0xa4beea44, BX
428 ADDL R8, BX
429 MOVL 16(DX), R8
430 XORL CX, R9
431 XORL BP, R9
432 ADDL R9, BX
433 ROLL $0x04, BX
434 MOVL BP, R9
435 ADDL BP, BX
436 ADDL $0x4bdecfa9, CX
437 ADDL R8, CX
438 MOVL 28(DX), R8
439 XORL SI, R9
440 XORL BX, R9
441 ADDL R9, CX
442 ROLL $0x0b, CX
443 MOVL BX, R9
444 ADDL BX, CX
445 ADDL $0xf6bb4b60, SI
446 ADDL R8, SI
447 MOVL 40(DX), R8
448 XORL BP, R9
449 XORL CX, R9
450 ADDL R9, SI
451 ROLL $0x10, SI
452 MOVL CX, R9
453 ADDL CX, SI
454 ADDL $0xbebfbc70, BP
455 ADDL R8, BP
456 MOVL 52(DX), R8
457 XORL BX, R9
458 XORL SI, R9
459 ADDL R9, BP
460 ROLL $0x17, BP
461 MOVL SI, R9
462 ADDL SI, BP
463 ADDL $0x289b7ec6, BX
464 ADDL R8, BX
465 MOVL (DX), R8
466 XORL CX, R9
467 XORL BP, R9
468 ADDL R9, BX
469 ROLL $0x04, BX
470 MOVL BP, R9
471 ADDL BP, BX
472 ADDL $0xeaa127fa, CX
473 ADDL R8, CX
474 MOVL 12(DX), R8
475 XORL SI, R9
476 XORL BX, R9
477 ADDL R9, CX
478 ROLL $0x0b, CX
479 MOVL BX, R9
480 ADDL BX, CX
481 ADDL $0xd4ef3085, SI
482 ADDL R8, SI
483 MOVL 24(DX), R8
484 XORL BP, R9
485 XORL CX, R9
486 ADDL R9, SI
487 ROLL $0x10, SI
488 MOVL CX, R9
489 ADDL CX, SI
490 ADDL $0x04881d05, BP
491 ADDL R8, BP
492 MOVL 36(DX), R8
493 XORL BX, R9
494 XORL SI, R9
495 ADDL R9, BP
496 ROLL $0x17, BP
497 MOVL SI, R9
498 ADDL SI, BP
499 ADDL $0xd9d4d039, BX
500 ADDL R8, BX
501 MOVL 48(DX), R8
502 XORL CX, R9
503 XORL BP, R9
504 ADDL R9, BX
505 ROLL $0x04, BX
506 MOVL BP, R9
507 ADDL BP, BX
508 ADDL $0xe6db99e5, CX
509 ADDL R8, CX
510 MOVL 60(DX), R8
511 XORL SI, R9
512 XORL BX, R9
513 ADDL R9, CX
514 ROLL $0x0b, CX
515 MOVL BX, R9
516 ADDL BX, CX
517 ADDL $0x1fa27cf8, SI
518 ADDL R8, SI
519 MOVL 8(DX), R8
520 XORL BP, R9
521 XORL CX, R9
522 ADDL R9, SI
523 ROLL $0x10, SI
524 MOVL CX, R9
525 ADDL CX, SI
526 ADDL $0xc4ac5665, BP
527 ADDL R8, BP
528 MOVL (DX), R8
529 XORL BX, R9
530 XORL SI, R9
531 ADDL R9, BP
532 ROLL $0x17, BP
533 MOVL SI, R9
534 ADDL SI, BP
535
536 // ROUND4
537 MOVL DI, R9
538 XORL CX, R9
539 ADDL $0xf4292244, BX
540 ADDL R8, BX
541 ORL BP, R9
542 XORL SI, R9
543 ADDL R9, BX
544 MOVL 28(DX), R8
545 MOVL DI, R9
546 ROLL $0x06, BX
547 XORL SI, R9
548 ADDL BP, BX
549 ADDL $0x432aff97, CX
550 ADDL R8, CX
551 ORL BX, R9
552 XORL BP, R9
553 ADDL R9, CX
554 MOVL 56(DX), R8
555 MOVL DI, R9
556 ROLL $0x0a, CX
557 XORL BP, R9
558 ADDL BX, CX
559 ADDL $0xab9423a7, SI
560 ADDL R8, SI
561 ORL CX, R9
562 XORL BX, R9
563 ADDL R9, SI
564 MOVL 20(DX), R8
565 MOVL DI, R9
566 ROLL $0x0f, SI
567 XORL BX, R9
568 ADDL CX, SI
569 ADDL $0xfc93a039, BP
570 ADDL R8, BP
571 ORL SI, R9
572 XORL CX, R9
573 ADDL R9, BP
574 MOVL 48(DX), R8
575 MOVL DI, R9
576 ROLL $0x15, BP
577 XORL CX, R9
578 ADDL SI, BP
579 ADDL $0x655b59c3, BX
580 ADDL R8, BX
581 ORL BP, R9
582 XORL SI, R9
583 ADDL R9, BX
584 MOVL 12(DX), R8
585 MOVL DI, R9
586 ROLL $0x06, BX
587 XORL SI, R9
588 ADDL BP, BX
589 ADDL $0x8f0ccc92, CX
590 ADDL R8, CX
591 ORL BX, R9
592 XORL BP, R9
593 ADDL R9, CX
594 MOVL 40(DX), R8
595 MOVL DI, R9
596 ROLL $0x0a, CX
597 XORL BP, R9
598 ADDL BX, CX
599 ADDL $0xffeff47d, SI
600 ADDL R8, SI
601 ORL CX, R9
602 XORL BX, R9
603 ADDL R9, SI
604 MOVL 4(DX), R8
605 MOVL DI, R9
606 ROLL $0x0f, SI
607 XORL BX, R9
608 ADDL CX, SI
609 ADDL $0x85845dd1, BP
610 ADDL R8, BP
611 ORL SI, R9
612 XORL CX, R9
613 ADDL R9, BP
614 MOVL 32(DX), R8
615 MOVL DI, R9
616 ROLL $0x15, BP
617 XORL CX, R9
618 ADDL SI, BP
619 ADDL $0x6fa87e4f, BX
620 ADDL R8, BX
621 ORL BP, R9
622 XORL SI, R9
623 ADDL R9, BX
624 MOVL 60(DX), R8
625 MOVL DI, R9
626 ROLL $0x06, BX
627 XORL SI, R9
628 ADDL BP, BX
629 ADDL $0xfe2ce6e0, CX
630 ADDL R8, CX
631 ORL BX, R9
632 XORL BP, R9
633 ADDL R9, CX
634 MOVL 24(DX), R8
635 MOVL DI, R9
636 ROLL $0x0a, CX
637 XORL BP, R9
638 ADDL BX, CX
639 ADDL $0xa3014314, SI
640 ADDL R8, SI
641 ORL CX, R9
642 XORL BX, R9
643 ADDL R9, SI
644 MOVL 52(DX), R8
645 MOVL DI, R9
646 ROLL $0x0f, SI
647 XORL BX, R9
648 ADDL CX, SI
649 ADDL $0x4e0811a1, BP
650 ADDL R8, BP
651 ORL SI, R9
652 XORL CX, R9
653 ADDL R9, BP
654 MOVL 16(DX), R8
655 MOVL DI, R9
656 ROLL $0x15, BP
657 XORL CX, R9
658 ADDL SI, BP
659 ADDL $0xf7537e82, BX
660 ADDL R8, BX
661 ORL BP, R9
662 XORL SI, R9
663 ADDL R9, BX
664 MOVL 44(DX), R8
665 MOVL DI, R9
666 ROLL $0x06, BX
667 XORL SI, R9
668 ADDL BP, BX
669 ADDL $0xbd3af235, CX
670 ADDL R8, CX
671 ORL BX, R9
672 XORL BP, R9
673 ADDL R9, CX
674 MOVL 8(DX), R8
675 MOVL DI, R9
676 ROLL $0x0a, CX
677 XORL BP, R9
678 ADDL BX, CX
679 ADDL $0x2ad7d2bb, SI
680 ADDL R8, SI
681 ORL CX, R9
682 XORL BX, R9
683 ADDL R9, SI
684 MOVL 36(DX), R8
685 MOVL DI, R9
686 ROLL $0x0f, SI
687 XORL BX, R9
688 ADDL CX, SI
689 ADDL $0xeb86d391, BP
690 ADDL R8, BP
691 ORL SI, R9
692 XORL CX, R9
693 ADDL R9, BP
694 ROLL $0x15, BP
695 ADDL SI, BP
696 ADDL R10, BX
697 ADDL R11, BP
698 ADDL R12, SI
699 ADDL R13, CX
700
701 // Prepare next loop
702 ADDQ $0x40, DX
703 CMPQ DX, AX
704 JB loop
705
706 // Write output
707 MOVQ dig+0(FP), AX
708 MOVL BX, (AX)
709 MOVL BP, 4(AX)
710 MOVL SI, 8(AX)
711 MOVL CX, 12(AX)
712
713end:
714 RET