Hi,
This is an updated post from https://github.com/Mbed-TLS/mbedtls/issues/6464, which should be posted in mbedtls mail list.
My question is how to significantly improve SHA256 performance on big files (regardless of architectures).
*=== Updates* I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded.
Mbed TLS version (number or commit id): *3.1.0* Operating system and version: * Centos-8.5, CPU 11900K* Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): *gcc/g++ 8.5* Additional environment information:
*Test files and performance* CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha256 *5 sec* CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha256 *66 sec*
Also, as advised I try to turn on "MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT " and "MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT" using mbedtls-3.2.0 in M1, but compiler reported the following error:
CMake Error at library/CMakeLists.txt:257 (add_library): Cannot find source file:
psa_crypto_driver_wrappers.c
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at library/CMakeLists.txt:257 (add_library): No SOURCES given to target: mbedcrypto
Thanks for your help.
*=== Original message at github*
Summary
sha256() and sha1() incurs significant overhead on big files(~1G above). *This might not be an issue*, and I'm looking for an efficient way to calculate hash on big files. System information
Mbed TLS version (number or commit id): 3.1.0 Operating system and version: M1 OSX Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): Clang++ Additional environment information: Expected behavior
Fast calculation of big files in less than 1 second Actual behavior
Test files: CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha1 *3.3 sec*, sha256 *5.9 sec* CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha1 *40 sec*, sha256 *78 sec* Steps to reproduce
ISO files can be downloaded at: http://ftp.iij.ad.jp/pub/linux/centos-vault/8.5.2111/isos/x86_64/
Make sure use fast disk, say nvme, to store ISO files, or else loading big files could take lots of time. Also use user from time command to measure performance.
Workable code of sha256:
string test_sha256(string file_path) { mbedtls_sha256_context ctx; FILE *fp; string output; int BUFFER_SIZE = 4096; uint8_t buffer[BUFFER_SIZE]; size_t read, k_bytes; uint8_t hash[32];
mbedtls_sha256_init(&ctx); mbedtls_sha256_starts(&ctx, 0);
fp = fopen(file_path.c_str(), "r"); if (fp == NULL) { mbedtls_sha256_free(&ctx); return output; }
while ((read = fread(buffer, 1, BUFFER_SIZE, fp))) { mbedtls_sha256_update(&ctx, buffer, read); }
mbedtls_sha256_finish(&ctx, hash);
mbedtls_sha256_free(&ctx); fclose(fp);
// update hash string, omit here
return output;
}
On 22/10/2022 11:28, Liu James via mbed-tls wrote:
Also, as advised I try to turn on "MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT " and "MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT" using mbedtls-3.2.0 in M1, but compiler reported the following error:
CMake Error at library/CMakeLists.txt:257 (add_library): Cannot find source file:
psa_crypto_driver_wrappers.c
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at library/CMakeLists.txt:257 (add_library): No SOURCES given to target: mbedcrypto
There was a mistake in the build scripts of the 3.2.0 release. Please use the 3.2.1 release.
Best regards,
Hi
I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded
Mbed TLS has no acceleration for SHA-256 on x86 or x86_64 - optional or otherwise - it just uses C code. So this is as expected.
Thanks
Tom
________________________________ From: Liu James via mbed-tls mbed-tls@lists.trustedfirmware.org Sent: 22 October 2022 10:28 To: mbed-tls@lists.trustedfirmware.org mbed-tls@lists.trustedfirmware.org Subject: [mbed-tls] Performance tuning of SHA256 on big files
Hi,
This is an updated post from https://github.com/Mbed-TLS/mbedtls/issues/6464, which should be posted in mbedtls mail list.
My question is how to significantly improve SHA256 performance on big files (regardless of architectures).
=== Updates I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded.
Mbed TLS version (number or commit id): 3.1.0 Operating system and version: Centos-8.5, CPU 11900K Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): gcc/g++ 8.5 Additional environment information:
Test files and performance CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha256 5 sec CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha256 66 sec
Also, as advised I try to turn on "MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT " and "MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT" using mbedtls-3.2.0 in M1, but compiler reported the following error:
CMake Error at library/CMakeLists.txt:257 (add_library): Cannot find source file:
psa_crypto_driver_wrappers.c
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at library/CMakeLists.txt:257 (add_library): No SOURCES given to target: mbedcrypto
Thanks for your help.
=== Original message at github
Summary
sha256() and sha1() incurs significant overhead on big files(~1G above). This might not be an issue, and I'm looking for an efficient way to calculate hash on big files.
System information
Mbed TLS version (number or commit id): 3.1.0 Operating system and version: M1 OSX Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): Clang++ Additional environment information:
Expected behavior
Fast calculation of big files in less than 1 second
Actual behavior
Test files: CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha1 3.3 sec, sha256 5.9 sec CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha1 40 sec, sha256 78 sec
Steps to reproduce
ISO files can be downloaded at: http://ftp.iij.ad.jp/pub/linux/centos-vault/8.5.2111/isos/x86_64/
Make sure use fast disk, say nvme, to store ISO files, or else loading big files could take lots of time. Also use user from time command to measure performance.
Workable code of sha256:
string test_sha256(string file_path) { mbedtls_sha256_context ctx; FILE *fp; string output; int BUFFER_SIZE = 4096; uint8_t buffer[BUFFER_SIZE]; size_t read, k_bytes; uint8_t hash[32];
mbedtls_sha256_init(&ctx); mbedtls_sha256_starts(&ctx, 0);
fp = fopen(file_path.c_str(), "r"); if (fp == NULL) { mbedtls_sha256_free(&ctx); return output; }
while ((read = fread(buffer, 1, BUFFER_SIZE, fp))) { mbedtls_sha256_update(&ctx, buffer, read); }
mbedtls_sha256_finish(&ctx, hash);
mbedtls_sha256_free(&ctx); fclose(fp);
// update hash string, omit here
return output;
}
Hi,
Thanks for the tip. I test mbedtls-3.2.1 in M1 by adding two options in mbedtls_config.h:
MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT, MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT.
There are substantial improvements on two big files using sha256: CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): (before) *5.9 sec,* (after)* 32 sec* CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): (before) *78 sec, *(after) *41 sec*
But the problem I'm trying to solve is still there: 1) sha256 incurs high overhead on big files (less than a few seconds are desired), considering there are many big files to process in real time; 2) not sure if tuning could work in x86.
Is it possible to slice a big file into chunks and compute hash separately and merge? I guess other crypto libraries or utilities have same overhead on big files.
Regards
Tom Cosgrove Tom.Cosgrove@arm.com 于2022年10月24日周一 16:24写道:
Hi
I use same code with mbedtls-3.1.0 to run tests in x86, and performance
is still downgraded
Mbed TLS has no acceleration for SHA-256 on x86 or x86_64 - optional or otherwise - it just uses C code. So this is as expected.
Thanks
Tom
*From:* Liu James via mbed-tls mbed-tls@lists.trustedfirmware.org *Sent:* 22 October 2022 10:28 *To:* mbed-tls@lists.trustedfirmware.org < mbed-tls@lists.trustedfirmware.org> *Subject:* [mbed-tls] Performance tuning of SHA256 on big files
Hi,
This is an updated post from https://github.com/Mbed-TLS/mbedtls/issues/6464, which should be posted in mbedtls mail list.
My question is how to significantly improve SHA256 performance on big files (regardless of architectures).
*=== Updates* I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded.
Mbed TLS version (number or commit id): *3.1.0* Operating system and version: * Centos-8.5, CPU 11900K* Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): *gcc/g++ 8.5* Additional environment information:
*Test files and performance* CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha256 *5 sec* CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha256 *66 sec*
Also, as advised I try to turn on "MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT " and "MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT" using mbedtls-3.2.0 in M1, but compiler reported the following error:
CMake Error at library/CMakeLists.txt:257 (add_library): Cannot find source file:
psa_crypto_driver_wrappers.c
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at library/CMakeLists.txt:257 (add_library): No SOURCES given to target: mbedcrypto
Thanks for your help.
*=== Original message at github*
Summary
sha256() and sha1() incurs significant overhead on big files(~1G above). *This might not be an issue*, and I'm looking for an efficient way to calculate hash on big files. System information
Mbed TLS version (number or commit id): 3.1.0 Operating system and version: M1 OSX Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): Clang++ Additional environment information: Expected behavior
Fast calculation of big files in less than 1 second Actual behavior
Test files: CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha1 *3.3 sec*, sha256 *5.9 sec* CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha1 *40 sec*, sha256 *78 sec* Steps to reproduce
ISO files can be downloaded at: http://ftp.iij.ad.jp/pub/linux/centos-vault/8.5.2111/isos/x86_64/
Make sure use fast disk, say nvme, to store ISO files, or else loading big files could take lots of time. Also use user from time command to measure performance.
Workable code of sha256:
string test_sha256(string file_path) { mbedtls_sha256_context ctx; FILE *fp; string output; int BUFFER_SIZE = 4096; uint8_t buffer[BUFFER_SIZE]; size_t read, k_bytes; uint8_t hash[32];
mbedtls_sha256_init(&ctx); mbedtls_sha256_starts(&ctx, 0); fp = fopen(file_path.c_str(), "r"); if (fp == NULL) { mbedtls_sha256_free(&ctx); return output; } while ((read = fread(buffer, 1, BUFFER_SIZE, fp))) { mbedtls_sha256_update(&ctx, buffer, read); } mbedtls_sha256_finish(&ctx, hash); mbedtls_sha256_free(&ctx); fclose(fp); // update hash string, omit here return output;
}
Hi
MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT
If you're building the software to run on a system that you know has the crypto extensions, then use MBEDTLS_SHA256_USE_A64_CRYPTO_ONLY - it will be (marginally) faster. There are few aarch64 systems without the crypto extensions, but one of them is the Raspberry Pi, which is used widely.
Is it possible to slice a big file into chunks and compute hash separately and merge?
No, the hash algorithms are sequential.
I've seen up to around 2 GB/s raw hashing speed (i.e. on data in memory) on Apple Silicon.
int BUFFER_SIZE = 4096
That seems very short. Even though fread() is buffered, a quick google suggests a typical buffer size of 8 KB, which means lots of calling into the kernel and context switches. I'd be inclined to read 512 MB at a time.
But if you want the fastest processing, the thing to do is benchmark the libraries you have access to (Mbed TLS, OpenSSL, WolfSSL come to mind) on the different systems you have access to (aarch64, x86_64) and use the winner.
Thanks
Tom
________________________________ From: James Liu icefrog1950@gmail.com Sent: 24 October 2022 12:35 To: Tom Cosgrove Tom.Cosgrove@arm.com Cc: mbed-tls@lists.trustedfirmware.org mbed-tls@lists.trustedfirmware.org Subject: Re: [mbed-tls] Performance tuning of SHA256 on big files
Hi,
Thanks for the tip. I test mbedtls-3.2.1 in M1 by adding two options in mbedtls_config.h:
MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT, MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT.
There are substantial improvements on two big files using sha256: CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): (before) 5.9 sec, (after) 32 sec CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): (before) 78 sec, (after) 41 sec
But the problem I'm trying to solve is still there: 1) sha256 incurs high overhead on big files (less than a few seconds are desired), considering there are many big files to process in real time; 2) not sure if tuning could work in x86.
Is it possible to slice a big file into chunks and compute hash separately and merge? I guess other crypto libraries or utilities have same overhead on big files.
Regards
Tom Cosgrove <Tom.Cosgrove@arm.commailto:Tom.Cosgrove@arm.com> 于2022年10月24日周一 16:24写道: Hi
I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded
Mbed TLS has no acceleration for SHA-256 on x86 or x86_64 - optional or otherwise - it just uses C code. So this is as expected.
Thanks
Tom
________________________________ From: Liu James via mbed-tls <mbed-tls@lists.trustedfirmware.orgmailto:mbed-tls@lists.trustedfirmware.org> Sent: 22 October 2022 10:28 To: mbed-tls@lists.trustedfirmware.orgmailto:mbed-tls@lists.trustedfirmware.org <mbed-tls@lists.trustedfirmware.orgmailto:mbed-tls@lists.trustedfirmware.org> Subject: [mbed-tls] Performance tuning of SHA256 on big files
Hi,
This is an updated post from https://github.com/Mbed-TLS/mbedtls/issues/6464, which should be posted in mbedtls mail list.
My question is how to significantly improve SHA256 performance on big files (regardless of architectures).
=== Updates I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded.
Mbed TLS version (number or commit id): 3.1.0 Operating system and version: Centos-8.5, CPU 11900K Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): gcc/g++ 8.5 Additional environment information:
Test files and performance CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha256 5 sec CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha256 66 sec
Also, as advised I try to turn on "MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT " and "MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT" using mbedtls-3.2.0 in M1, but compiler reported the following error:
CMake Error at library/CMakeLists.txt:257 (add_library): Cannot find source file:
psa_crypto_driver_wrappers.c
Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc
CMake Error at library/CMakeLists.txt:257 (add_library): No SOURCES given to target: mbedcrypto
Thanks for your help.
=== Original message at github
Summary
sha256() and sha1() incurs significant overhead on big files(~1G above). This might not be an issue, and I'm looking for an efficient way to calculate hash on big files.
System information
Mbed TLS version (number or commit id): 3.1.0 Operating system and version: M1 OSX Configuration (if not default, please attach mbedtls_config.h): Compiler and options (if you used a pre-built binary, please indicate how you obtained it): Clang++ Additional environment information:
Expected behavior
Fast calculation of big files in less than 1 second
Actual behavior
Test files: CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha1 3.3 sec, sha256 5.9 sec CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): sha1 40 sec, sha256 78 sec
Steps to reproduce
ISO files can be downloaded at: http://ftp.iij.ad.jp/pub/linux/centos-vault/8.5.2111/isos/x86_64/
Make sure use fast disk, say nvme, to store ISO files, or else loading big files could take lots of time. Also use user from time command to measure performance.
Workable code of sha256:
string test_sha256(string file_path) { mbedtls_sha256_context ctx; FILE *fp; string output; int BUFFER_SIZE = 4096; uint8_t buffer[BUFFER_SIZE]; size_t read, k_bytes; uint8_t hash[32];
mbedtls_sha256_init(&ctx); mbedtls_sha256_starts(&ctx, 0);
fp = fopen(file_path.c_str(), "r"); if (fp == NULL) { mbedtls_sha256_free(&ctx); return output; }
while ((read = fread(buffer, 1, BUFFER_SIZE, fp))) { mbedtls_sha256_update(&ctx, buffer, read); }
mbedtls_sha256_finish(&ctx, hash);
mbedtls_sha256_free(&ctx); fclose(fp);
// update hash string, omit here
return output;
}
Hi James,
Looking around, top performance for SHA256 on an M1 seems to be a little over 2GB/s. Mbed TLS can get close; at this point it probably depends more on achieving efficient (prefetch-friendly) memory and storage accesses than the speed of the hashes.
You can get similar speeds on x86, but you'll need to use an implementation with hardware acceleration. Mbed TLS doesn't have acceleration for hashes on x86.
If you want to validate file chunks quickly, you need to store the hash of each chunk separately. You can verify the hash of a file by hashing the list of hashes of the chunks. This is a standard technique, used by many synchronization tools. For example, Bittorrent can exchange files one piece at a time because the torrent file contains a hash for each piece.
So for example, if you break up files into chunks of 1GB, you can validate each chunk in about 0.5s. Validating a 10GB file still takes 10 times this. You may be able to parallelize the calculation: you can have each core calculate a hash independently. Your RAM and I/O may not be able to keep up though.
The downside of this approach is that you have to pick a chunk size and a format for the hash list. There's no standard for that. The hash of a file will always be different if you change the chunk size: it's mathematically impossible to calculate the hash(C1+C2) from hash(C1) and hash(C2).
Best regards,
mbed-tls@lists.trustedfirmware.org