Hi James,
Looking around, top performance for SHA256 on an M1 seems to be a little
over 2GB/s. Mbed TLS can get close; at this point it probably depends
more on achieving efficient (prefetch-friendly) memory and storage
accesses than the speed of the hashes.
You can get similar speeds on x86, but you'll need to use an
implementation with hardware acceleration. Mbed TLS doesn't have
acceleration for hashes on x86.
If you want to validate file chunks quickly, you need to store the hash
of each chunk separately. You can verify the hash of a file by hashing
the list of hashes of the chunks. This is a standard technique, used by
many synchronization tools. For example, Bittorrent can exchange files
one piece at a time because the torrent file contains a hash for each piece.
So for example, if you break up files into chunks of 1GB, you can
validate each chunk in about 0.5s. Validating a 10GB file still takes 10
times this. You may be able to parallelize the calculation: you can have
each core calculate a hash independently. Your RAM and I/O may not be
able to keep up though.
The downside of this approach is that you have to pick a chunk size and
a format for the hash list. There's no standard for that. The hash of a
file will always be different if you change the chunk size: it's
mathematically impossible to calculate the hash(C1+C2) from hash(C1) and
hash(C2).
Best regards,
--
Gilles Peskine
Mbed TLS developer
On 24/10/2022 13:35, James Liu via mbed-tls wrote:
> Hi,
>
> Thanks for the tip. I test mbedtls-3.2.1 in M1 by adding two options
> in mbedtls_config.h:
>
> MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT,
> MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT.
>
> There are substantial improvements on two big files using sha256:
> CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): (before)*5.9
> sec,*(after)*32 sec*
> CentOS-8.5.2111-x86_64-boot.iso (10.79 GB): (before) *78 sec,
> *(after)*41 sec*
> *
> *
> But the problem I'm trying to solve is still there:
> 1) sha256 incurs high overhead on big files (less than a few seconds
> are desired), considering there are many big files to process in real
> time;
> 2) not sure if tuning could work in x86.
>
> Is it possible to slice a big file into chunks and compute hash
> separately and merge? I guess other crypto libraries or utilities
> have same overhead on big files.
>
> Regards
>
> Tom Cosgrove
Tom.Cosgrove@arm.com 于2022年10月24日周一 16:24写道:
>
> Hi
>
> > I use same code with mbedtls-3.1.0 to run tests in x86, and
> performance is still downgraded
>
> Mbed TLS has no acceleration for SHA-256 on x86 or x86_64 -
> optional or otherwise - it just uses C code. So this is as expected.
>
> Thanks
>
> Tom
>
> ------------------------------------------------------------------------
> *From:* Liu James via mbed-tls
mbed-tls@lists.trustedfirmware.org
> *Sent:* 22 October 2022 10:28
> *To:* mbed-tls@lists.trustedfirmware.org
>
mbed-tls@lists.trustedfirmware.org
> *Subject:* [mbed-tls] Performance tuning of SHA256 on big files
> Hi,
>
> This is an updated post from
>
https://github.com/Mbed-TLS/mbedtls/issues/6464, which should be
> posted in mbedtls mail list.
>
> My question is how to significantly improve SHA256 performance on
> big files (regardless of architectures).
>
> *=== Updates*
> I use same code with mbedtls-3.1.0 to run tests in x86, and
> performance is still downgraded.
>
> Mbed TLS version (number or commit id): *3.1.0*
> Operating system and version: * Centos-8.5, CPU 11900K*
> Configuration (if not default, please attach|mbedtls_config.h|):
> Compiler and options (if you used a pre-built binary, please
> indicate how you obtained it): *gcc/g++ 8.5*
> Additional environment information:
>
> *Test files and performance*
> CentOS-8.5.2111-x86_64-boot.iso (827.3 MB):|sha256|*5 sec*
> CentOS-8.5.2111-x86_64-boot.iso (10.79 GB):|sha256|*66 sec*
>
>
> Also, as advised I try to turn on
> "MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT " and
> "MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT" using mbedtls-3.2.0 in
> M1, but compiler reported the following error:
>
> CMake Error at library/CMakeLists.txt:257 (add_library):
> Cannot find source file:
>
> psa_crypto_driver_wrappers.c
>
> Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm
> .ixx .cppm .h
> .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03
> .hip .ispc
>
>
> CMake Error at library/CMakeLists.txt:257 (add_library):
> No SOURCES given to target: mbedcrypto
>
>
> Thanks for your help.
>
> *=== Original message at github*
>
>
> Summary
>
> |sha256()|and|sha1()|incurs significant overhead on big files(~1G
> above).*This might not be an issue*, and I'm looking for an
> efficient way to calculate hash on big files.
>
>
> System information
>
> Mbed TLS version (number or commit id): 3.1.0
> Operating system and version: M1 OSX
> Configuration (if not default, please attach|mbedtls_config.h|):
> Compiler and options (if you used a pre-built binary, please
> indicate how you obtained it): Clang++
> Additional environment information:
>
>
> Expected behavior
>
> Fast calculation of big files in less than 1 second
>
>
> Actual behavior
>
> Test files:
> CentOS-8.5.2111-x86_64-boot.iso (827.3 MB):|sha1|*3.3
> sec*,|sha256|*5.9 sec*
> CentOS-8.5.2111-x86_64-boot.iso (10.79 GB):|sha1|*40
> sec*,|sha256|*78 sec*
>
>
> Steps to reproduce
>
> ISO files can be downloaded
> at:
http://ftp.iij.ad.jp/pub/linux/centos-vault/8.5.2111/isos/x86_64/
>
> Make sure use fast disk, say nvme, to store ISO files, or else
> loading big files could take lots of time. Also
> use|user|from|time|command to measure performance.
>
> Workable code of sha256:
>
> |string test_sha256(string file_path) { mbedtls_sha256_context
> ctx; FILE *fp; string output; int BUFFER_SIZE = 4096; uint8_t
> buffer[BUFFER_SIZE]; size_t read, k_bytes; uint8_t hash[32];
> mbedtls_sha256_init(&ctx); mbedtls_sha256_starts(&ctx, 0); fp =
> fopen(file_path.c_str(), "r"); if (fp == NULL) {
> mbedtls_sha256_free(&ctx); return output; } while ((read =
> fread(buffer, 1, BUFFER_SIZE, fp))) { mbedtls_sha256_update(&ctx,
> buffer, read); } mbedtls_sha256_finish(&ctx, hash);
> mbedtls_sha256_free(&ctx); fclose(fp); // update hash string, omit
> here return output; }|
>
>
>