Hi

> MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT

If you're building the software to run on a system that you know has the crypto extensions, then use MBEDTLS_SHA256_USE_A64_CRYPTO_ONLY -
it will be (marginally) faster. There are few aarch64 systems without the crypto extensions, but one of them is the Raspberry Pi, which is used widely.

> Is it possible to slice a big file into chunks and compute hash separately and merge?

No, the hash algorithms are sequential.

I've seen up to around 2 GB/s raw hashing speed (i.e. on data in memory) on Apple Silicon.

> int BUFFER_SIZE = 4096

That seems very short. Even though fread() is buffered, a quick google suggests a typical buffer size of 8 KB, which
means lots of calling into the kernel and context switches. I'd be inclined to read 512 MB at a time.

But if you want the fastest processing, the thing to do is benchmark the libraries you have access to (Mbed TLS, OpenSSL, WolfSSL come to mind)
on the different systems you have access to (aarch64, x86_64) and use the winner.

Thanks

Tom



From: James Liu <icefrog1950@gmail.com>
Sent: 24 October 2022 12:35
To: Tom Cosgrove <Tom.Cosgrove@arm.com>
Cc: mbed-tls@lists.trustedfirmware.org <mbed-tls@lists.trustedfirmware.org>
Subject: Re: [mbed-tls] Performance tuning of SHA256 on big files
 


Hi,

Thanks for the tip. I test mbedtls-3.2.1 in M1 by adding two options in mbedtls_config.h: 

MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT,  MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT.

There are substantial improvements on two big files using sha256:
CentOS-8.5.2111-x86_64-boot.iso (827.3 MB):  (before)  5.9 sec, (after) 32 sec
CentOS-8.5.2111-x86_64-boot.iso (10.79 GB):   (before) 78 sec,   (after) 41 sec

But the problem I'm trying to solve is still there:
1) sha256 incurs high overhead on big files (less than a few seconds are desired), considering there are many big files to process in real time;
2) not sure if tuning could work in x86.

Is it possible to slice a big file into chunks and compute hash separately and merge?  I guess other crypto libraries or utilities have same overhead on big files.

Regards

Tom Cosgrove <Tom.Cosgrove@arm.com> 于2022年10月24日周一 16:24写道:
Hi

> I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded

Mbed TLS has no acceleration for SHA-256 on x86 or x86_64 - optional or otherwise - it just uses C code. So this is as expected.

Thanks

Tom


From: Liu James via mbed-tls <mbed-tls@lists.trustedfirmware.org>
Sent: 22 October 2022 10:28
To: mbed-tls@lists.trustedfirmware.org <mbed-tls@lists.trustedfirmware.org>
Subject: [mbed-tls] Performance tuning of SHA256 on big files
 
Hi,

This is an updated post from https://github.com/Mbed-TLS/mbedtls/issues/6464, which should be posted in mbedtls mail list. 

My question is how to significantly improve SHA256 performance on big files (regardless of architectures).

=== Updates
I use same code with mbedtls-3.1.0 to run tests in x86, and performance is still downgraded.

Mbed TLS version (number or commit id): 3.1.0
Operating system and version:  Centos-8.5, CPU 11900K
Configuration (if not default, please attach mbedtls_config.h):
Compiler and options (if you used a pre-built binary, please indicate how you obtained it): gcc/g++ 8.5
Additional environment information:

Test files and performance
CentOS-8.5.2111-x86_64-boot.iso (827.3 MB): sha256  5 sec
CentOS-8.5.2111-x86_64-boot.iso (10.79 GB):  sha256  66 sec


Also, as advised I try to turn on "MBEDTLS_SHA256_USE_A64_CRYPTO_IF_PRESENT " and "MBEDTLS_SHA512_USE_A64_CRYPTO_IF_PRESENT" using mbedtls-3.2.0 in M1, but compiler reported the following error:

CMake Error at library/CMakeLists.txt:257 (add_library):
  Cannot find source file:

    psa_crypto_driver_wrappers.c

  Tried extensions .c .C .c++ .cc .cpp .cxx .cu .mpp .m .M .mm .ixx .cppm .h
  .hh .h++ .hm .hpp .hxx .in .txx .f .F .for .f77 .f90 .f95 .f03 .hip .ispc


CMake Error at library/CMakeLists.txt:257 (add_library):
  No SOURCES given to target: mbedcrypto


Thanks for your help.

=== Original message at github

Summary

sha256() and sha1() incurs significant overhead on big files(~1G above). This might not be an issue, and I'm looking for an efficient way to calculate hash on big files.

System information

Mbed TLS version (number or commit id): 3.1.0
Operating system and version: M1 OSX
Configuration (if not default, please attach mbedtls_config.h):
Compiler and options (if you used a pre-built binary, please indicate how you obtained it): Clang++
Additional environment information:

Expected behavior

Fast calculation of big files in less than 1 second

Actual behavior

Test files:
CentOS-8.5.2111-x86_64-boot.iso (827.3 MB):  sha1  3.3 sec, sha256  5.9 sec
CentOS-8.5.2111-x86_64-boot.iso (10.79 GB):  sha1  40 sec, sha256  78 sec

Steps to reproduce

ISO files can be downloaded at:  http://ftp.iij.ad.jp/pub/linux/centos-vault/8.5.2111/isos/x86_64/

Make sure use fast disk, say nvme, to store ISO files, or else loading big files could take lots of time. Also use user from time command to measure performance.

Workable code of sha256:

string test_sha256(string file_path)
{
    mbedtls_sha256_context ctx;
    FILE *fp;
    string output;
    int BUFFER_SIZE = 4096;
    uint8_t buffer[BUFFER_SIZE];
    size_t read, k_bytes;
    uint8_t hash[32];
    
    mbedtls_sha256_init(&ctx);
    mbedtls_sha256_starts(&ctx, 0);

    fp = fopen(file_path.c_str(), "r");
    if (fp == NULL)
    {
        mbedtls_sha256_free(&ctx);
        return output;
    }
    
    while ((read = fread(buffer, 1, BUFFER_SIZE, fp)))
    {
        mbedtls_sha256_update(&ctx, buffer, read);
    }

    mbedtls_sha256_finish(&ctx, hash);

    mbedtls_sha256_free(&ctx);
    fclose(fp);

    // update hash string, omit here

    return output;

}