Thanks Tamas,
Did you try with CC312 disabled as well? I am wondering if you would be able to at least reproduce what I saw.
Ray
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> On Behalf Of Tamas Ban via TF-M
Sent: Thursday, March 19, 2020 4:30 AM
To: tf-m(a)lists.trustedfirmware.org
Cc: nd <nd(a)arm.com>
Subject: [TF-M] FW: 2048-bit RSA Keypair generation is slow
CAUTION: This email originated from outside of the organization. Do not click links or open attachments unless you recognize the sender and know the content is safe.
Hi,
We did measurements on Musca-B1 with enabled CC312. We did 10 run with both compiler( armclang and gnuarm). RSA2048 key generation takes 10-35 sec. In average gnuarm seems a bit slower. I suspect the reason behind that random number still need to be tested that are they fit for rsa2048 generation (might are they primes?) and the testing algorithm is might faster with armclang.
But for sure we have not seen any run which could take minutes.
Tamas
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Adrian Shaw via TF-M
Sent: 19 March 2020 12:17
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>; Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Cc: nd <nd(a)arm.com<mailto:nd@arm.com>>; tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: Re: [TF-M] 2048-bit RSA Keypair generation is slow
If the platform has on-chip NVM then you should consider using NV seed entropy. You still need a TRNG to get the initial entropy, but seems like it could improve average runtime performance. I believe there's a macro like MBEDTLS_ENTROPY_NV_SEED that you have to enable, as well as providing the hooks to read/write the flash.
Adrian
________________________________
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> on behalf of Soby Mathew via TF-M <tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>>
Sent: 18 March 2020 16:24
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Cc: nd <nd(a)arm.com<mailto:nd@arm.com>>; tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org> <tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>>
Subject: Re: [TF-M] 2048-bit RSA Keypair generation is slow
Hi Ray,
That is good information. The software RNG is not truly random and is not production quality. The predictable timing could be an indicator that the randomness is reproducible. That's all I know, but the mbed-crypto team may have more inputs and they can be queried here : https://github.com/ARMmbed/mbed-crypto/issues
Best Regards
Soby Mathew
From: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Sent: 18 March 2020 15:33
To: Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Cc: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>; nd <nd(a)arm.com<mailto:nd@arm.com>>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Soby,
Musca-B1 has CC312 support which will introduce an rng for this purpose. Unfortunately, I didn't test Musca-B1 with cc312 enabled and I don't have a board now to test with.
However, I did test with a h/w rng on the PSoC64 and the performance isn't always better. When using s/w, the time it takes was quite consistent at 1.5 minutes or 5.5 minutes depending on toolchain. After introducing the h/w rng, I've seen it finish quickly (less than a minute) but I've also seen it take 15 minutes.
Thanks,
Ray
From: Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Sent: Wednesday, March 18, 2020 4:44 AM
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Cc: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>; nd <nd(a)arm.com<mailto:nd@arm.com>>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Raymond
Thanks for bring this to the forum. When I did the migration to the latest mbed-crypto tag, I faced the same issue. After couple of discussion within the team and mbedcrypto, this is the understanding I have.
Currently mbedcrypto is configured to use NULL entropy. Hence this is the print message when building mbedcrypto :
#warning "**** WARNING! MBEDTLS_TEST_NULL_ENTROPY defined! "
This means that mbedcrypto uses a software algorithm to generate randomness. It iterates this algorithm till it can generate enough randomness for the keypair generation and this is the reason for the delay.
If the hardware has an entropy source which can be used for this random number generation, then I understand the performance will be much better. But this will need suitable configuration to be enabled from TF-M.
I am not much familiar with hardware architecture of M-Class systems. One of the questions I have to the forum is, do we have such entropy sources on the hardware? Then it is worth scheduling some task to investigate how to abstract/use the entropy source.
Best Regards
Soby Mathew
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Raymond Ngun via TF-M
Sent: 17 March 2020 21:02
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: [TF-M] 2048-bit RSA Keypair generation is slow
Hi all,
When testing the PSA API Crypto Dev API test suite, I noticed that psa_generate_key() is awfully slow when generating a 2048-bit RSA keypair and the performance is different depending on toolchain used. I've run these tests on Musca-B1 and with GCC, key generation generally completes in 25s but an armclang build requires 2.5 minutes to complete. This is more pronounced on PSoC64 where we are building for armv6m. Here, the key generation can take up to 5.5 minutes and I've seen it take even longer - I assume this is due to values returned by rng.
Anybody have comments on the performance of psa_generate_key() or more specifically mbedtls_rsa_gen_key()? What can be done to help improve the performance of this software implementation?
Thank you,
Ray
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
Hi Thomas,
Thanks your mail.
For your 1st question. There is indeed a problem to build the PSA arch crypto test on the Musca_a board for the RAM size is too small. I suggest you split the crypto tests into different test images on the Musca_a board.
For the 2nd question, I only test that on FVP AN521 but never met this problem. We will test that on the MPS2 AN521 board soon to check if it is a real problem. If yes, we will try to fix that quickly.
Thanks,
Edison
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> On Behalf Of Thomas Törnblom via TF-M
Sent: Wednesday, March 18, 2020 9:38 PM
To: tf-m(a)lists.trustedfirmware.org
Subject: [TF-M] psa-arch-tests console baud rate
Triggered by Rays mail on slow RSA key pair generation, I built and tried to run the tests, but ran into a couple of issues:
1) I attempted to build it for the Musca A, but apparently that doesn't have enough RAM to run these tests
2) Switching to an MPS2+ with AN521 configuration I see that the baud rate on the UART appears wrong when running the tests.
mcuboot produces correct messages at 115200 bps, as does the normal ConfigRegression* tests, but the ConfigPsaApiTest* appears to produce just garbage.
Is there a setting I need to set to get any useful data on the terminal when running the ConfigPsaApiTests?
Cheers,
Thomas
--
Thomas T�rnblom, Product Engineer
IAR Systems AB
Box 23051, Strandbodgatan 1
SE-750 23 Uppsala, SWEDEN
Mobile: +46 76 180 17 80 Fax: +46 18 16 78 01
E-mail: thomas.tornblom(a)iar.com<mailto:thomas.tornblom@iar.com> Website: www.iar.com<http://www.iar.com>
Twitter: www.twitter.com/iarsystems<http://www.twitter.com/iarsystems>
Note that you'll still to re-seed from the TRNG periodically, but doesn't have to be done every time the application needs a random number.
Adrian
________________________________
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> on behalf of Adrian Shaw via TF-M <tf-m(a)lists.trustedfirmware.org>
Sent: 19 March 2020 11:16
To: Raymond Ngun <Raymond.Ngun(a)cypress.com>; Soby Mathew <Soby.Mathew(a)arm.com>
Cc: nd <nd(a)arm.com>; tf-m(a)lists.trustedfirmware.org <tf-m(a)lists.trustedfirmware.org>
Subject: Re: [TF-M] 2048-bit RSA Keypair generation is slow
If the platform has on-chip NVM then you should consider using NV seed entropy. You still need a TRNG to get the initial entropy, but seems like it could improve average runtime performance. I believe there's a macro like MBEDTLS_ENTROPY_NV_SEED that you have to enable, as well as providing the hooks to read/write the flash.
Adrian
________________________________
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> on behalf of Soby Mathew via TF-M <tf-m(a)lists.trustedfirmware.org>
Sent: 18 March 2020 16:24
To: Raymond Ngun <Raymond.Ngun(a)cypress.com>
Cc: nd <nd(a)arm.com>; tf-m(a)lists.trustedfirmware.org <tf-m(a)lists.trustedfirmware.org>
Subject: Re: [TF-M] 2048-bit RSA Keypair generation is slow
Hi Ray,
That is good information. The software RNG is not truly random and is not production quality. The predictable timing could be an indicator that the randomness is reproducible. That’s all I know, but the mbed-crypto team may have more inputs and they can be queried here : https://github.com/ARMmbed/mbed-crypto/issues
Best Regards
Soby Mathew
From: Raymond Ngun <Raymond.Ngun(a)cypress.com>
Sent: 18 March 2020 15:33
To: Soby Mathew <Soby.Mathew(a)arm.com>
Cc: tf-m(a)lists.trustedfirmware.org; nd <nd(a)arm.com>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Soby,
Musca-B1 has CC312 support which will introduce an rng for this purpose. Unfortunately, I didn’t test Musca-B1 with cc312 enabled and I don’t have a board now to test with.
However, I did test with a h/w rng on the PSoC64 and the performance isn’t always better. When using s/w, the time it takes was quite consistent at 1.5 minutes or 5.5 minutes depending on toolchain. After introducing the h/w rng, I’ve seen it finish quickly (less than a minute) but I’ve also seen it take 15 minutes.
Thanks,
Ray
From: Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Sent: Wednesday, March 18, 2020 4:44 AM
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Cc: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>; nd <nd(a)arm.com<mailto:nd@arm.com>>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Raymond
Thanks for bring this to the forum. When I did the migration to the latest mbed-crypto tag, I faced the same issue. After couple of discussion within the team and mbedcrypto, this is the understanding I have.
Currently mbedcrypto is configured to use NULL entropy. Hence this is the print message when building mbedcrypto :
#warning "**** WARNING! MBEDTLS_TEST_NULL_ENTROPY defined! "
This means that mbedcrypto uses a software algorithm to generate randomness. It iterates this algorithm till it can generate enough randomness for the keypair generation and this is the reason for the delay.
If the hardware has an entropy source which can be used for this random number generation, then I understand the performance will be much better. But this will need suitable configuration to be enabled from TF-M.
I am not much familiar with hardware architecture of M-Class systems. One of the questions I have to the forum is, do we have such entropy sources on the hardware? Then it is worth scheduling some task to investigate how to abstract/use the entropy source.
Best Regards
Soby Mathew
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Raymond Ngun via TF-M
Sent: 17 March 2020 21:02
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: [TF-M] 2048-bit RSA Keypair generation is slow
Hi all,
When testing the PSA API Crypto Dev API test suite, I noticed that psa_generate_key() is awfully slow when generating a 2048-bit RSA keypair and the performance is different depending on toolchain used. I’ve run these tests on Musca-B1 and with GCC, key generation generally completes in 25s but an armclang build requires 2.5 minutes to complete. This is more pronounced on PSoC64 where we are building for armv6m. Here, the key generation can take up to 5.5 minutes and I’ve seen it take even longer – I assume this is due to values returned by rng.
Anybody have comments on the performance of psa_generate_key() or more specifically mbedtls_rsa_gen_key()? What can be done to help improve the performance of this software implementation?
Thank you,
Ray
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi,
We did measurements on Musca-B1 with enabled CC312. We did 10 run with both compiler( armclang and gnuarm). RSA2048 key generation takes 10-35 sec. In average gnuarm seems a bit slower. I suspect the reason behind that random number still need to be tested that are they fit for rsa2048 generation (might are they primes?) and the testing algorithm is might faster with armclang.
But for sure we have not seen any run which could take minutes.
Tamas
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Adrian Shaw via TF-M
Sent: 19 March 2020 12:17
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>; Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Cc: nd <nd(a)arm.com<mailto:nd@arm.com>>; tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: Re: [TF-M] 2048-bit RSA Keypair generation is slow
If the platform has on-chip NVM then you should consider using NV seed entropy. You still need a TRNG to get the initial entropy, but seems like it could improve average runtime performance. I believe there's a macro like MBEDTLS_ENTROPY_NV_SEED that you have to enable, as well as providing the hooks to read/write the flash.
Adrian
________________________________
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> on behalf of Soby Mathew via TF-M <tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>>
Sent: 18 March 2020 16:24
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Cc: nd <nd(a)arm.com<mailto:nd@arm.com>>; tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org> <tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>>
Subject: Re: [TF-M] 2048-bit RSA Keypair generation is slow
Hi Ray,
That is good information. The software RNG is not truly random and is not production quality. The predictable timing could be an indicator that the randomness is reproducible. That's all I know, but the mbed-crypto team may have more inputs and they can be queried here : https://github.com/ARMmbed/mbed-crypto/issues
Best Regards
Soby Mathew
From: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Sent: 18 March 2020 15:33
To: Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Cc: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>; nd <nd(a)arm.com<mailto:nd@arm.com>>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Soby,
Musca-B1 has CC312 support which will introduce an rng for this purpose. Unfortunately, I didn't test Musca-B1 with cc312 enabled and I don't have a board now to test with.
However, I did test with a h/w rng on the PSoC64 and the performance isn't always better. When using s/w, the time it takes was quite consistent at 1.5 minutes or 5.5 minutes depending on toolchain. After introducing the h/w rng, I've seen it finish quickly (less than a minute) but I've also seen it take 15 minutes.
Thanks,
Ray
From: Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Sent: Wednesday, March 18, 2020 4:44 AM
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Cc: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>; nd <nd(a)arm.com<mailto:nd@arm.com>>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Raymond
Thanks for bring this to the forum. When I did the migration to the latest mbed-crypto tag, I faced the same issue. After couple of discussion within the team and mbedcrypto, this is the understanding I have.
Currently mbedcrypto is configured to use NULL entropy. Hence this is the print message when building mbedcrypto :
#warning "**** WARNING! MBEDTLS_TEST_NULL_ENTROPY defined! "
This means that mbedcrypto uses a software algorithm to generate randomness. It iterates this algorithm till it can generate enough randomness for the keypair generation and this is the reason for the delay.
If the hardware has an entropy source which can be used for this random number generation, then I understand the performance will be much better. But this will need suitable configuration to be enabled from TF-M.
I am not much familiar with hardware architecture of M-Class systems. One of the questions I have to the forum is, do we have such entropy sources on the hardware? Then it is worth scheduling some task to investigate how to abstract/use the entropy source.
Best Regards
Soby Mathew
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Raymond Ngun via TF-M
Sent: 17 March 2020 21:02
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: [TF-M] 2048-bit RSA Keypair generation is slow
Hi all,
When testing the PSA API Crypto Dev API test suite, I noticed that psa_generate_key() is awfully slow when generating a 2048-bit RSA keypair and the performance is different depending on toolchain used. I've run these tests on Musca-B1 and with GCC, key generation generally completes in 25s but an armclang build requires 2.5 minutes to complete. This is more pronounced on PSoC64 where we are building for armv6m. Here, the key generation can take up to 5.5 minutes and I've seen it take even longer - I assume this is due to values returned by rng.
Anybody have comments on the performance of psa_generate_key() or more specifically mbedtls_rsa_gen_key()? What can be done to help improve the performance of this software implementation?
Thank you,
Ray
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
If the platform has on-chip NVM then you should consider using NV seed entropy. You still need a TRNG to get the initial entropy, but seems like it could improve average runtime performance. I believe there's a macro like MBEDTLS_ENTROPY_NV_SEED that you have to enable, as well as providing the hooks to read/write the flash.
Adrian
________________________________
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> on behalf of Soby Mathew via TF-M <tf-m(a)lists.trustedfirmware.org>
Sent: 18 March 2020 16:24
To: Raymond Ngun <Raymond.Ngun(a)cypress.com>
Cc: nd <nd(a)arm.com>; tf-m(a)lists.trustedfirmware.org <tf-m(a)lists.trustedfirmware.org>
Subject: Re: [TF-M] 2048-bit RSA Keypair generation is slow
Hi Ray,
That is good information. The software RNG is not truly random and is not production quality. The predictable timing could be an indicator that the randomness is reproducible. That’s all I know, but the mbed-crypto team may have more inputs and they can be queried here : https://github.com/ARMmbed/mbed-crypto/issues
Best Regards
Soby Mathew
From: Raymond Ngun <Raymond.Ngun(a)cypress.com>
Sent: 18 March 2020 15:33
To: Soby Mathew <Soby.Mathew(a)arm.com>
Cc: tf-m(a)lists.trustedfirmware.org; nd <nd(a)arm.com>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Soby,
Musca-B1 has CC312 support which will introduce an rng for this purpose. Unfortunately, I didn’t test Musca-B1 with cc312 enabled and I don’t have a board now to test with.
However, I did test with a h/w rng on the PSoC64 and the performance isn’t always better. When using s/w, the time it takes was quite consistent at 1.5 minutes or 5.5 minutes depending on toolchain. After introducing the h/w rng, I’ve seen it finish quickly (less than a minute) but I’ve also seen it take 15 minutes.
Thanks,
Ray
From: Soby Mathew <Soby.Mathew(a)arm.com<mailto:Soby.Mathew@arm.com>>
Sent: Wednesday, March 18, 2020 4:44 AM
To: Raymond Ngun <Raymond.Ngun(a)cypress.com<mailto:Raymond.Ngun@cypress.com>>
Cc: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>; nd <nd(a)arm.com<mailto:nd@arm.com>>
Subject: RE: 2048-bit RSA Keypair generation is slow
Hi Raymond
Thanks for bring this to the forum. When I did the migration to the latest mbed-crypto tag, I faced the same issue. After couple of discussion within the team and mbedcrypto, this is the understanding I have.
Currently mbedcrypto is configured to use NULL entropy. Hence this is the print message when building mbedcrypto :
#warning "**** WARNING! MBEDTLS_TEST_NULL_ENTROPY defined! "
This means that mbedcrypto uses a software algorithm to generate randomness. It iterates this algorithm till it can generate enough randomness for the keypair generation and this is the reason for the delay.
If the hardware has an entropy source which can be used for this random number generation, then I understand the performance will be much better. But this will need suitable configuration to be enabled from TF-M.
I am not much familiar with hardware architecture of M-Class systems. One of the questions I have to the forum is, do we have such entropy sources on the hardware? Then it is worth scheduling some task to investigate how to abstract/use the entropy source.
Best Regards
Soby Mathew
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Raymond Ngun via TF-M
Sent: 17 March 2020 21:02
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: [TF-M] 2048-bit RSA Keypair generation is slow
Hi all,
When testing the PSA API Crypto Dev API test suite, I noticed that psa_generate_key() is awfully slow when generating a 2048-bit RSA keypair and the performance is different depending on toolchain used. I’ve run these tests on Musca-B1 and with GCC, key generation generally completes in 25s but an armclang build requires 2.5 minutes to complete. This is more pronounced on PSoC64 where we are building for armv6m. Here, the key generation can take up to 5.5 minutes and I’ve seen it take even longer – I assume this is due to values returned by rng.
Anybody have comments on the performance of psa_generate_key() or more specifically mbedtls_rsa_gen_key()? What can be done to help improve the performance of this software implementation?
Thank you,
Ray
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Hi Raymond
Thanks for bring this to the forum. When I did the migration to the latest mbed-crypto tag, I faced the same issue. After couple of discussion within the team and mbedcrypto, this is the understanding I have.
Currently mbedcrypto is configured to use NULL entropy. Hence this is the print message when building mbedcrypto :
#warning "**** WARNING! MBEDTLS_TEST_NULL_ENTROPY defined! "
This means that mbedcrypto uses a software algorithm to generate randomness. It iterates this algorithm till it can generate enough randomness for the keypair generation and this is the reason for the delay.
If the hardware has an entropy source which can be used for this random number generation, then I understand the performance will be much better. But this will need suitable configuration to be enabled from TF-M.
I am not much familiar with hardware architecture of M-Class systems. One of the questions I have to the forum is, do we have such entropy sources on the hardware? Then it is worth scheduling some task to investigate how to abstract/use the entropy source.
Best Regards
Soby Mathew
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> On Behalf Of Raymond Ngun via TF-M
Sent: 17 March 2020 21:02
To: tf-m(a)lists.trustedfirmware.org
Subject: [TF-M] 2048-bit RSA Keypair generation is slow
Hi all,
When testing the PSA API Crypto Dev API test suite, I noticed that psa_generate_key() is awfully slow when generating a 2048-bit RSA keypair and the performance is different depending on toolchain used. I've run these tests on Musca-B1 and with GCC, key generation generally completes in 25s but an armclang build requires 2.5 minutes to complete. This is more pronounced on PSoC64 where we are building for armv6m. Here, the key generation can take up to 5.5 minutes and I've seen it take even longer - I assume this is due to values returned by rng.
Anybody have comments on the performance of psa_generate_key() or more specifically mbedtls_rsa_gen_key()? What can be done to help improve the performance of this software implementation?
Thank you,
Ray
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
Triggered by Rays mail on slow RSA key pair generation, I built and
tried to run the tests, but ran into a couple of issues:
1) I attempted to build it for the Musca A, but apparently that doesn't
have enough RAM to run these tests
2) Switching to an MPS2+ with AN521 configuration I see that the baud
rate on the UART appears wrong when running the tests.
mcuboot produces correct messages at 115200 bps, as does the normal
ConfigRegression* tests, but the ConfigPsaApiTest* appears to produce
just garbage.
Is there a setting I need to set to get any useful data on the terminal
when running the ConfigPsaApiTests?
Cheers,
Thomas
--
*Thomas Törnblom*, /Product Engineer/
IAR Systems AB
Box 23051, Strandbodgatan 1
SE-750 23 Uppsala, SWEDEN
Mobile: +46 76 180 17 80 Fax: +46 18 16 78 01
E-mail: thomas.tornblom(a)iar.com <mailto:thomas.tornblom@iar.com>
Website: www.iar.com <http://www.iar.com>
Twitter: www.twitter.com/iarsystems <http://www.twitter.com/iarsystems>
Hi all,
When testing the PSA API Crypto Dev API test suite, I noticed that psa_generate_key() is awfully slow when generating a 2048-bit RSA keypair and the performance is different depending on toolchain used. I've run these tests on Musca-B1 and with GCC, key generation generally completes in 25s but an armclang build requires 2.5 minutes to complete. This is more pronounced on PSoC64 where we are building for armv6m. Here, the key generation can take up to 5.5 minutes and I've seen it take even longer - I assume this is due to values returned by rng.
Anybody have comments on the performance of psa_generate_key() or more specifically mbedtls_rsa_gen_key()? What can be done to help improve the performance of this software implementation?
Thank you,
Ray
This message and any attachments may contain confidential information from Cypress or its subsidiaries. If it has been received in error, please advise the sender and immediately delete this message.
Thanks,
The library model seems to treat the entire SPE as a single resource as it appears to lock and only allow a single entry at a time:
/* This is the "Big Lock" on the secure side, to guarantee single entry
* to SPE
*/
extern int32_t tfm_secure_lock;
Is this a limitation of the current implementation or is this the envisioned design for the library model?
If the latter, is there a technical/security reason why only a single entry is allowed? Or is it that no use-case to allow multiple entry has been discussed.
To back up a bit, we have use cases where I would like to see the NSPE RTOS make all the scheduling decisions and further have all the secure peripheral resources be managed individually with respect to blocking threads.
The IPC model seems to be built around having a scheduler in the SPM and the library model seems to preclude the ability to manage secure peripherals individually.
Erik Shreve, PSEM
Software Security Engineer & Architect (CMCU Platform Development)
From: TF-M [mailto:tf-m-bounces@lists.trustedfirmware.org] On Behalf Of Mate Toth-Pal via TF-M
Sent: Friday, March 13, 2020 7:50 AM
To:
Cc: nd
Subject: [EXTERNAL] Re: [TF-M] Cooperative Scheduling Rules and CMSIS tz_context
Hi Erik,
Yes, the two working models are mutually exclusive, and the model is selected compile time, by choosing a corresponding config cmake file.
I'm afraid there isn't a comprehensive specification for the Library model, like for IPC model.
For examples on calling services in library model, you can have a look in the Secure Service API implementations. Look in interface/src for NS, and in the directory of the services, for example services/initial_attestation/tfm_attestation_secure_api.c for secure examples.
The working model specific code parts that are in common files are (across the whole TF-M codebase) guarded by the TFM_PSA_API define:
#ifdef TFM_PSA_API
/* IPC model implementation */
#else /* TFM_PSA_API */
/* Library model implementation */
#endif /* TFM_PSA_API */
If a source file is IPC model specific it has 'ipc' in its name, or is in a directory called 'ipc'. Library model specific files have 'func' in their name.
Regards,
Mate
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> On Behalf Of Shreve, Erik via TF-M
Sent: Friday, March 13, 2020 1:34 PM
To: tf-m(a)lists.trustedfirmware.org
Subject: Re: [TF-M] Cooperative Scheduling Rules and CMSIS tz_context
Mate,
Thanks for providing your time to respond. Understanding there are two models in play helps a lot. A few follow up questions...
Can you confirm that the models are mutually exclusive?
This page indicates to me that they are mutually exclusive: https://ci.trustedfirmware.org/job/tf-m-build-test-nightly/lastSuccessfulBu…
Or are there provisions to allow a system to run both models on the _same_ execution node? For example running both on the same v7 or v8 with TrustZone core.
If there are provisions for this, where can I go to get a better understanding of them?
Also, is there any spec for the library model beyond the API descriptions in the CMSIS documentation (https://arm-software.github.io/CMSIS_5/Core/html/using_TrustZone_pg.html#RT…
If not, then I'll rely on reading the existing code.
Please feel free to point me more documentation if I can get answers there. (I don't mind a friendly "read the manual" or "run this example code.")
Thanks again for your time.
Erik Shreve, PSEM
Software Security Engineer & Architect (CMCU Platform Development)
From: TF-M [mailto:tf-m-bounces@lists.trustedfirmware.org] On Behalf Of Mate Toth-Pal via TF-M
Sent: Thursday, March 12, 2020 2:34 PM
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Cc: nd
Subject: [EXTERNAL] Re: [TF-M] Cooperative Scheduling Rules and CMSIS tz_context
Hi Erik,
Your mail contains quite a few questions. I'll try to give an overview in the topic you are asking, hoping that it is going to answer most (hopefully all) questions. If there is anything left unanswered, please follow up.
So currently TF-M supports two working model: IPC model, and Library model. Library model has no scheduling on the secure side, secure services are called like normal functions.
The IPC model contains a simple scheduler. (IPC model is an implementation of the PSA Firmware Framework for M (PSA-FF-M): https://developer.arm.com/-/media/Files/pdf/PlatformSecurityArchitecture/Ar… ) In this model, all the secure partitions have a runtime context associated with them. They have a separate stack, and their runtime state is saved when they are not scheduled by the Secure scheduler. This means that the number of secure contexts is determined compile time.
'Cooperative Scheduling Rules' document is a proposal of how a Non-Secure and a Secure scheduler should work together, to efficiently handle secure and non-secure asynchronous events. Currently The Secure scheduler has no knowledge of the state of the NS scheduler, and vice-versa. There is a roadmap item 'Scheduler - Initial Support' (currently without deadline) at https://developer.trustedfirmware.org/w/tf_m/planning/ which refers to this.
The RTOS Thread Context Management API implementation had been added to TF-M, to support a feature called TF-M Non-Secure client identification. It is used by the
/**
* \brief Stores caller's client id in state context
*/
void tfm_core_get_caller_client_id_handler(const uint32_t svc_args[]);
API that is available for the secure services. Through this API secure services can distinguish Non-Secure clients.
TF-M manages a database of clients reported by the Non secure software (through the TZ_*APIs), and also tracks the ID of the currently active client. That client ID is returned by 'tfm_core_get_caller_client_id_handler' (in case the Secure Service was called from Non-Secure code)
Please note that the 'tfm_core_get_caller_client_id_handler' API is only supported in Library model. The TZ_* APIs are implemented in the NSPM module: secure_fw/core/tfm_nspm_func.c.
The IPC model's implementation of the NSPM module is empty: secure_fw/core/tfm_nspm_ipc.c
The original idea behind the RTOS Thread Context Management API is that the Secure context creation (allocating stack, and other supporting data structures) and destruction is initiated from the Non-Secure code. The number of active secure contexts is decided by the NS code in run time (of course if there is no more resource, the context creation fails). On scheduling, the non-secure SW notifies the secure code, to activate the context for the thread to be scheduled. This results in setting the secure stack pointer to the desired context (hence the "prepare"). After the NS Scheduler returns to the thread, the Secure or the Non-Secure code continues execution, depending on whether Secure, or Non-Secure execution had been interrupted by the NS scheduling.
Neither the single secure context design of the library model, and the above described design of the IPC model supports this approach, and that's why the semantics of the APIs in TF-M are different in TF-M.
Regards,
Mate
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Shreve, Erik via TF-M
Sent: Wednesday, March 11, 2020 3:07 PM
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: [TF-M] Cooperative Scheduling Rules and CMSIS tz_context
Hello everyone, I'm new to the list. Quick introduction: I work at Texas Instruments and I'm investigating TFM for use in some of our platforms here.
I'm trying to understand the scheduling and task architecture for PSA and the TFM implementation and how that intersects with NS tasks.
I found the following:
https://ci.trustedfirmware.org/job/tf-m-build-test-nightly/lastSuccessfulBu…https://arm-software.github.io/CMSIS_5/Core/html/group__context__trustzone_…
The existing TFM implementation doesn't seem to do much with the tz_context interface. It looks like it tracks which NS client is active, but I don't see where it uses that information to do anything.
Am I missing something?
I'd like to understand more about where TFM is going with the tz_context interface.
Will there be a tz_context focused scheduler? I can imagine one where the SPM associates NS clients with running Secure Partitions and resumes execution of a running a Secure Partition when the NS RTOS indicates the NS client is "active." But it isn't clear to me if this is the direction or not.
I note that the text for TZ_LoadContext_S says "**Prepare** the secure context for execution so that a thread in the non-secure state can call secure library modules." Preparing sounds different than immediate execution.
Are there other documents I can read or source implementations I can reference to get more of a handle on this?
Thanks,
Erik Shreve, PSEM
Software Security Engineer & Architect (CMCU Platform Development)
Texas Instruments
Hi Erik,
Yes, the two working models are mutually exclusive, and the model is selected compile time, by choosing a corresponding config cmake file.
I'm afraid there isn't a comprehensive specification for the Library model, like for IPC model.
For examples on calling services in library model, you can have a look in the Secure Service API implementations. Look in interface/src for NS, and in the directory of the services, for example services/initial_attestation/tfm_attestation_secure_api.c for secure examples.
The working model specific code parts that are in common files are (across the whole TF-M codebase) guarded by the TFM_PSA_API define:
#ifdef TFM_PSA_API
/* IPC model implementation */
#else /* TFM_PSA_API */
/* Library model implementation */
#endif /* TFM_PSA_API */
If a source file is IPC model specific it has 'ipc' in its name, or is in a directory called 'ipc'. Library model specific files have 'func' in their name.
Regards,
Mate
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org> On Behalf Of Shreve, Erik via TF-M
Sent: Friday, March 13, 2020 1:34 PM
To: tf-m(a)lists.trustedfirmware.org
Subject: Re: [TF-M] Cooperative Scheduling Rules and CMSIS tz_context
Mate,
Thanks for providing your time to respond. Understanding there are two models in play helps a lot. A few follow up questions...
Can you confirm that the models are mutually exclusive?
This page indicates to me that they are mutually exclusive: https://ci.trustedfirmware.org/job/tf-m-build-test-nightly/lastSuccessfulBu…
Or are there provisions to allow a system to run both models on the _same_ execution node? For example running both on the same v7 or v8 with TrustZone core.
If there are provisions for this, where can I go to get a better understanding of them?
Also, is there any spec for the library model beyond the API descriptions in the CMSIS documentation (https://arm-software.github.io/CMSIS_5/Core/html/using_TrustZone_pg.html#RT…
If not, then I'll rely on reading the existing code.
Please feel free to point me more documentation if I can get answers there. (I don't mind a friendly "read the manual" or "run this example code.")
Thanks again for your time.
Erik Shreve, PSEM
Software Security Engineer & Architect (CMCU Platform Development)
From: TF-M [mailto:tf-m-bounces@lists.trustedfirmware.org] On Behalf Of Mate Toth-Pal via TF-M
Sent: Thursday, March 12, 2020 2:34 PM
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Cc: nd
Subject: [EXTERNAL] Re: [TF-M] Cooperative Scheduling Rules and CMSIS tz_context
Hi Erik,
Your mail contains quite a few questions. I'll try to give an overview in the topic you are asking, hoping that it is going to answer most (hopefully all) questions. If there is anything left unanswered, please follow up.
So currently TF-M supports two working model: IPC model, and Library model. Library model has no scheduling on the secure side, secure services are called like normal functions.
The IPC model contains a simple scheduler. (IPC model is an implementation of the PSA Firmware Framework for M (PSA-FF-M): https://developer.arm.com/-/media/Files/pdf/PlatformSecurityArchitecture/Ar… ) In this model, all the secure partitions have a runtime context associated with them. They have a separate stack, and their runtime state is saved when they are not scheduled by the Secure scheduler. This means that the number of secure contexts is determined compile time.
'Cooperative Scheduling Rules' document is a proposal of how a Non-Secure and a Secure scheduler should work together, to efficiently handle secure and non-secure asynchronous events. Currently The Secure scheduler has no knowledge of the state of the NS scheduler, and vice-versa. There is a roadmap item 'Scheduler - Initial Support' (currently without deadline) at https://developer.trustedfirmware.org/w/tf_m/planning/ which refers to this.
The RTOS Thread Context Management API implementation had been added to TF-M, to support a feature called TF-M Non-Secure client identification. It is used by the
/**
* \brief Stores caller's client id in state context
*/
void tfm_core_get_caller_client_id_handler(const uint32_t svc_args[]);
API that is available for the secure services. Through this API secure services can distinguish Non-Secure clients.
TF-M manages a database of clients reported by the Non secure software (through the TZ_*APIs), and also tracks the ID of the currently active client. That client ID is returned by 'tfm_core_get_caller_client_id_handler' (in case the Secure Service was called from Non-Secure code)
Please note that the 'tfm_core_get_caller_client_id_handler' API is only supported in Library model. The TZ_* APIs are implemented in the NSPM module: secure_fw/core/tfm_nspm_func.c.
The IPC model's implementation of the NSPM module is empty: secure_fw/core/tfm_nspm_ipc.c
The original idea behind the RTOS Thread Context Management API is that the Secure context creation (allocating stack, and other supporting data structures) and destruction is initiated from the Non-Secure code. The number of active secure contexts is decided by the NS code in run time (of course if there is no more resource, the context creation fails). On scheduling, the non-secure SW notifies the secure code, to activate the context for the thread to be scheduled. This results in setting the secure stack pointer to the desired context (hence the "prepare"). After the NS Scheduler returns to the thread, the Secure or the Non-Secure code continues execution, depending on whether Secure, or Non-Secure execution had been interrupted by the NS scheduling.
Neither the single secure context design of the library model, and the above described design of the IPC model supports this approach, and that's why the semantics of the APIs in TF-M are different in TF-M.
Regards,
Mate
From: TF-M <tf-m-bounces(a)lists.trustedfirmware.org<mailto:tf-m-bounces@lists.trustedfirmware.org>> On Behalf Of Shreve, Erik via TF-M
Sent: Wednesday, March 11, 2020 3:07 PM
To: tf-m(a)lists.trustedfirmware.org<mailto:tf-m@lists.trustedfirmware.org>
Subject: [TF-M] Cooperative Scheduling Rules and CMSIS tz_context
Hello everyone, I'm new to the list. Quick introduction: I work at Texas Instruments and I'm investigating TFM for use in some of our platforms here.
I'm trying to understand the scheduling and task architecture for PSA and the TFM implementation and how that intersects with NS tasks.
I found the following:
https://ci.trustedfirmware.org/job/tf-m-build-test-nightly/lastSuccessfulBu…https://arm-software.github.io/CMSIS_5/Core/html/group__context__trustzone_…
The existing TFM implementation doesn't seem to do much with the tz_context interface. It looks like it tracks which NS client is active, but I don't see where it uses that information to do anything.
Am I missing something?
I'd like to understand more about where TFM is going with the tz_context interface.
Will there be a tz_context focused scheduler? I can imagine one where the SPM associates NS clients with running Secure Partitions and resumes execution of a running a Secure Partition when the NS RTOS indicates the NS client is "active." But it isn't clear to me if this is the direction or not.
I note that the text for TZ_LoadContext_S says "**Prepare** the secure context for execution so that a thread in the non-secure state can call secure library modules." Preparing sounds different than immediate execution.
Are there other documents I can read or source implementations I can reference to get more of a handle on this?
Thanks,
Erik Shreve, PSEM
Software Security Engineer & Architect (CMCU Platform Development)
Texas Instruments