Re: [mbed-tls] mbedtls_x509write_crt_set_subject_name, mbedtls_x509_dn_gets fail to handle UTF8 multibyte properly

11 Jun 2020


      ...
On 11 Jun 2020, at 12:09, Gilles Peskine via mbed-tls mbed-tls@lists.trustedfirmware.org wrote:
On 11/06/2020 11:24, Martin Man via mbed-tls wrote:
...
I think this is a bug and the dn_gets should simply leave the UTF-8
multibyte untouched when parsing it out from a field tagged with ASN.1
tag 12 (utf-8).
We are not going to do Unicode normalization in Mbed TLS: that would be
far too complex for a library that runs on systems with ~1e5 bytes
available for code. So Unicode strings would only be processed correctly
if the application passes normalized strings and CAs only generate
certificates with normalized strings. But that would be an improvement
on converting non-ASCII characters to '?'.
Definitely agree that normalization is not needed. I think this problem could be split into two parts:
1) When a const char* is passed into mbedtls_x509write_crt_set_subject_name, the mbedtls will currently encode it into ASN tag 12 UTF8. Not sure what validation is done. But it could perhaps do at least a basic validation of what the C string passed in is to avoid generating a cert with crippled DN. Alternatively you can simply trust the developer to pass in correct UTF8 and document this. This is a API design decision of what input is allowed to be passed into the method and what validation is done on this.
2) When the mbedtls_x509_dn_gets extracts a C string from the ASN.1 tagged as 12, it could validate that it is indeed valid UTF-8, or just leave it as is and push it out. Again, this is about what we expect the library to do.
I’m not an expert on whether this can in any way be used to trick MBEDLTS to do bad things when sending in a malformed certificate, say a one where DN is encoded as UTF-8 but contains illegal UTF-8 in the payload.
thanks for listening,
Martin

2025

2024

2023

2022

2021

2020

Re: [mbed-tls] mbedtls_x509write_crt_set_subject_name, mbedtls_x509_dn_gets fail to handle UTF8 multibyte properly