mbedtls_x509write_crt_set_subject_name, mbedtls_x509_dn_gets fail to handle UTF8 multibyte properly - mbed-tls

11 Jun 2020


      Hi guys,
I’m new to the list and bringing the discussion over here from https://github.com/ARMmbed/mbedtls/issues/3413 https://github.com/ARMmbed/mbedtls/issues/3413.
I’m creating a certificate using mbedtls and setting it’s issuer and and subject using the mbedtls_x509write_crt_set_subject_name, and mbedtls_x509write_crt_set_issuer_name.
The name passed in is in UTF8 and contains a sequence 0xe2 0x80 0x99 (apostrophe) in the CN string. If I debugged this correctly, the underlying sequence of bytes is correctly encoded in ASN.1 and tagged as 12 (UTF-8).
When I later parse the cert and try to extract its subject back using mbedtls_x509_dn_gets from crt.subject and crt.issuer the UTF-8 gets corrupted.
The code in mbedtls_x509_dn_gets fails to properly handle the UTF-8 multibyte sequence 0xe2 0x80 0x99 and turns it into 0xe2 0x80 0x3f.
There is a fix floating around development branch mentioned here https://github.com/ARMmbed/mbedtls/pull/3326/files https://github.com/ARMmbed/mbedtls/pull/3326/files which essentially replaces all control chars with question marks.
I think this is a bug and the dn_gets should simply leave the UTF-8 multibyte untouched when parsing it out from a field tagged with ASN.1 tag 12 (utf-8).
What’s your opinion?
Martin