Hi guys,

I’m new to the list and bringing the discussion over here from https://github.com/ARMmbed/mbedtls/issues/3413.

I’m creating a certificate using mbedtls and setting it’s issuer and and subject using the mbedtls_x509write_crt_set_subject_name, and mbedtls_x509write_crt_set_issuer_name.

The name passed in is in UTF8 and contains a sequence 0xe2 0x80 0x99 (apostrophe) in the CN string. If I debugged this correctly, the underlying sequence of bytes is correctly encoded in ASN.1 and tagged as 12 (UTF-8).

When I later parse the cert and try to extract its subject back using mbedtls_x509_dn_gets from crt.subject and crt.issuer the UTF-8 gets corrupted.

The code in mbedtls_x509_dn_gets fails to properly handle the UTF-8 multibyte sequence 0xe2 0x80 0x99 and turns it into 0xe2 0x80 0x3f.

There is a fix floating around development branch mentioned here https://github.com/ARMmbed/mbedtls/pull/3326/files which essentially replaces all control chars with question marks.

I think this is a bug and the dn_gets should simply leave the UTF-8 multibyte untouched when parsing it out from a field tagged with ASN.1 tag 12 (utf-8).

What’s your opinion?
Martin