Re: [mbed-tls] mbedtls_x509write_crt_set_subject_name, mbedtls_x509_dn_gets fail to handle UTF8 multibyte properly

11 Jun 2020


      On 11/06/2020 11:24, Martin Man via mbed-tls wrote:
...
The code in mbedtls_x509_dn_gets fails to properly handle the UTF-8
multibyte sequence 0xe2 0x80 0x99 and turns it into 0xe2 0x80 0x3f.
There is a fix floating around development branch mentioned
here https://github.com/ARMmbed/mbedtls/pull/3326/files%C2%A0which
essentially replaces all control chars with question marks.
I think this is a bug and the dn_gets should simply leave the UTF-8
multibyte untouched when parsing it out from a field tagged with ASN.1
tag 12 (utf-8).
That code is from an earlier era (mid 2000s, I think) when most systems
used an 8-bit encoding, but non-8-bit-clean systems were still common. A
'\x80' in text might be transformed to '\x00' with disastrous consequences.
But over a decade later, I don't think non-8-bit-clean systems are a
concern anymore. Leaving all non-ASCII characters alone sounds
reasonable to me.
We are not going to do Unicode normalization in Mbed TLS: that would be
far too complex for a library that runs on systems with ~1e5 bytes
available for code. So Unicode strings would only be processed correctly
if the application passes normalized strings and CAs only generate
certificates with normalized strings. But that would be an improvement
on converting non-ASCII characters to '?'.
-- 
Gilles Peskine
Mbed TLS developer

2026

2025

2024

2023

2022

2021

2020

Re: [mbed-tls] mbedtls_x509write_crt_set_subject_name, mbedtls_x509_dn_gets fail to handle UTF8 multibyte properly