Tegra has in tegra_pwr_domain_power_down_wfi() console_flush(); console_switch_state(0);
which is what none other has. Should console_flush() be called by default all the time when console is switched and also disabled when system goes down?
It seems that the Tegra platform intentionally wants to stop all console output when going into suspend and then turns it back on after resume. Not sure why they do that but there may be some platform-specific reason (e.g. maybe some code on resume would try to make it print to the UART before a clock for it is reenabled and that would make it hang or something).
If you care about seeing all console output during suspend, then generally you should explicitly call console_flush() somewhere at the end, yes. The suspend code is always platform-specific though so this is something every platform needs to implement independently. I assume most forget to do it because in practice, UART output tends to come out either way even without an explicit flush call.
Why console_switch_state(CONSOLE_FLAG_RUNTIME) is not called from bl31_main() when before bl31_plat_runtime_setup() is called we have console_flush() already?
I believe this may be a holdover from back when MULTI_CONSOLE_API was an optional feature that a platform had to enable explicitly. Now that it is the default, moving that call to bl31_main() would probably make sense.
I would like to understand what should be the right behavior. Why are platforms removing CRASH flag after registration? (I see that a lot of platforms are having private plat_crash_console_init() but pretty much crash console is the same with regular console). Why runtime console is setup directly in bl31_early_platform_setup2 when guidance is saying that it should be done much later?
I don't know about each platform and why they are overriding these flags, they may all have their own reasons. The default (BOOT | CRASH) should be right for most cases. If a console additionally sets the RUNTIME flag that just means TF-A will continue to print to it after the main operating system is booted (e.g. these might be platforms where the main operating system and TF-A use different UART ports so that they don't clash and there's no need to disable the TF-A output).
Many of the platforms that override the plat_crash_console functions may just be older platforms that implemented this manually (usually after the same standard pattern) before the default implementation was added. They could probably be fixed up to use the default instead but I assume nobody cares enough to put in that effort. It technically doesn't matter whether they disable CONSOLE_FLAG_CRASH (because when you override the plat_crash_console macros there's nothing looking at that flag anymore), but it may have still been done for clarity.
Also commit 63c52d0071ef ("plat/common/crash_console_helpers.S: Fix MULTI_CONSOLE_API support") removed CONSOLE_FLAG_CRASH from plat_crash_console_init but only from 64bit version. In 32bit version there is still there. It suggest that any C code should be called. Do we really need CONSOLE_FLAG_CRASH?
The 64bit version still respects the CRASH flag, it is just checked by plat_crash_console_putc() directly. Basically, before that patch, the crash console worked by just calling console_set_scope(CRASH) and then having plat_crash_console_putc() and plat_crash_console_flush() chain into the normal console_putc() and console_flush() functions. This worked back when the entire console framework was still written in assembly. Then somebody rewrote it in C, but that broke this crash console implementation because now console_putc() and console_flush() didn't work without a valid stack anymore. So the solution with that patch was to basically reimplement what the main console framework does (loop through all consoles and call the individual function pointers for them) in assembly. Those functions just check for CONSOLE_FLAG_CRASH directly so there was no reason to update console_set_scope() anymore.
The 32bit version is honestly just in disrepair and doesn't work. I only fixed up the 64bit version because I don't have a 32bit device to test with. In theory it shouldn't be hard to implement the equivalent to what the current 64bit version does there as well, but somebody else wrote the 32bit version and I guess they're no longer around to maintain it.