On top of what Soby mentioned earlier (2b and 2c) TF-A team in Arm is working on following optimizations 

1. Remove NS EL2 unused init code in TF-A, This code exists in TF-A for legacy reasons where NS-EL2 is not present and EL3 directly boots to NS-EL1. In all latest platforms NS-EL2 is mandatory. 
    This code will be guarded under INIT_UNUSED_NS_EL2 for legacy support.

2. TF-A statically allocates context entry per world per CPU basis, containing EL3/EL2 and EL1 registers which makes it huge. There are few optimizations which can be done
      a). Move EL3 registers from CPU context to per world variable (as it is going to be same for all CPUs)

      b). EL1 part of the struct is only used in some configurations and it appears to be mutually exclusive with the EL2 part so it should be compiled away when not needed. So, in case of CTX_INCLUDE_EL2_REGS=1, do                        not save EL1 regs. (need to take care of few special cases)

      c). EL2 part of the struct is heavily feature dependent. But memory is allocated always regardless of how this option has been set. The context entries should be coupled with their respective feature enables and be
           optimized away when unused.

Thanks
Manish

From: Soby Mathew via TF-A <tf-a@lists.trustedfirmware.org>
Sent: 17 July 2023 14:05
To: Okash Khawaja <okash@google.com>; tf-a@lists.trustedfirmware.org <tf-a@lists.trustedfirmware.org>
Subject: [TF-A] Re: The increasing size of BL31
 
Hi Okash,
Thanks for raising this. There have been many discussions on these within the team. There are security tradeoffs when moving data structures from SRAM to DRAM as they are now susceptible to DDR attacks like row hammer.  So, any moving of data from SRAM need to be done analyzing the security tradeoffs. This decision would need to be platform specific based on Threat model of the deployment. Also, it would be good if TF-A can have a framework to easily allocate memory in DDR rather than via linker scripts as done today. One possible mitigation for DDR allocations is to have a mechanism to store the SHA in SRAM so that BL31 can verify the integrity (although PoC-PoU threat remains which needs to be mitigated separately).

From my PoV, these are some of the improvements that need to be made :

1. Any new Architectural features should not affect platforms which do not use the feature. Today, even though the code is compiled out with suitable ENABLE_FEAT_yyy flag, the data footprint cost is not removed. EL2 sysreg is a single data structure and any feature register increases the memory footprint : https://github.com/ARM-software/arm-trusted-firmware/blob/master/include/lib/el3_runtime/aarch64/context.h#L158

2. Make it easier for platforms to reclaim boot code and use as R/W data like stack. We did some work with the introduction `__init` attribute. There are probably more functions which can have `__init` attribute as TF-A has evolved from the initial work. I am not sure if any platform other than FVP is using this reclaim feature.

3. Add capability to reclaim boot time data at runtime. This could be marked by `__initdata` attribute. The init data that comes to mind are the rw data used for FCONF and perhaps XLAT_TABLE_LIB.

4. Remove the EL1 context when SPM is present in S-EL2. This needs Hafnium to be able to save incoming NS EL1 context which it doesn't have today. This will also help CCA systems as RMM also follows the same model and is potentially a huge saving as in a CCA system this can save 528 bytes of SRAM per CPU. So on a 64 CPU system this is ~30 Kb on a 64 CPU system. This will help any S-EL2/CCA systems to reduce the SRAM footprint.

Best Regards
Soby Mathew

> -----Original Message-----
> From: Okash Khawaja via TF-A <tf-a@lists.trustedfirmware.org>
> Sent: Monday, July 17, 2023 12:54 PM
> To: tf-a@lists.trustedfirmware.org
> Subject: [TF-A] The increasing size of BL31
>
> Hi,
>
> Typically, BL31 runs in SRAM which tends to be limited. As we add support for
> newer architectural features e.g. CCA, general features and standards, the size
> of BL31 image will grow and become harder to fit in most SRAMs.
>
> This email is to share ideas on how to address this problem.
>
> A simple approach will be to identify parts of NOBITS ELF sections of
> BL31 which can be moved out to DRAM. Since NOBITS sections aren't part of
> the file image, loading and authentication code doesn't have to change. The
> challenge will be to come up with some criteria to help decide what kind of
> buffers can be kept in DRAM vs SRAM.
>
> Other ideas are also welcome. Please share your thoughts.
>
> Thanks,
> Okash
> --
> TF-A mailing list -- tf-a@lists.trustedfirmware.org To unsubscribe send an
> email to tf-a-leave@lists.trustedfirmware.org
--
TF-A mailing list -- tf-a@lists.trustedfirmware.org
To unsubscribe send an email to tf-a-leave@lists.trustedfirmware.org