Hi Sandeep,
For two world systems, applying the principle of least privilege, BL2 can complete its task running at S-EL1 so need to run it at EL3. This is explained by Dan in [1] "The main reason for running BL2 at S-EL1 is to minimise the amount
of code running at EL3, which is slightly more secure. Any other benefits are a side effect"
Few more reasons not to run BL2 at EL3
- BL2 is about loading components from flash to DRAM and using dangerous memcpys which can be harmful when image base and limits can be tricked by some means.
- BL2 can also have drivers which are a good source of implementation bugs and have no real reason to exist in EL3/secure monitor.
- S-EL1 cannot access EL3 registers
For four world systems, Since we can't complete all loader capability at S-EL1, we had to run it at EL3(as GPT HW is only accessible at EL3).
Having said that, it's possible to re-factor BL2 in two parts, one running at EL3 and the other running S-EL1. Considering the effort required to achieve this, the current design has been chosen (running whole of BL2
at EL3) but it still can be a possibility in future to re-factor BL2.
Finally, to answer your question "Should BL2 execution state be different for 2 and 4 world system at the cost of diverging from basic security principle(in 2 world)?" - IMHO it's better not to diverge from security principle.