You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I notice that your branch target buffer is a Register file, which have no read latency. So I got two question:
Is this a usual way to use Register file as BTB rather than use a block of sram?
And if we use sram, comparing to register file, there may be one cycle delay for read operation. I think the read delay disturb the design of branch prediction. How to handle it?
Best regards
The text was updated successfully, but these errors were encountered:
Good question! I’ve been thinking about this recently...
The Rocket and WD SweRV cores use flops for the BTB.
Those cores and this one use 28-32 fully associative BTB entries.
A larger SRAM based BTB would be possible, I think, but you would lose the fully associative capability, but it could be much bigger to compensate (and would be much more FPGA friendly).
As you note, there would be a read latency to deal with - the less than ideal workaround is to read ahead by one address and accept that following a predicted branch, the BTB would not be able to provide another prediction for one cycle.
I have been modelling various branch prediction designs, and it seems that a BTB with one cycle latency is indeed worse for various benchmarks, but it’s not terrible.
@cool-ic There are varying sizes of BTB. Some CPU have an initial micro-BTB that is implemented in flops, and then have a secondary, larger BTB implemented in SRAM. It's not possible to lookup the second, larger BTB on a cycle-by-cycle basis, but nevertheless its presence allows for the branch target to be resolved earlier in the pipeline should the address miss in the micro-BTB (although not at the very start). You will only really see this for pipelines that have a couple of stages at the front-end.
I notice that your branch target buffer is a Register file, which have no read latency. So I got two question:
Is this a usual way to use Register file as BTB rather than use a block of sram?
And if we use sram, comparing to register file, there may be one cycle delay for read operation. I think the read delay disturb the design of branch prediction. How to handle it?
Best regards
The text was updated successfully, but these errors were encountered: