-
Notifications
You must be signed in to change notification settings - Fork 698
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
The second optimization for Altera FPGA is to move the BHT to LUTRAM. Same as before, the reason why the optimization previously done for Xilinx is not working, is that in that case asynchronous RAM primitives are used, and Altera does not support asynchronous RAM. Therefore, this optimization consists in using synchronous RAM for the BHT. The main changes to the existing code are: New RAM module to infer synchronous RAM in altera with 2 independent read ports and one write port (SyncThreePortRam.sv) Changes in the frontend.sv file: modify input to vpc_i port of BHT, by advancing the address to read, in order to compensate for the delay of synchronous RAM. Changes in the bht.sv file: This case is more complex because of the logic operations that need to be performed inside the BHT. First, the pc pointed by bht_update_i is read from the memory, modified according to the saturation counter and valid bit, and finally written again in the memory. The prediction output is given based on the vpc_i. With asynchronous memory, the new data written via update_i is available one clock cycle after writing it. So, if vpc_i tries to read the address that was previously written by update_i, everything is fine. However, in the case of synchronous memory there are three clock cycles of latency (one for reading the pc content (read port 1), another one for writing it, and another one for reading in the other port (read port 0)). For this reason, there is the need to adapt the design to these new latency constraints: First, there is the need for a delay on the address write of the synchronous RAM, to wait for the previous pc read and store the right modified data. Once this is solved, similarly to the FIFO case, there is the need for an auxiliary buffer that will store the data written in the FIFO, allowing to have it available 2 clock cycles after the update_i was valid. This is because after having the correct data, the RAM takes 2 clock cycles until data can be available in the output (one clock cycle for writing and one for reading). Finally, there is a multiplexer in the output that permits to deliver the correct prediction providing the data from the update logic (1 cycle of delay), the auxiliary register (2 cycles of delay), or the RAM (3 or more cycles of delay), depending on the delay since the update_i was valid (i.e. written to the memory).
- Loading branch information
1 parent
8a84f78
commit c389382
Showing
3 changed files
with
221 additions
and
60 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
// Copyright 2024 PlanV Technologies | ||
// | ||
// Licensed under the Solderpad Hardware Licence, Version 2.0 (the "License"); | ||
// you may not use this file except in compliance with the License. | ||
// SPDX-License-Identifier: Apache-2.0 WITH SHL-2.0 | ||
// You may obtain a copy of the License at https://solderpad.org/licenses | ||
// | ||
// Inferable, Asynchronous Three-Ports RAM, there are a write port and two read ports | ||
// | ||
// | ||
// This module is designed to work with both Xilinx, Microchip and Altera FPGA tools by following the respective | ||
// guidelines: | ||
// - Xilinx UG901 Vivado Design Suite User Guide: Synthesis | ||
// - Inferring Microchip PolarFire RAM Blocks | ||
// - Altera Quartus II Handbook Volume 1: Design and Synthesis (p. 768) | ||
// | ||
// Current Maintainers:: Angela Gonzalez - PlanV Technologies | ||
|
||
|
||
module SyncThreePortRam | ||
#( | ||
parameter ADDR_WIDTH = 10, | ||
parameter DATA_DEPTH = 1024, // usually 2**ADDR_WIDTH, but can be lower | ||
parameter DATA_WIDTH = 32 | ||
)( | ||
input logic Clk_CI, | ||
|
||
// Write port | ||
input logic WrEn_SI, | ||
input logic [ADDR_WIDTH-1:0] WrAddr_DI, | ||
input logic [DATA_WIDTH-1:0] WrData_DI, | ||
|
||
// Read ports | ||
input logic [ADDR_WIDTH-1:0] RdAddr_DI_0, | ||
input logic [ADDR_WIDTH-1:0] RdAddr_DI_1, | ||
|
||
output logic [DATA_WIDTH-1:0] RdData_DO_0, | ||
output logic [DATA_WIDTH-1:0] RdData_DO_1 | ||
); | ||
|
||
logic [DATA_WIDTH-1:0] mem [DATA_DEPTH-1:0]= '{default:0}; | ||
|
||
// WRITE | ||
always_ff @(posedge Clk_CI) | ||
begin | ||
if (WrEn_SI) begin | ||
mem[WrAddr_DI] <= WrData_DI; | ||
end | ||
|
||
RdData_DO_0 = mem[RdAddr_DI_0]; | ||
RdData_DO_1 = mem[RdAddr_DI_1]; | ||
|
||
end | ||
|
||
//////////////////////////// | ||
// assertions | ||
//////////////////////////// | ||
|
||
// pragma translate_off | ||
assert property | ||
(@(posedge Clk_CI) (longint'(2)**longint'(ADDR_WIDTH) >= longint'(DATA_DEPTH))) | ||
else $error("depth out of bounds"); | ||
// pragma translate_on | ||
|
||
endmodule |