Pages

Sunday, June 9, 2013

FPGA.GP.01.04.ML506

After trying out different coregen adders, the dsp48 adder as used to try and fit a design in the ml506 board. The ml506 uses a virtex-5 sx50t device that has 288 dsp48`s. The adders were instantiated as below and the sha256 transform LOOP_LOG2 parameter what set to 2.

wire [31:0] t1, t2, new_w, t3, t4;
wire cout0, cout1, cout2;
wire gnd = 0;

dsp48_32bit_adder adder0 (.a(rx_state[`IDX(7)]),.b(e1_w),.c_in(gnd),.c_out(cout0) );
dsp48_32bit_adder adder1 (.a(ch_w),.b(rx_w[31:0]),.c_in(cout0),.c_out(cout1) );
dsp48_32bit_adder adder2 (.a(k),.b(gnd),.c_in(cout1),.s(t1) );
dsp48_32bit_adder adder3 (.a(e0_w),.b( maj_w),.c_in(gnd),.s(t2) );
dsp48_32bit_adder adder4 (.a(s1_w),.b(rx_w[319:288]),.c_in(gnd),.c_out(cout2) );
dsp48_32bit_adder adder5 (.a(s0_w),.b(rx_w[31:0]),.c_in(cout2),.s(new_w) );
dsp48_32bit_adder adder6 (.a(rx_state[`IDX(3)]),.b(t1),.c_in(gnd),.s(t3) );
dsp48_32bit_adder adder7 (.a(t1),.b(t2),.c_in(gnd),.s(t4) );

Below is a Map Report snippet that summarizes the resources required for this build.

Design Summary
--------------
Slice Logic Utilization:
  Number of Slice Registers:                26,148 out of  32,640   80%
  Number of Slice LUTs:                     31,944 out of  32,640   97%
    Number of fully used LUT-FF pairs:      25,863 out of  32,229   80%
Slice Logic Distribution:
  Number of occupied Slices:                 8,099 out of   8,160   99%
Specific Feature Utilization:
  Number of DSP48Es:                           256 out of     288   88%

By using 88% of the dsp48`s, the design was able to fit into the sx50t device. The number of LUTS is almost max at 97%, however the percent of fully used LUT/FF pairs is only at 80%. This is probably because the FF is only 80%, thus reporting the same for fully used pairs also. This might hint that the design does not have much more potential for resource savings.

This will be the baseline to start characterizing the hashrate. A similar approach will be performed using the ml505, however since there are only a handful of dsp48, it is not expected to be able to pack the same design. Probable need to dial down the sha256 transform LOOP_LOG2 parameter down to 3 perhaps.

Thursday, May 23, 2013

FPGA.GP.01.03 Coregen Adders

After taking a first stab in the last blog entry, design modifications are to be considered to better utilize the FPGA resources. One way is to replace some of the addition rtl with adder instantiations available in the Coregen Libraries. The libraries will create instantiation templates of predefined adders that better utilize the FPGA fabric as compared to handing over the generic code to the Tools to map the adders. This can be done by using "fabric" adders and DSP48E adders if using the Virtex Series FPGA.

The uncommented code are the original generic verilog rtl adders. The commented code are the Fabric and DSP48E adder instatiations from Coregen Templates. Each piece of code is toggled back and forth with the comments for each build.

wire [31:0] t1 = rx_state[`IDX(7)] + e1_w + ch_w + rx_w[31:0] + k;
wire [31:0] t2 = e0_w + maj_w;
wire [31:0] new_w = s1_w + rx_w[319:288] + s0_w + rx_w[31:0];
wire [31:0] t3 = rx_state[`IDX(3)] + t1;
wire [31:0] t4 = t1 + t2;

// wire [31:0] t1, t2, new_w, t3, t4;
// wire cout0, cout1, cout2;
// wire gnd = 0;

// fabric_32bit_adder adder0 (.a(rx_state[`IDX(7)]),.b(e1_w),.c_in(gnd),.c_out(cout0) );
// fabric_32bit_adder adder1 (.a(ch_w),.b(rx_w[31:0]),.c_in(cout0),.c_out(cout1) );
// fabric_32bit_adder adder2 (.a(k),.b(gnd),.c_in(cout1),.s(t1) );
// fabric_32bit_adder adder3 (.a(e0_w),.b( maj_w),.c_in(gnd),.s(t2) );
// fabric_32bit_adder adder4 (.a(s1_w),.b(rx_w[319:288]),.c_in(gnd),.c_out(cout2) );
// fabric_32bit_adder adder5 (.a(s0_w),.b(rx_w[31:0]),.c_in(cout2),.s(new_w) );
// fabric_32bit_adder adder6 (.a(rx_state[`IDX(3)]),.b(t1),.c_in(gnd),.s(t3) );
// fabric_32bit_adder adder7 (.a(t1),.b(t2),.c_in(gnd),.s(t4) );

// dsp48_32bit_adder adder0 (.a(rx_state[`IDX(7)]),.b(e1_w),.c_in(gnd),.c_out(cout0) );
// dsp48_32bit_adder adder1 (.a(ch_w),.b(rx_w[31:0]),.c_in(cout0),.c_out(cout1) );
// dsp48_32bit_adder adder2 (.a(k),.b(gnd),.c_in(cout1),.s(t1) );
// dsp48_32bit_adder adder3 (.a(e0_w),.b( maj_w),.c_in(gnd),.s(t2) );
// dsp48_32bit_adder adder4 (.a(s1_w),.b(rx_w[319:288]),.c_in(gnd),.c_out(cout2) );
// dsp48_32bit_adder adder5 (.a(s0_w),.b(rx_w[31:0]),.c_in(cout2),.s(new_w) );
// dsp48_32bit_adder adder6 (.a(rx_state[`IDX(3)]),.b(t1),.c_in(gnd),.s(t3) );
// dsp48_32bit_adder adder7 (.a(t1),.b(t2),.c_in(gnd),.s(t4) );

4 target boards are used for the Coregen Adder insertion: Spartan 3e Starter Kit, Spartan 3a Starter Kit, ML505 and ML506. Identical source code was used for all four boards, with the Coregen Adders created for each board seperately.

By using the fabric adders, the LUT usage went down 16.9%, 16.9%, 6.4%, and 6.4% respectively. By using the DSP48E adders with the ML505 and ML506 boards, the LUT usage when down 20%.

ISE 13.2 Map Results

Since the sha256 transform LOOP_LOG2 parameter what set to 5 for all runs, further work will be done going forward to try to unroll the design further and fully use the FPGA fabric that is available.



Tuesday, May 21, 2013

FPGA.GP.01.02 BitCoin Miner First Stab

Here are the results at building a design targeting the Spartan-3e Starter Kit, ML505 and ML506.









FPGA.GP.01.01 BitCoin Miner RoadMap

The first project entry will be a Bitcoin Miner. This design will be based off the github reference design that uses the sha256_transform. The FPGA`s will be Spartan-3e Starter Kit, ML505 and ML506. As always with most designs, the objective will be to pack as much of the sha-256 transform into these small fpga`s at the highest operating frequency and lowest power consumption for cost savings.

Below is the project RoadMap, ultimately build a parrallel hasher platform running the ML505 and ML506 at the same time. The stretch will be to try running the platform from Raspberry Pi or Embedded Linux for lower power consumption.



In the future the platform can be migrated to a Parallella Board with ARM/Zynq.