024 – Floating-Point FPGA Audio Equalizer (2)

In this post, the second part of our series exploring a floating-point FPGA audio equalizer, we will describe the logic of our double-precision floating-point Biquad and use it implement a low-pass filter.

Floating-Point Biquad Filter Architecture

As we discussed in the first part of this series, we will use the Direct Form II Transposed Structure of the Biquad Filter as the basis for our equalizer. This topology is shown in the figure below.

Direct Form II Transformed of the Biquad Filter. Source keil.com

We will use two hierarchical levels: the Biquad Filter module will remain at the upper level, and will be responsible for managing the delayed samples and the coefficients used by the filter. The Biquad Equation module will be instantiated inside the Biquad Filter module and will be responsable for performing the calculations dictated by the filter’s transfer function.

The advantage of this two-level approach is twofold. First, by managing the delayed samples and coefficients independently from the calculation, we can more easily reuse the Biquad Equation logic for different channels. Second, this kind of separation will make it possible to change how the calculation is carried out (for instance when experimenting with different optimization goals or an HLS description) while keeping the rest of our design unchanged.

Biquad Filter

The logic in the Biquad Filter module is described in the Biquad Control FSM, which performs the following tasks:

Update the delayed input samples each time a new sample arrives
Trigger the processing of each channel and wait for it to be completed
Update the delayed output samples when the calculation has been completed

The code for the Biquad Filter module is shown below.

module biquad_filter # (
    parameter integer DP_FLOATING_POINT_BIT_WIDTH = 64
    ) (
    input   logic                                       i_clock,
    // Audio Input
    input   logic                                       i_data_valid,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_data_left,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_data_right,
    // Audio Output
    output  logic                                       o_data_valid,
    output  logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] o_data_left,
    output  logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] o_data_right
);

    timeunit 1ns;
    timeprecision 1ps;

    // Biquad equation: y[n] = a0 * x[n] + d1
    //                  d1 = a1 * x[n-1] + b1 * y[n-1] + d2
    //                  d2 = a2 * x[n-2] + b2 * y[n-2]

    // Filter coefficients (lowpass, 44.1 kHz, 500 Hz Fc, 0.7071 Q, 6 dB Gain)
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] a0 = 64'h3F53C838DB03A294;    // a0 = 0.0012074046354035072
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] a1 = 64'h3F63C838DB03A294;    // a1 = 0.0024148092708070144
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] a2 = 64'h3F53C838DB03A294;    // a2 = 0.0012074046354035072
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] b1 = 64'h3FFE63AA866C6F75;    // b1 = 1.8993325472756315
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] b2 = 64'hBFECEEE57E8EE62E;    // b2 = -0.9041621658172454

    // Biquad Equation
    logic biquad_equation_start;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   xn_left   = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   xn_1_left = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   xn_2_left = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   yn_left   = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   yn_1_left = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   yn_2_left = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   xn_right = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   xn_1_right = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   xn_2_right = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   yn_right = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   yn_1_right = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   yn_2_right = 'b0;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   biquad_equation_xn;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   biquad_equation_xn_1;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   biquad_equation_xn_2;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   biquad_equation_yn_1;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   biquad_equation_yn_2;
    logic                                       biquad_equation_done;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   biquad_equation_yn;
    biquad_equation # (
      .DP_FLOATING_POINT_BIT_WIDTH (DP_FLOATING_POINT_BIT_WIDTH)
    ) biquad_equation_inst (
      .i_clock  (i_clock),
      .i_start  (biquad_equation_start),
      .i_a0     (a0),
      .i_a1     (a1),
      .i_a2     (a2),
      .i_b1     (b1),
      .i_b2     (b2),
      .i_xn     (biquad_equation_xn),
      .i_xn_1   (biquad_equation_xn_1),
      .i_xn_2   (biquad_equation_xn_2),
      .i_yn_1   (biquad_equation_yn_1),
      .i_yn_2   (biquad_equation_yn_2),
      .o_done   (biquad_equation_done),
      .o_yn     (biquad_equation_yn)
    );

    // Biquad Control FSM
    enum logic [2 : 0]  {IDLE,
                        PROCESS_LEFT_CHANNEL,
                        UPDATE_LEFT_SAMPLES,
                        PROCESS_RIGHT_CHANNEL,
                        UPDATE_RIGHT_SAMPLES} biquad_control_fsm_state = IDLE;

    always_ff @(posedge i_clock) begin : biquad_control_fsm
        case (biquad_control_fsm_state)
            IDLE : begin
                biquad_equation_start <= 1'b0;
                o_data_valid <= 1'b0;
                if (i_data_valid == 1'b1) begin
                    xn_2_left <= xn_1_left;
                    xn_1_left <= xn_left;
                    xn_left <= i_data_left;
                    xn_2_right <= xn_1_right;
                    xn_1_right <= xn_right;
                    xn_right <= i_data_right;
                    biquad_control_fsm_state <= PROCESS_LEFT_CHANNEL;
                end
            end

            PROCESS_LEFT_CHANNEL : begin
                biquad_equation_xn <= xn_left;
                biquad_equation_xn_1 <= xn_1_left;
                biquad_equation_xn_2 <= xn_2_left;
                biquad_equation_yn_1 <= yn_left;
                biquad_equation_yn_2 <= yn_1_left;
                biquad_equation_start <= 1'b1;
                biquad_control_fsm_state <= UPDATE_LEFT_SAMPLES;
            end

            UPDATE_LEFT_SAMPLES : begin
                biquad_equation_start <= 1'b0;
                if (biquad_equation_done == 1'b1) begin
                    yn_2_left <= yn_1_left;
                    yn_1_left <= yn_left;
                    yn_left <= biquad_equation_yn;
                    biquad_control_fsm_state <= PROCESS_RIGHT_CHANNEL;
                end
            end

            PROCESS_RIGHT_CHANNEL : begin
                biquad_equation_xn <= xn_right;
                biquad_equation_xn_1 <= xn_1_right;
                biquad_equation_xn_2 <= xn_2_right;
                biquad_equation_yn_1 <= yn_right;
                biquad_equation_yn_2 <= yn_1_right;
                biquad_equation_start <= 1'b1;
                biquad_control_fsm_state <= UPDATE_RIGHT_SAMPLES;
            end

            UPDATE_RIGHT_SAMPLES : begin
                biquad_equation_start <= 1'b0;
                if (biquad_equation_done == 1'b1) begin
                    yn_2_right <= yn_1_right;
                    yn_1_right <= yn_right;
                    yn_right <= biquad_equation_yn;
                    o_data_valid <= 1'b1;
                    biquad_control_fsm_state <= IDLE;
                end
            end

            default : begin
                biquad_control_fsm_state <= IDLE;
            end
        endcase
    end

    assign o_data_left = yn_left;
    assign o_data_right = yn_right;

endmodule

One thing to note about the coefficients Biquad Filter module: they were calculated using this tool in the Ear Level Engineering website, which is based on their description of the Direct Form II Transposed Structure. However, the formula that we use is taken from the CMSIS-DSP library, and in order to use the coefficients from the Ear Level Engineering calculator, we must invert the sign of the coefficients used on the delayed outputs (b1, b2). If we don’t, our filter will become unstable and its output will grow infinitely.

Biquad Equation

The Biquad Equation module performs the actual calculation of the Biquad Filter output. As mentioned before, we use the Direct Form II Transposed Structure, whose transfer function is described by the CMSIS-DSP library as follows:

y[n] = b0 * x[n] + d1, where:
d1 = b1 * x[n-1] + a1 * y[n-1] + d2
d2 = b2 * x[n-2] + a2 * y[n-2]

The Biquad Equation module takes all the inputs required for the transfer function (delayed input samples, delayed out samples, coefficients) and calculates the corresponding output sample. To achieve this, we instantiate double-precision floating-point adder and multiplier IP cores and then design a state machine that triggers each operation contained in the transfer function until we have calculated our desired output. The code for the Biquad Equation module is shown below.

module biquad_equation # (
    parameter integer DP_FLOATING_POINT_BIT_WIDTH = 64
    ) (
    input   logic                                       i_clock,
    input   logic                                       i_start,
    // Coefficients
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_a0,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_a1,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_a2,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_b1,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_b2,
    // Samples
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_xn,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_xn_1,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_xn_2,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_yn_1,
    input   logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] i_yn_2,
    // Audio Output
    output  logic                                       o_done,
    output  logic   [DP_FLOATING_POINT_BIT_WIDTH-1 : 0] o_yn
);

    timeunit 1ns;
    timeprecision 1ps;

    // Biquad equation: y[n] = a0 * x[n] + d1
    //                  d1 = a1 * x[n-1] + b1 * y[n-1] + d2
    //                  d2 = a2 * x[n-2] + b2 * y[n-2]
 
    logic                                       adder_data_in_valid;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   adder_data_in_a;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   adder_data_in_b;
    logic                                       adder_result_valid;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   adder_result;
    dp_fp_adder dp_fp_adder_inst (
        .aclk                   (i_clock),
        .s_axis_a_tvalid        (adder_data_in_valid),
        .s_axis_a_tdata         (adder_data_in_a),
        .s_axis_b_tvalid        (adder_data_in_valid),
        .s_axis_b_tdata         (adder_data_in_b),
        .m_axis_result_tvalid   (adder_result_valid),
        .m_axis_result_tdata    (adder_result)
    );

    logic                                       mult_data_in_valid;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   mult_data_in_a;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   mult_data_in_b;
    logic                                       mult_result_valid;
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   mult_result;
    dp_fp_multiplier dp_fp_multiplier_inst (
        .aclk                   (i_clock),
        .s_axis_a_tvalid        (mult_data_in_valid),
        .s_axis_a_tdata         (mult_data_in_a),
        .s_axis_b_tvalid        (mult_data_in_valid),
        .s_axis_b_tdata         (mult_data_in_b),
        .m_axis_result_tvalid   (mult_result_valid),
        .m_axis_result_tdata    (mult_result)
        );

        // Biquad Equation FSM
    logic [DP_FLOATING_POINT_BIT_WIDTH-1 : 0]   aux;
    enum logic [3 : 0]  {IDLE,
                        MULT_A2_XN2,
                        MULT_B2_YN2,
                        ADD_D2,
                        MULT_A1_XN1,
                        ADD_D1_AUX,
                        MULT_B1_YN1,
                        ADD_D1,
                        MULT_A0_XN,
                        ADD_YN} biquad_equation_fsm_state = IDLE;
    always_ff @(posedge i_clock) begin : biquad_equation_fsm
        case (biquad_equation_fsm_state)
            IDLE : begin
                o_done <= 1'b0;
                adder_data_in_valid <= 1'b0;
                mult_data_in_valid <= 1'b0;
                if (i_start == 1'b1) begin
                    mult_data_in_a <= i_a2;
                    mult_data_in_b <= i_xn_2;
                    mult_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= MULT_A2_XN2;
                end
            end

            MULT_A2_XN2 : begin
                mult_data_in_valid <= 1'b0;
                if (mult_result_valid == 1'b1) begin
                    aux <= mult_result;
                    mult_data_in_a <= i_b2;
                    mult_data_in_b <= i_yn_2;
                    mult_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= MULT_B2_YN2;
                end
            end

            MULT_B2_YN2 : begin
                mult_data_in_valid <= 1'b0;
                if (mult_result_valid == 1'b1) begin
                    adder_data_in_a <= mult_result;
                    adder_data_in_b <= aux;
                    adder_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= ADD_D2;
                end
            end

            ADD_D2 : begin
                adder_data_in_valid <= 1'b0;
                if (adder_result_valid == 1'b1) begin
                    aux <= adder_result;                // aux holds d2
                    mult_data_in_a <= i_a1;
                    mult_data_in_b <= i_xn_1;
                    mult_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= MULT_A1_XN1;
                end
            end

            MULT_A1_XN1 : begin
                mult_data_in_valid <= 1'b0;
                if (mult_result_valid == 1'b1) begin
                    adder_data_in_a <= mult_result;
                    adder_data_in_b <= aux;
                    adder_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= ADD_D1_AUX;
                end
            end

            ADD_D1_AUX : begin
                adder_data_in_valid <= 1'b0;
                if (adder_result_valid == 1'b1) begin
                    aux <= adder_result;
                    mult_data_in_a <= i_b1;
                    mult_data_in_b <= i_yn_1;
                    mult_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= MULT_B1_YN1;
                end
            end

            MULT_B1_YN1 : begin
                mult_data_in_valid <= 1'b0;
                if (mult_result_valid == 1'b1) begin
                    adder_data_in_a <= mult_result;
                    adder_data_in_b <= aux;
                    adder_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= ADD_D1;
                end
            end

            ADD_D1 : begin
                adder_data_in_valid <= 1'b0;
                if (adder_result_valid == 1'b1) begin
                    aux <= adder_result;                // aux holds d1
                    mult_data_in_a <= i_a0;
                    mult_data_in_b <= i_xn;
                    mult_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= MULT_A0_XN;
                end
            end

            MULT_A0_XN : begin
                mult_data_in_valid <= 1'b0;
                if (mult_result_valid == 1'b1) begin
                    adder_data_in_a <= mult_result;
                    adder_data_in_b <= aux;
                    adder_data_in_valid <= 1'b1;
                    biquad_equation_fsm_state <= ADD_YN;
                end
            end

            ADD_YN : begin
                adder_data_in_valid <= 1'b0;
                if (adder_result_valid == 1'b1) begin
                    o_done <= 1'b1;
                    o_yn <= adder_result;
                    biquad_equation_fsm_state <= IDLE;
                end
            end

            default : begin
                biquad_equation_fsm_state <= IDLE;
            end
        endcase
    end

endmodule

Simulation and Implementation Results

We are now ready to test our equalizer. We use the testbench introduced in the first part of this series to make sure our design works as expected. We are aiming for a low-pass filter with 500 Hz Fc, 6 dB Gain and a Q factor of 0.7071 for our 44.1 kHz sample rate. The result of the simulation with our mono snare hit test file is shown in the figure below.

An exhaustive analysis of the performance of this filter is well outside the scope of this post, so we’ll accept the visual representation of the waveforms as the first evidence that the filters does what we designed it to do. The high-frequency components of the snare hit are smoothed out and we are left with the low-frequency content that we would expect to see after applying a low-pass filter. A much better (and more satisfying) test is to program the FPGA and use the ZedBoard’s SW1 to enable/disable the low-pass filter in real time while playing some of your favorite music.

The figure below shows the resource utilization of our Biquad Filter. We can see that the double-precision floating-point multiplier consumes 11 DSP slices out of the 18 used by the entire design so far. This increased resource utilization is one of the major disadvantages of floating-point processing with FPGAs, especially in double precision. There might also be potential for reducing resource utilization by managing intermediate signals more efficiently, but that was not a priority for this design.

Another way in which the cost of floating-point processing manifests itself is the latency and throughput of the system. Our Biquad Filter has a latency of around 3.1299 us, which given our current 44.100 Hz sample rate would allow us to process six channels of audio. While this is acceptable for the stereo processing in which we have focused so far, it is good to keep in mind the ways in which it could be improved. These are roughly in the order that I would approach them in a real project:

The Biquad Equation FSM is a very straightforward, ‘naive’ implementation, in which only one addition or one multiplication takes place at a time. While there are some dependencies between these operands, the overall performance can almost certainly be increased by a more thoughtful design of this control logic.
Both the control logic and the floating-point cores currently run at 100 MHz. This can probably be more than doubled while still achieving timing closure.
Adjusting the floating-point core’s Latency, Rate and Cycles per Operation parameters might yield a better compromise between performance and resource utilization than what we have seen here.
Multiple instances of the Biquad Equation module will linearly increase the performance of the system (as well as its resource utilization).

In the next part of this series we will replace our Biquad Filter logic with an HLS-generated IP core. See you then!

Cheers,

Isaac

The RTL and simulation files for this post are available in the FPGA Audio Processor repository under this tag.

024 – Floating-Point FPGA Audio Equalizer (2)

Floating-Point Biquad Filter Architecture

Biquad Filter

Biquad Equation

Simulation and Implementation Results

Leave a Reply Cancel reply