Overview of the DCT and IDCTA two-dimensional DCT on MxM pixels is defined as :It is important that A is orthonormal transform matrix. In other words, y is produced from input x in a one-to-one mapping manner, from left hand side (input) to the right hand side (output) in Figure_2. Therefore, If one "reverse" A (or ATformally), one can obtain x back from y, which is input data now, entering from right hand side to left hand side of the matrix. The invert transform is then defined as:
whereThis can be implemented with two matrix multiplication which correspond to two 1D-DCT on the columns and on the rows, please see the matlab version of above IDCT (idct64.m and idct16.m) and DCT (dct64.m) transform in the /tools/synop/ece6132/fall2002/part1.1/idct1 directory.
Figure 1: decomposition of the 2D-IDCT into two 1D-IDCTs
Some techniques can be used to reduce the number of multiplication. For example this implementation of the 1D-DCT from Arai, Agui and Nakajuma uses 5+8 multiplication instead of 64.
Figure 2: dataflow of a fast 1D-DCT by Arai, Agui and NakajumaImplementationFor the project, we provide an implementation of a fast IDCT algorithm. In the dataflow presented on Figure_3, we have 16 multiplication instead of 64. However the irregular structure should make the scheduling more complex. Note: Cn = cos(k*pi/16) andSn = sin(k*pi/16).
Figure 3: Dataflow of a fast IDCT
ObjectiveThe IDCT IP block is provided to accelerate the MPEG decoder. Fisrt, you need to verify the functionality of the IDCT block by using the provided IDCT testbench. Then, the IDCT block will be interagted with the other parts in the MPEG decoder (they could be HW or SW). The interface of the IDCT block need to be re-designed in order to connect with the processor correctly. Finally, you need to evalute the performance of your MPEG decoder implementation by putting IDCT in hardware.DesignPlease follow the following steps to verify the functionality of the IDCT block.- Verify the functionality of the IDCT block with a Verilog testbench
- Go to the directory /part2/ip_blocks/idct/idct_v_test/
- To compile the the verilog files you can use the "make" command.
- Follow the instructions in lab2 to verify the funtionality of "idct_tb.v" and "idct.v"
- Compare the results with the data in the file "out.dat"
- Learn how the idct_tb block communicates with the idct block
- Verify the functionality of the IDCT block with a C code testbench
- Go to the directory /ip_blocks/idct/idct_c_test/
- Run the "make compile_vcs" command to compile the Verilog code
- Run "make idct_test_sw.exe" to compile the C code of the IDCT algorithm without using IDCT block. You will get an executable file "idct_test_sw.exe" for MPC755
- Run "make idct_test_hw.exe" to compile the C code using the IDCT block. You will get an executable file "idct_test_hw.exe" for MPC755.
- Follow the instruction in the mini-tutorial of Seamless to run the above two implementation respectively
- Compare the final values in the array "block"
- Compare the execution time of these two implementations
Questions- What is the memory size used in the IDCT block? (i.e. mainly the registers used for data)
- How does the testbench component handshake with the IDCT component?
- How many cycles does the IDCT take to read a block of data?
- What is the execution time when using the SW IDCT? what is the execution time when using the HW IDCT? what is performance improvement?
- How much time is spent on data transferring between the IDCT block and the memory? How much time is spent on the IDCT algorithm excluding data transferring? Please evaluate both the SW and the HW implementations. What is the maximum speedup that can possibly be achieved by moving IDCT to hardware?
- In the given IDCT block, the data input/output port has 32 bits. Since MPC755 has a 64-bit data bus, is that possible to fully use the MPC755 data bus by expanding the data input/output of IDCT to 64 bits? If you can, how? What is the impact on the performance?
- Based on the above questions, what are your suggestions about improving the IDCT performance and its impact on the whole performance of the decoder?
From:
阅读(1179) | 评论(0) | 转发(0) |