SlideShare uma empresa Scribd logo
1 de 18
Baixar para ler offline
Wiener Filter Realization using Hardware.
QR decomposition of matrices and inversion
by Givens’ Rotation
***************************************
7th
Semester Project Report
Akashdip Das
Abantika Chowdhury
Sayan Chaudhuri
Guide : Dr. Ayan Banerjee
Electronics and Telecommunication Engineering Department
December, 2016
1
Contents
1 Abstract 3
2 Introduction 3
3 Wiener Filtering 4
4 Q-R decomposition of a matrix 6
5 Hardware for inversion of an upper triangular matrix(R) 9
5.1 Storage in a RAM . . . . . . . . . . . . . . . . . . . . . . . . . 10
5.2 Address generation Mechanism . . . . . . . . . . . . . . . . . 10
5.3 Hardware for finding the inverse of diagonal elements . . . . . 12
5.4 Hardware for the finding the inverse of the other elements . . 13
6 Conclusions 15
6.1 Multi PORT RAM for faster performance . . . . . . . . . . . 15
6.2 Distributed Arithmetic for computing the product of the two
matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
7 Acknowledgements 17
2
1 Abstract
Super-resolution reconstruction is a method for reconstructing higher reso-
lution images from a set of low resolution observations. The sub-pixel differ-
ences among different observations of the same scene allow to create higher
resolution images with better quality. In the last thirty years, many methods
for creating high resolution images have been proposed. However, hardware
implementations of such methods are limited. Wiener filter design is one
of the techniques we will use initially for this process. Wiener filter design
involves matrix inversion. A novel method for the matrix inversion has been
proposed in the report. QR decomposition will be the computational algo-
rithm used using Givens Rotation.
2 Introduction
The process of super resolution initially requires that the image be restored
from the effects of noise and degradation(assumed isotropic). For that pur-
pose the Wiener Filter is used that basically helps in forming an estimate of
the image from the degraded one.The fundamentals of the Wiener Filtering
has been discussed in Section 3. The Wiener Filtering requires generation of
the inverse of a given matrix The method followed here is the QR Decompo-
sition(discussed in Section 4). The QR decomposition involves generation of
an upper triangular matrix which we will be inverting in the proposed algo-
rithm. Various techniques for decomposition of the matrix has been discussed
in papers [3],[4]. However the inversion of a matrix proposed by them was
not sufficient for the general solution for the problem. Rather the solution
was illustrated for a specific system of 3x3 matrix. The QR decomposition
involves forming an upper triangular matrix and an orthogonal matrix. The
inversion of an orthogonal matrix is simply obtained by computing its trans-
pose. The inversion of the upper triangular matrix has been discussed in this
paper. The solutions available for this process is for a 3x3 or 4x4 system.
So in this paper we have generalized the inversion to a nxn system. The
hardware that is required for this purpose has been developed in Section 5
along with sound reasoning and justification. The hardware that has been
developed has scopes for enhanced performance that has been discussed in
section 6
3
3 Wiener Filtering
In signal processing, the Wiener filter is a filter used to produce an estimate
of a desired or target random process by linear time-invariant (LTI) filtering
of an observed noisy process, assuming known stationary signal and noise
spectra, and additive noise. The Wiener filter minimizes the mean square
error between the estimated random process and the desired process. The
goal of the Wiener filter is to compute a statistical estimate of an unknown
signal using a related signal as an input and filtering that known signal to
produce the estimate as an output. For example, the known signal might
consist of an unknown signal of interest that has been corrupted by additive
noise. The Wiener filter can be used to filter out the noise from the corrupted
signal to provide an estimate of the underlying signal of interest. he Wiener
filter is based on a statistical approach based on MMSE (Minimum Mean
Square Error).The causal finite impulse response (FIR) Wiener filter, instead
of using some given data matrix X and output vector Y, finds optimal tap
weights by using the statistics of the input and output signals. It populates
the input matrix X with estimates of the auto-correlation of the input signal
(T) and populates the output vector Y with estimates of the cross-correlation
between the output and input signals (V).
In order to derive the coefficients of the Wiener filter, consider the signal
w[n] being fed to a Wiener filter of order N and with coefficients {a0, · · · , aN }.
The output of the filter is denoted x[n] which is given by the expression.
x[n] = N
i=0 aiw[n − i].
The residual error is denoted e[n] and is defined as e[n] = x[n] s[n] (see the
corresponding block diagram). The Wiener filter is designed so as to mini-
mize the mean square error (MMSE criteria) which can be stated concisely
as follows:
ai = arg min E e2
[n] , where E[·] denotes the expectation operator. In
the general case, the coefficientsai may be complex and may be derived for
the case where w[n] and s[n] are complex as well. With a complex signal, the
matrix to be solved is a Hermitian Toeplitz matrix, rather than symmetric
Toeplitz matrix. For simplicity, the following considers only the case where
all these quantities are real. The mean square error (MSE) may be rewritten
as:
4
E e2
[n] = E (x[n] − s[n])2
= E x2
[n] + E s2
[n] − 2E[x[n]s[n]]
= E


N
i=0
aiw[n − i]
2

 + E s2
[n] − 2E
N
i=0
aiw[n − i]s[n]
To find the vector [a0, . . . , aN ] which minimizes the expression above, calcu-
late its derivative with respect to each ai
∂
∂ai
E e2
[n] =
∂
∂ai



E


N
i=0
aiw[n − i]
2

 + E s2
[n] − 2E
N
i=0
aiw[n − i]s[n]



= 2E
N
j=0
ajw[n − j] w[n − i] − 2E[s[n]w[n − i]]
= 2
N
j=0
E[w[n − j]w[n − i]]aj − 2E[w[n − i]s[n]]
Assuming that w[n] and s[n] are each stationary and jointly stationary, the
sequencesRw[m] and Rws[m] known respectively as the autocorrelation of
w[n] and the cross-correlation between w[n] and s[n] can be defined as fol-
lows:
Rw[m] = E{w[n]w[n + m]}
Rws[m] = E{w[n]s[n + m]}
The derivative of the MSE may therefore be rewritten as (notice that Rws[−i] = Rsw[i])
∂
∂ai
E e2
[n] = 2
N
j=0
Rw[j − i]aj − 2Rsw[i] i = 0, · · · , N.
Letting the derivative be equal to zero results in
N
j=0
Rw[j − i]aj = Rsw[i] i = 0, · · · , N.
which can be rewritten in matrix form





Rw[0] Rw[1] · · · Rw[N]
Rw[1] Rw[0] · · · Rw[N − 1]
...
...
...
...
Rw[N] Rw[N − 1] · · · Rw[0]





T





a0
a1
...
aN





a
=





Rsw[0]
Rsw[1]
...
Rsw[N]





v
These equations are known as the Wiener–Hopf equations. The matrix T ap-
5
pearing in the equation is a symmetric Toeplitz matrix. Under suitable con-
ditions on R , these matrices are known to be positive definite and therefore
non-singular yielding a unique solution to the determination of the Wiener
filter coefficient vector,
a = T−1
v
It is this equation that makes it necessary to design a Matrix Inversion Hard-
ware that is faster than the existing ones so that there is less delay in image
processing and also generalization to NxN form. The inversion of the matrix
will be done in this paper using QR decomposition using Givens Rotation
4 Q-R decomposition of a matrix
QR Decomposition: QR decomposition is one of the most important opera-
tions in linear algebra. It can be used to find matrix inversion, to solve a set of
simulations equations or in numerous applications in scientific computing. It
represents one of the relatively small numbers of matrix operation primitive
from which a wide range of algorithms can be realized. QR decomposition
is an elementary operation, which decomposes a matrix into an orthogonal
and a triangular matrix. QR decomposition of a real square matrix A is a
decomposition of A as A = QR, where Q is an orthogonal matrix (QT Q =
I) and R is an upper triangular matrix. And we can factor m x n matrices
(with m n) of full rank as the product of an m x n orthogonal matrix where
QT Q = I and an n x n upper triangular matrix. There are different meth-
ods which can be used to compute QR decomposition. The techniques for
QR decomposition are Gram-Schmidt ortho-normalization method, House-
holder reflections, and the Givens rotations. Each decomposition method has
a number of advantages and disadvantages because of their specific solution
process.The Givens’ Rotation Technique has been discussed
If there are two nonzero vectors, x and y, in a plane, the angle, θ, between
them can be formalized as :
cos(θ)= (x,y)
||x||2||y||2
The rotation will be performed using 16 bit pipelined CORDIC.
This formula can be extended to n vectors. The angle, θ , can be defined
as
6
θ=arccos (x,y)
||x||2||y||2
((A−1
)
−1
)=A
A=QR where R is an upper triangular matrix and R is an orthogonal matrix.
I=QQT
Consider a 4X4 system
A =




a1,1 a1,2 a1,3 a1,4
a2,1 a2,2 a2,3 a2,4
a3,1 a3,2 a3,3 a3,4
a4,1 a4,2 a4,3 a4,4




R =




a1,1 a1,2 a1,3 a1,4
0 a2,2 a2,3 a2,4
0 0 a3,3 a3,4
0 0 0 a4,4




The matrix of Givens Rotation is
G(i,j, θ) =




1 0 0 0
0 cos(θ) sin(θ) 0
0 −sin(θ) cos(θ) 0
0 0 0 1




Givens Rotation process utilizes a cycle of rotation whose function is to
null an element in the sub-diagonal of the matrix forming the QR matrix. Q
matrix is obtained by concatenating all the Givens Rotation.
R is to be found from three rotation where each element is obtained from
each rotation. Givens Rotation matrices needed for a 3x3 system
G1 =


cos(θ) 0 sin(θ)
0 1 0
−sin(θ) 0 cos(θ)


G2 =


cos(θ) sin(θ) 0
−sin(θ) cos(θ) 0
cos(θ) cos(θ) 1


G3 =


1 0 0
cos(θ) cos(θ) sin(θ)
cos(θ) −sin(θ) cos(θ)


θ, A(3,1) , A(2,1), A(3,2) can be obtained using
c1 = A1(1,1)
√
A1(3,1)2
+A1(1,1)2
7
c2 = A1(1,1)
√
A1(2,2)2
+A1(3,2)2
c3 = A1(1,1)
√
A1(2,2)2
+A1(3,2)2
s1 = A1(3,1)
√
A1(3,1)2
+A1(1,1)2
s2 = A1(2,1)
√
A1(2,1)2
+A1(1,1)2
s3 = A1(3,2)
√
A1(2,2)2
+A1(3,2)2
Q = G1
T
.G2
T
.G3
T
A2 = G1A1
A3 = G2A2
R = G3A3
A = QR
A−1
= (QR)−1
A−1
= (R)−1
(Q)−1
A−1
= (R)−1
(Q)T
This nececitates the formation of the inverse of the upper triangular ma-
trix and it’s subsequent multiplication to the transpose of the orthogonal
matrix.
Figure 1: Basic Hardware for matrix inversion using QR decomposition.The
G matrix is formed using Givens Rotation performed using CORDIC
8
5 Hardware for inversion of an upper trian-
gular matrix(R)
We have designed the hardware for inversion of a generalised N X N upper
triangular matrix R. where R=





r1,1 r1,2 · · · r1,n
0 r2,2 · · · r2,n
...
...
...
...
0 0 · · · rn,n





Let B be (R)−1
. The algorithm is as followed
1 f or ( row=1;row<=n ; row++)
2 B(row , row )=1/R(row , row )
3 next row
4 f or ( row=1;row<=n ; row++)
5 f or ( col=row+1; col<=n ; col++)
6 s=0
7 f or (k=1;k<=col −1;k++)
8 s=s+B(row , k)R(k , col )
9 s=−s /R( col , col )
10 B(row , col )=s
11 next k
12 next col
13 next row
We observe that the inverse of the upper triangular matrix is also an
upper triangular matrix with the diagonal elements reciprocal of the diag-
onal elements of the original matrix. The inverse of the other elements are
calculated recursively using the algorithm as mentioned above. An example
to illustrate how the algorithm works is shown below. Let A be an upper
triangular matrix and B be its inverse then
A=







a1,1 a1,2 a1.3 · · · r1,n
0 a2,2 a2,3 · · · a2,n
0 0 a3,3 · · · a3,n
...
...
...
...
...
0 0 0 · · · an,n







B=







b1,1 b1,2 ab1.3 · · · br1,n
0 b2,2 b2,3 · · · b2,n
0 0 b3,3 · · · b3,n
...
...
...
...
...
0 0 0 · · · bn,n







Since AB=I
9







a1,1 a1,2 a1.3 · · · r1,n
0 a2,2 a2,3 · · · a2,n
0 0 a3,3 · · · a3,n
...
...
...
...
...
0 0 0 · · · an,n














b1,1 b1,2 ab1.3 · · · br1,n
0 b2,2 b2,3 · · · b2,n
0 0 b3,3 · · · b3,n
...
...
...
...
...
0 0 0 · · · bn,n







=







1 0 0 · · · 0
0 1 0 · · · 0
0 0 1 · · · 0
...
...
...
...
...
0 0 0 · · · 1







Multiplying the ith
row of matrix A with the ith
column of B yields ai,ibi,i=1.
Hence we see that bi,i = 1
ai,i
Now to solve for the non diagonal elements of the matrix B. We multiply the
first row and second column first to get a1,1b1,2+a1,2b2,2=0. We already know
thw value of b2,2 So the only unknown is b1,2. Now in general to obtain the
value of bi,j we multiply the ith
row of A and the jth
column of B and equate
that to 0 proceeding in a proper sequence of steps so that the values of b that
are needed to do the forward substitution are obtained from beforehand.
5.1 Storage in a RAM
In any matrix total number of elements = n x n=n2
. In the upper triangular
matrix generated here the number of non-zero elements is n(n−1)
2
since the
rest of the elements are zero in the bottom left triangle.So for minimisation
of hardware we have come up with an algorithm to omit storage of the zeros
in the RAM. If the zeros were not omitted the position of the element ri,j
would be j + (i-1)x n. However since this is not the case we are required to
develop an algorithm to generate the RAM location address for given i, j and
n
5.2 Address generation Mechanism
As in the upper triangular matrix the ri,j = 0 for i<j; there would no need for
storing them as zeroes individually in the RAM, instead we could just omit
the zeroes and find the position in the RAM corresponding to inputs (i,j)
that is ri,j would be given and a corresponding location in the RAM would
be obtained in our mechanism where zeroes are not stored, the address in
the RAM for ri,j would be equal to
n(i-1)+j-i(i−1)
2
-1.
Now this formula is obtained from the fact that in the actual system we
would have the address of the element ri,j as j + (i-1)x n but this time for
10
each row we are omitting i zeros, so the cumulative number of zeros omitted
is i
k=1 k
Figure 2: Block diagram of the address generation block
Figure 3: Circuit diagram of the address generation block
Hardware Required :
11
4 adders/subtractors
2 multipliers
1 bit right shifter
5.3 Hardware for finding the inverse of diagonal ele-
ments
The following circuit (Figure 4) can be used for inversion of the diagonal
elements of the upper triangular matrix. The circuit consists of a loadable
up counter that counts till the number of rows in the matrix. Hence the
comparator to indicate that this process needs to stop when the value n is
reached. The circuit then sends value to the address generator block of RAM
A and then the same address is sent to RAM B so that the data is modified
in the same location in both RAM and RAM B.
Hardware Required :
1 Loadable Up Counter
1 Comparator
1 Inverter Block that computes the inverse of a 16 bit number.
Time Required :
Same as n clock pulses
12
Figure 4: Schematic hardware design for inversion of diagonal elements
5.4 Hardware for the finding the inverse of the other
elements
The following circuit(Figure 5) can be used for diagonalizing all elements
other than the diagonal elements.
Hardware Required :
3 Loadable Up Counter
4 address generation blocks
1 divider
1 multiplier
4 adders/subtractors
1 Register
Necessary control circuits for termination of loops
No. of clock cycles needed :
O(n2
)
13
Figure 5: Schematic hardware design for inversion of elements other than
those lying in the principal diagonal
14
6 Conclusions
6.1 Multi PORT RAM for faster performance
One of the obstacles in the way of obtaining high performance in computing
is the memory-wall . If the processing elements cannot get the data from reg-
ister file (RF) at the processing rate, this causes a bottleneck that adversely
affects the overall performance. In order to meet the requirement of proper
data usage between the computational units, such a computation system
needs a register file that can meet the requirements of different computing
units on the FPGA. The demand to process more data per unit time requires
multiple read and write operations at a time, which can be achieved by the
usage of multi-port register files (MPo-RFs) instead of conventional single-
port RFs (SPo-RF).Multi-ported memories are challenging to implement on
FPGAs since the block RAMs included in the fabric typically have only two
ports. Hence we must construct memories requiring more than two ports
either out of logic elements or by combining multiple block RAMs. Some
Conventional Multi-Port Register File Implementations that can be used:
1. Distributed Memory
2. Replication
3. Banking
4. Multi-pumping
6.2 Distributed Arithmetic for computing the product
of the two matrices
Distributed arithmetic is a technique developed for the real-time computation
of the inner product of the vector with constant elements and the vector
with varying coefficients. The inner product is computed without splitting
into operations of multiplication and addition. At calculation, operations
of summation and shift of inner products of an unchangeable vector and a
bit-slice of a changeable vector are carried out. All possible values of partial
inner products are calculated offline and written down in Look Up Table
(LUT).The content of LUT is computed dynamically in the online mode.
Contents of this memory remain invariable for the period of multiplication of
the left matrix by a column of the right matrix. Despite need of calculation
of contents of LUT total number of micro-operations of addition decreases
15
Figure 6: 4 Read + 1 Write block RAM as an example of Multiport RAM
in comparison with a classical way of calculation of matrix product.
16
7 Acknowledgements
The authors would like to thank their Project Guide Dr. Ayan Banerjee
for his invaluable suggestions and proper direction throughout the course
of the project. Thankfulness and heartfelt gratitude is also extended to Mr.
Anirban Chakraborty who is currently pursuing his Ph.D under the guidance
of Prof. Ayan Banerjee.
References
[1] Gonzalez, R. C., Woods, R. E. (2002). Digital image processing. Upper
Saddle River, NJ: Prentice Hall.
[2] Seyid K, Blanc S, Leblebici Y Hardware Implementation of Real-Time
Multiple Frame Super-Resolution eyid Very Large Scale Integration
(VLSI-SoC), 2015 IFIP/IEEE International Conference on
[3] Matrix Inversion Using QR Decomposition by Parabolic Synthesis Nafiz
Ahmed Chisty—
[4] Brown, Robert Grover; Hwang, Patrick Y.C. (1996). Introduction to Ran-
dom Signals and Applied Kalman Filtering (3 ed.). New York: John Wiley
Sons. ISBN 0-471-12839-2.
[5] D. Boulfelfel, R.M. Rangayyan, L.J. Hahn, and R. Kloiber, 1994, ”Three-
dimensional restoration of single photon emission computed tomography
images”, IEEE Transactions on Nuclear Science, 41(5): 1746-1754, Octo-
ber 1994
[6] Wiener, Norbert (1949). Extrapolation, Interpolation, and Smoothing of
Stationary Time Series. New York: Wiley. ISBN 0-262-73005-7.
[7] Thomas Kailath, Ali H. Sayed, and Babak Hassibi, Linear Estimation,
Prentice-Hall, NJ, 2000, ISBN 978-0-13-022464-4.
[8] Wiener N: The interpolation, extrapolation and smoothing of stationary
time series’, Report of the Services 19, Research Project DIC-6037 MIT,
February 1942
17
[9] Kolmogorov A.N: ’Stationary sequences in Hilbert space’, (In Russian)
Bull. Moscow Univ. 1941 vol.2 no.6 1-40. English translation in Kailath
T. (ed.) Linear least squares estimation Dowden, Hutchinson Ross 1977
[10] Vladislav Lesnikov, Tatiana Naumovich, Alexander Chastikov, ”Modifi-
cation of the architecture of a distributed arithmetic”, East-West Design
Test Symposium (EWDTS) 2015 IEEE, pp. 1-4, 2015.
[11] Tips Tricks: Creating a 2W+4R FPGA Block RAM, Part 1 ´Alvaro
Lopes, Senior Software engineer, Critical Software
[12] An Efficient FPGA Implementation of Scalable Matrix Inversion Core
using QR Decomposition
18

Mais conteúdo relacionado

Mais procurados

DSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersDSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersAmr E. Mohamed
 
13 fourierfiltrationen
13 fourierfiltrationen13 fourierfiltrationen
13 fourierfiltrationenhoailinhtinh
 
Image restoration1
Image restoration1Image restoration1
Image restoration1moorthim7
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersIDES Editor
 
DSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformDSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformAmr E. Mohamed
 
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)Shajun Nisha
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal ProcessingSandip Ladi
 
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingAmr E. Mohamed
 
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformationsJohn Williams
 
Paper id 252014114
Paper id 252014114Paper id 252014114
Paper id 252014114IJRAT
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transformop205
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2Ali Baig
 
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidFourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidXavier Davias
 
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions ssuser2797e4
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsAmr E. Mohamed
 

Mais procurados (20)

DSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital FiltersDSP_FOEHU - Lec 07 - Digital Filters
DSP_FOEHU - Lec 07 - Digital Filters
 
13 fourierfiltrationen
13 fourierfiltrationen13 fourierfiltrationen
13 fourierfiltrationen
 
Image restoration1
Image restoration1Image restoration1
Image restoration1
 
A Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR FiltersA Novel Methodology for Designing Linear Phase IIR Filters
A Novel Methodology for Designing Linear Phase IIR Filters
 
DSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier TransformDSP_FOEHU - Lec 09 - Fast Fourier Transform
DSP_FOEHU - Lec 09 - Fast Fourier Transform
 
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
ESTIMATING NOISE PARAMETER & FILTERING (Digital Image Processing)
 
Digital Signal Processing
Digital Signal ProcessingDigital Signal Processing
Digital Signal Processing
 
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processingDsp 2018 foehu - lec 10 - multi-rate digital signal processing
Dsp 2018 foehu - lec 10 - multi-rate digital signal processing
 
Image trnsformations
Image trnsformationsImage trnsformations
Image trnsformations
 
Unit ii
Unit iiUnit ii
Unit ii
 
Paper id 252014114
Paper id 252014114Paper id 252014114
Paper id 252014114
 
Image transforms
Image transformsImage transforms
Image transforms
 
Signal Processing Homework Help
Signal Processing Homework HelpSignal Processing Homework Help
Signal Processing Homework Help
 
Fast Fourier Transform
Fast Fourier TransformFast Fourier Transform
Fast Fourier Transform
 
Dft,fft,windowing
Dft,fft,windowingDft,fft,windowing
Dft,fft,windowing
 
Image transforms 2
Image transforms 2Image transforms 2
Image transforms 2
 
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoidFourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
Fourier transforms & fft algorithm (paul heckbert, 1998) by tantanoid
 
EC8562 DSP Viva Questions
EC8562 DSP Viva Questions EC8562 DSP Viva Questions
EC8562 DSP Viva Questions
 
Matched filter
Matched filterMatched filter
Matched filter
 
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and SystemsDSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
DSP_2018_FOEHU - Lec 03 - Discrete-Time Signals and Systems
 

Semelhante a Wiener Filter Hardware Realization

Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic AlgorithmPerformance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic Algorithmrahulmonikasharma
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationIvan Kitov
 
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...sipij
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentIJERD Editor
 
Performance evaluation of ds cdma
Performance evaluation of ds cdmaPerformance evaluation of ds cdma
Performance evaluation of ds cdmacaijjournal
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewIJARIIT
 
A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...eSAT Journals
 
Mining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systemsMining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systemsDr.MAYA NAYAK
 
A novel architecture of rns based
A novel architecture of rns basedA novel architecture of rns based
A novel architecture of rns basedVLSICS Design
 
Espacios y subepacios vectoriales
Espacios y subepacios vectorialesEspacios y subepacios vectoriales
Espacios y subepacios vectorialesMirianArcos1
 
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...ijcsit
 
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGICDESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGICVLSICS Design
 
A novel approach for high speed convolution of finite
A novel approach for high speed convolution of finiteA novel approach for high speed convolution of finite
A novel approach for high speed convolution of finiteeSAT Publishing House
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check IJECEIAES
 

Semelhante a Wiener Filter Hardware Realization (20)

Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic AlgorithmPerformance Assessment of Polyphase Sequences Using Cyclic Algorithm
Performance Assessment of Polyphase Sequences Using Cyclic Algorithm
 
Investigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlationInvestigation of repeated blasts at Aitik mine using waveform cross correlation
Investigation of repeated blasts at Aitik mine using waveform cross correlation
 
Time of arrival based localization in wireless sensor networks a non linear ...
Time of arrival based localization in wireless sensor networks  a non linear ...Time of arrival based localization in wireless sensor networks  a non linear ...
Time of arrival based localization in wireless sensor networks a non linear ...
 
International Journal of Engineering Research and Development
International Journal of Engineering Research and DevelopmentInternational Journal of Engineering Research and Development
International Journal of Engineering Research and Development
 
Nc2421532161
Nc2421532161Nc2421532161
Nc2421532161
 
Performance evaluation of ds cdma
Performance evaluation of ds cdmaPerformance evaluation of ds cdma
Performance evaluation of ds cdma
 
Channel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a reviewChannel and clipping level estimation for ofdm in io t –based networks a review
Channel and clipping level estimation for ofdm in io t –based networks a review
 
xldb-2015
xldb-2015xldb-2015
xldb-2015
 
A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...A novel approach for high speed convolution of finite and infinite length seq...
A novel approach for high speed convolution of finite and infinite length seq...
 
Mining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systemsMining of time series data base using fuzzy neural information systems
Mining of time series data base using fuzzy neural information systems
 
06075626 (1)
06075626 (1)06075626 (1)
06075626 (1)
 
06075626
0607562606075626
06075626
 
A novel architecture of rns based
A novel architecture of rns basedA novel architecture of rns based
A novel architecture of rns based
 
Espacios y subepacios vectoriales
Espacios y subepacios vectorialesEspacios y subepacios vectoriales
Espacios y subepacios vectoriales
 
A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...A robust blind and secure watermarking scheme using positive semi definite ma...
A robust blind and secure watermarking scheme using positive semi definite ma...
 
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGICDESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
DESIGN OF QUATERNARY LOGICAL CIRCUIT USING VOLTAGE AND CURRENT MODE LOGIC
 
Ijetr042170
Ijetr042170Ijetr042170
Ijetr042170
 
A novel approach for high speed convolution of finite
A novel approach for high speed convolution of finiteA novel approach for high speed convolution of finite
A novel approach for high speed convolution of finite
 
N41049093
N41049093N41049093
N41049093
 
Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check  Area efficient parallel LFSR for cyclic redundancy check
Area efficient parallel LFSR for cyclic redundancy check
 

Último

『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书rnrncn29
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Coursebim.edu.pl
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSsandhya757531
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionMebane Rash
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Sumanth A
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfBalamuruganV28
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfShreyas Pandit
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTSneha Padhiar
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical trainingGladiatorsKasper
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionSneha Padhiar
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...Erbil Polytechnic University
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESkarthi keyan
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmDeepika Walanjkar
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptJohnWilliam111370
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptNoman khan
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating SystemRashmi Bhat
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProRay Yuan Liu
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxRomil Mishra
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Romil Mishra
 

Último (20)

『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
『澳洲文凭』买麦考瑞大学毕业证书成绩单办理澳洲Macquarie文凭学位证书
 
Katarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School CourseKatarzyna Lipka-Sidor - BIM School Course
Katarzyna Lipka-Sidor - BIM School Course
 
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMSHigh Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
High Voltage Engineering- OVER VOLTAGES IN ELECTRICAL POWER SYSTEMS
 
US Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of ActionUS Department of Education FAFSA Week of Action
US Department of Education FAFSA Week of Action
 
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
Robotics-Asimov's Laws, Mechanical Subsystems, Robot Kinematics, Robot Dynami...
 
CS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdfCS 3251 Programming in c all unit notes pdf
CS 3251 Programming in c all unit notes pdf
 
Theory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdfTheory of Machine Notes / Lecture Material .pdf
Theory of Machine Notes / Lecture Material .pdf
 
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENTFUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
FUNCTIONAL AND NON FUNCTIONAL REQUIREMENT
 
70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training70 POWER PLANT IAE V2500 technical training
70 POWER PLANT IAE V2500 technical training
 
Cost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based questionCost estimation approach: FP to COCOMO scenario based question
Cost estimation approach: FP to COCOMO scenario based question
 
"Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ..."Exploring the Essential Functions and Design Considerations of Spillways in ...
"Exploring the Essential Functions and Design Considerations of Spillways in ...
 
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTESCME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
CME 397 - SURFACE ENGINEERING - UNIT 1 FULL NOTES
 
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithmComputer Graphics Introduction, Open GL, Line and Circle drawing algorithm
Computer Graphics Introduction, Open GL, Line and Circle drawing algorithm
 
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.pptROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
ROBOETHICS-CCS345 ETHICS AND ARTIFICIAL INTELLIGENCE.ppt
 
Forming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).pptForming section troubleshooting checklist for improving wire life (1).ppt
Forming section troubleshooting checklist for improving wire life (1).ppt
 
Input Output Management in Operating System
Input Output Management in Operating SystemInput Output Management in Operating System
Input Output Management in Operating System
 
A brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision ProA brief look at visionOS - How to develop app on Apple's Vision Pro
A brief look at visionOS - How to develop app on Apple's Vision Pro
 
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptxTriangulation survey (Basic Mine Surveying)_MI10412MI.pptx
Triangulation survey (Basic Mine Surveying)_MI10412MI.pptx
 
Designing pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptxDesigning pile caps according to ACI 318-19.pptx
Designing pile caps according to ACI 318-19.pptx
 
Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________Gravity concentration_MI20612MI_________
Gravity concentration_MI20612MI_________
 

Wiener Filter Hardware Realization

  • 1. Wiener Filter Realization using Hardware. QR decomposition of matrices and inversion by Givens’ Rotation *************************************** 7th Semester Project Report Akashdip Das Abantika Chowdhury Sayan Chaudhuri Guide : Dr. Ayan Banerjee Electronics and Telecommunication Engineering Department December, 2016 1
  • 2. Contents 1 Abstract 3 2 Introduction 3 3 Wiener Filtering 4 4 Q-R decomposition of a matrix 6 5 Hardware for inversion of an upper triangular matrix(R) 9 5.1 Storage in a RAM . . . . . . . . . . . . . . . . . . . . . . . . . 10 5.2 Address generation Mechanism . . . . . . . . . . . . . . . . . 10 5.3 Hardware for finding the inverse of diagonal elements . . . . . 12 5.4 Hardware for the finding the inverse of the other elements . . 13 6 Conclusions 15 6.1 Multi PORT RAM for faster performance . . . . . . . . . . . 15 6.2 Distributed Arithmetic for computing the product of the two matrices . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 7 Acknowledgements 17 2
  • 3. 1 Abstract Super-resolution reconstruction is a method for reconstructing higher reso- lution images from a set of low resolution observations. The sub-pixel differ- ences among different observations of the same scene allow to create higher resolution images with better quality. In the last thirty years, many methods for creating high resolution images have been proposed. However, hardware implementations of such methods are limited. Wiener filter design is one of the techniques we will use initially for this process. Wiener filter design involves matrix inversion. A novel method for the matrix inversion has been proposed in the report. QR decomposition will be the computational algo- rithm used using Givens Rotation. 2 Introduction The process of super resolution initially requires that the image be restored from the effects of noise and degradation(assumed isotropic). For that pur- pose the Wiener Filter is used that basically helps in forming an estimate of the image from the degraded one.The fundamentals of the Wiener Filtering has been discussed in Section 3. The Wiener Filtering requires generation of the inverse of a given matrix The method followed here is the QR Decompo- sition(discussed in Section 4). The QR decomposition involves generation of an upper triangular matrix which we will be inverting in the proposed algo- rithm. Various techniques for decomposition of the matrix has been discussed in papers [3],[4]. However the inversion of a matrix proposed by them was not sufficient for the general solution for the problem. Rather the solution was illustrated for a specific system of 3x3 matrix. The QR decomposition involves forming an upper triangular matrix and an orthogonal matrix. The inversion of an orthogonal matrix is simply obtained by computing its trans- pose. The inversion of the upper triangular matrix has been discussed in this paper. The solutions available for this process is for a 3x3 or 4x4 system. So in this paper we have generalized the inversion to a nxn system. The hardware that is required for this purpose has been developed in Section 5 along with sound reasoning and justification. The hardware that has been developed has scopes for enhanced performance that has been discussed in section 6 3
  • 4. 3 Wiener Filtering In signal processing, the Wiener filter is a filter used to produce an estimate of a desired or target random process by linear time-invariant (LTI) filtering of an observed noisy process, assuming known stationary signal and noise spectra, and additive noise. The Wiener filter minimizes the mean square error between the estimated random process and the desired process. The goal of the Wiener filter is to compute a statistical estimate of an unknown signal using a related signal as an input and filtering that known signal to produce the estimate as an output. For example, the known signal might consist of an unknown signal of interest that has been corrupted by additive noise. The Wiener filter can be used to filter out the noise from the corrupted signal to provide an estimate of the underlying signal of interest. he Wiener filter is based on a statistical approach based on MMSE (Minimum Mean Square Error).The causal finite impulse response (FIR) Wiener filter, instead of using some given data matrix X and output vector Y, finds optimal tap weights by using the statistics of the input and output signals. It populates the input matrix X with estimates of the auto-correlation of the input signal (T) and populates the output vector Y with estimates of the cross-correlation between the output and input signals (V). In order to derive the coefficients of the Wiener filter, consider the signal w[n] being fed to a Wiener filter of order N and with coefficients {a0, · · · , aN }. The output of the filter is denoted x[n] which is given by the expression. x[n] = N i=0 aiw[n − i]. The residual error is denoted e[n] and is defined as e[n] = x[n] s[n] (see the corresponding block diagram). The Wiener filter is designed so as to mini- mize the mean square error (MMSE criteria) which can be stated concisely as follows: ai = arg min E e2 [n] , where E[·] denotes the expectation operator. In the general case, the coefficientsai may be complex and may be derived for the case where w[n] and s[n] are complex as well. With a complex signal, the matrix to be solved is a Hermitian Toeplitz matrix, rather than symmetric Toeplitz matrix. For simplicity, the following considers only the case where all these quantities are real. The mean square error (MSE) may be rewritten as: 4
  • 5. E e2 [n] = E (x[n] − s[n])2 = E x2 [n] + E s2 [n] − 2E[x[n]s[n]] = E   N i=0 aiw[n − i] 2   + E s2 [n] − 2E N i=0 aiw[n − i]s[n] To find the vector [a0, . . . , aN ] which minimizes the expression above, calcu- late its derivative with respect to each ai ∂ ∂ai E e2 [n] = ∂ ∂ai    E   N i=0 aiw[n − i] 2   + E s2 [n] − 2E N i=0 aiw[n − i]s[n]    = 2E N j=0 ajw[n − j] w[n − i] − 2E[s[n]w[n − i]] = 2 N j=0 E[w[n − j]w[n − i]]aj − 2E[w[n − i]s[n]] Assuming that w[n] and s[n] are each stationary and jointly stationary, the sequencesRw[m] and Rws[m] known respectively as the autocorrelation of w[n] and the cross-correlation between w[n] and s[n] can be defined as fol- lows: Rw[m] = E{w[n]w[n + m]} Rws[m] = E{w[n]s[n + m]} The derivative of the MSE may therefore be rewritten as (notice that Rws[−i] = Rsw[i]) ∂ ∂ai E e2 [n] = 2 N j=0 Rw[j − i]aj − 2Rsw[i] i = 0, · · · , N. Letting the derivative be equal to zero results in N j=0 Rw[j − i]aj = Rsw[i] i = 0, · · · , N. which can be rewritten in matrix form      Rw[0] Rw[1] · · · Rw[N] Rw[1] Rw[0] · · · Rw[N − 1] ... ... ... ... Rw[N] Rw[N − 1] · · · Rw[0]      T      a0 a1 ... aN      a =      Rsw[0] Rsw[1] ... Rsw[N]      v These equations are known as the Wiener–Hopf equations. The matrix T ap- 5
  • 6. pearing in the equation is a symmetric Toeplitz matrix. Under suitable con- ditions on R , these matrices are known to be positive definite and therefore non-singular yielding a unique solution to the determination of the Wiener filter coefficient vector, a = T−1 v It is this equation that makes it necessary to design a Matrix Inversion Hard- ware that is faster than the existing ones so that there is less delay in image processing and also generalization to NxN form. The inversion of the matrix will be done in this paper using QR decomposition using Givens Rotation 4 Q-R decomposition of a matrix QR Decomposition: QR decomposition is one of the most important opera- tions in linear algebra. It can be used to find matrix inversion, to solve a set of simulations equations or in numerous applications in scientific computing. It represents one of the relatively small numbers of matrix operation primitive from which a wide range of algorithms can be realized. QR decomposition is an elementary operation, which decomposes a matrix into an orthogonal and a triangular matrix. QR decomposition of a real square matrix A is a decomposition of A as A = QR, where Q is an orthogonal matrix (QT Q = I) and R is an upper triangular matrix. And we can factor m x n matrices (with m n) of full rank as the product of an m x n orthogonal matrix where QT Q = I and an n x n upper triangular matrix. There are different meth- ods which can be used to compute QR decomposition. The techniques for QR decomposition are Gram-Schmidt ortho-normalization method, House- holder reflections, and the Givens rotations. Each decomposition method has a number of advantages and disadvantages because of their specific solution process.The Givens’ Rotation Technique has been discussed If there are two nonzero vectors, x and y, in a plane, the angle, θ, between them can be formalized as : cos(θ)= (x,y) ||x||2||y||2 The rotation will be performed using 16 bit pipelined CORDIC. This formula can be extended to n vectors. The angle, θ , can be defined as 6
  • 7. θ=arccos (x,y) ||x||2||y||2 ((A−1 ) −1 )=A A=QR where R is an upper triangular matrix and R is an orthogonal matrix. I=QQT Consider a 4X4 system A =     a1,1 a1,2 a1,3 a1,4 a2,1 a2,2 a2,3 a2,4 a3,1 a3,2 a3,3 a3,4 a4,1 a4,2 a4,3 a4,4     R =     a1,1 a1,2 a1,3 a1,4 0 a2,2 a2,3 a2,4 0 0 a3,3 a3,4 0 0 0 a4,4     The matrix of Givens Rotation is G(i,j, θ) =     1 0 0 0 0 cos(θ) sin(θ) 0 0 −sin(θ) cos(θ) 0 0 0 0 1     Givens Rotation process utilizes a cycle of rotation whose function is to null an element in the sub-diagonal of the matrix forming the QR matrix. Q matrix is obtained by concatenating all the Givens Rotation. R is to be found from three rotation where each element is obtained from each rotation. Givens Rotation matrices needed for a 3x3 system G1 =   cos(θ) 0 sin(θ) 0 1 0 −sin(θ) 0 cos(θ)   G2 =   cos(θ) sin(θ) 0 −sin(θ) cos(θ) 0 cos(θ) cos(θ) 1   G3 =   1 0 0 cos(θ) cos(θ) sin(θ) cos(θ) −sin(θ) cos(θ)   θ, A(3,1) , A(2,1), A(3,2) can be obtained using c1 = A1(1,1) √ A1(3,1)2 +A1(1,1)2 7
  • 8. c2 = A1(1,1) √ A1(2,2)2 +A1(3,2)2 c3 = A1(1,1) √ A1(2,2)2 +A1(3,2)2 s1 = A1(3,1) √ A1(3,1)2 +A1(1,1)2 s2 = A1(2,1) √ A1(2,1)2 +A1(1,1)2 s3 = A1(3,2) √ A1(2,2)2 +A1(3,2)2 Q = G1 T .G2 T .G3 T A2 = G1A1 A3 = G2A2 R = G3A3 A = QR A−1 = (QR)−1 A−1 = (R)−1 (Q)−1 A−1 = (R)−1 (Q)T This nececitates the formation of the inverse of the upper triangular ma- trix and it’s subsequent multiplication to the transpose of the orthogonal matrix. Figure 1: Basic Hardware for matrix inversion using QR decomposition.The G matrix is formed using Givens Rotation performed using CORDIC 8
  • 9. 5 Hardware for inversion of an upper trian- gular matrix(R) We have designed the hardware for inversion of a generalised N X N upper triangular matrix R. where R=      r1,1 r1,2 · · · r1,n 0 r2,2 · · · r2,n ... ... ... ... 0 0 · · · rn,n      Let B be (R)−1 . The algorithm is as followed 1 f or ( row=1;row<=n ; row++) 2 B(row , row )=1/R(row , row ) 3 next row 4 f or ( row=1;row<=n ; row++) 5 f or ( col=row+1; col<=n ; col++) 6 s=0 7 f or (k=1;k<=col −1;k++) 8 s=s+B(row , k)R(k , col ) 9 s=−s /R( col , col ) 10 B(row , col )=s 11 next k 12 next col 13 next row We observe that the inverse of the upper triangular matrix is also an upper triangular matrix with the diagonal elements reciprocal of the diag- onal elements of the original matrix. The inverse of the other elements are calculated recursively using the algorithm as mentioned above. An example to illustrate how the algorithm works is shown below. Let A be an upper triangular matrix and B be its inverse then A=        a1,1 a1,2 a1.3 · · · r1,n 0 a2,2 a2,3 · · · a2,n 0 0 a3,3 · · · a3,n ... ... ... ... ... 0 0 0 · · · an,n        B=        b1,1 b1,2 ab1.3 · · · br1,n 0 b2,2 b2,3 · · · b2,n 0 0 b3,3 · · · b3,n ... ... ... ... ... 0 0 0 · · · bn,n        Since AB=I 9
  • 10.        a1,1 a1,2 a1.3 · · · r1,n 0 a2,2 a2,3 · · · a2,n 0 0 a3,3 · · · a3,n ... ... ... ... ... 0 0 0 · · · an,n               b1,1 b1,2 ab1.3 · · · br1,n 0 b2,2 b2,3 · · · b2,n 0 0 b3,3 · · · b3,n ... ... ... ... ... 0 0 0 · · · bn,n        =        1 0 0 · · · 0 0 1 0 · · · 0 0 0 1 · · · 0 ... ... ... ... ... 0 0 0 · · · 1        Multiplying the ith row of matrix A with the ith column of B yields ai,ibi,i=1. Hence we see that bi,i = 1 ai,i Now to solve for the non diagonal elements of the matrix B. We multiply the first row and second column first to get a1,1b1,2+a1,2b2,2=0. We already know thw value of b2,2 So the only unknown is b1,2. Now in general to obtain the value of bi,j we multiply the ith row of A and the jth column of B and equate that to 0 proceeding in a proper sequence of steps so that the values of b that are needed to do the forward substitution are obtained from beforehand. 5.1 Storage in a RAM In any matrix total number of elements = n x n=n2 . In the upper triangular matrix generated here the number of non-zero elements is n(n−1) 2 since the rest of the elements are zero in the bottom left triangle.So for minimisation of hardware we have come up with an algorithm to omit storage of the zeros in the RAM. If the zeros were not omitted the position of the element ri,j would be j + (i-1)x n. However since this is not the case we are required to develop an algorithm to generate the RAM location address for given i, j and n 5.2 Address generation Mechanism As in the upper triangular matrix the ri,j = 0 for i<j; there would no need for storing them as zeroes individually in the RAM, instead we could just omit the zeroes and find the position in the RAM corresponding to inputs (i,j) that is ri,j would be given and a corresponding location in the RAM would be obtained in our mechanism where zeroes are not stored, the address in the RAM for ri,j would be equal to n(i-1)+j-i(i−1) 2 -1. Now this formula is obtained from the fact that in the actual system we would have the address of the element ri,j as j + (i-1)x n but this time for 10
  • 11. each row we are omitting i zeros, so the cumulative number of zeros omitted is i k=1 k Figure 2: Block diagram of the address generation block Figure 3: Circuit diagram of the address generation block Hardware Required : 11
  • 12. 4 adders/subtractors 2 multipliers 1 bit right shifter 5.3 Hardware for finding the inverse of diagonal ele- ments The following circuit (Figure 4) can be used for inversion of the diagonal elements of the upper triangular matrix. The circuit consists of a loadable up counter that counts till the number of rows in the matrix. Hence the comparator to indicate that this process needs to stop when the value n is reached. The circuit then sends value to the address generator block of RAM A and then the same address is sent to RAM B so that the data is modified in the same location in both RAM and RAM B. Hardware Required : 1 Loadable Up Counter 1 Comparator 1 Inverter Block that computes the inverse of a 16 bit number. Time Required : Same as n clock pulses 12
  • 13. Figure 4: Schematic hardware design for inversion of diagonal elements 5.4 Hardware for the finding the inverse of the other elements The following circuit(Figure 5) can be used for diagonalizing all elements other than the diagonal elements. Hardware Required : 3 Loadable Up Counter 4 address generation blocks 1 divider 1 multiplier 4 adders/subtractors 1 Register Necessary control circuits for termination of loops No. of clock cycles needed : O(n2 ) 13
  • 14. Figure 5: Schematic hardware design for inversion of elements other than those lying in the principal diagonal 14
  • 15. 6 Conclusions 6.1 Multi PORT RAM for faster performance One of the obstacles in the way of obtaining high performance in computing is the memory-wall . If the processing elements cannot get the data from reg- ister file (RF) at the processing rate, this causes a bottleneck that adversely affects the overall performance. In order to meet the requirement of proper data usage between the computational units, such a computation system needs a register file that can meet the requirements of different computing units on the FPGA. The demand to process more data per unit time requires multiple read and write operations at a time, which can be achieved by the usage of multi-port register files (MPo-RFs) instead of conventional single- port RFs (SPo-RF).Multi-ported memories are challenging to implement on FPGAs since the block RAMs included in the fabric typically have only two ports. Hence we must construct memories requiring more than two ports either out of logic elements or by combining multiple block RAMs. Some Conventional Multi-Port Register File Implementations that can be used: 1. Distributed Memory 2. Replication 3. Banking 4. Multi-pumping 6.2 Distributed Arithmetic for computing the product of the two matrices Distributed arithmetic is a technique developed for the real-time computation of the inner product of the vector with constant elements and the vector with varying coefficients. The inner product is computed without splitting into operations of multiplication and addition. At calculation, operations of summation and shift of inner products of an unchangeable vector and a bit-slice of a changeable vector are carried out. All possible values of partial inner products are calculated offline and written down in Look Up Table (LUT).The content of LUT is computed dynamically in the online mode. Contents of this memory remain invariable for the period of multiplication of the left matrix by a column of the right matrix. Despite need of calculation of contents of LUT total number of micro-operations of addition decreases 15
  • 16. Figure 6: 4 Read + 1 Write block RAM as an example of Multiport RAM in comparison with a classical way of calculation of matrix product. 16
  • 17. 7 Acknowledgements The authors would like to thank their Project Guide Dr. Ayan Banerjee for his invaluable suggestions and proper direction throughout the course of the project. Thankfulness and heartfelt gratitude is also extended to Mr. Anirban Chakraborty who is currently pursuing his Ph.D under the guidance of Prof. Ayan Banerjee. References [1] Gonzalez, R. C., Woods, R. E. (2002). Digital image processing. Upper Saddle River, NJ: Prentice Hall. [2] Seyid K, Blanc S, Leblebici Y Hardware Implementation of Real-Time Multiple Frame Super-Resolution eyid Very Large Scale Integration (VLSI-SoC), 2015 IFIP/IEEE International Conference on [3] Matrix Inversion Using QR Decomposition by Parabolic Synthesis Nafiz Ahmed Chisty— [4] Brown, Robert Grover; Hwang, Patrick Y.C. (1996). Introduction to Ran- dom Signals and Applied Kalman Filtering (3 ed.). New York: John Wiley Sons. ISBN 0-471-12839-2. [5] D. Boulfelfel, R.M. Rangayyan, L.J. Hahn, and R. Kloiber, 1994, ”Three- dimensional restoration of single photon emission computed tomography images”, IEEE Transactions on Nuclear Science, 41(5): 1746-1754, Octo- ber 1994 [6] Wiener, Norbert (1949). Extrapolation, Interpolation, and Smoothing of Stationary Time Series. New York: Wiley. ISBN 0-262-73005-7. [7] Thomas Kailath, Ali H. Sayed, and Babak Hassibi, Linear Estimation, Prentice-Hall, NJ, 2000, ISBN 978-0-13-022464-4. [8] Wiener N: The interpolation, extrapolation and smoothing of stationary time series’, Report of the Services 19, Research Project DIC-6037 MIT, February 1942 17
  • 18. [9] Kolmogorov A.N: ’Stationary sequences in Hilbert space’, (In Russian) Bull. Moscow Univ. 1941 vol.2 no.6 1-40. English translation in Kailath T. (ed.) Linear least squares estimation Dowden, Hutchinson Ross 1977 [10] Vladislav Lesnikov, Tatiana Naumovich, Alexander Chastikov, ”Modifi- cation of the architecture of a distributed arithmetic”, East-West Design Test Symposium (EWDTS) 2015 IEEE, pp. 1-4, 2015. [11] Tips Tricks: Creating a 2W+4R FPGA Block RAM, Part 1 ´Alvaro Lopes, Senior Software engineer, Critical Software [12] An Efficient FPGA Implementation of Scalable Matrix Inversion Core using QR Decomposition 18