FACULTY OF ELECTRICAL ENGINEERING
UNIVERSITY OF BANJA LUKA

Address: Patre 5, 78000 Banja Luka, Bosnia and Herzegovina
Phone: +387 51 211824
Fax: +387 51 211408
Web: www.etfbl.net

ELECTRONICS
Web: www.electronics.etfbl.net
E-mail: electronics@etfbl.net

Editor-in-Chief:
Branko L. Dokić, Ph. D.
Faculty of Electrical Engineering, University of Banja Luka, Bosnia and Herzegovina
E-mail: branko.dokic@etfbl.net

Co Editor-In-Chief:
Prof. Tatjana Pešić-Brdanin, University of Banja Luka, Bosnia and Herzegovina
E-mail: tatjanapb@etfbl.net

International Editorial Board:
• Prof. Goce Arsov, St. Cyril and Methodius University, Macedonia
• Prof. Zdenka Babić, University of Banja Luka, Bosnia and Herzegovina
• Prof. Petar Biljanović, University of Zagreb, Croatia
• Prof. Milorad Božić, University of Banja Luka, Bosnia and Herzegovina
• Prof. Octavio Nieto-Taladriz Garcia, Polytechnic University of Madrid, Spain
• Dr Zoran Jakšić, IHTM, Serbia
• Prof. Vladimir Katić, University of Novi Sad, Serbia
• Prof. Tom J. Kazmierski, University of Southampton, United Kingdom
• Prof. Vančo Litovski, University of Niš, Serbia
• Dr Duško Lukač, University of Applied Sciences, Germany
• Prof. Danilo Mandić, Imperial College, London, United Kingdom
• Prof. Bratislav Milovanović, University of Niš, Serbia
• Prof. Vojin Oklobđija, University of Texas at Austin, USA
• Prof. Predrag Pejović, University of Belgrade, Serbia
• Prof. Ninoslav Stojadinović, University of Niš, Serbia
• Prof. Robert Šobot, Western University, Canada
• Prof. Slobodan Vukosavić, University of Belgrade, Serbia
• Prof. Volker Zerbe, University of Applied Sciences of Erfurt, Germany

Secretary:
Aleksandar Pajkanović
Drago Čavka
Svjetlana Kovačević
Jovica Bulović

Publisher:
Faculty of Electrical Engineering, University of Banja Luka, Bosnia and Herzegovina

Number of printed copies: 100
Technology-Dependent Optimization of FIR Filters based on Carry-Save Multiplier and 4:2 Compressor unit

Burhan Khurshid and Roohie Naaz

Abstract - This work presents an FPGA implementation of FIR filter based on 4:2 compressor and CSA multiplier unit. The hardware realizations presented in this paper are based on the technology-dependent optimization of these individual units. The aim is to achieve an efficient mapping of these isolated units on Xilinx FPGAs. Conventional filter implementations consider only technology-independent optimizations and rely on Xilinx CAD tools to map the logic onto FPGA fabric. Very often this results in inefficient mapping. In this paper, we consider the traditional CSA-4:2 compressor based FIR filters and restructure these units to achieve improved integration levels. The technology optimized Boolean networks are then coded using instantiation based coding strategies. The Xilinx tool then uses its own optimization strategies to further optimize the networks and generate circuits with high logic densities and reduced depths. Experimental results indicate a significant improvement in performance over traditional realizations. An important property of technology-dependent optimizations is the simultaneous improvement in all the performance parameters. This is in contrast to the technology-independent optimizations where there is always an application driven trade-off between different performance parameters.

Index Terms—FIR filters, FPGA, Look-up table, Technology Mapping, Carry-save Arithmetic

Original Research Paper
DOI: 10.7251/ELS1620043K

I. INTRODUCTION

Finite Impulse Response (FIR) filters are basic components in many Digital Signal Processing (DSP) applications like multimedia & wireless communication, image processing, video & audio processing, speech recognition etc. [1], [2], [3]. Owing to their iterative nature, DSP computations differ drastically from general purpose computations. The non-terminating nature of DSP algorithms can be exploited to design efficient systems by exploiting the concurrencies both within iteration and among multiple iterations [4]. This has provided designers with sufficient impetus to look beyond the traditional software oriented solutions and consider some hardware platforms where the underlying resources can be utilized to develop a complete System on Chip (SoC) solution that best matches the algorithmic complexity by developing the right type of architecture.

Application Specific Integrated Circuits (ASIC) have long been used to develop custom architectures for realizing FIR filters. However, with ASICs the design flow is complicated and time consuming resulting in huge non-recurring engineering (NRE) costs [5]. This has typically reserved ASICs for high volume markets and for some specialized domains. FPGAs provide for reduced design time due to a simplified design flow and pre-fabricated nature which eliminates the requirement for verification of deep sub-micron effects. Some other advantages include reconfigurable design approach [6], [7], large scale integration [6], [8], availability of several intellectual property (IP) cores [9], reduced NRE costs [6], [7] etc. The design cycle in FPGAs has a strong computer aided design (CAD) support. The software handles the time consuming mapping, routing, placement and floor planning phases. The effectiveness of technology-independent optimizations that are generally well suited for ASICs [10] is thus limited in FPGAs. Thus, in order to get maximum performance from the target FPGA device, optimizations that are specific to the underlying technology have to be considered. This requires a complete knowledge about the target device. Also, the choice of the target device will have a prominent effect on the end performance of the system [11]. In this work, we carry out the hardware realization of 4:2 compressor and carry save adder (CSA) based multiplier units that are optimized for FPGAs with 6-input look-up tables (LUT). Since all modern FPGAs from Xilinx support 6-input LUTs [12], [13], [14], the filter realizations based on these individually optimized units should provide an improved performance.

FIR filters in general tend to have high arithmetic complexity due to the required number of multipliers, adders and delay elements. However, the main computational bottleneck is the multiply operation that requires a large
computation time [15]. A variety of approaches have been used to speed-up the multiplier operation in a filter structure. These approaches either completely eliminate the existing multiplier unit or reduce its architectural complexity. A widely used multiplier-less approach is the one where the multiplier unit is replaced by some sort of memory. Two frequently used memory-based techniques are the direct ROM based implementation and distributed arithmetic (DA) based implementation. ROM based implementations replace the multipliers with LUTs resulting in faster output compared to multiply and accumulate design [16]. DA based techniques have high throughput processing capabilities and regularity resulting in cost-optimal structures [17], [18], [19]. In DA based multipliers the pre-computed partial products are stored in the memory elements and are later read out and accumulated to obtain the desired results. To minimize the logic requirements the authors in [20] present a new kind of FPGA implementation algorithm which is based on the Remainder theorem. Another approach uses the divided LUT method to decrease the computational complexity [21]. The problem with memory-based techniques is the increased on-chip memory requirements as the operand word-length increases. Low capacity FPGAs often switch to bit-serial arithmetic for realizing multiplier circuits [22]. Special purpose bit-serial implementations include power-of-two sum or difference approaches. This allows multiplication to be replaced with faster shift and addition operations [23], [24], [25]. Linear systolic structures have also been used as bit-serial architectures [4]. In these approaches the conventional 2-D bit-parallel architectures are transformed into linear 1-D bit-serial systolic structures. The drawback with bit-serial structure is their reduced speed.

Apart from the multiplier-less approaches several techniques have been used to reduce the complexity of the multiplication operation. In [26] the authors present an improved multiplier design based on Canonc Signed Digit (CSD) representation and Horner’s scheme. Two multiplier structures have been proposed: a cascaded adder structure and an accumulator structure. Both take advantage of the fixed nature of filter coefficients and reduce the number of partial products resulting in an area efficient realization. Similarly constant coefficient multipliers have been reported in [27], [28], [29]. Residue Number System (RNS) is yet another approach that has been used for designing efficient high-speed multipliers [30]. The limited inter-moduli carry propagation and parallel computations make RNS desirable for add/multiply intensive applications. Similarly, the authors in [31] propose a novel design scheme based on the combination of sum of power of two (SPT) coefficients and carry-save addition for implementing fast multiplier blocks.

At system level, one of the frequently used modifications is to develop systolic architectures for the filter structures. Systolic designs posses a high potential to yield a high throughput rate with features like simplicity, regularity and modularity of structure [32]. The authors in [33] and [34] take a systolic approach for filter design by using multipliers based on direct ROM and DA approaches. Similarly, the work in [35] uses the systolic structures but focuses on replacing the original adder unit by using a parallel prefix adder (PPA) with minimal depth algorithm. Poly-phase decomposition has been used to design high-speed and low-power parallel filters [36], [7], [37], [38], [39], [40]. A modification of poly-phase decomposition is the Fast FIR algorithm (FFA). Filters based on FFA are area efficient utilizing fewer multiplier units [41], [42].

All the above-mentioned approaches use technology-independent optimizations to enhance the performance of the filtering structures. In this paper, we take an alternate low-level design approach and propose realizations that are based on technology-dependent optimizations.

The rest of the paper is organized as follows. Section II discusses the basic FIR structure. Section III discusses some of the preliminary terminologies used in this paper. Section IV discusses the technology-dependent optimization of the 4:2 compressor and CSA multiplier unit. Synthesis and implementation is carried out in section V. Conclusions are drawn in section VI and references are listed at the end.

II. BASIC FIR FILTER

An N-tap FIR filter is defined as:

$$y[n] = \sum_{k=0}^{N-1} h_k \cdot x[n-k]$$

(1)

Where, $h_k$ is the filter coefficient that determines the frequency behavior of the filter and $x[n-k]$ are the time delayed samples of input sequence with $0 \leq k \leq N-1$. A direct mapping of equation 1 results in the Direct form realization of the FIR filter as shown in figure 1. The critical path of the Direct form consists of $N$ adder units and one multiplier unit. Alternatively transposition theorem may be applied to obtain the Transposed form as shown in figure 2 [4]. The critical path in the Transposed form consists of one adder and one multiplier unit only. The Transposed and Direct form may be combined to have different Hybrid forms. However, in this paper we have only considered the realization of Transposed and Direct forms of FIR filters.
III. PRELIMINARY TERMINOLOGIES

Logic synthesis is concerned with realizing a desired functionality with minimum possible cost. In the context of digital design the cost of a circuit is a measure of its speed, area, power or any combination of these. For graphical representations a combinational function may be represented as a directed acyclic graph (DAG) called the Boolean network. Nodes within this network represent logic gates, primary inputs (PI) and primary outputs (PO). Each node implements a local function and together with its predecessor nodes implements a global function. A cone of a node \( v \), \( C_v \), is a sub-graph that includes the node \( v \) and some of its non-PI predecessor nodes. Any node, \( u \) within this cone has a path to the root node \( v \), which lies entirely in \( C_v \). The level of the node \( v \) is the length of the longest path from any PI node to \( v \). If node \( v \) is a PO node then the level will give the depth of the network. Thus network depth is the largest level of a node in the network. The critical path and area of a mapped Boolean network is measured by the depth and number of LUTs utilized by the network. A network is said to be \( k \)-bounded if the fan-in of every node does not exceed \( k \).

IV. TECHNOLOGY-DEPENDENT OPTIMIZATION

Technology-dependent optimization transforms the initial Boolean network into a circuit netlist that utilizes the target logic elements efficiently. The aim is to distribute the logic among the targeted elements with minimum possible depth and minimum resource utilization. The target element in majority of FPGAs is a \( k \)-LUT [43], [44]. An efficient utilization of this circuit element could lead to increased logic densities and reduced circuit depths.

Technology-dependent optimization using LUTs is a two step process. In the first step, the entire network is partitioned into suitable sub-networks. The individual nodes within each sub-network are then covered with suitable cones. The redundancies within each sub-network are exploited during the covering phase. The logic implemented by each cone is then mapped onto a separate LUT. In the second step, the netlist for the entire network is constructed by assembling the individually optimized sub-networks. The overall aim is to have a circuit implementation that uses minimum possible LUTs and has minimum possible depth.

The FIR structure considered in this paper is based on a multiplier unit and a 4:2 compressor unit. The multiplier is based on the CSA logic and generates two partial vectors. The 4:2 compressor then combines these partial vectors with those generated from the previous stage and generates two new partial vectors. A final adder stage at the output then combines these partial vectors and generates the final result. The CSA multiplier and 4:2 compressor ensure that there is no rippling of carry and the critical path is kept to a minimal. Direct and Transposed FIR structures based on these units are shown in figure 3 and figure 4 respectively. In each case, the input \( x[n] \) and the filter coefficients \( h_i \) are assumed to be in fixed-point 2’s complement representation.

A. Technology-Dependent Optimization of the Multiplier unit

The multiplier unit is an array multiplier based on the carry-save logic. In carry-save logic the carry outputs are saved and used in the adder in the next row. This ensures that the additions at different bit-positions within a row are independent of each other. The details of a CSA multiplier are given in [4]. The schematic for 4-bit operands is shown in figure 5. The vector merging adder which is the main computational bottleneck in a CSA multiplier need not to be included as the partial sum and carry outputs are directly fed to the 4:2 compressor unit.
Figure 6 shows the Boolean network for the basic operating cell used in the CSA multiplier. The network is partitioned into two sub-networks corresponding to the sum (S) and carry (C) outputs. Each sub-network is separately mapped onto a circuit of LUTs by covering the individual nodes with suitable cones. A straight forward approach would be to cover each node within a sub-network with a separate cone. The sub-network is then traversed in a post-order depth-first fashion and the local function implemented by each cone is mapped onto a separate LUT. This is shown in figure 7(a). The overall depth at PO nodes S and C is three and the LUT count is seven. The number of LUTs may be reduced by decomposing the 3-input OR gate in the carry sub-network and duplicating the AND gate in each of the sum and carry sub-networks. The decomposed node is included in two separate cones and the sub-network is again traversed in a post-order depth-first fashion resulting in the realization of figure 7(b). The shaded nodes represent the duplicated logic. The circuit depth is now two and the LUT count is three. However, an optimal implementation may be obtained by exploiting the reconvergent PI nodes in the carry sub-network. Reconvergent nodes share the same inputs and can be exploited to reduce the number of PIs to a sub-network by realizing such paths within the LUT. This is shown in figure 7(c). The depth of the circuit is now reduced to one and the total LUT count is reduced to one as both sum and carry sub-networks are mapped onto a single 6-LUT with dual outputs. An \( n \times n \) multiplier implemented using the realization of figure 7(c) will require \( n^2 \) LUTs and will have an overall depth of \( n \).

It was mentioned in the introduction that the design cycle in FPGAs has a strong CAD support that handles the majority of the technology-dependent steps like mapping and placement and route (PAR). Technology-dependent optimizations mainly focus on improving the mapping of Boolean networks onto target LUTs. However, with modern CAD tools, both technology mapping and PAR are automated and the optimization process is not transparent to the user [14]. Thus any optimization done prior to the design entry may get overridden during the mapping and PAR phases. To counter this issue, we re-define the coding strategy at the design entry phase. Instead of writing conventional behavioral codes, which are inferential in nature, we adopt an instantiation based coding...
strategy, wherein a target element is directly called upon and the desired functionality is assigned to it. This ensures a controlled mapping. The following instantiations were used to map the circuit in figure 7.c.

LUT_1: LUT6_2 generic map (INIT => X96660000E8880000”) port map (C, S, c, s, b, a, ‘1’, ‘1’);

B. Technology-Dependent Optimization of the 4:2 Compressor unit

A 4:2 compressor unit takes four inputs and produces two output values, sum (S) and carry (C). A 4:2 compressor can be realized using a carry-save stage and a final ripple-carry stage as shown in figure 8. The carry save stage can be easily pipelined by inserting registers between the subsequent stages. The registers are shown as small dots in the schematic of figure 8. The critical path for such a realization consists of the delay associated with the final ripple carry chain. An n-bit 4:2 compressor unit will require 2n full adder units and will have a critical path of n full adders. Each full adder requires two LUTs; one for the sum and the other for the carry. Thus total LUT cost for an n-bit 4:2 compressor would be 4n. Assuming each LUT has a unit delay, the total critical path for such a realization would be n. Further n flip-flops would be required for effective pipelining of the structure.

For technology-dependent optimization, we first consider the covering process at a higher level. The adder units are covered in an inclined fashion as shown in figure 9. Each covered portion implements a two-stage ripple carry chain, with the difference that the sum output is being rippled and the carry output is retained. The critical path of such a realization will consist of the delay associated with each covered portion. Pipelining the circuit would require placing registers at the output node of each shaded portion. This is shown as small dots in figure 9. The entire structure would require approximately n flip-flops for pipelining.

Let us now consider the covering process at a lower level. Figure 10(a) and 10(b) shows the block diagram and the corresponding parent network for a single shaded portion. For technology-dependent optimization we again directly restructure the parent network. This is done by duplicating the logic at node Z. Node duplication enables the parent network to be partitioned into three separate networks corresponding to the outputs Out₁, Out₂ and Out₃. Each network is covered with a suitable cone and the logic function implemented by each cone is mapped onto a separate LUT. The overall realization is shown in figure 11. LUT₁ is used to realize a single three-input function. LUT₂ is used in the dual mode to realize two five-input functions corresponding to the outputs Out₁ and Out₃. The overall depth of the circuit is one. Since the critical path of the 4:2 compressor unit is limited by the depth of each shaded portion, the overall critical path is one. Also note that the LUT cost is two. The 4:2 compressor will thus have the same hardware utilization as a binary adder.

The following instantiations were used to map the circuit in figure 18(c).

LUT₁: LUT3_L generic map (INIT => X"E8") port map (Out₂, c, b, a);
LUT₂: LUT6_L generic map (INIT => X"E8E8EE896696996") port map (Out₁, Out₃, e, d, c, b, a,
Fig. 8. Top-level schematic for an 8-bit 4:2 compressor unit based on CSA logic

Fig. 9. Covering of individual adder units. Each shaded portion represents a two-stage RCA chain

Fig. 10. Block diagram and Boolean network corresponding to a single shaded portion.

Fig. 11. Technology optimized realization for each shaded portion
V. SYNTHESIS, IMPLEMENTATION AND RESULTS

The implementation in this work targets 6-LUT FPGA. In particular we have considered devices from Virtex-5 and Virtex-6 FPGA families from Xilinx. The implementation is carried for different filter orders and an operand word-length of 16 bits. The parameters considered are area, timing and power dissipation. Area is measured in terms of different FPGA resources utilized. Timing analysis may be static or dynamic. Static timing analysis gives information about the critical path delay and operating frequency of the design. Static timing analysis is done post synthesis and post PAR. However, the metrics obtained after synthesis are often not accurate enough due to the programmability of the FPGA which allows for interconnect delays to change significantly between iterations. Therefore, the metrics presented in this paper are post PAR and have been recorded for a high optimization level with area and speed as optimization goals. Dynamic timing analysis verifies the functionality of the design by applying test vectors and checking for correct output vectors. Dynamic timing analysis gives information about the switching activity of the design, which is captured in the value charge dump (VCD) file. Power dissipation is given by the sum of static power dissipation and dynamic power dissipation. Static power dissipation is device specific and is mainly determined by the specific FPGA family. Dynamic power dissipation is related to the charging and discharging of capacitances along different logic nodes and interconnects. Dynamic power dissipation mainly consists of the logic power, clock power and signal power [45]. Logic power depends on the amount of on-chip resources being utilized by the design. Clock power is proportional to the operating frequency. Signal power depends on the switching activity and the density of the interconnects. For simulation and metrics generation similar test benches have been used and are typically designed to represent the worst case scenario (in terms of switching activity) for data entering into the filter. Design entry is done using VHDL. As mentioned earlier instantiation based coding strategy is used. The constraints relating to synthesis and implementation are duly provided and complete timing closure is ensured. Synthesis and implementation is carried out in Xilinx ISE 12.1 [46]. Power analysis is done using the Xpower analyzer tool.

A. Area Analysis

Table 1 provides a comparison of the different FPGA resources utilized by the technology optimized FIR filter against the traditional implementation and the one based on the Xilinx multiply-adder IP v 2.0. The operand length in each case is 16 bits and the filter order is 16. Target device is xc5vlx50-2ff324 from Virtex-5. Both technology optimized and traditional implementations rely on the optimization strategies of the Xilinx CAD tool, however, the initial restructuring ensures that the end performance is better in technology-dependent optimizations. Note that the multiply-adder IP v 2.0 is used with an optimum latency value (1).

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>LUTs</th>
<th>Flip-flops</th>
<th>Slices</th>
<th>DSPs</th>
</tr>
</thead>
<tbody>
<tr>
<td>Transposed [this work]</td>
<td>254 (223*)</td>
<td>185 (185)</td>
<td>143 (93)</td>
<td>0</td>
</tr>
<tr>
<td>Transposed [traditional]</td>
<td>323 (307)</td>
<td>271 (253)</td>
<td>311 (289)</td>
<td>0</td>
</tr>
<tr>
<td>Transposed [IP v 2.0]</td>
<td>273 (267)</td>
<td>303 (303)</td>
<td>243 (243)</td>
<td>30</td>
</tr>
</tbody>
</table>

* When area optimized

Next we compare our implementation against the various realizations reported in [47] and [10]. In [47] the authors have considered the Direct, Transposed and Hybrid realizations of the FIR filter. Two sets of results have been reported; one for symmetric coefficients and the other for asymmetric coefficients. For symmetric coefficients Direct and Transposed forms have been considered for a filter order of 120. Each form is implemented using the architecture based on generic multiplier (GM) and shift-add (SA) operation. For asymmetric coefficients Direct, Transposed and three different Hybrid forms (Hybrid-I-3, Hybrid-II-15, and Hybrid-III-15) have been considered for a filter order of 75. Table 2 provides the comparison for symmetric coefficients and table 3 provides the comparison for asymmetric coefficients. The devices considered are xc5vlx50-2ff324 and xc6vlx75t-2ff484 from Virtex-5 and Virtex-6 respectively. It is observed that in each case the structures based on the technology optimized multiplier and 4:2 compressor units use the underlying fabric efficiently. It should be noted that the authors in [47] have used Xilinx ISE 13.1 design suite as the synthesis tool while our implementation is carried out in Xilinx ISE 12.1 design suite. Using a higher version will only enhance the performance of the designs.

In [10] the authors present an algorithm that achieves performance speed up by enabling an efficient use of the embedded arithmetic blocks and custom compression trees. Different filter realizations have been considered that include filters based on canonical signed digit (CSD) arithmetic utilizing 6:3 compression trees, systolic architectures for transposed realizations and filters designed using IP multiply-accumulate (MAC) units, unfolded MAC architecture and one generated through MATLAB Filter design and analysis (FDA) tool. Table 4 provides the area comparisons in terms of number of slices utilized. The target device is xc5vlx50-2ff324 from Virtex-5. Again the technology optimized structures show an efficient utilization of the underlying fabric. Apart from that only general logic elements are utilized and no special primitives or macro-support is consumed.
### TABLE 2
RESOURCE UTILIZATION FOR DIFFERENT FILTER REALIZATIONS WITH SYMMETRIC COEFFICIENTS.

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Architecture</th>
<th>XC5VLX50-2FF324 (Virtex-5)</th>
<th>XC6VLX75T-2FF484 (Virtex-6)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>LUTs</td>
<td>FFs</td>
<td>Slices</td>
</tr>
<tr>
<td>Direct [47]</td>
<td>GM</td>
<td>4357</td>
<td>1904</td>
</tr>
<tr>
<td></td>
<td>SA^A</td>
<td>3838</td>
<td>1904</td>
</tr>
<tr>
<td></td>
<td>SA^D</td>
<td>4013</td>
<td>1904</td>
</tr>
<tr>
<td>Transposed [47]</td>
<td>GM</td>
<td>4236</td>
<td>3886</td>
</tr>
<tr>
<td></td>
<td>SA^A</td>
<td>5094</td>
<td>3886</td>
</tr>
<tr>
<td></td>
<td>SA^D</td>
<td>5070</td>
<td>3886</td>
</tr>
<tr>
<td>Direct [this work]</td>
<td>Tech. optimized^A</td>
<td>2508</td>
<td>1904</td>
</tr>
<tr>
<td></td>
<td>Tech. optimized^D</td>
<td>4975</td>
<td>1904</td>
</tr>
<tr>
<td>Transposed [this work]</td>
<td>Tech. optimized^A</td>
<td>1802</td>
<td>1888</td>
</tr>
<tr>
<td></td>
<td>Tech. optimized^D</td>
<td>3952</td>
<td>1888</td>
</tr>
</tbody>
</table>

^A When area optimized  
^D When delay optimized

### TABLE 3
RESOURCE UTILIZATION FOR DIFFERENT FILTER REALIZATIONS WITH ASSYMMETRIC COEFFICIENTS.

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Architecture</th>
<th>XC5VLX50-2FF324 (Virtex-5)</th>
<th>XC6VLX75T-2FF484 (Virtex-6)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>LUTs</td>
<td>FFs</td>
<td>Slices</td>
</tr>
<tr>
<td>Direct [47]</td>
<td>3398</td>
<td>1184</td>
<td>1148</td>
</tr>
<tr>
<td>Hybrid-I-3 [47]</td>
<td>3821</td>
<td>1212</td>
<td>1179</td>
</tr>
<tr>
<td>Hybrid-I-15 [47]</td>
<td>3941</td>
<td>546</td>
<td>1105</td>
</tr>
<tr>
<td>Hybrid-I-15 [47]</td>
<td>4121</td>
<td>598</td>
<td>1120</td>
</tr>
<tr>
<td>Transposed[47]</td>
<td>3617</td>
<td>2204</td>
<td>980</td>
</tr>
<tr>
<td>Direct [this work]^A</td>
<td>1538</td>
<td>1184</td>
<td>1137</td>
</tr>
<tr>
<td>Direct [this work]^D</td>
<td>2980</td>
<td>1192</td>
<td>1420</td>
</tr>
<tr>
<td>Transposed [this work]^A</td>
<td>1362</td>
<td>1222</td>
<td>960</td>
</tr>
<tr>
<td>Transposed [this work]^D</td>
<td>2479</td>
<td>1222</td>
<td>1084</td>
</tr>
</tbody>
</table>

^A When area optimized  
^D When delay optimized

### TABLE 4
RESOURCE UTILIZATION FOR DIFFERENT FILTER REALIZATIONS.

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Filter Order</th>
<th>Bit-Slices</th>
<th>DSP Blocks</th>
</tr>
</thead>
<tbody>
<tr>
<td>CSD 6:3 compressor [10]</td>
<td>7</td>
<td>444</td>
<td>--</td>
</tr>
<tr>
<td>Transposed Systolic [10]</td>
<td>7</td>
<td>114</td>
<td>7</td>
</tr>
<tr>
<td>2-Unfolded MAC [10]</td>
<td>7</td>
<td>14</td>
<td>14</td>
</tr>
<tr>
<td>FDA HDL pipelined [10]</td>
<td>18</td>
<td>653</td>
<td>18</td>
</tr>
<tr>
<td>IP Systolic MAC [10]</td>
<td>18</td>
<td>548</td>
<td>9</td>
</tr>
<tr>
<td>Transposed CSD pipelined [10]</td>
<td>18</td>
<td>1071</td>
<td>--</td>
</tr>
<tr>
<td>Pipelined DSP48 MAC [10]</td>
<td>18</td>
<td>14</td>
<td>18</td>
</tr>
<tr>
<td>Direct form [This work]</td>
<td>7</td>
<td>162</td>
<td>--</td>
</tr>
<tr>
<td>Transposed form [This work]</td>
<td>7</td>
<td>132</td>
<td>--</td>
</tr>
<tr>
<td>Direct form [This work]</td>
<td>18</td>
<td>320</td>
<td>--</td>
</tr>
<tr>
<td>Transposed form [This work]</td>
<td>18</td>
<td>240</td>
<td>--</td>
</tr>
</tbody>
</table>
B. Timing Analysis

Table 5 provides a comparison of the critical path delay and maximum clock frequency for the technology optimized FIR filter against the traditional implementation and the one based on the Xilinx multiply-adder IP v 2.0. The realization considered is transposed. The operand length is 16 bits and the filter order is 16. Target device is xc5vlx50-2ff324 from Virtex-5.

Tables 6 and 7 provide a comparison of the critical path delay for the technology optimized realization and those reported in [47]. Table 6 is for symmetric coefficients with a filter order of 120 and table 7 is for asymmetric coefficients with a filter order of 75. The devices considered are xc5vlx50-2ff324 and xc6vlx75t-2ff484 from Virtex-5 and Virtex-6 respectively. The input operand length in each case is 16 bits.

Technology optimized structures are implemented with minimum possible depth; therefore, the critical path delays are quite low. Since clock frequency is also a strong function of the propagation and routing delays associated with the critical path, a minimum depth circuit also ensures high operating frequencies. This is indicated in table 8 where maximum clock frequency is compared against the various designs implemented in [10].

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Critical path delay (ns)</th>
<th>Max. clock frequency (MHz)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Transposed [This work]</td>
<td>6.632</td>
<td>581.654</td>
</tr>
<tr>
<td>Transposed [Traditional]</td>
<td>10.541</td>
<td>375.38</td>
</tr>
<tr>
<td>Transposed [IP v 2.0]</td>
<td>8.751</td>
<td>480.517</td>
</tr>
</tbody>
</table>

TABLE 5
TIMING ANALYSES FOR TECHNOLOGY OPTIMIZED AND IP BASED TRANPOSED FIR FILTER

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Architecture</th>
<th>Critical path delay (ns)</th>
</tr>
</thead>
<tbody>
<tr>
<td>Direct [47]</td>
<td>GM</td>
<td>29.7</td>
</tr>
<tr>
<td></td>
<td>SA^</td>
<td>37.6</td>
</tr>
<tr>
<td></td>
<td>SA^</td>
<td>27.7</td>
</tr>
<tr>
<td>Transposed [47]</td>
<td>GM</td>
<td>11.5</td>
</tr>
<tr>
<td></td>
<td>SA^</td>
<td>15.8</td>
</tr>
<tr>
<td></td>
<td>SA^</td>
<td>10.9</td>
</tr>
<tr>
<td>Direct [work]</td>
<td>Tech. optimized^</td>
<td>30.873</td>
</tr>
<tr>
<td></td>
<td>Tech. optimized^</td>
<td>21.64</td>
</tr>
<tr>
<td>Transposed [work]</td>
<td>Tech. optimized^</td>
<td>11.741</td>
</tr>
<tr>
<td></td>
<td>Tech. optimized^</td>
<td>8.088</td>
</tr>
</tbody>
</table>

TABLE 6
CRITICAL PATH DELAY FOR DIFFERENT FILTER REALIZATIONS WITH SYMMETRIC COEFFICIENTS

C. Power Analysis

Technology-dependent optimization reduces the power dissipation in two ways. First, the high activity switching nodes within a network are hid within the LUTs in the final circuit netlist. This reduces the overall switching activity associated with the logic nodes [48]. Second, technology-dependent optimization results in a minimal depth circuit with a high logic density. This reduces the length of interconnects. Since interconnects in FPGAs are reconfigurable switches, there is a further reduction in the switching activity and thus the power dissipated. The analysis is done for a constant supply voltage and maximum operating frequency in each case. Test benches were designed for worst-case switching activity and the filter functionality was verified for more than 1000 input signals. The design node activity from the simulator database along with the power constraint file (PCF) was used for power analysis in the Xpower analyzer tool. Table 9 gives the detailed power dissipation for the technology optimized FIR filter against the traditional implementation and the one based on the Xilinx multiply-adder IP v 2.0. The target device is Virtex-5 and the filter order and input bit-width is 16.

The power dissipated in clocking resources varies with the clock frequency. Since technology optimized design operates at slightly higher frequency, the power dissipated by clocking resources is more. Power dissipated by on-chip resources...
(logic + DSP) is lesser for technology optimized design because of the efficient utilization of the underlying resources. A reduction in switching activity due to hiding of nodes and reduction of interconnects results in lower power dissipation in the signals.

Tables 10 and 11 provide comparison of the power dissipation by the technology optimized realizations and those reported in [47]. Table 10 is for symmetric coefficients with a filter order of 120 and table 11 is for asymmetric coefficients with a filter order of 75. The devices considered are xc5vlx50-2ff324 and xc6vlx75t-2ff484 from Virtex-5 and Virtex-6 respectively. The input operand length in each case is 16 bits.

For high throughput DSP systems it is more appropriate to quantify the power efficiency through energy analysis. In [49] the authors define three energy related parameters for FIR filters: Energy Dissipation (ED) which is the energy dissipated per FPGA slice. Energy throughput (ET) which is the energy throughput of interconnects results in lower power dissipation. The input operand length in each case is 16 bits.

### Table 9

<table>
<thead>
<tr>
<th>FPGA Resource</th>
<th>Power Dissipation (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>T’nosed [This work]</td>
</tr>
<tr>
<td>Clock</td>
<td>27.45</td>
</tr>
<tr>
<td>Logic</td>
<td>3.41</td>
</tr>
<tr>
<td>DSP</td>
<td>--</td>
</tr>
<tr>
<td>Signals</td>
<td>6.23</td>
</tr>
<tr>
<td>I/Os</td>
<td>9.12</td>
</tr>
<tr>
<td>Quiescent</td>
<td>529.91</td>
</tr>
<tr>
<td>Total</td>
<td>576.12</td>
</tr>
</tbody>
</table>

### Table 10

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Architecture</th>
<th>Power Dissipation (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>XC5VLX50-2FF324</td>
<td>XC6VLX75T-2FF484</td>
</tr>
<tr>
<td>Direct [47]</td>
<td>GM</td>
<td>797</td>
</tr>
<tr>
<td></td>
<td>SA A</td>
<td>870</td>
</tr>
<tr>
<td></td>
<td>SA D</td>
<td>812</td>
</tr>
<tr>
<td>Transposed [47]</td>
<td>GM</td>
<td>785</td>
</tr>
<tr>
<td></td>
<td>SA A</td>
<td>848</td>
</tr>
<tr>
<td></td>
<td>SA D</td>
<td>804</td>
</tr>
<tr>
<td>Direct [this work]</td>
<td>Tech. optimized A</td>
<td>749.13</td>
</tr>
<tr>
<td>Transposed [this work]</td>
<td>Tech. optimized D</td>
<td>733.61</td>
</tr>
</tbody>
</table>

### Table 11

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Architecture</th>
<th>Power Dissipation (mW)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>XC5VLX50-2FF324</td>
<td>XC5VLX50-2FF484</td>
</tr>
<tr>
<td>Direct [47]</td>
<td>820</td>
<td>1523</td>
</tr>
<tr>
<td>Hybrid-I-3 [47]</td>
<td>762</td>
<td>1493</td>
</tr>
<tr>
<td>Hybrid-I-15 [47]</td>
<td>834</td>
<td>1551</td>
</tr>
<tr>
<td>Hybrid-I-15 [47]</td>
<td>787</td>
<td>1507</td>
</tr>
<tr>
<td>Transposed[47]</td>
<td>760</td>
<td>1487</td>
</tr>
<tr>
<td>Direct [this work] A</td>
<td>745.78</td>
<td>1059.76</td>
</tr>
<tr>
<td>Direct [this work] D</td>
<td>658.54</td>
<td>851.26</td>
</tr>
<tr>
<td>Transposed [this work] A</td>
<td>755.65</td>
<td>1062.76</td>
</tr>
<tr>
<td>Transposed [this work] D</td>
<td>740.25</td>
<td>962.08</td>
</tr>
</tbody>
</table>

### Table 12

<table>
<thead>
<tr>
<th>Filter Design</th>
<th>Architecture</th>
<th>EOP (nJ)</th>
<th>ET (nJ/bit)</th>
<th>ED (nJ/Slice)</th>
</tr>
</thead>
<tbody>
<tr>
<td></td>
<td>XC5VLX50-2FF324</td>
<td>XC6VLX75T-2FF484</td>
<td></td>
<td></td>
</tr>
<tr>
<td>Direct [47]</td>
<td>GM</td>
<td>23.671</td>
<td>44.16</td>
<td>0.012329</td>
</tr>
<tr>
<td></td>
<td>SA A</td>
<td>32.712</td>
<td>48.64</td>
<td>0.017038</td>
</tr>
<tr>
<td></td>
<td>SA D</td>
<td>22.5</td>
<td>35.5</td>
<td>0.011719</td>
</tr>
<tr>
<td>Transposed [47]</td>
<td>GM</td>
<td>9.027</td>
<td>13.34</td>
<td>0.004702</td>
</tr>
<tr>
<td></td>
<td>SA A</td>
<td>13.4</td>
<td>17.98</td>
<td>0.006979</td>
</tr>
<tr>
<td></td>
<td>SA D</td>
<td>8.763</td>
<td>12.625</td>
<td>0.004564</td>
</tr>
<tr>
<td>Direct [this work]</td>
<td>Tech. optimized A</td>
<td>23.127</td>
<td>34.24</td>
<td>0.012045</td>
</tr>
<tr>
<td>Transposed [this work]</td>
<td>Tech. optimized D</td>
<td>15.964</td>
<td>17.975</td>
<td>0.008315</td>
</tr>
<tr>
<td>Transposed [this work]</td>
<td>Tech. optimized A</td>
<td>8.613</td>
<td>8.136</td>
<td>0.004486</td>
</tr>
<tr>
<td></td>
<td>5.84</td>
<td>4.62</td>
<td>0.003042</td>
<td>0.002406</td>
</tr>
</tbody>
</table>
VI. CONCLUSIONS

This paper focused on the realization of FIR filters using technology optimized multiplier and 4:2 compressor unit. The results presented in this paper showed that technology-dependent optimizations have a direct impact on area, delay and power dissipations of the design. Different filter forms (Direct, Transposed and Hybrid) were implemented and it was shown that for a particular form, the technology optimized realizations always have an improved performance. Another key feature of technology-dependent optimization is that the same optimization results in the improvement of all the performance parameters (area, speed and power). This is generally not the case with technology-independent optimization where there is always an application driven trade-off that drives the design process. However, performance speed-up through technology-dependent optimization strongly relies on the amount of control the designer has over the mapping process. In this paper, we tackled this issue by modifying the coding strategy and writing instantiation based codes to map the behavior of the optimized Boolean networks. This complicates the design entry and although an efficient mapping is achieved, a complete control over the mapping process still remains a bottleneck in technology-dependent optimizations.

Conflict of Interests

The authors declare that there is no conflict of interests regarding the publication of this paper.

REFERENCES

[22] Y. Zhou and P. Shi, “Distributed Arithmetic for FIR Implementation on


Telescopic Op-Amp Optimization for MDAC Circuit Design

Abdelghani Dendouga and Slimane Oussalah

Abstract—An 8-bit 40-MS/s low power Multiplying Digital-to-Analogue Converter (MDAC) for a pipelined-to-Analog to Digital converter (ADC) is presented. The conventional dedicated operational amplifier (Op-Amp) is performed by using telescopic architecture that features low power and less-area. Further reduction of power and area is achieved by using multifunction 1.5bit/stage MDAC architecture. The design of the Op-Amp is performed by the elaboration of a program based on multi objective genetic algorithms to allow automated optimization. The proposed program is used to find the optimal transistors sizes (length and width) in order to obtain the best Op-Amp performances for the MDAC. In this study, six performances are considered, direct current gain, unity-gain bandwidth, phase margin, power consumption, area, slew rate, thermal noise, and signal to noise ratio. The Matlab optimization toolbox is used to implement the program. Simulations were performed by using Cadence Virtuoso Spectre circuit simulator in standard AMS 0.18μm CMOS technology. A good agreement is observed between the results obtained by the program optimization and simulation, after that the Op-Amp is introduced in the MDAC circuit to extract its performances.

Index Terms—Analog circuit design; MDAC; MOGAs; operational amplifier; pipelined ADC.

Original Research Paper
DOI: 10.7251/ELS1620055D

I. INTRODUCTION

HIGH-speed, high-resolution and low-power design is becoming the chief research domain of Data converters. Analog-to-Digital Converter (ADC) and Digital-to-Analog Converter (DAC), play a fundamental role in interfacing the digital processing core with the outer real analog world. ADCs can be found in a wide range of applications, spanning from imaging to ultrasound and communication systems. In particular, pipelined ADC architecture [1-3] offers a good trade-off between conversion rate, resolution and power consumption.

Operational amplifiers (Op-Amps) are one of the most widely used building blocks for ADCs. To fulfill the given requirements, the designer must choose the suitable circuit architecture, although different tools which partially automated the topology synthesis appeared in the past [4-5]. Therefore, the use of multi-objective optimization algorithms (MOGAs) is of a great importance to the automatic design of Op-Amp. Accuracy, ease of use, generality, robustness, and reasonable run-time are all necessary for a circuit synthesis solution to gain acceptance by using optimization methods [6-8]. This method uses a program based on multi-objective optimization using a genetic algorithm (GA) to calculate the optimal transistors dimensions (length and width) of transistors, of an Op-Amp which is used in the Multiplying Digital-to-Analog Converter (MDAC). The method which handles a wide variety of specifications and constraints, is extremely fast, and results in globally optimal designs.

The goal of this work is to design and optimize an MDAC circuit for a pipelined ADC in the sight of a front-end electronics of the semiconductor tracker (SCT) detector in ATLAS (A Toroidal LHC Apparatus) experiment [9]. This experiment will learn about the basic forces that have shaped the universe since the beginning of time.

This paper is organized as follows. The current section gives an introduction to the structure of this paper, the context of this work and the challenges for its completion. In section 2, the pipelined ADC structure is analyzed. Section 3 describes a MDAC circuit to be designed. The requirements and the optimization of the telescopic Op-Amp for the MDAC are described in section 4. Section 5 presents the MOGAs optimization methodology. Simulation results are presented in Section 6. Finally, some concluding remarks are given in the last section.

II. THE PIPELINED ADC

The pipeline ADC is composed with a non-overlapping clock generator, a sample-and-hold amplifier, N stages pipelined 1.5-bit/stage with 0.5-bit over range ADCs and DACs [10], a latch array, and a digital error correction circuit (Fig. 1). A non-overlapping clock generator will provide overall ADC non-overlapping clock phase. First, sample-and-hold amplifier will sample the input analog signal in the sample mode, and then hold this value in the output in the hold mode. The hold value will be transferred to digital by N-stage pipelined 1.5-bit/stage ADC (Fig. 2), and its code can be stored by the latch array. At last the digital error correction
circuit corrects the error that is produced by each ADC stage offset, so the correct digital codes can be got in the output of the overall ADC.

III. THE MDAC

The MDAC is a single switched capacitor circuit that can be implemented the function of S/H operation, D/A conversion, subtraction, and amplification of the remainder. With the charge conservation concept, the output in hold phase is given by [2]:

\[
V_{out} = \left( \frac{C_f - C_s}{C_f} \right) \cdot V_{in} - \left( \frac{C_s}{C_f} \right) \cdot V_{DAC}
\]  

(1)

where \(C_s\) is the sampling capacitor, \(C_f\) is the feedback capacitor, and \(V_{DAC}\) is the output voltage of the DAC circuit in the MDAC circuit. The MDAC circuit in the 1.5-bit/stage architecture is very simple as shown in Fig. 3.

The implementation of the MDAC stage is shown in Fig. 3 [2]. During the first phase, the input signal is sampled onto the sampling capacitors \(C_s\) and \(C_f\). In the next phase, called amplifying phase, capacitor \(C_f\) is switched into feedback around the amplifier and \(C_s\) is connected to the voltage reference level, causing the charge redistribution that results in a signal amplification and subtraction by the voltage reference. Table I lists summary information about the 1.5-bit stage configuration.

IV. AMPLIFIERS

Amplifiers in pipeline ADCs have a direct and major role in the operation of the individual pipeline stages by performing active sampling and residue amplification [6]. Consequently, the amplifier limitations have a direct impact on the overall ADC performance, which for high speed and very high resolutions may require the ADC to be digitally calibrated.

A. Op-Amp Requirements

The bandwidth and gain characteristics are crucial in the design of data converters. The Op-Amp is preferred to have 90° phase margin over full load conditions and process variations to avoid second order step response and its associated ringing. Decreasing the phase margin results in an increase of amplitude of ringing and this can increase the settling time. It can be proved that the DC open loop gain of an Op-Amp used in an ADC must satisfy the condition [11]:

\[
A_0 \geq 2^{N+2}
\]

(2)

where \(N\) is the number of bits of conversion.

In this work an 8 Bit data converter needs a minimum DC open loop gain of 54. The speed of an Op-Amp is decided by the Op-Amp used. The minimum unity gain frequency (\(f_u\)) for a given settling time \(t_s<1/f_{clk}\) required to settle the output to within +½ LSB of its final value can be evaluated as [11]:

\[
f_u \geq 0.22(N + 1)\cdot f_{clk}
\]

(3)
i.e. for a 8 bit ADC at 40 MHz clock frequency needs an Op-Amp with a unity gain frequency of around 160 MHz.

Op-Amps are the key elements in analog-to-digital converter systems. For this reason, those circuit implementations that are specifically used to achieve the Op-Amp function in MDAC block. We have chosen the telescopic architecture to implement the Op-Amp because of its advantages. The telescopic amplifier is a single stage amplifier that exhibits one dominant pole, so it typically has higher frequency capability and consumes less power than other topologies, moreover its speed is higher than most other types of amplifiers [3].

B. Telescopic Op-Amp Architecture

The simplest version of a single stage Op-Amp is the telescopic architecture, shown in Fig. 4. The input differential pair injects the signal currents into common gate stages. Then, the circuit achieves the differential to single ended conversion with a cascode current mirror. We note that the transistors are placed one on the top of the other to create a sort of telescopic composition [4].

The most important of specification are the following:

- **Differential DC gain**
  The differential DC gain is given by the expression:
  \[ A_v = \frac{g_{m1}}{\frac{1}{2}(\lambda_{m} + \lambda_{p})} \]  

- **Unity gain bandwidth**
  The UGBW established by the dominant node is given by:
  \[ UGBW = \frac{g_{m1}}{C_L} \]  

- **Slew rate**
  For this Op-Amp, the condition to ensure a minimum slew rate is expressed as:
  \[ SR = \frac{I_{bias}}{C_L} \]  

- **Phase margin**
  The phase margin is given by:
  \[ PM = 180 - \tan^{-1}\left(\frac{GBW}{p_1}\right) - \tan^{-1}\left(\frac{GBW}{p_2}\right) \]  

- **Power consumption**
  The power consumption of this operational amplifier can be calculated as follows:
  \[ P = (V_{DD} - V_{SS}) \cdot I_{bias} \]

- **Area**
  The area occupied by the Op-Amp circuit is given by:
  \[ Area = \sum_{i=1}^{k} W_i \cdot L_i \]  

In practice, the area must take into account all the connections respecting the design rule check (DRC) of the technology that can take a non-negligible space area. So, this approximation must be improved.

The Op-Amp, forming the core of a switched capacitor (SC) multiplying D/A converter, is the most critical block of a pipeline stage [6]. The resolution and speed of the whole ADC is usually determined by the Op-Amps of the MDACs. Usually, the amplifiers open loop DC-gain limits the settling accuracy of the amplifier output, while the bandwidth and slew rate of the amplifier determine the maximal clock frequency.

To maximize the signal-to-noise ratio, the Op-Amp should also utilize a large signal swing at the output. The amplifier specifications relative to the resolution and sample rate i.e, the DC-gain of the amplifier in a MDAC is determined by the resolution, and the slew rate and GBW specifications can be derived from the sampling speed of the A/D converter. The sampling speed of a pipeline A/D converter is in turn dictated by the settling time of the amplifier in the MDAC. The settling time is determined first by the slew rate (SR) and finally by the GBW of the amplifier.

![Telescopic CMOS Op-Amp architecture.](image)

Fig. 4. Telescopic CMOS Op-Amp architecture.

V. MULTI-OBJECTIVE OPTIMIZATION

A. Genetic Algorithms

Optimal design of analog circuits is to find a variable set \( x = \{x_1, x_2, ..., x_n\} \) that optimizes a performance functions, such as maximum operating frequency, gain, SNR, offset, etc.
Knowing that this variable meets imposed specifications and/or inherent constraints, for example, technology limits, saturation conditions of transistors, impedance matching, etc. Vector $x$ may encompass biases, lengths ($L$) and widths ($W$) of MOSFET gate transistors, component values, etc. [5]. In the optimization stage the parameter space is explored and the design improved with respect to the objective functions [12,13]. The optimization, which is called multi-objective optimization, is based on an evolutionary algorithm known as weight-based GA [6]. Genetic algorithms are a particular class of evolutionary algorithms, that use biology inspired techniques such as selection, mutation, crossover and inheritance [2].

A weighted approach has been used to optimize Op-Amps. It uses adaptive weights along the optimization process to determine the overall fitness of an individual [10].

$$F = \sum_{i=1}^{n} \omega_i \cdot f_i$$  

where $\omega_i$ is the relative preference or weight associated with the objective function.

### B. Optimization Methodology

The heart of the whole algorithm is the optimization engine; our optimization algorithm is developed in MATLAB and implemented in the MOGA optimization tool MATLAB. GA is responsible for the exploration of the solution space in the search for optimal solutions. In general, the best individuals in a population tend to reproduce.

At the beginning we generate the individual randomly $n$ times ($n$ represents the population size). The individual is made up of binary code string encoding a particular sized Op-Amp fitness of every individual can be got, and then the GA can be used to choose the better individuals as the parents of the next generation. After crossing and mutating, the new generation is produced. Performing the above works iteratively the goal will be achieved in the end (Fig. 6) [5].

In the program elaborated with MATLAB toolbox, every individual is presented by a binary code string. Fig. 3 shows that telescopic Op-Amp contains 9 CMOS transistors and biasing current ($I_{bias}$). As a total there are 11 parameters to be adjusted and each gene of the chromosome stands for one parameter. Thus, the parameter vector is compressed to [9]: \([W_1, L_1, W_3, L_3, W_5, L_5, W_7, L_7, W_9, L_9, I_{bias}]\). The size of the transistors $M_3$, $M_5$, $M_7$, and $M_9$ are equal to $M_1$, $M_3$, $M_5$, and $M_7$, respectively.

The goal of multi-objective optimization is the finding a vector $x$ of decision variables satisfying constraints to give acceptable values to all objective functions $f(x)$. In this MOGA, six performances are considered: DC gain, unity-gain bandwidth, phase margin, power consumption, circuit area, slew rate, thermal noise and signal to noise ratio. Variables obtained from GAs (Table II) are used to simulate the two-stage Op-Amp circuit by using Cadence Virtuoso Spectre in AMS 0.18$\mu$m CMOS technology. The simulated DC gain and a phase margin of a telescopic Op-Amp are demonstrated in Fig. 6. The DC gain exhibits a high gain of 62.8 dB and a phase margin of 57.42° with the unity gain bandwidth of 285.8 MHz. Power consumption is evaluated at 0.051 mW.

By using variables obtained from GA (Table II), the telescopic Op-Amp circuit is simulated by using Cadence Virtuoso Spectre in AMS 0.18$\mu$m CMOS process. The results obtained by simulation shown in Fig. 7 and Table III confirm the efficiency of GA in determining the device sizes in analog circuits. Our results are compared to the results given in [16]. Consequently the performance of the telescopic Op-Amp optimized by the proposed method represents a good method to optimize an analog circuit.

Wafer production will always show some variation of technological parameters (process parameters and mismatch parameters). The Monte Carlo process simulation is the adequate tool to give an early estimation how it will affect the circuits’ function [14,15].
Fig. 7 shows Monte Carlo analysis of the optimized telescopic Op-Amp. The Op-Amp was found to have a DC-gain variation of -0.565 mV over 200 runs, meaning a 3\(\sigma\) DC-gain of 1.69 mV, which keeps the gain upper than 54, and an unit gain bandwidth variation of 6.07 MHz, meaning a 3\(\sigma\) unit gain bandwidth of 18.21 MHz, which keeps the unit gain bandwidth upper than 160 and a phase margin variation of 0.52 deg, meaning a 3\(\sigma\) phase margin of 1.57 deg, which keeps the telescopic Op-Amp stable.

To check effectively the design, it is important to simulate if it works properly inside its real environment. Fig. 8 shows a testbench which realizes the complete MDAC as it will be printed on the silicon; the input of the system is fed by a ramp. The main function of the sub-DAC is to output an analog voltage based on the comparators decisions. The residue plot shown in Fig. 9 represents the transfer function which is created by the switched capacitor circuit (MDAC). Fig. 10 represents the binary response of the 1.5 bit stage in the different three level of the reference signal.

The transfer function of an ideal ADC can be represented by a best fit line as shown in Fig. 11, typically either an endpoint fit or a least squares fit.

An ADC that exhibits integral non-linearity (INL) will have a transfer function that is not a perfect line. The maximum difference between the actual and the ideal transfer characteristic is the INL. The telescopic Op-Amp is placed in the test bench illustrated in Fig. 7 to be simulated and its INL is calculated. This concept is illustrated in Fig. 11. We note

<table>
<thead>
<tr>
<th>Variable</th>
<th>Min</th>
<th>Max</th>
<th>Chosen iteration</th>
<th>In circuit</th>
</tr>
</thead>
<tbody>
<tr>
<td>W1=W2</td>
<td>15(\mu)m</td>
<td>35(\mu)m</td>
<td>25.4 (\mu)m</td>
<td>25(\mu)m</td>
</tr>
<tr>
<td>L1=L2</td>
<td>0.18 (\mu)m</td>
<td>1(\mu)m</td>
<td>274 nm</td>
<td>0.28(\mu)m</td>
</tr>
<tr>
<td>W3=W4</td>
<td>15(\mu)m</td>
<td>35(\mu)m</td>
<td>25.5 (\mu)m</td>
<td>25(\mu)m</td>
</tr>
<tr>
<td>L3=L4</td>
<td>0.18 (\mu)m</td>
<td>1(\mu)m</td>
<td>345 nm</td>
<td>0.35(\mu)m</td>
</tr>
<tr>
<td>W5=W6</td>
<td>100(\mu)m</td>
<td>300(\mu)m</td>
<td>249 (\mu)m</td>
<td>250(\mu)m</td>
</tr>
<tr>
<td>L5=L6</td>
<td>0.18 (\mu)m</td>
<td>1(\mu)m</td>
<td>294 nm</td>
<td>0.3(\mu)m</td>
</tr>
<tr>
<td>W7=W8</td>
<td>20(\mu)m</td>
<td>50(\mu)m</td>
<td>30.7 (\mu)m</td>
<td>30.7(\mu)m</td>
</tr>
<tr>
<td>L7=L8</td>
<td>0.18 (\mu)m</td>
<td>1(\mu)m</td>
<td>326 nm</td>
<td>0.33(\mu)m</td>
</tr>
<tr>
<td>W9</td>
<td>100(\mu)m</td>
<td>200(\mu)m</td>
<td>124 (\mu)m</td>
<td>125(\mu)m</td>
</tr>
<tr>
<td>L9</td>
<td>0.18 (\mu)m</td>
<td>1(\mu)m</td>
<td>269 nm</td>
<td>0.27(\mu)m</td>
</tr>
<tr>
<td>I_{bias}</td>
<td>5(\mu)A</td>
<td>200(\mu)A</td>
<td>158(\mu)A</td>
<td>160(\mu)A</td>
</tr>
</tbody>
</table>

<table>
<thead>
<tr>
<th>Performance Names</th>
<th>Specification</th>
<th>MOGA program</th>
<th>Spectre Simulation</th>
</tr>
</thead>
<tbody>
<tr>
<td>Supply Voltage (V)</td>
<td>1.8</td>
<td>1.8</td>
<td>1.8</td>
</tr>
<tr>
<td>DC Gain dB)</td>
<td>≥ 54</td>
<td>62</td>
<td>62.81</td>
</tr>
<tr>
<td>Unity GBW (MHz)</td>
<td>≥ 160</td>
<td>280</td>
<td>285.8</td>
</tr>
<tr>
<td>Phase Margin (deg)</td>
<td>≥ 50</td>
<td>60</td>
<td>57.42</td>
</tr>
<tr>
<td>Slew Rate (V/(\mu)s)</td>
<td>Max</td>
<td>2.25</td>
<td>2.19</td>
</tr>
<tr>
<td>Area ((\mu)m²)</td>
<td>Min</td>
<td>559</td>
<td>580-</td>
</tr>
<tr>
<td>Power (mW)</td>
<td>Min</td>
<td>0.047</td>
<td>0.051</td>
</tr>
<tr>
<td>Technology</td>
<td>AMS 0.18 (\mu)m</td>
<td>AMS 0.18 (\mu)m</td>
<td>0.18 (\mu)m</td>
</tr>
</tbody>
</table>

Fig. 7 shows Monte Carlo analysis of the optimized telescopic Op-Amp. The Op-Amp was found to have a DC-gain variation of -0.565 mV over 200 runs, meaning a 3\(\sigma\) DC-gain of 1.69 mV, which keeps the gain upper than 54, and an unit gain bandwidth variation of 6.07 MHz, meaning a 3\(\sigma\) unit gain bandwidth of 18.21 MHz, which keeps the unit gain bandwidth upper than 160 and a phase margin variation of 0.52 deg, meaning a 3\(\sigma\) phase margin of 1.57 deg, which keeps the telescopic Op-Amp stable.

To check effectively the design, it is important to simulate if it works properly inside its real environment. Fig. 8 shows a testbench which realizes the complete MDAC as it will be printed on the silicon; the input of the system is fed by a ramp.

The main function of the sub-DAC is to output an analog voltage based on the comparators decisions. The residue plot shown in Fig. 9 represents the transfer function which is created by the switched capacitor circuit (MDAC). Fig. 10 represents the binary response of the 1.5 bit stage in the different three level of the reference signal.

The transfer function of an ideal ADC can be represented by a best fit line as shown in Fig. 11, typically either an endpoint fit or a least squares fit.

An ADC that exhibits integral non-linearity (INL) will have a transfer function that is not a perfect line. The maximum difference between the actual and the ideal transfer characteristic is the INL. The telescopic Op-Amp is placed in the test bench illustrated in Fig. 7 to be simulated and its INL is calculated. This concept is illustrated in Fig. 11. We note
that telescopic architecture generates an INL error less than 3.5 LSB.

Fig. 8. The MDAC test bench.

Fig. 9. The transfer function of the MDAC.

Fig. 10. Binary response of the MDAC for a ramp signal input.

Fig. 7. Monte Carlo Simulation of (a) DC gain, (b) Unity Gain Bandwidth and (c) Phase Margin of the Telescopic Op-Amp.
Fig. 11. Transfer Function of the MDAC and its Best Fitted Line and the INL of the MDAC Using Optimized Telescopic.

VII. CONCLUSION

This work has demonstrated the utility of an evolutionary algorithm for automating electronic design using algorithms called multi-objective genetic algorithms, which have the ability to deal with a problem of multi-objective optimization with two or more goals and taking account also the constraints. A program based on MOGAs has been developed to optimize the performances of a telescopic operational amplifier by determining its CMOS transistors sizes in order to design an optimized 8-bit 40-MS/s low power Multiplying Digital–Analog Converter (MDAC) for a pipelined ADC. The optimization procedure was implemented using MATLAB optimization toolbox, and the circuit simulation results were obtained from Spectre. The Op-Amp was designed in AMS 0.18µm CMOS technology.

Monte Carlo analysis of the optimized telescopic Op-Amp shows that DC-gain, unit gain bandwidth, and phase margin prove the effectiveness of the approach in the MDAC design where the design space is too complicated to be done with the classical methods within a short time. It can be concluded that the proposed MOGAs-based approach is efficient and gives promising results for circuits design and optimization problems.
Abstract—The influence of atmospheric environment is fundamental for Free-Space Optical link (FSO). The atmosphere can significantly degrade the communication quality of FSO up to so low received power/RSSI level that it can lead to the loss of communication. For this reason, authors used a professional weather station built on site of FSO link for measurement of real atmospheric conditions such as wind speed, temperature, relative air humidity, air pressure and solar radiation. Random changing of these atmosphere parameters creates atmospheric turbulences, absorption and dispersion centers. It is necessary to specify the value of refractive index structure parameter $C_n^2$ because it determines the influence of atmosphere on the FSO. The first part of this article includes the theoretical calculation of $C_n^2$, there are used two models PAMELA and Macroscale-Meteorological model. The evaluation of the atmospheric influences and the RSSI value of received power level and also simulation of different types of modulation formats OOK-RZ, OOK-NRZ and BPSK in Optiwave is integral part of this article.

Index Terms—atmospherics, eMOS, FSO, modulation format, MOS, PAMELA.

Original Research Paper
DOI: 10.7251/ELS1620062L

I. INTRODUCTION

FSO links are used a long time for their advantages in terms of use for communication in the unlicensed band, high security level and simple and fast installation. The FSO began to be extensively used in the context of increasing transmission speed for last mile networks, there was needed a system with a higher capacity in comparison with conventional radio waves communication systems. Increase of the transmission speed and reliability of FSO link is largely given by the type of modulation format, which is used for communication [1]. The most commonly used modulation format is OOK (On Off Keying) in variants RZ (Return to Zero) and NRZ (Non-Return to Zero) [2]-[6]. In comparison of the FSO and RF system, the FSO is less affected by rain and snow, but atmospheric turbulence and fog dramatically affect the data transmission. Therefore in areas with dynamic weather changes there began to be used links that combines the advantages of FSO and RF. Disadvantage, which greatly reduces deployment and utilization of FSO systems, is then the atmosphere used as a transmission environment [7].

The atmosphere is dynamic and chaotic transmission medium, where the refractive index of air fluctuates during the day, especially at the turn of the day and night. Furthermore, the fluctuations of refractive index are related to wind velocity, roughness of the earth’s surface, amount of solar radiation, rainfall, air pressure, density and composition of the air. The intensity of these fluctuations is described by the refractive index structure parameter $C_n^2$. For the theoretical calculation of refractive index structure parameter there can be used basic mathematical models obtained from empirical study of the influence of the atmosphere on the refractive index structure parameter. These models are known as horizontal models and include the PAMELA model based on the Monin-Obhukov similarity theory (MOS) and Macroscale-Meteorological model based on global meteorological data released by the U.S. Army Night Vision Laboratory. In comparison of these two models, PAMELA model is far more difficult to re-computing performance and includes more parameters. Macroscale-Meteorological model is especially based on special weight function of relative part of day [8]-[13].

The location of measurement and used devices are also described as part of this article. The following chapters describe the theoretical background of mathematical models used to evaluation of the atmospheric effects on the FSO link.

II. DESCRIPTION OF THE MEASURED AREA, USED EQUIPMENT AND MEASURED VALUES

The data were analyzed during the month of April 2015 (1st–15th). These days were chosen due to the completeness of the data and also due to the development of RSSI (Received Signal Strength Indicator) value in each day, because the RSSI value fluctuated approximately within the same range. It was considered that in this month there were very few dispersion centers, which would be appreciably showed on the RSSI diagram.
hPa, we obtain follow equation:

\[ \Delta n = \frac{77.6 \times 10^{-6} \times P_a}{T} \left(1 + \frac{7.52 \times 10^{-3}}{\lambda^2} \right). \]  \tag{1}

By modifying this equation, where \( \lambda \) represents the wavelength of the radiation light source, \( T=288^\circ\text{K} \), \( P_a=1013 \) hPa, we obtain follow equation:

\[ C_n^2 = \frac{b \times K_b}{\epsilon^3} \left( \frac{\partial n}{\partial h} \right)^2 , \]  \tag{2}

where \( b \) is a constant commonly approximated by 2.8. If we ignore the small contributions to the total differential from fluctuations in atmospheric pressure, we can differentiate (1) with respect to the potential temperature \( \theta \) and by using next equation (more in [7] or [13]) with ignoring small contributions due to wavelength hit follows that [7], [11], [13]:

\[ \frac{\partial n}{\partial h} \approx \frac{\partial n}{\partial \theta} \frac{\partial \theta}{\partial h} \approx \frac{\Phi_k}{k_p \times h \times T^2} , \]  \tag{3}

where \( k_p \) is von Karman's constant taken to be 0.4, \( T* \) is characteristic or scaling temperature, \( \Phi_k \) is dimension less temperature gradient, \( L \) is Monin-Obukhov length and \( u* \) is friction velocity.

The other model is based on the Macroscale-Meteorological model and it carried out by the U.S. Army Night Vision Laboratory. This model is based on standard meteorological parameters measured over the world. It is also based on the concept of temporal hours or relative part of day. Relations between \( C_n^2 \) and \( th \) (temporal hour) parameter led to construction of weight function. Values of \( th \) are obtained in following ways [8]:

1. the current \( th \) is obtained by subtraction the hour of sunrise from the current hour and dividing by 12.
2. the current \( th \) is obtained by subtraction the hour of sunrise from current hour and dividing by the value of 1 \( th \).

\[ C_n^2 = 3.8\times10^{-14} W + f(T) + f(U) + f(RH) - 5.3\times10^{-13} , \]  \tag{4}

\[ f(T) = 2\times10^{-15} \times T , \]  \tag{5}

\[ f(U) = -2.5\times10^{-15} U + 1.2\times10^{-17} U^2 - 8.5\times10^{-17} U^3 , \]  \tag{6}

\[ f(RH) = -2.8\times10^{-15} RH + 2.9\times10^{-17} RH^2 - 1.1\times10^{-17} RH^3 , \]  \tag{7}

where \( W \) is temporal hour weight, \( T \) is air temperature in degrees of Kelvin, \( RH \) is relative humidity in (%) and \( WS \) is wind speed (m/s) [2]. This model has some limitations due to the weather conditions, for which it is valid. According to the theory [9] is a basic MOS model limited by temperature range of 9–35\(^\circ\)C, relative humidity range 14–92% and wind speed range of 0–10 m/s. The MOS model has certain limitations therefore it is not suitable e.g. for coastal areas where it can be assumed a higher air velocity and also higher air humidity. For this reason, the original MOS model was modified on the new version eMOS (extended Macroscale Meteorological Model):

\[ C_n^2 = 3.8\times10^{-14} W + \frac{A}{\text{exp}(T)} \times 10^{-14} + f(U) + , \]  \tag{8}

\[ f(T) = 2.58 \times 10^{-14} U , \]  \tag{9}

\[ f(RH) = -6.79 \times 10^{-15} RH . \]  \tag{10}

These equations (9) and (10) are valid in case that the area is covered by vegetation. The theory [9] describes an alternative variant of (9) and (10), which are mainly
designated for the mountainous areas. This model has its limitations for wind speed of range 5–10 m/s and relative humidity 92–100%.

IV. MEASUREMENT OF CLIMATIC EFFECTS

The weather station Davis Vintage Pro2 was used for measurement of meteorological data. This weather station is installed in close proximity to the FSO head on the roof of the main building of VSB–Technical University of Ostrava. In addition to the values of wind speed, temperature, pressure, relative humidity and solar radiation, the weather station is also capable to measure the wind direction and rainfall. Separate data were read in minute interval in time period. During the measurement of climatic influences a sensor failure occurred in 10th and 11th day of measuring due to electric energy failure. Due to this failure the days 10th April 2015 and 11th April 2015 were removed from subsequent calculations.

Figure 2 shows measured values of temperature gradient in correlation with solar radiation for monitored period. It is clear obvious from Fig. 2 that the unambiguous trend of temperature increasing with increasing solar radiation. In period of 1st–8th April 2015 the relative humidity was around 90% in night hours and up to 70% in day hours. In period of 9th to 14th April 2015 the mean value of relative humidity decreased which is related to increasing of outdoor air temperature, see Fig. 2. The maximal wind speed was 9 m/s which answers to limit cases of wind speed for MOS and eMOS models. The last Fig. 4 in this section shows behavior of atmospheric pressure which increased from the start of measurement to maximal value around 1030 hPa. The behavior of RSSI [\text{dBm}] level of FSO link periodically increased and decreased approximately from 250 to 530. Notice that occurred stationary increasing of RSSI values from 3rd April 2015. Also other figures show change of atmosphere behavior with air temperature increasing.
V. CALCULATION OF REFRACTIVE INDEX STRUCTURE PARAMETER $C_n^2$

According to the theory described in III. section the calculations of models PAMELA, MOS and eMOS were done in software Matlab. These calculations went from measured data which were obtained in period 1st to 14th April 2015. These data are used as variables which enable to calculate $C_n^2$ according to models PAMELA, MOS and eMOS. The final calculation is displayed in Fig. 5a, 5b and 5c, where the x-axis represents minutes of one day and y-axis shows values of the refractive index structure parameter $C_n^2$. The calculations of separate minutes are represented by grey crosses. For each minute, the 12 calculations were done in our cases and from these 12 values in each minute the arithmetic mean (red line) was calculated. Fig. 5d shows the final comparison of three models.

Fig. 4. Measurement of voltage level RSSI and atmospheric pressure in period 1st to 14th April 2015.

Fig. 5. Calculation of refractive index structure parameter for mean values of atmospheric conditions in period 1st to 14th April 2015.
VI. SIMULATION OF INFLUENCE OF ATMOSPHERIC PHENOMENA ON FSO IN OPTIWAY SYSTEM

Optiwave OptiSystem software ver. 11 was used for simulation of the influence of atmospheric conditions on the FSO link. We selected mostly used modulation formats RZ, NRZ and BPSK in FSO communications (see Fig. 6) for simulation of the influence of atmospheric phenomena on the quality of communication. The modulation formats affect important parameters such as bandwidth and energy efficiency, which affect overall system performance. The main goal of modulation formats is increasing spectral/power efficiency and reducing sensitivity to fluctuation of received power [14], [15]. The modulation of the optical beam is provided in optical area by Mach-Zender modulator, see Fig. 6.

The simulations were set to correspond to the real FSO link, which is placed in the campus. The OOK (On-Off-Keying) modulation format is the most commonly used in commercial terrestrial FSO communications. Advantage of this format is resistance to nonlinearities of the laser and an external modulator. However, this format is much more sensitive to turbulence and other disturbances which lead to fluctuation of received optical power. We can reduce this fluctuation by replacing the decision-making level with adaptive decision-making level. The OOK modulation format can be used with non-return to zero (NRZ) or return to zero (RZ) link code. For OOK-NRZ modulation format the BER can be defined as [2], [7], [15]:

\[
\text{BER}_{\text{OOK-NRZ}} = \frac{1}{2} \cdot \text{erfc} \left( \frac{1}{2} \sqrt{\frac{\text{SNR}}{2}} \right), \tag{11}
\]

where is \( \text{erfc} \) (error function also called the Gaussian error function) and \( \text{SNR} \) is signal-to-noise ratio defined in dB units. For OOK-RZ format the pulse duration is shorter for “1” thereby the energy efficiency increases, but RZ requires larger bandwidth than NRZ. The BER of OOK-RZ can be defined as [2], [7], [15]:

\[
\text{BER}_{\text{OOK-RZ}} = \frac{1}{2} \cdot \text{erfc} \left( \frac{1}{2} \sqrt{\frac{\text{SNR}}{2}} \right). \tag{12}
\]

The BPSK (also sometimes called Phase Reversal Keying (PRK), or 2PSK) is the simplest form of PSK (Phase Shift Keying) modulations. It uses two phases with difference of 180°. It does not exactly matter where the constellation points are positioned. This modulation is the most robust of all the PSK modulations nevertheless it takes the highest level of noise or distortion which could lead to an incorrect decision.

However, it only modulates 1 bit/symbol thereby it is unsuitable for high data-rate applications. BPSK has variety of applications in digital communications systems such as the wireless LAN standard, IEEE 802.11, digital modems, wireless telephone networks etc. BPSK modulation format is functionally equivalent to 2-QAM (Quadrature Amplitude Modulation) modulation. BER for BPSK can be defined as [2], [7]:

\[
\text{BER}_{\text{BPSK}} = \frac{1}{2} \cdot \text{erfc} \left( \frac{\sqrt{\text{SNR}}}{\sqrt{2}} \right). \tag{13}
\]

Values of refractive index structure parameter calculated according real data were successively entered in OptiSystem, the simulations calculated the BER and the eye diagrams. Figure 7 shows the calculated dependence of the BER on the refractive index structure parameter. The refractive index structure parameter values were entered from 10\(^{-17}\) to 10\(^{-12}\) with logarithmic increase.

We highlighted the area from 2.21·10\(^{-16}\) do 4.27·10\(^{-14}\) in Fig. 7, these values correspond to the maximum and minimum value of the refractive index structure parameter calculated by models. The expected increasing of BER is reflected in this area that shows the logarithmic character of BER. However, computational model presents particular and mistaken behavior in the area around 4·10\(^{-14}\), when BER begins decline with increasing turbulence error rate. This behavior is contrary to the logical assumption and it can be considered as an error caused by an internal algorithm program.
Development of BER with regards to chosen type of modulation format and refractive index structure parameter for models PAMELA, MOS and eMOS is showed in Fig. 9–11. Good results could by seen for modulation format BPSK from simulations and development of BER parameter (Fig. 8 c, f). This behavior is not in the case of RZ (Fig. 8 a, d) or NRZ (Fig. 8 b, e) modulation format, increasing of BER is more significant for these modulations. In this case interruption of communication channel could occur due to higher values of BER.

The last simulation is calculation of maximal range of FSO link for minimal and maximal $C_n^2$ value. We simulated the range for modulations RZ, NRZ and BPSK with the same BER=$10^{-12}$. The attenuation coefficient of atmosphere was set to 0.4 dB/km. Table I summarizes the simulated results for maximal range of FSO link. It is obvious that maximal range of FSO link could be achieved by using BPSK modulation.

### VII. CONCLUSION

Within this article the team of authors focused its attention to the area of atmospheric influence on Free-Space-Optics. Based on real measured values of atmospheric environment three models of refractive index structure parameter (PAMELA, MOS and eMOS) were computed. Each model uses different principles for calculation of the $C_n^2$ which is reflected especially during sunset and sunrise, when PAMELA model has the lowest level of turbulence compared to MOS and eMOS. These models were then correlated with BER for different modulation formats in our case RZ, NRZ, and BPSK. With regards to real atmospheric measurement, the BPSK modulation showed good error-resistant in turbulent condition in comparison to RZ and NRZ modulation.
ACKNOWLEDGMENT

The research described in this article could be carried out thanks to the active support of the projects no. SP2016/151, SP2016/149 and VI20152020008. This article was also supported by projects Technology Agency of the Czech Republic TA03020439 and TA04021263. The research has been partially supported by the Ministry of Education, Youth and Sports of the Czech Republic through grant project no. CZ.1.07/2.3.00/20.0217 within the frame of the operation programme Education for competitiveness financed by the European Structural Funds and from the state budget of the Czech Republic.

REFERENCES


A Novel SVPWM Algorithm Considering Neutral-Point Potential Balancing for Three-Level NPC Inverter

Chen Yongchao, Li Yanda, and Zhao Ling.

Abstract—For three-level inverter, complexity of control strategy and neutral-point potential imbalance problem of DC side is the bottleneck restricting its application. In order to solve the problem, a simplified implementation of three-level space vector pulse width modulation (SVPWM) considering neutral-point potential balancing is proposed in this paper. The proposed SVPWM algorithm is based on judging of three phase voltages and voltage-second balance principle, which does not need to perform the sine and cosine calculations, and thus it is more convenient and effective than traditional SVPWM algorithm. Also, the neutral-point potential balancing can be realized conveniently and effectively. The proposed algorithm can not only effectively simplify the calculation and reduce calculation time greatly, but also achieve the same control effect as traditional SVPWM. It has certain reference significance and can be used to shorten sampling time and improve the inverter performance. Finally, the proposed SVPWM algorithm is verified by simulation and experimental results.

Index Terms—Three-level Inverter, SVPWM, Voltage-second balance, Neutral-point potential.

Original Research Paper
DOI: 10.7251/ELS1620069Y

I. INTRODUCTION

RECENTLY, industry has begun to demand more and more high-voltage and high-power equipments, such as large AC motor drives, HVDC, UPFC and STATCOM. Therefore, multilevel inverters particularly the three-level one, have received great attention due to their significant advantages in high-power and high-voltage applications. Comparing with two-level inverter system with the same capacity, three-level inverters can reduce the voltage stress on switches, diminish the harmonic distortion of output voltage and increase the rate of voltage and power [1, 2].

The modulation method is a key technology in the research of three-level inverters, which directly determine the performance of inverter. Among so many methods, SPWM and SVPWM are used most frequently. For the method of SPWM, the fatal disadvantage is low DC voltage utilization. As for SVPWM, even though it has the advantages of high DC voltage utilization ratio, low ripples and less number of power device switching [3, 4]. However, as the number of the level increases, more complicated control strategy is needed in an effort to choose proper switching states from increased amount of redundant states. Moreover, the neutral-point (NP) potential variation is another thorny problem for three-level inverter. The traditional SVPWM algorithm needs sine and cosine calculations. Its large amount of complicated calculation leads to waste of time. For this reason it becomes very difficult to shorten the sampling time. A great deal of work has been done to find out a better modulation algorithm and all kinds of simplified SVPWM algorithms have been proposed in many literature. But these algorithms mostly focus on offsetting unnecessary inverse operation and changing multiplication into shift operation, often with very little effect [5]. In fact the essence of SVPWM is based on the principle of voltage-second balance. According to this principle, a simplified SVPWM algorithm is proposed in this paper, which can avoid sine and cosine calculations, and thus sampling period can be shortening. And this can be used to improve the controlling effect of three-level inverter.

II. FUNDAMENTAL OF THREE-LEVEL SVPWM

Among all multi-level topologies, the most popular at present is three-level neutral-point-clamped inverter (NPC) [6, 7]. Fig.1 presents the scheme of three-level NPC inverter. Each leg of the NPC inverter consists of four power switches, four freewheeling diodes and two clamping diodes that limit the voltage excursions across each device to half the input dc-bus voltage. For three-level NPC inverter, each bridge leg has three different switching states P, O or N corresponding to three kinds of output phase voltage respectively.
Take three phase reference voltages as:
\[ u_a = E_m \sin wt \]  
\[ u_b = E_m \sin(wt - \frac{2}{3}\pi) \]  
\[ u_c = E_m \sin(wt + \frac{2}{3}\pi) \]  

Define the voltage space vector corresponding to the three-phase reference voltage as:
\[ V_{ref} = \frac{2}{3}(u_a + u_b e^{\frac{-2\pi}{3}} + u_c e^{\frac{-2\pi}{3}}) = E_m e^{j\omega t} \]  

The three-phase positive sequence voltage is corresponding to the counter clockwise rotating reference vector. Take three-phase into consideration, the total switching states consist of \(3^3=27\) different states corresponding to 27 standard voltage space vectors, as shown in Fig.2. According to magnitude, these 27 vectors are classified by four categories, that is, zero vectors, small vectors (vertices of inner hexagon), medium vectors (mid-points of sides of outer hexagon) and large vectors (vertices of outer hexagon). Each small vector has two redundant switching states respectively.

And these 27 standard voltage space vectors divide the space vector diagram into six triangle sections. Start from the large voltage vector PNN, the whole region can be defined as sectors I, II… and VI every 60 degrees. And each sector is divided into six sub triangles as shown in Fig.3. Sector I is usually analyzed firstly. Then the result of the whole 360 degree region can be achieved according to its symmetry characteristic.

The main idea of SVPWM is to find out which sector and sub triangle the target reference vector \(V_{ref}\) falls into, and then, form the target reference vector by its nearest three vectors (NTV) according to voltage-second balance principle [8–10].

III. CONVENTIONAL SVPWM ALGORITHM

A. Location Judgment of Reference Vector

In order to form the target reference vector \(V_{ref}\), we must find out which sector and sub triangle the target reference vector falls into. It is determined by the target vector’s amplitude and phase.

As for the conventional SVPWM algorithm, coordinate transformation is needed in order to find out where \(V_{ref}\) is. Assume that the reference vector falls in the sector I, there is \(0 < \theta < \frac{\pi}{3}\).

When the reference vector falls into sector I, the judgment of the sub angle where it is located can be determined by rules listed in Table I.

<table>
<thead>
<tr>
<th>Sub angle</th>
<th>Judgment rule</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>[ \theta &lt; \frac{\pi}{6} \text{ and } m \leq \frac{1}{2\sin(\frac{2\pi}{3}-\theta)} ]</td>
</tr>
<tr>
<td>2</td>
<td>[ \theta \geq \frac{\pi}{6} \text{ and } m \leq \frac{1}{2\sin(\frac{2\pi}{3}-\theta)} ]</td>
</tr>
<tr>
<td>3</td>
<td>[ \theta &lt; \frac{\pi}{6} \text{ and } \frac{1}{2\sin(\frac{2\pi}{3}-\theta)} &lt; m \leq \frac{1}{2\sin(\frac{\pi}{3}-\theta)} ]</td>
</tr>
<tr>
<td>4</td>
<td>[ \theta \geq \frac{\pi}{6} \text{ and } \frac{1}{2\sin(\frac{2\pi}{3}-\theta)} &lt; m \leq \frac{1}{2\sin \theta} ]</td>
</tr>
<tr>
<td>5</td>
<td>[ \theta &lt; \frac{\pi}{6} \text{ and } m &gt; \frac{1}{2\sin(\frac{2\pi}{3}-\theta)} ]</td>
</tr>
<tr>
<td>6</td>
<td>[ \theta \geq \frac{\pi}{6} \text{ and } m &gt; \frac{1}{2\sin \theta} ]</td>
</tr>
</tbody>
</table>

In Table I, \(m\) is the modulation degree.
In order to avoid sine and cosine calculations, we can judge the location directly through three reference phase voltage.

The waveform of three positive sequence reference voltages is shown in Fig.4. It can be seen from Fig.4 that there exists following rules: The amplitude of the three reference phase voltage take turns as the biggest one for 120 degrees. In each 120 degrees, the smaller two phase voltages take turns as the smallest one for 60 degrees. Sorting the three reference phase voltage by the amplitude, we can get \( P^3 = 3! = 6 \) different kind of statuses. And these 6 different statuses are corresponding to 6 different sectors into which the reference vector may fall. Since the reference vector is synthesized by the three-phase reference voltage, the changing of positive sequence voltage lead to counter-clockwise rotation of the synthetic vector. Actually, comparing the amplitude of phase voltage is equivalent to identifying into which sector the reference vector falls [11].

For convenience of discussion, the range of \( \omega t \) is set within \( 90^\circ \sim 150^\circ \) in order to illustrate the fundamental of improved SVPWM. As shown is Fig.4, if three phase voltage is sorted, we can get \( U_a > U_b > U_c \). At the same time, the reference vector phase angle ranges between \( 0^\circ \sim 60^\circ \) because of \( \theta = \omega t - \frac{\pi}{2} \), which means that the reference vector falls into sector I. Furthermore, according to the symmetry of \( U_b \), this 60 degree can be divided into two parts, \( U_b \geq 0 \) and \( U_b < 0 \), each part occupies 30 degrees. Therefore, based on the value of three-phase reference voltage, it can be recognized that weather sub angle 1, 3, 5 or 2, 4, 6 the corresponding reference vector falls into. In order to find out exactly which sub angle the reference vector falls into, a few more judgments must be carried out. And new judging rules for the value of three-phase reference voltage can be obtained from TABLE I. For example, when the reference vector falls into sub angle 3, the corresponding judging rules can be deduced as following.

According to TABLE I, there is \( \theta < \frac{\pi}{6} \), which is equivalent to \( U_b < 0 \). Also, there is \( \frac{1}{2\sin(\frac{2\pi}{3} - \theta)} < m \leq \frac{1}{2\sin(\frac{\pi}{3} - \theta)} \)

Which can be replaced by:
In Fig.5, \( k \) is a scale factor which is used to adjust the NP potential. It can be seen from Fig.5 that only one phase switch state changes when the vector is changing and the switch state only changes between P, O or O, N. Therefore, the stable operating of inverter is guaranteed. Moreover, when the reference vector is rotating from one sector to another, three phase switch states remains unchanged and the transition is much more smoother [12].

As can be seen from the pulse sequence, the standard vector \( V_2 \) is always operating as one switch state OON, whose redundant switch state PPO is never been used. It is because that when the reference vector falls into sub angle 3, the distance between \( V_{ref} \) and \( V_2 \) is bigger than that of \( V_{ref} \) and \( V_1 \).

Thus, the operating time of \( V_2 \) is much shorter than that of \( V_1 \). This means that the NP potential adjusting ability of \( V_2 \) is weaker than that of \( V_1 \) contrarily [13]. So, only the redundant switch state of \( V_1 \) is used to adjust NP potential.

If the operating time of \( V_2 \) is also separated proportionally to the two redundant states OON and PPO, the pulse transmission sequence would become nine segments instead of seven segments. Although it can make full use of the redundant switching state to adjust NP potential, it would lead to complexity of the control and bigger switching loss of the device.

### C. Calculation of Vector Operating Time

Actually, the fundamental of SVPWM is voltage-second balance principle. The following formula can be obtained directly from Fig.5 according to voltage-second balance principle.

\[
\begin{align*}
U_{ab} T_s &= E(t_1 + t_2) \\
U_{bc} T_s &= E(t_2 + t_3) \\
U_{ca} T_s &= -E(t_1 + t_2) - 2Et_4
\end{align*}
\]

Combined with \( t_1 + t_2 + t_4 = T_s \), there is:

\[
\begin{align*}
t_1 &= T_s\left(1 - \frac{U_{bc}}{E}\right) \\
t_2 &= T_s\left(1 - \frac{U_{ab}}{E}\right) \\
t_4 &= -T_s\left(1 + \frac{U_{ca}}{E}\right)
\end{align*}
\]

In order to verify the correctness of above formula, we can compare the results with that of the traditional algorithms. According to the relationship between phase voltage and line to line voltage, and notice that \( \theta = \omega t - \frac{\pi}{2} \), there is:

\[
\begin{align*}
t_1 &= T_s\left[1 - \frac{\sqrt{3}E_m \sin(\omega t - \frac{\pi}{2})}{E}\right] \\
t_4 &= T_s\left(1 - \frac{\sqrt{3}E_m \sin \theta}{E}\right) \\
t_4 &= T_s\left(1 - 2m \sin \theta\right)
\end{align*}
\]

It can be found that formula (21) is as same as formula (8). Similarly, \( t_2 \) and \( t_3 \) can also be testified to be exactly equal to the
results obtained by conventional SVPWM. Furthermore, the results can also be testified when the reference vector falls into other sub angles and sectors. Thus, the validity of the proposed improved algorithm is verified.

D. Adjusting of Neutral-Point Potential

As is known form Fig.1, when the neutral point current \( i_o \) is not equal to zero, one of the two DC side capacitors would be charged while the other discharged. And this, leads to variation of neutral-point (NP) potential. The NP current is affected by the three-phase load current of the inverter. When the switch state of a certain phase is P or N, it will not affect the NP current. On the opposite, when the phase switch state is O, the contribution to the NP current is phase current.

While synthesizing the reference vector, the NP potential will not be affected by zero vectors and large vectors. Once the medium vector plays a role in synthesizing, one phase output load current is connected to neutral point, which will result in the neutral-point current disturbing the NP voltage variation [14–17]. The NP current would be the output current of a certain phase. Therefore, imbalance of the NP potential which is inevitable when medium vector is involved in the synthesis. So, in order to balance the NP potential, it is necessary to choose properly from the two redundant states of small vectors, which with the same output voltage but have opposite influence on the NP potential [18–20].

Split the small vector operating time into two parts according to a scale factor \( k \), it is convenient to balance the NP potential by adjusting \( k \) [21]. Assume that the NP potential offset \( \Delta U \) and three phase current \( i_a, i_b \) and \( i_c \) have been obtained by sampling. In order to pull back the NP potential to the balanced position in a sampling period, appropriate NP current should be injected into NP producing \( -\Delta U \) to help offset imbalance voltage. Therefore, the average value of the NP current in a sampling period should meet the following equation:

\[
I_o^* = -C \frac{\Delta U}{T_s} \tag{22}
\]

According to Fig.5, the average NP current can also be calculated as:

\[
I_o^* = \frac{1}{T_s}[(2k-1)t_i^a + t_i^b - t_i^c] \tag{23}
\]

Combine (22) and (23), and eliminating \( I_o^* \), there is:

\[
k = \frac{t_i^a - t_i^b + t_i^c - C\Delta U}{2t_i^a} \tag{24}
\]

Where the scale factor \( k \) ranges between 0–1. While the result obtained according to formula (24) fall outside the range, the boundary value should be taken at 0 or 1. At this point, the NP potential could not be balanced in one sampling period. However, the NP potential will attempt to move toward the balanced position, and after several sampling period, it will be balanced eventually.

V. SIMULATION AND EXPERIMENT

A. Simulation

Simulation study is carried out under the symmetrical three-phase load condition, where the resistance of load is 2.5Ω. The output phase to phase voltage is 315V, the output frequency is 50 Hz and the switching frequency is 12.5 kHz. The inverter DC bus consists of two capacitors in series with 4100uF each, and the DC bus voltage is 600V.

Fig. 6 shows the output phase to phase voltage and Fig.7 shows the three phase output current of inverter. From these two figures we can see obviously that the waveform distortion rate is satisfying. Fig. 8 shows the dynamic average value of current \( i_o \) which directly affect the fluctuation of NP potential. As shown in Fig.9, it can be seen that the NP potential variation is suppressed successfully. Also, the waveform of scale factor \( k \) is shown in Fig.10.
B. Experiment Device and Results

In order to go a step further to verify the correctness of the design method, a set of experimental platform is set up. The three-level NPC inverter use DSP-CPLD as the control core. The controlling and sampling board is placed on the top layer, while the main circuit board on medium layer, filter and radiator on the bottom layer, as shown in Fig 11.

The DC bus of inverter is linked to a 600V DC source, and islanding detection device is employed to provide the symmetrical three-phase loading condition for fully loaded test.

The waveforms of output phase current and phase to phase voltage are shown in Fig.12, and voltage variations of the two DC bus capacitors are shown in Fig.13. As can be seen clearly from Fig.13, the effect of NP potential balancing is pretty well.

VI. Conclusions

In this paper, a novel SVPWM algorithm for the three-level neutral-point-clamped inverter has been proposed. Compared with conventional SVPWM, the proposed algorithm is simple to implement and does not need to perform sine and cosine calculations. And thus it is more convenient and effective than the conventional SVPWM algorithm. Besides, the neutral-point potential balancing can be implemented readily by adjusting redundant scale factor of small vectors. Only the information of DC-link capacitor voltages and three-phase load currents is required, which is convenient to apply and is compatible of digital computer realization. Because the proposed SVPWM is also based on voltage second balance principle, the control effect is same as the conventional one. Owing to low computational overhead of improved SVPWM, the sampling period can be shortening and can be used to improve the controlling effect of three-level inverter.

REFERENCES

Hardware Implementation of FTC of Induction Machine on FPGA

S. Boukadida, S. Gdaim, and A. Mtibaa

Abstract—In this paper, a new design method of Direct Torque Control using Space Vector Modulation (DTC-SVM) of an Induction Machine (IM), which is based on Fault Tolerant Control (FTC) is proposed. Due to its complexity, the FTC is implemented on a microcontroller and a Digital Signal Processor (DSP) is characterized by a calculating delay. To solve this problem, an alternative digital solution is used, based on the Field Programmable Gate Array (FPGA), which is characterized by a fast processing speed. However, as an FGPA increases in size, there is a need for improved productivity, and this includes new design flows and tools. Xilinx System Generator (XSG) is a high-level block-based design tool that offers bit and cycle accurate simulation. This tool can automatically generate the Very High-Density Logic (VHDL) code without resorting to a tough programming, without being obliged to do approximations and more we can visualize the behavior of the machine before implementation which is very important for not damage our machine. Simulation and experimental results using Hardware In the Loop (HIL) of the FTC based DTC-SVM is compared with those of the conventional DTC. The comparison results illustrate the reduction in the torque and stator flux ripples. Our purpose is to reveal our algorithm efficiency and to show the Xilinx Virtex V FPGA performances in terms of execution time.

Index Terms—FPGA, Fault tolerant control, induction motor, parity space, Xilinx System Generator.

Original Research Paper
DOI: 10.7251/ELS1620076B

I. INTRODUCTION

Nowadays, the implementation of modern controllers for electrical drives demands a perfect satisfaction of the required performances. In order to satisfy control requirements, recent research studies proved that digital hardware solutions, such as FPGA, are an appropriate alternative to software solutions. Research interest in implementation on FPGA has grown considerably over the past few years.

This is due to its advantages such as reducing the execution time by adopting a parallel processing and the rapid prototyping of digital control.

The cost of FPGA decreases, for that reason, the replacement of microcontrollers with FPGA is a new tendency. During the past few years, several researchers use the FPGA for controlling electrical system [1-5]. Most of them develop the algorithm on a VHDL hardware description language. If the prototyping platform uses an FPGA to run the algorithm, a newly created simulation tool can be used not only to simulate exactly the hardware but also to automatically generate the VHDL code needed for the implementation. This software is the tool XSG developed by Xilinx and integrated in MATLAB/SIMULINK.

FTC of electrical drives was a very active research field for many research groups [6-9]. FTC should be able to detect faults and to cancel their effects or to attenuate them until an acceptable level. The FTC aims to ensure the continuous system functionality, even after faults occurrence. This allows increasing system availability and reliability. Different types of failures can occur in controlled electrical drives: IM, power converters, connectors, and sensors. The failures in the electric motor can have various origins such as failures related to the exploitation that can lead to faults and also a premature degradation, and failures related to wrong weak dimensioning and design which lead to a premature degradation [10]. Current sensors are widely used in controls of electrical drives. Faults in the current measurement chains have been treated for various electrical drives. Researches in this field initially focused on the effect of current sensor fault and the development of Fault Detection and Isolation (FDI) methods [11-14]. In general, FDI methods utilize the concept of redundancy, which can be either a hardware redundancy or analytical redundancy. The usual approach to fault diagnosis is based on hardware redundancy and uses a voting technique to decide if a fault has occurred and to locate it among the redundant system elements. Also, the analytical redundancy FDI approach makes use of a mathematical model of the monitored system. The analytical redundancy approach does not require additional hardware, it is usually a more cost-effective approach compared to the hardware redundancy approach. However, the analytical redundancy approach is more challenging due to the need to ensure its robustness in the presence of model uncertainties, noise, and unknown disturbances. Generally, the analytical redundancy approach can be divided into quantitative model-based methods and
Digital implementations of FTC are mostly carried out with microcontrollers or digital signal processors owing to their software flexibility and low cost. FPGA can also be used as a new digital solution. In this paper, authors propose a new simple FTC current that allows continuous operation of a three-phase controlled machine drive under fault current measurement. The adopted control method uses three current sensors instead of only two as usually done. Residuals for FDI are then provided via analytical redundancy introduced by the third sensor. In the case of current sensor fault occurrence, the faulty current sensor is eliminated and the control continues working with the two other healthy sensors. For the hardware implementation of the FTC of an IM on the FPGA, we use XSG toolbox developed by Xilinx and added to Matlab/Simulink. The XSG advantages are the rapid time to market, real time and portability. Once the design and simulation of the proposed algorithm are completed we can automatically generate the VHDL code in Xilinx ISE.

This paper is organized as follows. In Section II, the new design methodology for implementation on FPGA is presented. In section III, the design of the DTC of IM using SVM is introduced. In Section IV, the FTC approach is developed. The implementation of the proposed architecture is presented in the next section. Finally, a conclusion closes this paper.

II. DESIGN METHODOLOGY FOR IMPLEMENTATION ON FPGA

The design developed in this paper was performed according to an appropriate methodology. The XSG is a modeling tool developed by Xilinx. This tool can automatically generate the VHDL code without resorting to a difficult and tough programming. XSG facilitates the implementation of any design and it eases the designer from low-level algorithmic complexity. XSG uses the Xilinx Integrated Synthesis Environment design suite to automate VHDL code generation which can then be integrated with other designs or used as a stand-alone design. Unlike the VHDL languages, the new methodology (XSG) gives a model-based design interface using an extended Simulink library of building blocks to create hardware. The XSG library has a set of DSP hardware blocks that can perform complex functions such as Interpolation filter, CORDIC (divider, tan, sin, cos, Log), FFT and FIR filter design. This new methodology consists of a set of steps and roles that offer considerable hardware design advantages as shown in Fig. 1.

System Generator gives hardware co-simulation, making it possible to incorporate a design running on an FPGA directly into a Simulink simulation. Hardware co-simulation compilation targets automatically create a bit stream and associate it with a block. HIL, or FPGA in the loop, is a concept that as revealed by the name uses the hardware in the simulation loop. This leads to easy testing and the possibility to test how the actual plant is behaving in hardware. By having the stimuli in software on the PC, implementing a part of the loop in hardware and then receiving the response from hardware back in the software.

III. DESIGN OF THE DTC OF IM USING SVM

A. Conventional Direct Torque Control of IM

Figure 2 presents a possible schematic of DTC. IM is the system to be controlled.

Schematic of DTC is composed of two different loops corresponding to electromagnetic torque and stator flux. Two hysteresis comparator of torque and flux are used. The outputs of the stator flux error and torque error hysteresis blocks, together with the position of the stator flux are used as inputs.
to the switching table given in Table I. The module of the stator flux $\phi_s$ is given by equation 1, the developed electromagnetic torque $T_e$ is evaluated by equation 2.

<table>
<thead>
<tr>
<th>Number of sectors</th>
<th>$E_x$</th>
<th>$E_y$</th>
<th>$S_1$</th>
<th>$S_2$</th>
<th>$S_3$</th>
<th>$S_4$</th>
<th>$S_5$</th>
<th>$S_6$</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1</td>
<td>1</td>
<td>V2</td>
<td>V3</td>
<td>V4</td>
<td>V5</td>
<td>V6</td>
<td>V1</td>
</tr>
<tr>
<td>0</td>
<td>0</td>
<td>-1</td>
<td>V7</td>
<td>V0</td>
<td>V7</td>
<td>V0</td>
<td>V7</td>
<td>V0</td>
</tr>
<tr>
<td>0</td>
<td>-1</td>
<td>V6</td>
<td>V1</td>
<td>V2</td>
<td>V3</td>
<td>V4</td>
<td>V5</td>
<td>V1</td>
</tr>
<tr>
<td>0</td>
<td>-1</td>
<td>V3</td>
<td>V4</td>
<td>V5</td>
<td>V6</td>
<td>V1</td>
<td>V2</td>
<td>V3</td>
</tr>
<tr>
<td>-1</td>
<td>0</td>
<td>V0</td>
<td>V7</td>
<td>V0</td>
<td>V7</td>
<td>V0</td>
<td>V7</td>
<td>V0</td>
</tr>
</tbody>
</table>

and the shifted angle $\theta_s$ is presented by equation 3.

$$
\phi_s = \sqrt{\phi_{sa}^2 + \phi_{sb}^2}
$$

$$
T_e = \sqrt{3}\left(\phi_{sa}i_{sa} - \phi_{sb}i_{sb}\right)
$$

$$
\theta_s = \tan^{-1}\left(\frac{\phi_{sb}}{\phi_{sa}}\right)
$$

Where $\phi_s$ and $i_s$ denote stator flux and stator currents, in reference ($\alpha, \beta$), respectively.

B. DTC Inconvenience

The DTC algorithm is the most recently developed technique of IM. This drive was firstly proposed by Takahashi. The DTC method is characterized by a fast dynamic response, robustness to the rotor parameter variation and by its simple implementation. However, this technique suffers from many drawbacks. Among these disadvantages are stator flux and electromagnetic torque ripples. Thus, ripples can be observed on the controlled torque and flux which can be reflected on the driving shaft and caused damage to the structure. It is well established that these drawbacks are mainly due to the use of hysteresis torque and flux controllers. For this reason, many methods are used to overcome these disadvantages such as SVM. To overcome these disadvantages of the conventional DTC, in the next section, we present and discuss the hardware implementation of the DTC using SVM on the FPGA.

C. Direct Torque Control of IM-based SVM

In this section, a DTC-SVM approach is proposed to reduce torque and flux ripples. Fig. 3 illustrates the proposed version of the DTC-SVM with the IM. In order to have a switching frequency operation fixed, we combine the DTC with SVM. The switching table and the two hysteresis controller are replaced by two PI regulators, transformation block ($d, q$) to ($\alpha, \beta$) and a SVM block. The outputs of the regulator are used as inputs of direct axis voltage. The ($d, q$) axis voltages are converted into amplitude of stator voltage using equation 4.

$$
V_{sa} = V_d \cos(\theta_s) - V_q \sin(\theta_s)
$$

$$
V_{sb} = V_d \sin(\theta_s) + V_q \cos(\theta_s)
$$

The voltages ($V_d, V_q$) and stator angle $\Theta_s$ are used as reference signals in SVM approach.

The principle of SVM is to project the desired stator voltage vector on the two adjacent vectors corresponding to two switching states of the inverter. The values of these projections provide the desired commutation times. The coordinates of the voltage vector in the base formed by adjacent vectors are calculated using equation 5. To apply the reference vector, it is then possible to apply the vector $V_1$ during a time $t_1$ and the vector $V_2$ during a time $t_2$.

$$
\vec{V}_s = V_{sa} + jV_{sb} = \frac{T_1}{T_{mod}}\vec{V}_1 + \frac{T_2}{T_{mod}}\vec{V}_2
$$

Fig. 3. Diagram of the DTC-SVM method.
D. Modeling of proposed DTC-SVM scheme using XSG

XSG tool developed for MATLAB/SIMULINK package is widely used for verification purposes and algorithm development in FPGA and DSP. The main advantage of this tool is to translate the modeling design of DTC-SVM scheme into hardware implementation and to generate the VHDL code without restoring to a difficult programming. Furthermore, the FPGA hardware represents the blocks of the XSG tool developed for MATLAB/SIMULINK package. Therefore, the implementation time is reduced because the algorithm needs to be simulated and modeled just once. The design of the DTC-SVM using the XSG is based on a mathematical model. The design stages for logical operations and required arithmetic for the proposed design are carried out in a modular fashion and hierarchical. The modules of proposed design can be described as follows:

- Torque, flux Estimator and angle calculation:

  The XSG design of torque and flux estimator module and its submodules of proposed DTC-SVM are shown in Fig. 4. The module of the stator flux \( \phi_s \) given in equation 1, the developed electromagnetic torque \( T_e \) given in equation 2 and the shifted angle \( \theta_s \) presented by equation 3 are modeled using blocks of XSG. The XSG library contains many blocks that are used to develop our design such as; basic elements (multiplexers, delay, registers); mathematical functions (multiplication, relational, add, negate).

- Two PI regulators

  The XSG design of the (d, q) axis voltages are shown in Fig.5. The outputs of the regulator are used as inputs of direct axis voltage, as shown in Fig.6.

- Transformation (d, q) to (\( \alpha \), \( \beta \))

  The block SVM generates a series of pulses to be used subsequently to carry out the control signals used in the model of the inverter as shown in Fig.7.

![Fig. 4. Modeling of torque, flux and theta calculating](image1)

![Fig. 5. Calcul of (Vd, Vq)](image2)

![Fig. 6 Design of the components Vd, and Vq](image3)
IV. OVERVIEW OF FDI AND RECONFIGURATION TECHNIQUES

In this section, the dynamic redundancy equations based on the Parity Space (PS) approach with the proposed FDI algorithm are first presented. Then the control reconfiguration is presented.

A. Proposed FDI Algorithm

The concept of PS utilizes the information ingrown in the mathematical model for FDI. The actual behavior is compared to that expected on the basis of the model; deviations are indications of faults (or disturbances, noise or modeling errors). Primary residuals are formed as the difference between those predicted by the model and the actual plant outputs. These are then subjected to a linear transformation, to obtain the desired FDI properties.

The aim of this study is to ensure a quasi-instantaneous detection of fault occurrence on the sensor in order to ensure continuous operation.

A brief description of the PS approach is given here:

\[
\frac{dx}{dt} = AX(t) + Bu(t) ; A = (a_{j})_{1 \leq j \leq q \times n}  \tag{9}
\]

\[
y(t) = C.X(t) + f(t); B = (b_{j}) \tag{10}
\]

Where X(t) denotes the state vector, u(t) the control input vector and y(t) the measured output vector. All sensor faults are grouped in the f (t) term, which is generally unknown. Matrices A, B and C are the known matrices that depend on coefficients a_{j} and b_{j}.

In this study, the measurement equation of each sensor is presented as:

\[
y_{i}(t) = C_{i}.X(t) + f_{i}(t); C_{i} = (0.....1.......0) \tag{11}
\]

The discrete form of (9) and (10) is expressed by (12), where T_{a} is the sampling time of measurement:

\[
X(k+1) = A_{d}.X(k) + B_{d}.u(k) \tag{12}
\]

\[
y_{i}(k) = C_{d}.X(k) + f_{i}(k)
\]

Where:

\[
A_{d} = e^{A_{d}T_{a}} ; B_{d} = B.A_{d}T_{a} ; C_{d} = C_{i} \tag{13}
\]

Using temporal redundancy, the system (12) and (13) is expressed as:

\[
\begin{bmatrix}
y_{i}(k) \\
y_{i}(k+1) \\
\vdots \\
y_{i}(k+s)
\end{bmatrix} =
\begin{bmatrix}
C_{i} \\
C_{i}A \\
\vdots \\
C_{i}.A^{s}
\end{bmatrix}
\begin{bmatrix}
X(k) + F_{i}(k,s)
\end{bmatrix}
\begin{bmatrix}
\vdots \\
\vdots \\
\vdots \\
\vdots
\end{bmatrix}
\begin{bmatrix}
u(k) \\
u(k+1) \\
\vdots \\
u(k+s)
\end{bmatrix} \tag{14}
\]

The residual generation using PS is defined by:

\[
r_{i}(k) = V_{i}(Y_{i}(k,s) - G_{i}(s)U(k,s)) \tag{15}
\]

\[
=V_{i}.(H_{i}(s)X(k) + F_{i}(k,s))
\]

\[
=V_{i}.F_{i}(k,s)
\]

Where: H_{i}(s) is the observability matrix of “s” order, Y_{i} and U are constructed using temporal redundancies of the known input and output, G_{i}(s) is the control matrix, Matrix F_{i} is related to faults, V_{i} is a projection vector, which is derived from the following relation:

\[
V_{i}.H_{i} = 0 \tag{16}
\]

The V_{i} vector parity is (1×3) dimension and is defined by (17). Many solutions are available for V_{i}.H_{i} = 0. This leads to freedom degrees V_{i} parameters choice. One solution is given by (18).

\[
V_{i} = \begin{bmatrix} V_{1} & V_{2} & V_{3} \end{bmatrix} \tag{17}
\]

\[
V_{i} = \begin{bmatrix} A^{2} - 2A \quad 1 \end{bmatrix} \text{ when } A = 1 - \frac{1}{\alpha} T_{a} \tag{18}
\]

Where \( \alpha \) is the current time constant.

In electrical drives, sampling time T_{a} can be very small (3 to 5\mu s are typical values for high-performance systems), while current time constant is generally higher than 100\mu s. The approximation is applied; the derived simple vector projection V_{i} and the residual r are expressed by:
\[ V_i = \begin{bmatrix} 1 & -2 & 1 \end{bmatrix} \]  
\[ r = y_0 - 2y_1 + y_2 \]

To make residual variations when a fault occurs more significant and easier to detect, the absolute value is considered:

\[ r = |y_0 - 2y_1 + y_2| \]

Residual depends only on sensor outputs; therefore, it can be applied to any system with fast reading acquisition, independently of the complexity of the system model.

In this paper, without the necessity of a decision algorithm as most of FDI methods developed in the literature, the residual allows the isolation of a faulty sensor directly and it does not depend on any system parameters.

**B. Proposed FTC Algorithm**

In order to obtain high-performance motor drives, a modern control strategy like DTC-SVM is employed. This technique is inherently dependent on the measurement sensors, which should operate properly. However, when these sensors fail, the control system needs to compensate for the failure to function properly. This necessitates the backup systems to support the proper operation of the drive in case of sensor failure.

The adopted control method uses three current sensors instead of only two as usually done. Residuals for FDI are then provided via analytical redundancy introduced by the third sensor. In the case of current sensor fault occurrence, the faulty current sensor is eliminated and the control continues working with the two other healthy sensors.

Assuming that the controlled induction motor drive is with an isolated neutral \((I_a + I_b + I_c = 0)\), the reconfigurable currents \((I_{ar}, I_{br} \text{ and } I_{cr})\) can be computed in different ways as presented in TABLE II.

**V. XSG Simulation Results of Proposed Scheme**

The main objective of this paper is to design an easy and simple FTC that is based on PS and implements it on a Xilinx Virtex 5 XC5VFX50T-1FFG1136 (ML506) FPGA. The first step consists of building of the control algorithm with the use of XSG blocks. The modeling system is then connected to the simulated power system by the Gateway “In” and Gateway “Out” blocks, as shown in Fig.8. Once the design of the system is completed and gives the desired simulation results, the VHDL code can be generated by the XSG tool. After a generation of VHDL code and the synthesis, we can generate the bit stream file. Then we can move this configuration file to program the FPGA.

---

**TABLE II**

<table>
<thead>
<tr>
<th>Faulty Current</th>
<th>( I_a )</th>
<th>( I_b )</th>
<th>( I_c )</th>
</tr>
</thead>
<tbody>
<tr>
<td>No fault</td>
<td>( I_{ar} )</td>
<td>( I_{br} )</td>
<td>( I_{cr} )</td>
</tr>
<tr>
<td>1</td>
<td>(- (I_a + I_b))</td>
<td>( I_a )</td>
<td>( I_c )</td>
</tr>
<tr>
<td>2</td>
<td>( I_b )</td>
<td>(- (I_a + I_c))</td>
<td>( I_c )</td>
</tr>
<tr>
<td>3</td>
<td>(- (I_b + I_c))</td>
<td>( I_a )</td>
<td>(- (I_a + I_b))</td>
</tr>
</tbody>
</table>
Control system's algorithm must be functionally validated before implementation. Results using hardware co-simulation is presented to assess the ability of this diagnostic approach to detect isolate and reconfigure sensor faults in an IM. In order to respect technical constraints of the power inverter, the sampling period is 50 μs. The machine is running at 300 rad/sec. The flux and torque references used are 0.91 Wb and 10 N.m respectively.

In Fig. 9 (a and b) is illustrated that compared with the conventional DTC, ripples of stator flux is reduced.

The simulation results in Fig.10 (a and b) shows that the torque's ripples with DTC-SVM in steady state are reduced significantly compared with conventional DTC.

In all simulation presented can be observed a significant better behavior of the performance achieving the main objectives of the present work which was to reduce the torque ripple and maintain a good torque response as the conventional DTC. The performance of the control system is improved.

Fig. 11 shows the residual evolution when a current sensor fault occurs (at t = 1.5 s) in Ia. The measured current changes in a discontinuously, and this change is effectively detected by the FDI algorithm as a sensor fault. In order to get clear curves, magnitude of faults is chosen relatively high: a 2.5 offset error. Faults are applied at t=1.5s, where Imax = 5A. The maximum value of residual is R +2d ≈0.1+2×2.5=5 for an offset fault. When a fault occurs on one AC current sensor, Ia, Ib or Ic, the corresponding residual Rm, Rn or, Rc, respectively, becomes superior to the defined threshold, ε. In this case, the threshold was set equal to 0.1A.

Fig. 12 shows the measured current, with faulty sensor, reconfigured current, and the fault indicator Df. After a fault occurrence, the FTC is very rapidly reacting to prevent any undesirable event such as cascaded failures. Therefore, the time duration of the fault detection and the execution time of the backup strategy are very small as illustrated in Fig. 12. The quality and the quickness of the transition between the healthy operation mode and the faulty one are demonstrated.

VI. SYNTHESIS RESULTS OF PROPOSED SCHEME

The main objective of this paper is to design an easy and simple FTC that is based on PS and implement it on a Xilinx Virtex 5 XC5VFX50T-1FFG1136 FPGA. The whole design is developed under XSG environment. In the first step, we begin by implementing the proposed architectures using the XSG blocks available on the Simulink library. Once the design of the system is completed and gives the desired simulation results, the VHDL code can be generated by the XSG tool. After a generation of VHDL code and the synthesis, we can generate the bit stream file. Then we can move this configuration file to program the FPGA.

System Generator gives hardware co-simulation, making it possible to incorporate a design running on an FPGA directly into a Simulink simulation. Hardware co-simulation compilation targets automatically create a bit stream and associate it with a block. Hardware in the loop (HIL), or FPGA in the loop, is a concept that as revealed by the name uses the hardware in the simulation loop. This leads to easy testing and the possibility to see how the actual plant is behaving in hardware. By having the stimuli in software on the PC, implementing a part of the loop in hardware and then receiving the response from hardware back in the software.

The synthesis is the process of transforming one representation in the design abstraction hierarchy to another representation. In this step, the VHDL codes are synthesized to be converted into gate level view of the FTC architecture. Resource utilization of FTC implementation on FPGA is shown in Table III. It presents the information concerning the number of Input Output blocks, Slices Registers, Slice LUTs and number of DSP. The performance of the hardware solution based on the FPGA in terms of execution time is shown in Table IV.
The performances of the hardware solution based on the FPGA compared to software solutions: Digital Signal Processor dSPACE and microcontroller, is shown in Fig. 13.

Where $T_{ADC}$ is Analogue to digital conversion time and $T_{exe}$ is Temps execution.

The whole ejection time control of the control algorithm is equal to:

\[
T_{ADC} + T_{exe} + T_{ADC} + T_{exe} = T_{ADC} + T_{exe} + T_{ADC} + T_{exe}
\]
In [21], this paper deals with experimental validation of a reconfiguration strategy for sensor fault-tolerant control in IM. The proposed active FTC is implemented using a dSPACE 1103. In paper [22], the execution time is of 300µs using the dSPACE 1102. Using the dSPACE, the sampling time is to 100µs, due to the sequential processing of the dSPACE [23-26]. In this paper, using the FPGA the execution time of the control algorithm of IM is (1 to 2µs). Therefore, the obtained execution time using the FPGA is far lower compared to the software solutions.

VII. CONCLUSION

In this paper, we propose a new FTC algorithm, through system reconfigurations, under certain faulty sensor conditions. The purpose is to reduce the safety hazards risk, in case of faulty sensors. The design is derived from a PS approach and is based on temporal redundancies. The developed FTC is available for sensors measuring variables, with gradual change. Sudden faults are detected, even low magnitude faults. Simulation results show the performances of the proposed control strategy in terms of ripples of the electromagnetic torque and the stator flux. The implementation results show the performances of the hardware implementation in terms of design time which is reduced, low execution time, and minimal resources utilization.

REFERENCES


Simulation and performance analysis of Multiple PCS sensors system

Pawan Whig, Syed Naseem Ahmad and Surinder Kumar

Abstract—In this paper, a novel circuit is presented which overcome a serious limitation found in case of multiple sensors system. In this novel system design only one reference electrode and few active components used that makes the implementation of a low-cost system for the supervision of water quality. Photo Catalytic Sensor (PCS) estimates the parameter BOD (Biological Oxygen Demand) which is generally used to estimate quality of water. The system proposed in this paper involves a balanced bridge approach using few electronic components that provides a correlation in the input - output signals of low-cost sensors. The main reason of employing a readout circuit to PCS circuitry, is the fact that the fluctuation of $O_2$ influences the threshold voltage, which is internal parameter of the FET and can manifest itself as a voltage signal at output but as a function of the transconductance gain. The transconductance is a passive parameter and in order to derive voltage or current signal from its fluctuations the sensor has to be attached to readout circuit. This circuit provides high sensitivity to the changes in percentage of $O_2$ in the solution.

Index Terms— Biological Oxygen Demand, Multiple Sensor System, Photo Catalytic Sensor, Environment.

Original Research Paper
DOI: 10.7251/ELS1620085W

I. INTRODUCTION

Water is an elixir for life and with the development of industries and tanneries, water bodies are getting polluted. The wastes from these industries are discharged into water resulting in degradation of water. Health issues are a major concern due to increased quantities of the pollutants in water systems. The pollution in the water reduces the oxygen content which disturbs the balance in the aquatic life. So there is a strict need to monitor the quality of water and prevent further pollution being done. Chemical Oxygen Demand (COD) is used to determine the amount of organic pollutants in water [1]. The flow injection analysis (FIA) method proposed by Kim is used to determine the COD by using photochemical column. However, this approach is very time consuming and requires a complex setup [2]. Ample amount of computation time is required to get the results. To address this problem, a SPICE model for PCS is introduced by Whig and Ahmad, which is more user friendly and has less response time [3]. The circuit previously designed is not suitable for the multiple input PCS one of the major reason is that there are as many input reference electrodes are needed as the number of input sensors are increased, in other words the circuit is not suitable for the multiple input sensors network [4]. To deal with such a problem a new circuit is implemented with the facility of multiple inputs using only the single reference electrode and few active components. The major advantage of this circuit is that with the use of few active components and grounded reference electrode can make the circuit to overcome the problem of using multiple reference electrodes as inputs in an array of sensors.

II. PHOTO CATALYSIS PROCESS

The process of photo catalysis is a proficient method for degrading organic compounds. Various literatures are available on the different mechanisms and equations involved in the process for gaining a better knowledge [5-7]. The semiconductor material consists of two bands which are valence band and conduction band. The energy gap between these two bands is known as band gap given by $E_g$. The electrons from the valence band jump to conduction band which may be empty when a light of energy higher than band gap energy falls on the semiconductor material. Holes are left behind in the valence band due to excitation of electrons to higher energy band. These holes on reaching the surface of the organic molecule reacts with water to give OH radicals for oxidizing the organic pollutants. The dissolved oxygen in the molecular form acts as a scavenger of the photo generated electrons and forms a peroxide radical ion. Titanium oxide has the ability to cause photo-oxidative destruction of the organic pollutants and is non-corrosive in nature due to which it is used as a catalyst in the process [8-10]. The oxygen content in any given sample can be determined by observing the change in dissolved oxygen concentration during the process of photo catalysis.

In Photo catalysis process a floating gate electrode is used. The sunlight or UV radiations fall on $TiO_2$ which further act as a catalyst to speed up the photo catalysis process. PCS senses
the changes in the oxygen concentration and its voltage levels change as an indication. The complete Photo catalysis process is shown in Figure 1.

Figure 1 Photo catalysis process

III. PHOTO CATALYSIS SENSOR

The SPICE model for PCS is given in [3]. It is basically a MOSFET having structural difference in which the gate terminal is kept inside the solution and diffusion and quantum capacitances are added to overcome the effect of Helmholtz and diffusion layer [11-13]. The cross section of PCS is shown in Figure 2.

Figure 2 Cross-section of PCS

The threshold voltage equation for the PCS model is given as:

\[ V_{th(PCS)} = E_{Ref} - \Psi_{sol} - \chi_{sol} - \frac{-\Phi_s}{q} - \frac{Q_{ox} + Q_{a} + Q_{B}}{C_{ox}} + 2\Phi_f \]  

\[ (1) \]

\( \Psi_{sol} \) is an input parameter of the equation which is dependent on the concentration of \( O_2 \) in the solution and surface dipole potential \( \chi_{sol} \). Here \( E_{Ref} \) is the constant reference electrode potential. For different concentrations of \( O_2 \), different V-I curves for PCS can be plotted. \( \Psi_{sol} \) is a function of \( O_2 \) and as the saturation cut-off current \( I_0 \) increases the value of the oxygen concentration level decreases. The circuit for PCS as given in [14] is shown in Figure 3.

Figure 2 Circuit for PCS

Here \( C_M \) is the resultant of \( C_{ox} \) and \( C_q \) which are oxide and quantum capacitances respectively. The equivalent capacitance \( C_M \) is given as:

\[ \frac{1}{C_M} = \frac{1}{C_q} + \frac{1}{C_{ox}} \]

\[ (2) \]

The drain current equation in non-saturation mode for PCS is given as:

\[ I_{ds} = C_{ox} \mu \frac{W}{L} \left( \frac{V_{gs} - V_t}{V_{ds} - \frac{1}{2}V_{ds}^2} \right) \]

Where

\( C_{ox} = \) Oxide capacitance per unit area,
\( \mu = \) Mobility of electrons in the channel,
\( W = \) Channel width
\( L = \) Length of the Channel

Various Process Parameters including length of channel and channel width are chosen according to the 120 nm CMOS process model.

According to the characteristics of the MOSFET gate to source voltage, \( V_{gs} \) known as reference voltage drain current is allowed to vary with drain to source voltage keeping reference voltage constant. Comparing PCS with MOSFET keeping the concentration of \( O_2 = 1 \text{mg/l} \) it is found that the curve resembles with the characteristic Vds /Ids curve of MOSFET keeping \( V_{gs} \) constant. Now keeping the reference voltage \( V_{gs}=0 \) it is observed that for different concentration levels of \( O_2 \), different Vds/Ids curves are obtained as shown in Figure 4. From the above it is observed that as the oxygen concentration level decreases saturation cut off current \( I_{ds} \) increases hence it is concluded that PCS can be treated as MOSFET on the basis that the chemical input parameter \( \Psi_{sol} \) is a function of \( O_2 \) (\( \Psi_{sol} = f(Oxygen) \)). For the different values of oxygen content the curves between \( I_{ds} \) and \( V_{ds} \) is shown in Figure 4.

IV. DEVICE DESCRIPTION AND ANALYSIS

The PCS generates potential proportional to activity of detected oxygen ion. Potential in PCS is measured against the reference electrode. The Potentiometric method previously used had a serious limitation that for multiple sensors network multiple reference electrodes are needed [4].
To measure the change in the concentration of dissolved oxygen through a corresponding shift in the device threshold voltage in case of multiple sensor network using single electrode and few passive components is shown in Figure 5.

The sensing readout circuit detect the ion concentration of the solution with the feature of Constant Voltage Constant Current (CVCC) operation mode and floating reference electrode which made the design simple and robust. In this configuration, a Zener diode is used to stabilize the bridge network from any variations occur due to source. The bridge network is further use to maintain a constant potential difference across the inverting and non-inverting terminals of Op-Amp. PCS is connected at one of the arm of Balance Bridge as shown in the Figure 6. Any change occur at gate terminal of the PCS will be detected at the output of Op -Amp. To maintain and operate the PCS in the linear region, the gate to source voltage variation of PCS threshold voltage should be directly proportional to the variations of the dissolved oxygen values. Potential difference between the gate sensing membrane and the reference electrode is determine by the ion concentration of the solution. The readout circuit is to be implemented by integrated circuit. The measured signal is the output from amplifier.

V. SIMULATION

Various simulation analysis carried on the multiple sensor network using single reference electrode device are included as follow

A. Transient analysis

The transient analysis of the multiple sensor network using single reference electrode device is done on Multisim Version 11 of National Instruments and shown in Figure 7 and it is observed that the response is highly linear indicating that the device is stable. On plotting a linear trend line between $V_{out}$ and $V_{in}$ the coefficient of determination $R^2$ is found to be $99.7\%$ with standard error of 0.02. The coefficient of determination $R^2$ is useful because it gives the proportion of the variance (fluctuation) of one variable that is predictable from the other variable. It is a measure that allows us to determine how, certain one, can be in making predictions from a certain model. The coefficient of determination is a measure of how well the regression line represents the data. If the regression line passes exactly through every point on the scatter plot, it would be easy to explain all the variations.
component and propagates it to the output of the circuit sweeping through the frequency range specified in the analysis dialog box. Noise analysis calculates the noise contribution from each resistor and semiconductor device at the specified output node. Each resistor and semiconductor device is considered a noise generator. Each noise generator’s contribution is calculated and propagated by the appropriate transfer function to the output of the circuit. The total output noise (onoise) at the output node is the root mean square (RMS) sum of the individual noise contribution. The result is then divided by the gain from input source to the output source to get the equivalent input noise (inoise). This is the amount of noise which, if injected at the input source into a noiseless circuit, would cause the previously calculated amount of noise at the output. The total output noise voltage can be referenced to ground or it may be referenced to another node in the circuit. In this case, the total output noise is taken across these two nodes. The onoise and inoise for the given device is shown in Table 1.

**TABLE 1**

<table>
<thead>
<tr>
<th>Potentiometric Circuit</th>
<th>Integrated Noise Analysis</th>
</tr>
</thead>
<tbody>
<tr>
<td>Onoise_total</td>
<td>0.00876p</td>
</tr>
<tr>
<td>Inoise_total</td>
<td>0.00880p</td>
</tr>
</tbody>
</table>

The noise figure is used to specify the extent of noise in a device. For a transistor, noise figure is simply a measure of how much noise the transistor adds to the signal during the amplification process. In a circuit network, the noise figure is used as a “Figure-of-merit” to compare the noise in a network with the noise in an ideal or noiseless network. It is a measure of the degradation in signal-to-noise ratio (SNR) between the input and output ports of a network. When calculating the noise figure of a circuit design, Noise Factor (F) must also be determined. This is the numerical ratio of noise figure, where noise figure is expressed in db. Thus,

\[ F = \frac{\text{InputSNR}}{\text{OutputSNR}} \]

The noise figure analysis of the device is observed to be 0.0399db.

**C. Fourier analysis**

Fourier analysis is a method of analysing complex periodic waveforms. It permits any non-sinusoidal period function to be resolved into sine or cosine waves and a DC component. This permits further analysis and allows you to determine the effect of combining the waveform with other signals. Each frequency component of the response is produced by the corresponding harmonic of the periodic waveform. Each term is considered a separate source. According to the principle of superposition, the total response is the sum of the responses produced by each term. It is observed that, amplitude of the harmonics decreases progressively as the order of the harmonics increases. This indicates that comparatively few terms yield a good approximation. Fourier spectrum of the device is shown in Figure 8.

**TABLE 2**

<table>
<thead>
<tr>
<th>Harmonics</th>
<th>Frequency(Hz)</th>
<th>Magnitude</th>
<th>Phase</th>
</tr>
</thead>
<tbody>
<tr>
<td>1</td>
<td>1000</td>
<td>0.0855</td>
<td>-92.54</td>
</tr>
<tr>
<td>2</td>
<td>2000</td>
<td>0.0876</td>
<td>-94.58</td>
</tr>
<tr>
<td>3</td>
<td>3000</td>
<td>0.0851</td>
<td>-95.94</td>
</tr>
<tr>
<td>4</td>
<td>4000</td>
<td>0.0847</td>
<td>-97.88</td>
</tr>
<tr>
<td>5</td>
<td>5000</td>
<td>0.0842</td>
<td>-99.87</td>
</tr>
<tr>
<td>6</td>
<td>6000</td>
<td>0.11</td>
<td>-102.89</td>
</tr>
<tr>
<td>7</td>
<td>7000</td>
<td>0.0831</td>
<td>-103.86</td>
</tr>
<tr>
<td>8</td>
<td>8000</td>
<td>0.0825</td>
<td>-105.93</td>
</tr>
<tr>
<td>9</td>
<td>9000</td>
<td>0.081</td>
<td>-108.03</td>
</tr>
</tbody>
</table>

The comparison between Spice Model and the FIA analysis readings has been shown in Table.3.

**TABLE 3**

<table>
<thead>
<tr>
<th>Parameters</th>
<th>Results obtained from Spice Model</th>
<th>Results obtained from FIA analysis</th>
</tr>
</thead>
<tbody>
<tr>
<td>Multiple R</td>
<td>0.983</td>
<td>0.958</td>
</tr>
<tr>
<td>R²</td>
<td>0.966</td>
<td>0.918</td>
</tr>
<tr>
<td>Standard Error</td>
<td>0.026</td>
<td>0.040</td>
</tr>
<tr>
<td>Complexities</td>
<td>Less complex</td>
<td>More complex</td>
</tr>
<tr>
<td>Cost</td>
<td>Inexpensive</td>
<td>Expensive</td>
</tr>
<tr>
<td>Accuracy</td>
<td>More accurate</td>
<td>Less accurate</td>
</tr>
<tr>
<td>Behavior</td>
<td>Fairly linear</td>
<td>Non linear</td>
</tr>
</tbody>
</table>

**Inference from table**

a. The value of $R^2$ in case of Spice Model which shows the direction of a linear relationship between peak height decrease
in current ($\Delta I$) and dissolved oxygen concentration decrease ($\Delta O_2$) is greater compared to FIA model.

b. The value of standard error in Spice Model is found to be smaller which shows better accuracy of the Spice Model.

c. Spice Model is designed on CAD tools hence it is less complex, inexpensive, fairly linear and more accurate as compared to FIA Model.

VI. RESULT

Various analyses on the multiple sensor network using single reference electrode device reveal that the device has fairly good performance. Power analysis on Tanner Tool shows that the device consumes very low power in order of 10 $\mu$W. The slew rate of the device is good. The output observed in Figure 7 is highly linear, indicating that the device is stable. Coefficient of determination $R^2$ is found to be 99.7% with standard error of 0.02. A significant advantage of the proposed design is that with the use of only few active components and using grounded reference electrode one can overcome the problem of using multiple reference electrodes as inputs in an array of sensors.

VII. CONCLUSION

In this paper a simple and powerful approach to develop simulated computer models of multi-bioelectronics sensor network using single reference electrode is presented. The device has a simple architecture, and hence it is very suitable for water quality monitoring applications. This study may be extended for further improvements in terms of power and size, besides the wiring and layout characteristics level. This technique can be the area of interest for the new researcher working in the same field.

REFERENCES


A Novel Purely Active Electronically Controllable Configuration for Simulating Resistance in Floating Form

Mayank Srivastava and Dinesh Prasad

Abstract—This paper proposes a new purely active floating resistance simulation circuit employing two voltage differencing trans-conductance amplifiers (VDTAs). The proposed configuration enjoys following advantageous features; (i) purely active realization (ii) electronically tunable resistance (iii) no requirement of any active/passive component matching constraint (iv) good non-ideal behavior and (v) low sensitivity values. The Influence of VDTA terminal parasitics on high frequency behavior of proposed circuit is also investigated. The workability of proposed resistor simulator has been verified by an application example of voltage mode low-pass filter. To validate the theoretical analysis, SPICE simulations with TSMC 0.18µm CMOS process parameters have been performed.

Index Terms— Active simulation, Electronic control, Floating resistance, VDTA

I. INTRODUCTION

ACTIVE simulation of floating passive elements (resistors/capacitors/inductors) is a fascinating research area for analog circuit designers and researchers. Several floating passive component simulators employing different active building blocks (ABBs) have been proposed in [1]-[18] and reference cited therein. The floating resistor is an integral part of many analog circuits but from the viewpoint of monolithic integration it is not advisable to use a resistor in floating form as a floating resistor need more chip area than a grounded resistor as well as it is very difficult to design such a resistor with exact resistance value. Moreover, the resistance of such resistor is fixed and cannot be changed as per requirement. So, the active simulation of floating resistors has become a popular research area in which a floating resistor is realized either by using active element(s) along with external resistor(s) or by using active element(s) alone. Many synthetic floating resistor configurations using different active elements such as operational amplifier (OP-AMP) [6], operational trans-conductance amplifier (OTA) [7], modified current feedback operational amplifier (MCFOA) [8], second generation current conveyor (CCII) [9]-[11], [14] differential difference current conveyor (DDCC) [12], current controlled second generation current conveyors (CCCII) [13], current backward trans-conductance amplifier (CBTA) [15]-[16], current follower trans-conductance amplifiers (CFTA) [17] and differential voltage second generation current conveyor (DVCCII) [18] have been reported in literature but unfortunately all of these proposed configurations suffer one or more of following disadvantages:

- Use of excessive number of active elements (more than two) [6], [9], [11], [13].
- Use of external resistor(s) [6]-[11], [14]-[16], [18].
- Non-availability of electronic control [6], [8], [9]-[11], [18].
- Need for active/passive component matching constraint(s) [16]-[17].
- Degraded non ideal performance [18].

Therefore, the aim of this paper is to propose a new purely active synthetic floating resistor configuration employing VDTA. The proposed realization employs only two VDTAs and exhibit following advantageous features; (i) no requirement of any external resistor (ii) electronically tunable resistance (iii) no requirement of any matching constraint (iv) good non-ideal behavior and (v) low active and passive sensitivities.

II. PROPOSED CONFIGURATION

The voltage differencing trans-conductance amplifier (VDTA) is one of the active elements which have been introduced in [19]. It provides currents and voltages at different terminals with electronically controllable trans-conductance gains. Therefore, VDTA block is very suitable for synthesis and design of active circuits with electronic control feature. Fig.1 shows the symbolic representation of VDTA, where P and N are input ports, z is auxiliary port and X+ and X- are output ports. All the ports of VDTA exhibit high impedance levels. The CMOS implementation of VDTA [20] has been shown in Fig. 2.
The port relations of ideal VDTA shown in Fig. 1 can be characterized by the following hybrid matrix:

\[
\begin{bmatrix}
I_Z \\
I_{X^+} \\
I_{X^-}
\end{bmatrix} =
\begin{bmatrix}
g_{m_1} & -g_{m_1} & 0 \\
0 & 0 & g_{m_2} \\
0 & 0 & -g_{m_2}
\end{bmatrix}
\begin{bmatrix}
V_P \\
V_N \\
V_Z
\end{bmatrix}.
\] (1)

The trans-conductance gains \(g_{m_1}\) and \(g_{m_2}\) of CMOS VDTA shown in Fig. 2 are given as

\[
g_{m_1} = \frac{g_3 + g_4}{2}
\] (2)

\[
g_{m_2} = \frac{g_6 + g_7}{2}
\] (3)

Where, \(g_n\) is the trans-conductance of \(n^\text{th}\) MOS transistor given as

\[
g_n = \frac{I_{g_n} \mu_n C_{OX} W}{L}
\] (4)

Where, \(\mu_n\) is carrier mobility, \(C_{OX}\) is capacitance of gate-oxide layer per unit area, \(W\) is MOS transistor’s effective channel width, \(L\) is effective channel length and \(I_{g_n}\) is bias current of \(n^\text{th}\) transistor.

VDTA find several applications in designing of analog filters [21]-[23], oscillators [24] and inductor simulators [25] but there is no application in simulation of floating resistor has been reported so far. So, this paper is an effort to fill this void.

The proposed purely active floating resistor simulator is shown in Fig. 3.

The Routine circuit analysis of configuration shown in Fig. 3 yields short circuit admittance matrix as:

\[
\begin{bmatrix}
I_{in} \\
I_{out}
\end{bmatrix} =
\begin{bmatrix}
g_{m_1} & g_{m_2} \\
g_{m_1} & g_{m_2}
\end{bmatrix}
\begin{bmatrix}
+1 & -1 \\
-1 & +1
\end{bmatrix}
\begin{bmatrix}
V_{in} \\
V_{out}
\end{bmatrix}
\] (5)

Which simulate a floating resistor with resistance value,

\[
R_{eq} = \frac{g_{m_1}}{g_{m_1} g_{m_2}}
\] (6)

Where \((g_{m_1}, g_{m_2})\) and \((g_{m_3}, g_{m_4})\) are the trans-conductance gains of VDTA-1 and VDTA-2 respectively.

One can observe from (6) that the resistance of proposed synthetic resistor can be controlled electronically by varying trans-conductance gains \(g_{m_1}, g_{m_2}, g_{m_3}\), and \(g_{m_4}\). It is important to note that the presented circuit can also simulate negative floating resistor (-R) by appropriate interchanging of \(p\) and \(n\) and/or \(x^+\) and \(x^-\) ports of VDTAs. Such negative resistor simulator can be used for parasitic cancellation purpose.

### III. NON-IDEAL ANALYSIS

In the non-ideal ideal case, the VDTA can be characterized by the following equations

\[
I_Z = \beta_Z g_{m_1} (V_P - V_N)
\] (7)

\[
I_{X^+} = \beta_{x^+} g_{m_2} V_Z
\] (8)

\[
I_{X^-} = -\beta_{x^-} g_{m_2} V_Z
\] (9)

where \(\beta_Z, \beta_{x^+}\) and \(\beta_{x^-}\) are non ideal trans-conductance gain errors.

To check the behaviour of presented configuration under non ideal conditions, it is revisited considering the non ideal model of VDTA described by (7)-(9). The short Circuit admittance matrixes and floating resistances values of proposed simulator under the influence of VDTAs non-idealities are
idealties can be re-expressed as

\[
\begin{bmatrix}
I_{in} \\
I_{out}
\end{bmatrix} = \begin{bmatrix}
g_m \beta_{x_1} - \beta_{z_1} & \beta_{z_2} \\
g_m \beta_{x_2} - \beta_{z_1}
\end{bmatrix} \begin{bmatrix}
1 \\
-1
\end{bmatrix} \begin{bmatrix}
V_{in} \\
V_{out}
\end{bmatrix}
\] (10)

and

\[R_{eq} = \frac{g_m \beta_{x_2} - \beta_{z_1}}{g_m \beta_{x_2} - \beta_{z_1}}\] (11)

where \((\beta_{z1}, \beta_{x1+}, \beta_{x1-})\) and \((\beta_{z2}, \beta_{x2+}, \beta_{x2-})\) are the trans-conductance gain errors of VDTA-1 and VDTA-2 respectively.

It is clear from (11) that even under the non ideal conditions the proposed configuration simulates the lossless floating resistor.

The sensitivity figures of resistance of simulated floating resistor with respect to trans-conductance gains/ gain errors are found as;

\[
S_{R_{eq}} = S_{\beta_{x_1}} = -1, S_{R_{eq}} = S_{\beta_{x_2}} = S_{\beta_{z_1}} = 0,
\]

\[
S_{\beta_{z_2}} = S_{\beta_{z_2}} = S_{\beta_{z_2}} = S_{\beta_{z_2}} = -1
\] (12)

So, all the sensitivity values are low and not more than unity in magnitude.

IV. EFFECTS OF PARASITIC IMPEDANCES

At high frequency, the terminal parasitic of VDTA comes into the picture and effect the performance of a VDTA based circuit. A conventional VDTA along with its port parasitics has been shown in Fig. 4.

To study the effects of terminal parasitic of VDTAs on proposed resistance simulator configuration, this configuration was examined including port parasitics of VDTA. Fig. 5 shows the proposed resistance simulator with port parasitic of VDTA-1 and VDTA-2.

As the proposed configuration does not has any external passive component, so effects of port parasitic impedances cannot be alleviated and these parasitics will limit the high frequency behaviour of proposed configuration. So, the maximum usable frequency under the influence of parasitics can be found as:
The workability of proposed circuit is also verified by low-pass filter design example. The passive RC low-pass filter employing a floating resistor and a grounded capacitor has been shown in Fig. 6 and active realization of this low-pass filter using proposed resistor simulator is shown in Fig. 7.

The voltage mode transfer function obtained from Fig. 7 is given by equation (22), which is low-pass transfer function.

\[
\frac{V_{out}(s)}{V_{in}(s)} = \frac{1}{1 + sC_0g_{m_1}g_{m_2}}
\]  

(22)

V. APPLICATION EXAMPLE

The workability of proposed circuit is also verified by low-pass filter design example. The passive RC low-pass filter employing a floating resistor and a grounded capacitor has been shown in Fig. 6 and active realization of this low-pass filter using proposed resistor simulator is shown in Fig. 7.

The voltage mode transfer function obtained from Fig. 7 is given by equation (22), which is low-pass transfer function.

\[
\frac{V_{out}(s)}{V_{in}(s)} = \frac{1}{1 + sC_0g_{m_1}g_{m_2}}
\]

Fig. 6. Passive realization of voltage-mode low-pass filter

![Fig. 6. Passive realization of voltage-mode low-pass filter](image_url)

Fig. 7. Active realization of voltage-mode low-pass filter shown in Fig 6 employing proposed floating resistance simulator

![Fig. 7. Active realization of voltage-mode low-pass filter](image_url)

The simulation were performed employing CMOS VDTA (shown in Fig. 2) with supply voltages ±0.9 VDC and bias currents \(I_{b1} = I_{b2} = I_{b3} = I_{b4} = I_{b5} = I_{b6} = I_{b7} = I_{b8} = I_b = 150 \mu A\), where \(I_{b1}, I_{b2}, I_{b3}\) and \(I_{b4}\) are the bias currents of VDTA1 and \(I_{b5}, I_{b6}, I_{b7}\) and \(I_{b8}\) are the bias currents of VDTA2. The magnitude and phase response of impedance of proposed simulator have been shown in Fig. 8 and Fig. 9 respectively. It is seen from Fig. 8 that simulated magnitude response is approximately same as ideal magnitude response up to 292 MHz frequency (simulated resistance value is found 1.562 kΩ while ideal value is 1.571 kΩ upto 292 MHz). The phase responses as shown in Fig. 9 clearly indicate that simulated phase response matches the ideal phase response up to 34 MHz frequency. The deviation of simulated responses from ideal responses at high frequencies is due to presence of VDTA parasitic impedances. To demonstrate the electronic control of proposed configuration, simulations have been performed for different set of bias currents. Fig. 10 illustrated the magnitude responses for \(I_b = 130 \mu A, 110 \mu A\) and \(90 \mu A\). The simulated floating resistance values for \(I_b = 130 \mu A, 110 \mu A\) and \(90 \mu A\) were found 1.667, 1.816 kΩ and 2.002 kΩ respectively, while the ideal values were 1.689 kΩ, 1.836 kΩ and 2.029 kΩ. Hence, the deviation between simulated values and ideal values is not more than 1.5% in limited frequency region. The low-pass filter shown in figure 7 is also simulated using CMOS VDTA with supply voltage of ±0.9 VDC. The value of capacitor \(C_0\) is chosen as 0.1nF. The SPICE simulated frequency responses of this filters is shown in Fig. 11.

![Fig. 8. Magnitude response of input impedance of proposed simulator](image_url)

![Fig. 9. Phase response of input impedance of proposed simulator](image_url)
Fig. 10. Magnitude responses for different bias currents

Fig. 11. Frequency response of low-pass filter shown in Fig. 7

VII. CONCLUSION

A new purely active floating resistor simulator employing two VDTAs has been presented. To the best knowledge of authors there is no purely active floating resistor simulator employing VDTA(s) has been available in literature. The proposed configuration enjoys electronically tunable resistance, low sensitivity values, no requirement of any component matching constraint and excellent non-ideal behavior. The effects of port parasitics of VDTAs also have been investigated in proposed circuit to define high frequency limitation. The application of proposed resistor simulator in designing of a low pass filter has been proposed and verified. The mathematical analysis has been verified by SPICE simulations with TSMC 0.18µm CMOS process parameters.

REFERENCES


Instruction for Authors

Editorial objectives

In the journal "Electronics", the scientific papers from different fields of electronics and electrical engineering in the broadest sense are published. Main topics are electronics, automatics, telecommunications, computer techniques, power engineering, nuclear and medical electronics, analysis and synthesis of electronic circuits and systems, new technologies and materials in electronics etc. The main emphasis of papers should be on methods and new techniques, or the application of existing techniques in a novel way.

The reviewing process

Each manuscript submitted is subjected to the following review procedures:

- It is reviewed by the editor for general suitability for this publication;
- If it is judged suitable, two reviewers are selected and a double-blind review process takes place;
- Based on the recommendations of the reviewers, the editor then decides whether the particular paper should be accepted as it is, revised or rejected.

Submissions Process

The manuscripts are to be delivered to the editor by e-mail: electronics@etfbl.net.

Manuscripts have to be prepared in accordance with the instructions given in the template for paper preparation that can be found on the journal’s web page (www.electronics.etfbl.net).

Authors should note that proofs are not supplied prior to publication and ensure that the paper submitted is complete and in its final form.

Copyright

Articles submitted to the journal should be original contributions and should not be under consideration for any other publication at the same time. Authors submitting articles for publication warrant that the work is not an infringement of any existing copyright and will indemnify the publisher against any breach of such warranty. For ease of dissemination and to ensure proper policing of use, papers and contributions become the legal copyright of the publisher unless otherwise agreed.
TECHNOLOGY-DEPENDENT OPTIMIZATION OF FIR FILTERS BASED ON CARRY-SAVE MULTIPLIER AND 4:2 COMRESSOR UNIT .. 43
Burhan Khurshid and Roohie Naaz

TELESCOPIC OP-AMP OPTIMIZATION FOR MDAC CIRCUIT DESIGN ................................................................. 55
Abdelghani Dendouga and Slimane Oussalah

REAL MEASUREMENTS AND EVALUATION OF THE INFLUENCE OF ATMOSPHERIC PHENOMENA ON FSO COMBINED WITH MODULATION FORMATS .................................................................................................................. 62
Jan Latal, Lukas Hajek, Ales Vanderka, Jan Vitasek, Petr Koudelka, Stanislav Hejduk

A NOVEL SVPWM ALGORITHM CONSIDERING NEUTRAL-POINT POTENTIAL BALANCING FOR THREE-LEVEL NPC INVERTER ...... 69
Chen Yongchao, Li Yanda, and Zhao Ling

HARDWARE IMPLEMENTATION OF FTC OF INDUCTION MACHINE ON FPGA................................................................. 76
S. Boukadida, S. Gdaim and A. Mtibaa

SIMULATION AND PERFORMANCE ANALYSIS OF MULTIPLE PCS SENSORS SYSTEM ................................................. 85
Pawan Whig, Syed Naseem Ahmad and Surinder Kumar

A NOVEL PURELY ACTIVE ELECTRONICALLY CONTROLLABLE CONFIGURATION FOR SIMULATING RESISTANCE IN FLOATING FORM.................................................................................................................................................. 90
Mayank Srivastava and Dinesh Prasad