Volume 15, Issue 1 (6-2018)                   JSDP 2018, 15(1): 127-138 | Back to browse issues page


XML Persian Abstract Print


Download citation:
BibTeX | RIS | EndNote | Medlars | ProCite | Reference Manager | RefWorks
Send citation to:

Akbarzadeh N, Timarchi S. Modulo 2n+1 Multiply and MAC Units Specified for DSPs. JSDP 2018; 15 (1) :127-138
URL: http://jsdp.rcisp.ac.ir/article-1-543-en.html
Shahid Beheshti University
Abstract:   (4227 Views)

Nowadays, digital signal processors (DSPs) are appropriate choices for real-time image and video processing in embedded multimedia applications not only due to their superior signal processing performance, but also of the high levels of integration and very low-power consumption. Filtering which consists of multiple addition and multiplication operations, is one of the most fundamental operations of DSPs. Therefore, there is a need for an additional unit just after the multiplication unit in DSPs. By combining multiply and add units, new structure named MAC (Multiply and ACcumulate) unit is provided. Residue Number System (RNS) can improve speed and power consumption of arithmetic circuits as it offers parallel arithmetic operations on each moduli and confines carry propagation to each moduli. In order to improve the efficiency of the MAC unit, RNS could be utilized.
RNS divides large numbers to smaller numbers, called residues, according to a moduli set and enables performing arithmetic operations on each moduli independently. The moduli set {2n-1,2n,2n+1} is the most famous among others because of its simple and efficient implementation. Among this moduli set, modulo 2n+1 circuits are the critical path due to (n+1)-bit wide data path despite other two modules which all have n-bit wide operands. In order to overcome the problem of (n+1) bits operands, three representations has been suggested: diminished-1, Signed-LSB and Stored-Unibit. Although different multipliers have been proposed for diminished-1 representation, no multiplication structure has been proposed for the last two ones. Modulo 2n+1 multipliers are divided into 3 categories depending on their inputs and outputs types: both operands use standard (weighted) representation; one input uses standard representation, while the other one utilizes diminished-1 representation; both inputs use diminished-1 representation. Although several multiply and add units have been proposed for the first 2 categories, no MAC unit is proposed for the multipliers of a third category which outperform multipliers of other categories. In this article at first, one modulo 2n+1 MAC unit for the third category is proposed and then for further improvement, pipeline and multi-voltage techniques are utilized. Pipeline structure enables a trade-off between power consumption and delay. Whenever high-performance with least delay is desirable, nominal supply voltage can be chosen (high performance mode) otherwise by reducing supply voltage to the amount at which pipeline circuit and normal circuit without pipeline would have the same performance, power consumption decreases significantly (low power mode).
Simulations are performed in two phases. At first phase, proposed MAC unit without pipeline structure is described via VHDL code and synthesized with synopsys design vision tool. Results indicate that the proposed structure outperforms PDP (Power-Delay-Product) up to 39% compared to the state of the art MAC units. At second phase, CMOS transistor level implementation in two modes i.e. low power and high performance modes with Cadence Design Systems tool is provided. Simulation results indicate that at low power condition, proposed pipeline MAC unit yields to 71% power savings compared to existing circuits without declining efficiency. Furthermore, at high performance condition, however power consumption has increased, reducing delay up to 54% yields to 39% PDP savings for proposed pipeline MAC unit.
 

Full-Text [PDF 4618 kb]   (1518 Downloads)    
Type of Study: Applicable | Subject: Paper
Received: 2017/03/1 | Accepted: 2017/10/25 | Published: 2018/06/13 | ePublished: 2018/06/13

References
1. [1] Timarchi, S. Design and Implementation of Efficient Redundant Residue Number Systems, Ph.D dissertation. Shahid Beheshti University, chapter 1-2, 2010.
2. [2] Rajaeian, A., Grailu, H. Implementation of a Driver Drowsiness Detection System Based on TMAS320C5505A DSP Processor. JSDP; 14 (1) :83-98, 2017. [DOI:10.18869/acadpub.jsdp.14.1.83]
3. [3] Timarchi, S., Ghayour, P. and Shahbahrami, A. A novel high-speed low-power binary signed-digit adder. In Computer Architecture and Digital Systems (CADS), 2012 16th CSI International Symposium on (pp. 70-74). IEEE, May, 2012. [DOI:10.1109/CADS.2012.6316422]
4. [4] Timarchi, S., Fazlali, M. and Cotofana, S.D. A unified addition structure for moduli set {2 n− 1, 2 n, 2 n+ 1} based on a novel RNS representation. In Computer Design (ICCD), 2010 IEEE International Conference on (pp. 247-252). IEEE, October, 2010.
5. [5] Ramirez, J., Garcia, A., Lopez-Buedo, S. and Lloris, A. RNS-enabled digital signal processor design. Electronics Letters, 38(6), pp.266-268, 2002. [DOI:10.1049/el:20020192]
6. [6] Bernocchi, G.L., Cardarilli, G.C., Del Re, A., Nannarelli, A. and Re, M. Low-power adaptive filter based on RNS components. In Circuits and Systems, 2007. ISCAS 2007. IEEE International Symposium on (pp. 3211-3214). IEEE, May, 2007. [DOI:10.1109/ISCAS.2007.378155]
7. [7] Marino, F., Stella, E., Branca, A., Veneziani, N. and Distante, A. Specialized hardware for real-time navigation. Real-Time Imaging, 7(1), pp.97-108, 2001. [DOI:10.1006/rtim.1999.0220]
8. [8] Meyer-Bäse, U., García, A. and Taylor, F. Implementation of a communications channeliz-er using FPGAs and RNS arithme-tic. Journal of VLSI signal processing systems for signal, image and video techno-logy, 28(1-2), pp.115-128, 2001. [DOI:10.1023/A:1008167323437]
9. [9] Bajard, J.C. and Imbert, L. A full RNS implementation of RSA. IEEE Transactions on Computers, 53(6), pp.769-774, 2004. [DOI:10.1109/TC.2004.2]
10. [10] Leibowitz, L. A simplified binary arithmetic for the Fermat number transform. IEEE Transac-tions on acoustics, speech, and signal process-ing, 24(5), pp.356-359, 1976. [DOI:10.1109/TASSP.1976.1162834]
11. [11] Zimmermann, R. Efficient VLSI implementation of modulo (2/sup n//spl plusmn/1) addition and multiplication. In Computer Arithmetic, 1999. Proceedings. 14th IEEE Symposium on (pp. 158-167). IEEE, 1999.
12. [12] Efstathiou, C., Pekmestzi, K. and Axelos, N. August. On the Design of Modulo 2^ n+ 1 Multipliers. In Digital System Design (DSD), 2011 14th Euromicro Conference on (pp. 453-459). IEEE, 2011.
13. [13] Efstathiou, C., Moshopoulos, N., Axelos, N. and Pekmestzi, K. Efficient modulo 2n+ 1 multiply and multiply-add units based on modified Booth encoding. Integration, the VLSI Journal, 47(1), pp.140-147, 2014.
14. [14] Efstathiou, C., Vergos, H.T., Dimitrakopoulos, G. and Nikolos, D. Efficient diminished-1 modulo 2/sup n/+ 1 multipliers. IEEE Transac-tions on Computers, 54(4), pp.491-496, 2005. [DOI:10.1109/TC.2005.63]
15. [15] Lv, X. and Yao, R. Efficient diminished-1 modulo 2 n+ 1 multiplier architectures. In Neural Networks (IJCNN), 2014 International Joint Conference on (pp. 481-486). IEEE, July, 2014.
16. [16] Chen J.W., Yao R.H., Wu W.J. Efficient Modulo 2n+1 multipliers. IEEE Transactions on Very Large Scale Integration (VLSI) Systems, pp. 2149–2157, 2011.
17. [17] Efstathiou, C. and Voyiatzis, I. On the diminished-1 modulo 2 N+ 1 fused multiply-add units. In Design & Technology of Integrated Systems in Nanoscale Era (DTIS), 2011 6th International Conference on (pp. 1-5). IEEE, April, 2011.
18. [18] Illgner, K. DSPs for image and video process-ing. Signal Processing, 80(11), pp.2323-2336, 2000. [DOI:10.1016/S0165-1684(00)00120-1]
19. [19] Timarchi, S., Kavehei, O. and Navi, K. Low Power Modulo 2 n+ 1 Adder Based on Carry Save Diminished-One Number System. Amer-ican Journal of Applied Sciences, 5(4), pp.312-319, 2008. [DOI:10.3844/ajassp.2008.312.319]
20. [20] Piguet, C. Low-power CMOS circuits: techn-ology, logic design and CAD tools. CRC Press, 2005. [DOI:10.1201/9781420036503] [PMCID]
21. [21] Zimmermann, R. and Fichtner, W. Low-power logic styles: CMOS versus pass-transistor lo-gic. IEEE journal of solid-state circuits, 32(7), pp.1079-1090, 1997. [DOI:10.1109/4.597298]
22. [22] Strollo, A.G.M. and De Caro, D. Low power flip-flop with clock gating on master and slave latches. Electronics Letters, 36(4), pp.294-295, 2000. [DOI:10.1049/el:20000268]

Add your comments about this article : Your username or Email:
CAPTCHA

Send email to the article author


Rights and permissions
Creative Commons License This work is licensed under a Creative Commons Attribution-NonCommercial 4.0 International License.

© 2015 All Rights Reserved | Signal and Data Processing