BSEE, California Institute of Technology, 2012<br>

MSEE, Stanford University, 2015

Admitted to Ph.D. Candidacy: 2013-2014

Email: yanglita AT stanford DOT edu<br>


Research: Energy-Efficient, Approximate Memory for Machine Learning Hardware Accelerators


The success of convolutional neural networks (ConvNets) has led to impressive performance in a wide range of cloud-centric applications including image classification, speech recognition, and text analysis. To reduce latency and the high energy cost of communication with the cloud, our recent work focuses on the development of highly energy-efficient, edge computing ConvNet ASICs 1-3, with the additional benefit that local computation provides better privacy guarantees per user device 4. Unfortunately, the performance of ConvNets is often directly linked to the massive number of parameters needed to encode the network and the availability of representative datasets for training. Deployment of ConvNets in resource-constrained Internet of Everything (IoE) systems remains a challenge 5 due to the high memory energy consumption caused by network storage requirements and substantial data movement.


Recently, there has been an emergence of interest in the field of approximate computing, which explores trade-offs between the performance of an algorithm and hardware energy consumption with reduced precision. ConvNets are one class of algorithms which have been shown to be inherently error resilient, motivating extensive studies on the effect of noise in ConvNets with the goal of decreasing compute energy through methods of approximate computing 6. Similarly, we can leverage the error resilience of ConvNets by accepting bit errors at reduced voltages for memory energy savings (approximate memory), but few implementations utilize this due to the limited understanding of how bit errors affect the classification performance of ConvNets.

&nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;&nbsp;<br>

Approach and Contributions

Motivated by the need to reduce memory energy consumption in hardware ConvNets, and the current lack of understanding of ConvNet tolerance to bit errors, we present the first silicon-validated study on the efficacy of memory voltage scaling in SRAMs on the MNIST and CIFAR-10 datasets 8. Using a hardware-software co-design approach, we demonstrate that supply voltage in SRAMs for MNIST ConvNets can be scaled well below the Vmin and furthermore, with re-training to account for these SRAM bit errors, we demonstrate additional improvements in classification accuracy and energy savings 7. We further show that a uniform bit error model is sufficient to achieve classification accuracies very close to training with the physical SRAM in the loop 7.

Using this framework, we extend these methods to a multi-layer binarized ConvNet performing a more complex image classification task (CIFAR-10), demonstrating that significant errors can accumulate in the network with little to no degradation in classification accuracy 8. Furthermore, we show that additional energy savings are possible by leveraging the different bit error tolerances between weights and activations, and over the different layers of the network 8. Finally, we compare our required bit error tolerances between our MNIST and CIFAR-10 implementations, demonstrating that the CIFAR-10 network is less error resilient but still tolerates bit error rates significantly higher than conventional memory applications 8. Our findings and proposed methods serve as a framework which can be applied to the design of custom memory (e.g. hybrid 8T/6T, larger bitcells) and emerging memory technologies (e.g. RRAM, PCM) for ConvNet applications 9.


#Daniel Bankman, Lita Yang, Bert Moons, Marian Verhelst, and Boris Murmann, “An Always-On 3.8µJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with all Memory on Chip in 28nm CMOS,” IEEE ISSCC Dig. Tech. Papers, February 2018. #Daniel Bankman, Lita Yang, Bert Moons, Marian Verhelst and Boris Murmann, “An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28 nm CMOS,” IEEE J. Solid-State Circuits, vol. 54, no. 1, Jan. 2019. #Bert Moons, Daniel Bankman, Lita Yang, Boris Murmann, Marian Verhelst, “BinarEye: An Always-On Energy-Accuracy-Scalable Binary CNN Processor With All Memory On Chip In 28nm CMOS,” Proc. IEEE CICC, April 2018. #(Invited Paper) Lita Yang and Boris Murmann, “Approximate SRAM for Energy-Efficient, Privacy-Preserving Convolutional Neural Networks,” IEEE Computer Society Annual Symposium on VLSI (ISVLSI), July 2017. #(Invited Paper) Van Tam Nguyen, Nhan Nguyen-Thanh, Lita Yang, Duy H. N Nguyen, Chadi Jabbour, and Boris Murmann, “Cognitive Computation and Communication: A Complement Solution to Cloud for IoT,” 2016 International Conference on Advanced Technologies for Communications (ATC), October 2016. #Boris Murmann, Daniel Bankman, Elaina Chai, Daisuke Miyashita, and Lita Yang, “Mixed-Signal Circuits for Embedded Machine-Learning Applications,” Asilomar Conference on Signals, Systems and Computers, November 2015. #Lita Yang and Boris Murmann, “SRAM Voltage Scaling for Energy-Efficient Convolutional Neural Networks,” Int. Symposium on Quality Electronic Design (ISQED), March 2017. #Lita Yang, Daniel Bankman, Bert Moons, Marian Verhelst, and Boris Murmann, “Bit Error Tolerance of a CIFAR-10 Binarized Convolutional Neural Network Processor,” IEEE International Symposium on Circuits and Systems (ISCAS), May 2018. #Gage Hills, Daniel Bankman, Bert Moons, Lita Yang, Jake Hillard, Alex Kahng, Rebecca Park, Marian Verhelst, Boris Murmann, Max M. Shulaker, H.-S. Phillip Wong, and Subhasish Mitra, “TRIG: Hardware Accelerator for Inference-Based Applications and Experimental Demonstration Using Carbon Nanotube FETs,” Design Automation Conference (DAC), San Francisco, CA, Jun. 2018, pp. 1-10.