# Danny Bankman

S.B., EE, Massachusetts Institute of Technology, 2012

M.S., EE, Stanford University, 2015

Ph.D., EE, Stanford University, 2019

**Email**: dbankman AT stanford DOT edu

**In January 2020, I will join the Department of Electrical and Computer Engineering at Carnegie Mellon University as an Assistant Professor.**

**Google Scholar Profile:** scholar.google.com/citations

**Research**: *Mixed-Signal Processing for Machine Learning*

**Motivation**

Deep learning has several emerging applications that require very low latency and very high bandwidth, such as conversational agents and augmented reality [1]. These applications have created demand for hardware capable of running deep-learning-based inference on-device, rather than as a cloud-based service. Because deep neural networks with the capacity for practical tasks require millions of parameters and perform billions of arithmetic operations per inference, they can quickly drain a battery running on a general-purpose microprocessor, which is designed for programmability first and energy-efficiency second. My research focuses on custom circuits and microarchitectures for deep-learning-based inference, with energy-efficiency as the first priority.

**Microarchitecture**

A key challenge in DNN hardware architecture is managing the energy cost of memory access, which can exceed the energy cost of arithmetic operations by an order of magnitude for on-chip memory and three orders of magnitude for off-chip memory [2]. For small-scale deep learning applications with memory-efficient DNN architectures, such as image classification with binarized convolutional neural networks, all memory can be integrated on chip [3]. We have demonstrated a weight-stationary, parallel-processing architecture for binary CNNs (Fig. 1) that overcomes the memory energy bottleneck by integrating all memory on chip and amortizing the energy cost of access across many computations [4, 5]. The remaining energy bottleneck is arithmetic computation.

**Mixed-Signal Circuits**

In the low SNR regime where neural networks operate, mixed-signal processing can operate at lower energy consumption than digital [6, 7, 8]. In our binary CNN processor, we demonstrated a switched-capacitor neuron array that operates at an order of magnitude lower energy consumption than an equivalent digital neuron array designed with an RTL-to-GDSII flow [9]. The SC neuron (Fig. 2) achieves this energy savings owing to the precise matching between metal-oxide-metal fringe capacitors in the 28 nm technology, which allows a unit capacitor as small as 1 fF to meet BinaryNet's tolerable limit on statistical variation without degrading accuracy. With this unit capacitor size, the SC neuron consumes significantly lower dynamic energy than the equivalent digital neuron. The binary CNN processor has additionally been used to study how bit errors due to memory voltage over-scaling affect top-level CNN accuracy, and for the development of digital circuit techniques that render the CNN robust to the variations exhibited by carbon nanotube FETs [10, 11].

Fig. 1. Weight-stationary architecture with input reuse and binary-CNN-specific sliced datapath.

Fig. 2. SC neuron multiplies with XNOR gates, adds using voltage superposition, and resolves its 1-bit output using a voltage comparator.

**References**

[1] Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol. 521, pp. 436–444, May 2015.

[2] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2014, pp. 10-14.

[3] M. Courbariaux and Y. Bengio, “BinaryNet: Training deep neural networks with weights and activations constrained to +1 or -1,” CoRR, vol. abs/1602.02830, 2016.

[4] D. Bankman, L. Yang, B. Moons, M. Verhelst and B. Murmann, "An Always-On 3.8μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28nm CMOS," ISSCC Dig. Tech. Papers, San Francisco, CA, Feb. 2018, pp. 222-223.

[5] D. Bankman, L. Yang, B. Moons, M. Verhelst and B. Murmann, "An Always-On 3.8 μJ/86% CIFAR-10 Mixed-Signal Binary CNN Processor with All Memory on Chip in 28 nm CMOS," IEEE J. Solid-State Circuits, vol. 54, no. 1, Jan. 2019.

[6] B. Murmann, D. Bankman, E. Chai, D. Miyashita, and L. Yang, "Mixed-Signal Circuits for Embedded Machine-Learning Applications," Asilomar Conference on Signals, Systems and Computers, Asilomar, CA, Nov. 2015.

[7] D. Bankman and B. Murmann, "Passive charge redistribution digital-to-analogue multiplier," Electronics Letters, vol. 51, no. 5, pp. 386-388, March 5 2015.

[8] D. Bankman and B. Murmann, "An 8-Bit, 16 Input, 3.2 pJ/op Switched-Capacitor Dot Product Circuit in 28-nm FDSOI CMOS," Proc. IEEE Asian Solid-State Circuits Conf., Toyama, Japan, Nov. 2016, pp. 21-24.

[9] B. Moons, D. Bankman, L. Yang, B. Murmann, and M. Verhelst, "BinarEye: An Always-On Energy-Accuracy-Scalable Binary CNN Processor With All Memory On Chip In 28nm CMOS" Proc. CICC, San Diego, CA, Apr. 2018.

[10] L. Yang, D. Bankman, B. Moons, M. Verhelst, and B. Murmann, "Bit Error Tolerance of a CIFAR-10 Binarized Convolutional Neural Network Processor," in Proc. IEEE Int. Symp. Circuits Syst., Florence, Italy, May 2018.

[11] G. Hills, D. Bankman, B. Moons, L. Yang, J. Hillard, A.B. Kahng, R. Park, M. B. Murmann, M. Shulaker, H.-S.P. Wong, and S. Mitra, "TRIG: Hardware Accelerator for Inference-Based Applications and Experimental Demonstration Using Carbon Nanotube FETs," Design Automation Conference (DAC), San Francisco, CA, Jun. 2018.