====== Alex_Omid-Zohoor ======

{{wiki:Alex O.jpg|Alex_Omid_Zohoor}} 

<br> 

BSEE, Stanford University, 2010<br>MSEE, Stanford University, 2011&nbsp;<br>Admitted to Ph.D. Candidacy: 2011-2012<br> 

<br> 

**Towards Always-On Mobile Object Detection: Energy vs. Performance Tradeoffs for Embedded HOG Feature Extraction** 

Recent vision applications such as augmented reality and advanced driver assistance systems (ADAS) require real-time object detection. As stated in [[1]], energy-efficiency is crucial for both applications due to the limited battery life of mobile devices and the heat dissi-pation limits of automotive systems. To identify promising areas for energy reduction, we must analyze object detection from a system level. We focus on Histograms of Oriented Gradients (HOG) features [[2]], which capture localized pixel gradient information, as they are suitable for hardware implementation and have been shown to achieve high object detection performance. <br> 

Object detection can be divided into three main steps—image capture, feature-extraction, and detection. Conventionally, these steps are partitioned as shown in Fig. 1 (a). Since the feature-extraction and detection steps performed by the backend digital processor are computationally complex, memory intensive, and highly parallelizable, significant energy savings can be achieved through custom ASIC design. Reference [[1]] presents such an ASIC, which performs HOG feature-extraction and detection on 1080HD 60 fps video, and consumes only 45.3 mW (0.36 nJ/pixel). However, the lowest power commercial 1080HD 60 fps image sensor currently consumes nearly twice as much power at 86.7 mW (0.70 nJ/pixel) [[3]]. Therefore additional system-level energy savings may be achieved by reducing the energy requirements of frontend image capture.<br> 

It has been shown in [[4]] that typical commercial mobile CMOS image sensors consume over 200 mW of power, of which the two dominant sources are analog-to-digital conversion (70-85%) and chip-to-chip I/O (10-15%). Reducing pixel bitdepth below the conventional 12-bit value and compressing data output by performing partial feature-extraction (histogram generation) on-chip could reduce both ADC and I/O energy in the image capture step. Furthermore, partial on-chip feature extraction could reduce the memory, computation, and energy requirements of the backend digital processor, as shown in Fig. 1. (b) and demonstrated in [[5]].<br> 

Leveraging the fact that HOG features are based on gradients, we go one step further and propose the pipeline shown in Fig. 1. (c), which uses analog memory to store 3 rows of pixel values and a ratio-to-digital converter (“RDC”) to digitize ratios of neighboring pixels (representing gradients) rather than absolute pixel values. This approach allows scenes of a given dynamic range to be represented with fewer bits per pixel than in a standard imager, further reducing I/O energy, backend computation, and memory requirements, without sacrificing object detection performance.<br> 

To simulate the object-detection performance of our proposed system, we have created a database of over 4,000 annotated RAW images, modeled after the PASCAL VOC database [[6]]. Unlike processed JPEG images, RAW images are composed of 12-bit photosensor outputs, which approximate analog scene illumination levels. We plan to open source this database to the academic community through the Stanford Digital Repository. <br> 

<br> 

{{wiki:Fig1.jpg}}<br> 

Fig. 1. Conventional object detection pipeline (a), generic object detection pipeline with embedded feature extraction (b), and pro-posed object detection pipeline with ratio-to-digital converter (RDC) and embedded feature extraction (c). <br> 

<br> 

References<br><br>[[1]] A. Suleiman and V. Sze, "Energy-efficient HOG-based object detection at 1080HD 60 fps with multi-scale support," 2014 IEEE Workshop on Signal Processing Systems (SiPS), Belfast, 2014<br>[[2]] N. Dalal and B. Triggs, “Histogram of Oriented Gradients for Human Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognition, Jun. 2005, pp. 886–893. <br>[[3]] OmniVision OV2740. http://www.ovt.com/products/sensor.php?id=153<br>[[4]] LiKamWa, et al. “Energy Characterization and Optimization of Image Sensing Toward Continuous Mobile Vision,” Proc. Conf. Mobile Syst. Applicat. and Services, pages 69–82, Jun. 2013.<br>[[5]] J.Choi, J.Cho, S.Park, and E.Yoon, “A 3.4 uW Object-Adaptive CMOS Image Sensor with Embedded Feature Extraction Algorithm for Motion-Triggered Object-of-Interest Imaging,” IEEE J. Solid-State Circuits, vol. 49, no. 1, pp. 289–300, Jan. 2014.<br>[[6]] http://pascallin.ecs.soton.ac.uk/challenges/VOC/ 

We would like to acknowledge undergraduate researcher David Ta (dta255@stanford.edu) for his contributions to this project. <br> 

**Email**: [[mailto:alexoz@stanford.edu|alexoz AT stanford DOT edu]]<br>