Embedded ADAS Algorithm Optimization with High-Performance DSP IP and CV Software Library: Page 4 of 6

January 19, 2017 //By Charles Qi, Han Lin, Cadence
Embedded ADAS Algorithm Optimization with High-Performance DSP IP and CV Software Library
Recently computer vision (CV) technology has seen a rapidly increasing rate of adoption in the application of autonomous driving. CV algorithms are very compute intensive. Deployment of these algorithms often requires specialized high-performance DSPs or GPUs to achieve real-time performance while maintaining flexibility.

The Vision DSP integrates a 2D-capable scatter-gather iDMA and two banks of high-speed local RAMs to overcome the memory access bottleneck. This advanced architecture is demonstrated in Figure 4.


Figure 4 Cadence Tensilica Vision DSP Architecture
(for full resolution click here)

While the architecture of Vision DSP is designed to support high-performance vision computing, the effort to port generic C code to the DSP to utilize the compute capability is not trivial. The Vision DSP is distributed with a high-performance compiler that can infer and extract parallelism from the generic C code. Nonetheless, it is often required to develop the vision-computing kernel functions using hand-optimized C intrinsic code in order to maximize the performance. A rich set of vision-computing kernels have been implemented into a production-quality, OpenCV-like software library called XI to reduce the porting and optimization cycle.

In this study, we leveraged significant number of XI library functions in all the processing steps to perform perspective transform, image filtering, equalization, thresholding, Canny edge detection, and Hough transform, etc., as shown in Figure 5. The utilization of the XI library functions significantly reduces the effort to port and optimize the lane-detection algorithm to the Vision P5/6 Vision DSP. Real-time computing performance can be achieved in the instruction set simulator (ISS) within one to two months.


Figure 5 Mapping Lane Detection Algorithm Steps to XI Library Functions

Throughout the entire flow of the lane detection algorithm, the image data is processed using a technique called tiling with the facilitation of the iDMA. The wide SIMD data processing requires image data to be accessed in tight computation loops from the high-performance wide local RAMs using vectorized load/store instructions. The tiling scheme allows a small portion of the image to be brought into the local RAMs from the much slower system memory, using block data transfer supported by the iDMA.

Design category: 

Vous êtes certain ?

Si vous désactivez les cookies, vous ne pouvez plus naviguer sur le site.

Vous allez être rediriger vers Google.