
The Vision DSP integrates a 2D-capable scatter-gather iDMA and two banks of high-speed local RAMs to overcome the memory access bottleneck. This advanced architecture is demonstrated in Figure 4.

(for full resolution click here)
While the architecture of Vision DSP is designed to support high-performance vision computing, the effort to port generic C code to the DSP to utilize the compute capability is not trivial. The Vision DSP is distributed with a high-performance compiler that can infer and extract parallelism from the generic C code. Nonetheless, it is often required to develop the vision-computing kernel functions using hand-optimized C intrinsic code in order to maximize the performance. A rich set of vision-computing kernels have been implemented into a production-quality, OpenCV-like software library called XI to reduce the porting and optimization cycle.
In this study, we leveraged significant number of XI library functions in all the processing steps to perform perspective transform, image filtering, equalization, thresholding, Canny edge detection, and Hough transform, etc., as shown in Figure 5. The utilization of the XI library functions significantly reduces the effort to port and optimize the lane-detection algorithm to the Vision P5/6 Vision DSP. Real-time computing performance can be achieved in the instruction set simulator (ISS) within one to two months.

Throughout the entire flow of the lane detection algorithm, the image data is processed using a technique called tiling with the facilitation of the iDMA. The wide SIMD data processing requires image data to be accessed in tight computation loops from the high-performance wide local RAMs using vectorized load/store instructions. The tiling scheme allows a small portion of the image to be brought into the local RAMs from the much slower system memory, using block data transfer supported by the iDMA.