A large number of embedded multimedia applications are characterized by high instruction-level parallelism (ILP) expecially in the most critical internal loop bodies. Very Large Instruction Word (VLIW) architectures Application Specific Instruction Set Processors (ASIP) are best suited to exploit such parallelism. Fast design space exploration and optimization of VLIW architecture to a specific application target is increasingly becoming the crucial factor to achieve higher efficiency designs in a relatively small amount of time.
In this paper we propose an example of VLIW architecture application driven optimization using the VEX (VLIW Example) system. Such environment defines a 32 bit clustered configurable VLIW architecture, where configurable means both scalable and customizable. In addition it provides a flexible C compiler and a fast compiled architectural simulator, thus enabling a complete hardware-software exploration and optimization.
A typical image processing application, the Imaging Pipeline, which performs the conversion form a JPEG to a half-toned image, has been chosen as our benchmark. Dynamic and static profiling of the benchmark were firstly done, in order to identify the most executed functions and operations respectively. This allowed to define a meaningful architecture design space. Results of an exhaustive exploration are presented in detail.
Fine tuning code optimization has also been applied on one of the best architectures in terms of area-performance tradeoff. In particular we will show performance enhancements obtained by changing loop unrolling amount and introducing custom instructions.