Intel has radically changed the relevance of its processors in the digital signal processing (DSP) arena with the addition of Advanced Vector Extensions (AVX) in its second generation Core i7 multicore platforms.
With this move, the company has doubled the vector engine capability of the processor from 128 to 256 bits.
Through this improvement in floating point processing capabilities, the processors formerly code-named ‘Sandy Bridge’ now become compelling choices in military and aerospace applications such as radar, sonar, signal processing, intelligence, surveillance and reconnaissance (ISR) and on a wide range of manned and unmanned platforms.
For many years, the DSP processing heavy lifting has been done by DSP chips and FPGAs, but these solutions and others created their own burden in the forms of power consumption and development programming. Placing the AVX technology in each core of the 32 nm second generation Core i7 processor eliminates the size, weight and power (SWaP) issues of the previous DSP solutions, and also allows developers to work within the familiar programming environment of the general purpose processor.
The resulting GFLOPS processing capability puts Intel processors into a DSP realm they have never before enjoyed. In addition, Intel Hyper-Threading technology has been reintroduced. This enables a single core to appear as two virtual cores, and can boost performance by 25-30% in some cases. Benchmarks using Synthetic Aperture Radar code have demonstrated over twice the performance on second generation Core i7 when compared with first generation processors (Sandy Bridge versus Arrandale) at the same clock frequency and two threads of execution. When running with four threads (on four cores for second generation Core i7 or two cores with Hyper-Threading on Core i7), the increase exceeds 4X.
This article looks at the potential impact that the addition of AVX technology in the second generation Intel Core i7 processors may have on DSP applications in the military and aerospace arena.
Genesis of AVX technology
The second generation Core i7 processor breaks new ground by being the first processor offering from Intel to be effective at floating point (FP) processing. The earliest processors from the company, the 8086 and its offspring, did not have the ability to perform these calculations without help from software emulation or a dedicated 8087 FP chip.
MultiMedia eXtension (MMX) technology improved matters somewhat with the introduction of the Pentium processors. The MMX instructions and execution units were tasked with the encoding and decoding of audio and video feeds, but could be called upon for some types of military and aerospace DSP as long as integer operations were kept to a minimum. As MMX morphed into streaming single instruction multiple data (SIMD) extensions (SSE), FP calculations and DSP became more achievable.
DSP developers had a brief fling with another Intel offering starting in the late 1980s in the form of the i860 RISC microprocessor. This CPU and graphics accelerator found itself inside of everything from microcomputers to supercomputers during its roughly six-year run, but military and aerospace contractors were particularly taken with the i860’s ability to handle DSP operations.
The i860 was quickly surpassed by other RISC processors which in turn were quickly exterminated by the ARM-based XScale processors, thus leaving the DSP world without another dramatic technology update to SSE until the January 2011 release of the second generation Core i7 with AVX. AMD is also set to introduce AVX-enabled processors during 2011.
Breaking through the vector engine ceiling
AVX extended the functionality of SSE by doubling the width of the MMX registers from 128 bits to 256 bits and adding extensions that can operate in this wider data environment. This new 256 bit vector engine allows eight single-precision, 32-bit FP operations or four double-precision 64-bit FP operations to be performed at the same time, up from four single-precision or two double-precision operations with SSE. This becomes an important factor in DSP applications, where the same operation frequently must be performed many times across a large data set.
The vector engine doubling also has an almost direct correlation to the processor’s ability to perform FP vector calculations. Whereas previous processors relied on clock rates and die geometries to realise any gains in FP vector performance, AVX leverages the SIMD operation functionality to achieve greater gains than otherwise possible. This is significant because clock rates above 2 or even 3 GHz are reaching an efficiency ceiling due to their greater power consumption. Similarly, increased leakage has erased some of the potential gains of reduced die geometries.
The ability to perform one instruction on eight discrete sets of data, however, has no such technology ceiling. Each of the two or four cores of the second generation Core i7 processors has a dedicated AVX unit. This gives the new processors the ability to execute twice as many operations per clock cycle as their predecessors, with new quad-core platforms able to process up to 64 operations per clock cycle.
The memory unit in a quad-core second generation Core i7 processor provides a 32 KB, four-way first-level instruction cache, a 32 KB eight-way first-level data cache, and a 256 KB, eight-way second-level unified cache. Additionally, as much as a 6 MB, 16-way third-level cache can be shared by all of the cores. The processors have two DDR3 memory controllers with up to 21,35 GBps of peak memory bandwidth. The memory unit is able to process two read requests of 16 Bytes each and one write request of the same size per clock cycle to prevent pipeline stalls caused by inadequate data feeds.
Thus, the new second generation Core i7 processors have added strong FP performance to Intel’s already robust integer capabilities. Future micro-architectures promise to match integer to FP performance, and also to introduce fused multiply accumulate (FMA), an operation that performs the multiply and add operations with one rounding stage instead of two to increase numerical accuracy and speed.
The military and aerospace DSP arena is now able to capitalise on this newfound melding of capabilities through ruggedised commercial off-the-shelf (COTS) single board computers (SBCs) based around the second generation Core i7 processor family. One of the first commercial offerings was the 6U VPXcel6 SBC624 from GE Intelligent Platforms. This successor to the company’s SBC620 and SBC622 products based, respectively, on the Intel Core 2 Duo and Core i7 processors, is available in five levels of ruggedisation ranging from benign to fully rugged.
GE has subsequently announced three other SBC products based on the second generation Core i7: the 3U form factor VPXcel3 SBC324; the 6U XVR14 rugged VME SBC; and the 6U XCR14 rugged CompactPCI SBC. All employ serial switched fabrics to optimise board-to-board data transfers to enhance DSP capabilities. GE is also expected to soon announce a 6U VPX multiprocessor platform that promises to optimise high-performance density.
GE has announced its fifth second generation Core i7 platform, its DSP280 dual-node multiprocessor specifically designed for defence and aerospace applications requiring the highest levels of DSP and multiprocessing capabilities. With its powerful dual processor configuration, this fully rugged 6U OpenVPX multiprocessor platform will be capable of more than 260 Gigaflops peak performance per card slot.
The DSP280 also features up to 21 GBps main memory bandwidth with error checking and correction per CPU node. This high-performance embedded computing architecture can scale to teraFLOP performance levels within a single chassis via RDMA-enabled 10 Gigabit Ethernet and double data rate Infiniband dual port network interface controllers delivering as much as 1,8 GBps data rates per channel at approximately 1 μs memory-to-memory latencies.
Intel has promised a seven-year parts lifecycle for the second generation Core i7 family. GE has chosen ball grid array devices from Intel’s Performance Mobile chipset family that can be soldered down for increased resistance to high shock and vibration environments, instead of the less secure land grid array socket emplacement.
Maximising AVX development
Taking full advantage of the added performance capability of the second generation Core i7 processor family presents some degree of challenge to the developer. Each AVX unit can be programmed using primitives that can be called from C or other high-level languages. While no more complex than assembly code programming, getting good performance at this level is not a trivial task.
Many factors must be understood and factored in when coding to avoid pipeline stalls and resource contention. Compilers offer some help. Several already have AVX support, coupled with varying degrees of automatic vectorisation. Source code is analysed and, where possible, procedural loops are mapped to SIMD operations. This allows un-modified code to take advantage of AVX to some extent. If code modification is acceptable, or the code already uses library calls, math libraries offer a good alternative.
Intel produces integrated performance primitives (IPP) and Math Kernel Library (MKL) that are highly tuned for AVX by Intel’s own experts. Algorithm coverage is broad and performance is hard to beat. However, these libraries are proprietary to Intel (AMD has its own variation). Because of this, some programs turn to more open application programming interfaces (APIs) such as the Vector Signal and Image Processing Library (VSIPL) API that was sponsored by DARPA as a cross-platform, cross-vendor standard, and its C++ sibling, VSIPL++. These libraries can help isolate applications from the intricacies of the underlying hardware architectures.
GE Intelligent Platforms has supported the VSIPL standard API across multiple architectures – PowerPC/Altivec, GPGPU/CUDA, and Intel/SSE with the AXISLib product suite for many years. The company has just announced the latest addition to this product family, AXISLib-AVX, which includes the full VSIPL Core 1.0+ profile. These libraries are hand-optimised for the second generation Core i7 platform with support for AVX and multithreading so that developers can extract the maximum performance out of the new Intel processors for SWaP-sensitive sensor processing applications.
The AXISLib-AVX library includes more than 600 high-performance DSP and vector mathematical functions for advanced real-time embedded signal processing applications, and can be used on their own or as an integral software module within the AXIS Advanced Multiprocessor Integrated Software environment.
© Technews Publishing (Pty) Ltd | All Rights Reserved