The latest mobile phones already operate multi-band and multi-mode on cellular networks. They increasingly feature different communication pipes (Figure 1), for Wi-Fi connections, digital TV, digital audio broadcast and GPS satellite reception, among others. Soon it will be UWB (Ultra WideBand) for wireless USB, and WiMAX for mobile Internet access. Convergence in mobile portable devices means that many different combinations of these RF communication/broadcast standards will also appear in PDAs, laptop computers and game consoles. In all these consumer products, space, cost and power consumption constraints will no longer make it viable to have a dedicated wireless transceiver for each standard. Software-defined radio (SDR), implemented using an advanced programmable digital signal processor (DSP), such as an embedded-vector processor (EVP), promises the solution—a single module block capable of handling all these standards.
CONVERGENCE DRIVES RF ARCHITECTURES One of the reasons why dedicated wireless transceiver modules such as Bluetooth modules or Wi-Fi modules have gained market success is that these wireless communications modules were mostly add-on options rather than standard features. Therefore, solutions that allow manufacturers to configure an otherwise standard chassis simply by plugging in appropriate modules, including the necessary RF and baseband processing, have obvious attractions.
However, as combinations of these wireless communication channels become standard in equipment, the continued use of such dedicated modules becomes questionable. Not only will the aggregate size of the required modules become difficult to accommodate, their combined power consumption will threaten battery life and the increased silicon area will adversely impact product cost. In situations where several communications channels will need to be active at the same time, it will also become increasingly difficult to ensure coexistence because of the number of mutually interfering antennas required—which the industry refers to as the "porcupine problem".
Reducing size, cost, power consumption and antenna interference all point to an architecture in which all or part of the RF and baseband functionality is shared by different RF communication channels.
For example, in an integrated solution, channels operating in the same frequency band, such as Bluetooth and IEEE 802.11b/g, could intelligently share RF hardware such as an antenna, low-noise amplifier and mixer. Channels that utilize similar modulation schemes could share a single programmable modem. This will lead to new multiband, multimode RF architectures, in which RF is integrated with RF and modem with modem, preferably with a standardized digital interface in between. In order to allow a single hardware modem to service several different wireless communications channels, this will necessitate the use of highly flexible software-programmable modem engines.
In practice, modem engines represent one of the best places for manufacturers to differentiate themselves in the marketplace, because these engines are a key area where wireless performance can be enhanced. The air-interface for any mobile communications standard is rigidly defined, offering manufacturers little opportunity to enhance RF front-end performance other than choosing the best RF technology in which to implement it—for example, using an RF CMOS, BiCMOS or GaAs process technology as appropriate. Likewise, the codec at the other end of the modem pipe is also well defined in terms of the type of algorithm needed to implement it. However, the all-important modem that sits between the RF front-end and the codec is where innovative use of proprietary IP can be used to process and condition the modulated/demodulated signal before it enters the codec in order to achieve a lower bit error rate (BER) or a reduction in transmit/receive power for the same BER.
Because it must be adaptive to local signal conditions, such as multi-path fading and interference, this signal processing/conditioning should ideally be performed by DSP algorithms executed on a high-end software-programmable DSP. Having such a programmable approach enables adaptation to changing standards and field test results. It also allows the addition of new, smarter algorithms (e.g. to improve signal/noise ratio or SNR), which is difficult to do afterwards in hardware-based solutions without a silicon re-spin.
Because of the complexity of these algorithms, processors used in modem pipe applications must be capable of impressive performance, typically in excess of 10 GOPS (Giga Operations per Second). However, the battery-powered mobile, portable nature of the devices for which they are designed also means that they must consume very little power, typically in the order of a few hundred milliwatts maximum. Using advanced low-power/low-leakage CMOS fabrication technology limits the clock speed available for processors to below 300MHz. To achieve the required GOPS ratings at these clock speeds, these processors therefore need to exploit a very high level of parallelism—for example, by performing vector-wide processing.
Examples of algorithms that can be vectorized to run on vector processors include algorithms for signal conditioning functions such as equalization, interference cancellation and multi-path correlation (rake receiver) and algorithms for signal processing functions such as synchronization, QAM (Quadrature Amplitude Modulation) mapping/demapping and FFTs for OFDM demodulation.
There are, of course, other advantages to software programmability. It gives OEMs the ability to differentiate themselves in the marketplace using a single freely-available silicon platform, and it enables future shifts to new and more advanced algorithms. DSP-based modems are also much more flexible when it comes to upgrading modem performance and/or adding new features during the design-in process.
What are alternatives to a programmable architecture? So far, two other approaches are being pursued: hardwired dedicated building blocks, and reprogrammable/reconfigurable hardware (i.e. FPGAs).
Hardwired building blocks are currently being used in handsets only having to implement a relative small number of (fixed) standards. While cost-efficient when implementing only a few standards, area increases rapidly with the number of standards.
In fact, in a recent analysis by NXP Semiconductors on available solutions, a solution capable of handling EDGE, R'99, HSDPA and HSUPA in a single device using the current dedicated block approach, had an area 50 percent to 120 percent larger than a programmable solution.
One particular solution is NXP's EVP (See sidebar: NXP's EVP Architecture). The main reason for this is that standards have significant differences, and the engineering time required to implement effective resource sharing over standards in a hardware solution is simply too high to optimize to this level. In addition a programmable solution allows the addition of new, smarter algorithms without having to go through a new tape-out, as well as the flexibility for adaptation to changing standards and field-test results.
The other common solution is the use of programmable/reconfigurable hardware, such as FPGAs—a common approach for 3G basestations. However, even though resource reuse can be even higher than in a programmable solution, current FPGAs are still relatively expensive with respect to silicon area, as effective gate area is significantly lower compared to a fixed implementation (either dedicated hardware or a programmable architecture). Also, due to the larger area, leakage current might be an issue, as this will have direct impact on the standby time of a phone.
Thus, from an area/cost perspective the programmable architecture is an optimal solution, while power consumption compared to a hardwired solution is slightly higher. From an overall system perspective, however, this is acceptable. For instance, in system studies, NXP saw a reduction in standby power, as the programmable approach allowed implementation of smarter algorithms to reduce active time during standby periods.
SOFTWARE-DEFINED RADIO In implementing SDR, a "vector processor" is recommended as an extension to the classical depending on the computational load of individual tasks. SIMD-type of processing. By adding "intra-vector processing" the capability to have interaction between elements within a vector is added. This allows, for example, arbitrary re-orderings of data within a vector as needed for FFT butterflies, pilot channel removal and other computations common in communication signal processing. This significantly increases the computational efficiency compared to pure SIMD, where in such cases a fall-back to sequential processing is often the only solution.
With their ability to implement highly adaptive modem functions for many different communications standards and to negotiate smooth hand-over from one standard to another (Figure 2), programmable EVPs are a key enabler for SDR.
In addition to providing the very high GOPS ratings required, such processors also meet the silicon area and cost requirements of battery-powered mobile portable products. Their very high level of programmability not only accommodates the multiplicity of wireless communications systems that are appearing in mobile portable devices, it also allows manufacturers to keep pace with the evolution and usage scenarios of these standards and the evolution of new algorithms. They will also allow manufacturers to repair or upgrade their products "over-the-air" in order to reduce field-returns or provide consumers with an enhanced user experience, as wider coverage or higher bit rates for data download.
However, the software programmability of vector processors should only be applied when and where it provides real value in terms of reduced time to market, greater product differentiation or reduced cost. There are still some areas of baseband processing where software programmability may not be the answer and where more hardwired, more dedicated sub-modules are the most appropriate.
Codecs, for example, involve functions such as Viterbi and turbo coding/decoding that would consume significant processing resources on a software programmable codec engine, especially at the high bit-rates involved (typically in excess of 100Mbps). Yet these functions do not really require software programmability, as variations between standards are small. It therefore makes more sense to hardware-accelerate these functions in a reconfigurable codec solution than to implement them in a software-programmable one. The same is true for channel filtering.
SDR is therefore unlikely to be a fully software-programmable solution. Instead, it will be a mix of programmability and software controlled reconfigurability in the RF front-end in which embedded microcontrollers, digital signal processors, vector processors and hardware accelerators all play a part.
With analog-to-digital and digital-to-analog conversion shifted firmly up to the intermediate frequency (IF) stage, SDR is also likely to influence the future partitioning of multi-mode, multi-channel RF transceivers. Channel filtering, modem and codec functionality are likely to move either into the host's baseband chip, or to be aggregated into a separate connectivity modem engine. This will not only reduce chip count. It will allow modem and baseband functionality to be quickly migrated from one CMOS process technology to the next, thereby enabling rapid cost-down. At the same time it will continue to allow RF front-ends and power amplifiers to be implemented in the technology providing the right performance.
For 2G, 2.5G and 3G mobile phone transceivers this is still likely to be a BiCMOS or III-V process for some time to come, although migration to RFCMOS is already happening in lower application segments. SIDEBAR
NXP's EVP Architecture The VD3204x EVP provides a parallel-processing architecture for communications processing. Targeted at the inner receiver, which addresses the modulation/demodulation part of a communication pipe, this processor permits programmability in areas previously implemented by hardwired accelerators. The VD3204x family is optimized for low-power SoC use, focusing at 3G+ cellular hand sets and other battery operated applications, where low-power operation and cost effectiveness are crucial.
The core of the VD32040 is a SIMD-like datapath, augmented by "intra-vector" operations allowing operations having interaction between elements within a vector (permutations, additions, max/min search etc), expanding the range of algorithms which can be mapped on this core. Operating on vectors of typically 256 bits, it can handle 16 operations on 16 bit data elements per functional unit. Due to the added VLIW parallelism, this adds up to 100 operations per cycle, thereby offering a significant of computational performance without having to run at GHz frequencies. This allows a power-friendly implementation style, still using standard library synthesis.
Programming the VD3204x is accomplished in ANSI-C with intrinsics handling the vector functions, and is supported by a C++ host emulation library, a compiler allowing a higher level of abstraction compared to the classical assembly programming, plus a linker, cycle- and bit-accurate simulator and debugger/profiler.
CAPTIONS
Figure 1: Generic picture of a communication pipe, plus the mapping of the blocks on underlying hardware as envisioned by NXP.
Figure 2: Software defined radio is likely to remain a mix of microcontroller, DSP, vector processing and hardware acceleration