Signal processing is deployed in lots of equipment such as radars, cell phones, missiles, space buses, radars, and so on. Basic mathematical operations can be performed by using discrete digital components, e.g., a full adder circuit, while subtraction can be implemented using the same circuit with the subtrahend being represented by two’s complement. Digital Signal Processing (DSP) platforms are integrated circuit boards designed aiming to optimize the implementability of the algorithms and mathematical operations including sums, multiplications and so on. The internal architecture of a floating DSP is more complex than a fixed point device  . In some optimized libraries, the fixed point operations are preferred since this format reduces the DSP clock cycles necessary to perform some specific calculation, requiring the conversion between any existent float variables to fixed point variables inside the code. Currently, the powerful DSP hardware makes possible the construction and implementation of complex systems and signals, which require a great number of calculations and processing for real-time applications, where calculation and processing speed are critical.
Orthogonal Frequency Division Multiplexing (OFDM) is a transmission technique currently available in commercial applications, such as wireless networks (Wi-Fi 802.11) and cellular systems (LTE)  . In early days of the OFDM, the analog implementations of such transmission technique have resulted in incredible complex due to the deployment of discrete sinusoidal oscillators and coherent demodulators. After sometime, such system gained popularity after the digital approach used the Discrete Fourier Transform (DFT) and the correspondent Inverse Discrete Fourier Transform (IDFT) to demodulate and modulate the OFDM signal in N subcarriers, respectively. These operations can be calculated in an efficient way with the Fast Fourier Transform (FFT) and Inverse Fast Fourier Transform (IFFT) algorithms.
Among the three losses associated to the wireless radio channel, namely path loss, shadowing and fading terms, the OFDM system offers resistance to fading, when the number of subcarriers is large enough such that the flat fading condition is achieved. The delay spread that occurs in channels with multipath can provoke some intersymbol interference (ISI). To completely eliminate this effect, OFDM uses the Cyclic Prefix (CP). The CP addition can be easily implemented digitally, because the CP addition is a simple vector concatenation.
In the literature, there are works discussing the implementation aspects of specific parameter of the OFDM system. Among them,  demonstrated an effectiveness of a PAPR reduction algorithm implemented in a C6713 DSP hardware. In  , an optimization for FFT operations is implemented inside an iPROVE FPGA board and the results are compared with Carmel DSP and TMS320C62X. In  , narrowband interference cancellation algorithms applied to multiband OFDM ultra-wideband systems are implemented in TMS320C6713 DSP and some implementation parameters are discussed.
This paper deals with the implementability of an OFDM transmission. For that, the DSP TMS320 platform receives data from a computer, data is processed and modulated using an OFDM modulator, the channel coefficients are generated and applied to the signal, the OFDM symbols are demodulated and sent back to the computer. The DSP TMS320C6678 platform deployed presents a high performance multicore processor with 8 cores, each core can reach 1.25 GHz, supporting fixed and floating point operations. It has I2C and SPI interface, a 64-bit DDR3 interface, 64 timers, 16 GPIO pins  . The evaluation board available at UEL’s laboratory is a TMDSEVM6678L, which has a RS232 UART for serial communication, 512 MB DDR3, Ethernet interface and some other features  .
This work will be structured as following: in Section 2, the theory of the OFDM system is presented. Block diagrams, mathematical relations and some OFDM design parameters are stated. In Section 3, some implementation details and tools utilized are shown. In Section 4, simulations are presented and results discussed. In Section 5, the conclusions of the work are highlighted.
Notation: and represents the direct and inverse Fourier transform, respectively; represents circular convolution operator; represents the real part of a complex number; and mean that X follows a Uniform and Gaussian statistical distribution, respectively.
2. OFDM System
A block diagram representing an OFDM transmitter is depicted in Figure 1. Bits of information are mapped into constellation symbols (for example, BPSK, QPSK, or QAM), passed on a serial to parallel (S/P) converter, the IDFT is performed, the cyclic prefix (CP) is added and data is converted from parallel to serial (P/S), converted to analogical signal (D/A), converted to passband with carrier frequency and finally sent to the receiver via a wireless radio fading channel.
The OFDM receiver is depicted in Figure 2. The signal that reaches the receiver antenna is converted to baseband (coherent multiplication with a local oscillator with frequency and filtered), the signal is discretized via analogical-to-digital (A/D) converter, the cyclic prefix is removed and data is parallelized; after that, the DFT is calculated, the elements are converted to serial and finally demapped respectively with adopted constellation to obtain the demodulated bits.
OFDM systems use a great number of orthogonal subcarriers to transmit information; hence, the data rate on each subchannel is much less than the system data rate  . Mathematically, two signals are orthonormal if they satisfy the orthogonality condition:
where represents an OFDM symbol period. So, two subcarriers and are
Figure 1. OFDM transmitter block diagram.
Figure 2. OFDM receiver block diagram.
orthogonal if the integral on the interval of one period of the multiplication of the signals is zero. In  , it is proven that the subcarriers can be overlapped by a factor of 50% without lose of orthogonality.
The transmitted OFDM signal can be represented in time domain as:
where x[k] represents the k-th modulation symbol, the OFDM symbol period, the carrier frequency, represents the -th subcarrier frequency separated by a factor of and N is the number of subcarriers. In the discrete form, the OFDM can be represented as:
In Figure 3, a representation for the Equation (2) with N = 4 subcarriers is sketched in the frequency domain with x-axis representing the frequencies and y-axis the Power Spectrum Density (PSD). Substituting the index k with its respective range of values (k = 0, ..., 3), we can see the spectrum of 4 subcarriers with overlap. As depicted in this figure, the carrier frequency is not in the center of the spectrum occupied by the OFDM transmitted symbol, and each subcarrier is separated by a factor of.
The received signal corrupted by multiplicative noise, considering only the effect of fading channel, i.e., assuming that the additive white Gaussian noise effect can be neglected (high SNR regime) in discrete-time domain can be expressed by the circular convolution of the signal s[n] and channel impulse response h[n]:
Figure 3. Subcarriers for N = 4. OFDM spectrum non-centralized with 50% of superposition.
Hence, the original signal s[n] can be recovered in the frequency domain using frequency equalization: as:
where and represents the direct and inverse Fourier transform, respectively.
To obtain a satisfactory performance, the OFDM must achieve flat fading condition. In other words, the channel coherence band needs to be considerably greater than the channel of each subcarrier as:
where W represents the OFDM system bandwidth and N the number of subcarriers. Note that the right side of Equation (6) considers a system without spectral superposition. So, the number of subcarriers is a parameter of the OFDM system and can be increased to achieve the flat fading condition.
Another channel parameter that exerts influence in the OFDM system design is the delay spread (), which is related to the power delay profile (PDP). Considering a multipath channel, the signal is reflected on countless surfaces resulting in delayed versions of the original signal, causing ISI. The average power of received replica and the associated delay is represented in the PDP graphs. There are different models to represent the PDP, for example the uniform, or the decreasing exponential models, such as the IEEE 802.11, as discussed in  .
To eliminate the ISI caused by multipath channel, the CP is added to the signal vector. Considering h[n] the discrete channel impulse response with length, and s[n] the discrete form of the information signal, the CP addition can be easily performed by concatenating the last samples of s[n] onto the beginning of the vector. In Figure 4, a graphic representation of the CP addition operation is presented; the last samples are concatenated to the beginning of the vector to create the OFDM symbol in the discrete form and ready to be transmitted. If the CP length, the ISI can be eliminated at the receiver side.
3. DSP Implementation
In this work, the Code Composer Studio (CCS) software was deployed as the Integrated Development Environment (IDE) to write, compile and debug the entire developed code. Another software installed was the BIOS Multicore Software Development Kit (MCSDK), which provided some boot utilities, chip support libraries, drivers, and basic
Figure 4. Cyclic Prefix (CP) addition on the OFDM symbol.
platform utilities  . The code was written in C language and the serial interface was utilized to communicate the DSP board TMDSEVM6678 with the personal computer (PC), as represented in Figure 5. The information bits from the PC are passed to the DSP platform, where the data are processed to form the OFDM symbols, while channel distortions are taken in account; the OFDM symbols are demodulated and sent back to the computer. Another aspect considered is that the noise values were generated inside the DSP platform via Box Muller transform, which is described in Section 3.1. The modulation used was an M-QAM using Gray codification and a conventional slicer for hard decision region.
There are some specific functions implemented in an optimized way for specific DSP platforms. For the TMDSEVM6678 platform, the TI C6000 DSPLIB library of signal processing provides some routines for signal processing. In this work, the FFT, IFFT and convolution operations were performed using the DSPLIB optimized functions. In the DSPLIB documentation  for the FFT and IFFT functions used in this work, the input vectors must be a short type and with the format Q.15, which means one bit representing the integer part and 15 representing the fractional part. The user should be careful about the overflow that can occur and its variation depending on the FFT order.
To verify the DSP clock cycles consumed in executing a piece of the code, the CCS provides the necessary tools to this verification. The user can obtain the clock information going to the clock menu and enabling it, or using the Profiler tool. Another method is calling the function itoll provided by the header c6x.h, which returns the DSP clock count value that can be stored inside a variable.
Random Number Generation
In  , a method to generate random variables with Gaussian distribution with zero mean and unitary standard deviation is described. The method used is the Marsaglia polar approach of the Box Muller transformation  . This method considers two random variables (r.v.), , applying the transformation:
in which the random variables of such transformations follow a normalized Gaussian distribution:.
Figure 5. Block diagram representing the communication between the computer and the DSP platform.
Finally, to obtain different values of mean () and variance () considering, a simple linear transformation is applied:
4. Results and Discussion
In this section, some experiments using the DSP platform are described and analyzed. First, the 256-QAM OFDM algorithm is validated comparing the bit error rate (BER) values obtained via Monte Carlo simulation with the theoretical curves of a 256-QAM modulation. At the flat fading condition, the 256 QAM OFDM BER performance should be close as possible to the conventional 256-QAM BER performance. After this Tx-Rx OFDM code validation, an experiment sending an image data to the DSP platform was performed aiming to corroborate the effectiveness of the proposed OFDM DSP-based system implementation. The image data generated at PC was sent to the DSP platform, it was modulated and converted to OFDM symbols and sent to the receiver through a simulated wireless radio fading channel in discrete-time domain. Besides, the additive thermal noise effects were included at the receiver input; after that, the OFDM symbols were demodulated and sent back to the PC, where the original data were compared with the recovered data aiming to determine the average BER. As a figure-of-merit of implementability, the DSP resources allocated to the Tx-Rx OFDM execution were determined in therms of memory occupation and DSP clock cycles.
The OFDM system performance was measured verifying the bit error rate (BER) in two different channels: a) multipath fading channel; and b) pure additive white Gaussian noise (AWGN), just for noise power calibration purpose. Simulation parameters are summarized in Table 1. On the receiver side, the perfect channel information (CSI) has been considered and the channel fading effect was removed using frequency equalization, as indicated in Equation (5).
The BER performance simulated on the DSP platform and the theoretical curves of the 256-QAM are presented in Figure 6. As expected, for fading and AWGN channel, the OFDM 256-QAM performances converge to the 256-QAM at AWGN and Fading channels. The BER results obtained were consistent with the theory.
Table 1. OFDM BER performance simulation parameters.
Figure 6. DSP BER performance of a 256-QAM OFDM system.
4.2. Image Data
The image data sent in this experiment was the Lenna, an image commonly used in image processing  . The parameters used in this experiment are the same presented in Table 1 unless the channel condition; it was considered only the flat fading channel condition. To recover the Lenna image at receiver side, it was considered two different values of: 0 and 18 dB; the quality of recovered images is shown in Figure 7(a) and Figure 7(b), respectively. The BER measured after the channel effects were 0.2424 and 0.0203, respectively. Due to the Monte-Carlo simulations, the obtained BER values were consistent with the ones observed in Figure 6 for the corresponding. Observing the figure on the left side, the image with high BER is more blurred than the image on the right side, with lower BER.
4.3. DSP Resources
The OFDM DSP-based system implementability was verified by measuring the DSP platform resource allocated to the entire code execution. To perform this task, the number of DSP clock cycles was counted. For modulation, it was taken in to account the 256-QAM modulation, IFFT calculation and CP addition. On the demodulation, it was considered the CP removal, FFT calculation, channel equalization and 256-QAM demodulation. The mean values measured are presented in table Table 2.
The number of cycles of one OFDM symbol was different from the others due to QAM modulation and demodulation implementation. To demodulate the signal, the implemented code separates/classifies the constellation map into decision regions and
Figure 7. Same image recovered in the DSP platform after the fading channel emulation effect considering two different values of noise powers; respective BER is highlighted. (a) Relation. BER = 0.2424; (b) Relation. BER = 0.0203.
Table 2. Mean values of DSP cycles count for 256 QAM OFDM with FFT size N = 256.
verified if the received symbol was inside or not a specific region to demodulate/de- mapping the symbol. On the modulation, it has two "for" loops to convert the information to components in phase and quadrature, and that part of the code was responsible for a great part of the total cycles consumed in the modulation process.
Even with more components needed to the signal demodulation, including mainly channel equalization, the number of cycles consumed in the OFDM demodulation process was smaller than the number necessary for the OFDM modulation.
To measure the code size, the file created by the compiler with extension .map was verified, as discussed in  . The memory parameters are presented at Table 3.
Consulting  , the memory address 0x0C000000 corresponds to the DDR3 memory with size of 256 Mbytes. In Table 3, the memory MSMCSRAM shows a lower size, 0x00200000, which was caused by some definitions in linker.cmd file inside the CCS project. Converting the memory used to decimal value, it corresponds to 457,490 bytes. The percentage consumed was very low comparing with the total 256 Mbytes available on the address 0x0C000000, not being a critical parameter for the system implementation.
The signal processing in the OFDM system had an important role in simplifying its implementation using the FFT and IFFT to modulate the signal in multicarrier-based systems instead of using analogical huge number of discrete oscillators. In this work, a
Table 3. DSP Memory addresses and memory occupation obtained consulting the .map file.
256-QAM DSP-based baseband OFDM was implemented and analyzed in terms of BER performance, qualitative recovered image performance, DSP clock cycles and DSP memory requirement.
The 256-QAM OFDM BER system performance was obtained via Monte-Carlo simulation and these values were corroborated with a theoretical curve of a conventional 256-QAM modulation single-carrier system. In attaining the flat fading condition, the M-QAM OFDM performance converged to the M-QAM modulation and the code implemented on the DSP behaved as expected.
By verifying the clock cycles, the number of DSP cycles consumed to modulate the OFDM signal was greater than the number used to demodulate the signal. The reason is that the M-QAM modulation function was responsible for a great part of the cycles used. In the future work, assembly language optimization aspects should be implemented, while the processing gain concerning DSP clock cycles should be evidenced.
By analyzing the memory requirement of the entire implemented baseband OFDM, the amount memory allocated had size of 457,490 bytes of the memory block that corresponded to the address 0x0C000000 and was not a critical issue for the deployed DSP platform.
With the increase of new technologies, for example, Internet of Things, with countless chips communicating with each other, there is great appeal and interest of investigation around the viability of the implementation of sophisticated but efficient wireless transmission-modulation schemes on chip with scarce resources availability. This paper provides a multi-functional analysis of the implementability of an OFDM baseband system on a robust DSP platform, which can be extended for another DSP, FPGA or microprocessors platforms.
Thanks to Raul Ambrosio Valente Neto for the explanation on the random number generation. Colleagues at T & SP Lab at UEL University are grateful recognized for the conceptual discussions and hints.
 Dongwen, N., Baohui, Z., Dong, L. and Bo, Z.Q. (2011) Implementation of Algorithm for Reducing the PAPR of OFDM System Based on DSP. 46th International Universities’ Power Engineering Conference, Soest, 5-8 September 2011, 1-3.
 Lee, J.H., Moon, J.H., Heo, K.L., Sunwoo, M.H., Oh, S.K. and Kim, I.H. (2004) Implementation of Application-Specific DSP for OFDM Systems. 2004 IEEE International Symposium on Circuits and Systems (IEEE Cat. No.04CH37512), Vol. 3, III-665-8.
 Jeng, L.-D., Meng, F.-W. and Yang, C.-C. (2009) DSP Implementation of Narrowband Interference Cancellation Algorithms in MB-OFDM Based Ultra-Wideband Communication System. 2009 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Kanazawa, 7-9 January 2009, 146-149.