Abstract — A new behavioral model is proposed which can provide similar accuracy to the memory polynomial model (MPM) but is shown to have a more efficient architecture for digital hardware implementation. These features have been achieved by the proposed formulation, in which the nonlinear weights of the memory terms are a function of only the present input sample. The new approach is evaluated and compared with the conventional MPM using a Wideband Code Division Multiple Access (WCDMA) signal applied to two different amplifier circuits.

Index Terms — behavioral modeling; memory polynomial; digital predistortion; linearization; memory effects.

I. INTRODUCTION

Modern wireless communication industries are developing at a rapid pace. The increasing demand for high data rates is resulting in increasing bandwidths of modulation schemes used in telecommunication standards. Very often widened signal bandwidths and efficient PA architectures lead to inevitable memory effects which considerably degrade the quality of the modulation on the transmitted signal. Moreover, these modern standards have highly varying envelope signals with high peak-to-average power ratios (PAPR), even more than 10 dB. This large variation makes it challenging to have both a linear and energy efficient system.

One main approach to overcome the aforementioned problems is using digital predistortion to compensate for the memory effects and to linearize the system. The predistorter can be made by inverse modeling. The model should include memory and nonlinearity effects [1], [2]. For this purpose, numerous methods have been proposed in the literature [1]-[3]. Among them, Volterra-based models have been extensively used, thanks to their simplicity and good performance [4]. One of the most well-established classes of Volterra-based models is the memory polynomial model (MPM) [5] and those originating from it [4]. MPM has been extensively used as the reference model for comparison with new methods, because of its relatively good accuracy and mathematically compact form. The model can be further improved by including cross terms at the cost of increased complexity [4].

In this paper, a novel model is proposed by a modification to the MPM. With the suggested formulation, the new cross-memory polynomial model (CMPM) partly benefits from the effect of cross terms in the generalized memory polynomial model (GMPM), although it has the same number of coefficients as MPM. The model will be evaluated and compared with MPM. The results confirm its suitable performance for a wide bandwidth and over a wide range of power levels. Optimum architectures for both MPM and CMPM are also proposed to reduce the floating-operation per second (FLOPS) [6]. Based on this architecture the CMPM is more efficient in FPGA implementation than the MPM.

The paper is organized as follows. In section II, the approach is introduced. Section III discusses the evaluation results, and the conclusion is made in section IV.

II. MODELING APPROACH

A conventional MPM is formulated as [5]

\[ y(n) = \sum_{m=0}^{M-1} g_m(\|x(n-m)\|) x(n-m) \]  

(1)

where \( g_m(\|x(n-m)\|) \) is the nonlinearity term, and \( m \) is the memory depth. This equation shows that the output is the weighted sum of the input samples at recent time instants. The weight of each sample is a nonlinear function of the same sample, defined by

\[ g_m(\|x(n-m)\|) = \sum_{k=1}^{K} a_{km} |x(n-m)|^{k-1} \]  

(2)

where \( K \) is the nonlinearity order, and \( a_{km} \)'s are the coefficients which should be identified. In this paper, \( g_m \) is called the nonlinear weight. The nonlinear function indicates the nonlinearity of the memory effect.

MPM yields very good results in modeling nonlinear circuits with memory. Its performance can even be improved if the cross terms are included in the formulation [4], [6]. The modified model is the generalized memory polynomial, denoted as follows
\[ y(n) = \sum_{m=0}^{M-1} g_m \left( |x(n)|, |x(n-1)|, \ldots, |x(n-(M-1))| \right) x(n-m) \]

\[ g_m \left( |x(n)|, |x(n-1)|, \ldots, |x(n-(M-1))| \right) = \sum_{k=1}^{K} a_m^{(0)} |x(n)|^{-k} + \sum_{k=1}^{K} a_m^{(1)} |x(n-1)|^{-k} + \ldots + \sum_{k=1}^{K} a_m^{(M-1)} |x(n-(M-1))|^{-k} \]

As can be clearly seen, the number of coefficients in the generalized formulation has been increased. This rise leads to a higher computational load.

This paper proposes a new model, which has the simplicity of the memory polynomial formulation, but partly benefits from the cross terms, like in GMPM. The idea is to use the magnitude of the present input sample to compute the nonlinear weights of the memory terms. This modification is originated from the fact that the system has memory, and its recent samples are linked together. Evaluating the validity of this assumption is the subject of this paper. The suggested new model is written as

\[ y(n) = \sum_{m=0}^{M-1} g_m |x(n-m)| \]

\[ g_m |x(n)| = \sum_{k=1}^{K} a_m |x(n)|^{-k} \]

The number of coefficients is the same as in MPM. It should be mentioned that, although only the first order of the delayed input is used in the formulation, the method still models nonlinear memory effects. That is because the weight of each term is still nonlinear, even though it is now a nonlinear function of only the current sample.

Since the memory term is at a different time from the variable of the nonlinear weight for \( m \neq 0 \), eq. (5) has cross terms which endow the model with an improved characterization performance compared to those found in eq. (3), and which can compensate for the simplification of the nonlinear function in eq. (6). This feature is discussed in the next section. Moreover, the absolute input value only at the present time is considered for the nonlinear gain computation. Hence the complexity, especially for FPGA implementation, can be reduced and the model is more computationally and energy efficient although the number of the coefficients is the same as the memory polynomial. This gain of efficiency can be illustrated by referring to table I [6] and Fig. 1. In all diagrams of Fig. 1, the complex-complex multipliers are illustrated in dark gray, while the light gray circles are complex-real multipliers and the white circles indicate real-real multipliers. If a standard architecture for the FPGA implementation is used as in Fig. 1a, more FLOPS will be utilized than if the proposed architecture in Fig. 1b is used. This is mainly due to the higher number of complex-complex multiplications in Fig. 1a. It is worth mentioning that the propagation delay difference due to the branch in Fig. 1b connecting the input to the last multiplier is not a critical issue and can be compensated, if required, by inserting a delay block in that branch.

![Figure 1](https://example.com/figure1.png)

**Figure 1.** Standard architecture of MPM, b) proposed architecture for MPM and c) proposed architecture for CMPM.
If the same optimized architecture is applied in CMPM, it can be easily seen that a lower number of delay blocks than in MPM is required, while the rest of the circuit remains the same as for MPM, as shown in Fig. 1c. Therefore the overall FPGA architecture complexity has been reduced by using the CMPM. Like the MPM, the equations for determining the coefficients are linear, and common methods like Recursive Least Square can be used to find them.

<table>
<thead>
<tr>
<th>Operation</th>
<th>Number of FLOPs</th>
</tr>
</thead>
<tbody>
<tr>
<td>Conjugate</td>
<td>0</td>
</tr>
<tr>
<td>Delay</td>
<td>0</td>
</tr>
<tr>
<td>Real addition</td>
<td>1</td>
</tr>
<tr>
<td>Real multiplication</td>
<td>1</td>
</tr>
<tr>
<td>Complex addition</td>
<td>2</td>
</tr>
<tr>
<td>Complex-real multiplication</td>
<td>2</td>
</tr>
<tr>
<td>$</td>
<td>.</td>
</tr>
<tr>
<td>Complex – complex multiplication</td>
<td>6</td>
</tr>
<tr>
<td>Square-root</td>
<td>6–8</td>
</tr>
</tbody>
</table>

### TABLE I. NUMBER OF FLOPS FOR DIFFERENT OPERATIONS [6]

III. EVALUATION RESULTS AND DISCUSSION

The new model is evaluated by comparing its performance with MPM, as well as with a memoryless model which is defined by $M=I$ in eq. (5). The model has been identified by applying a WCDMA single-carrier signal to the amplifier. Only 3000 time samples were used for the parameters estimation. The performance is then evaluated using a much longer sequence comprising about 50000 samples. The carrier frequency in both cases is 2.14 GHz. The recursive least square method is used for solving the set of linear equations to find the coefficients of eq. (6). The performances of the models are examined by testing two amplifier evaluation boards.

The first device under test, referred to as Amp #1, is a Sirenza amplifier evaluation board, and the second one, hereafter indicated by Amp #2, is a Freescale amplifier. Both devices are tested such that their instantaneous power level exceeds the 1 dB compression point. The test setup is illustrated in Fig 2. It consists of an arbitrary waveform generator (AWG) and a vector signal analyzer (VSA), both connected to a PC. The baseband digital signal is generated in the PC and uploaded into the AWG. Then the AWG converts the signal into an RF time-domain waveform with a 2.14 GHz carrier frequency. The generated AWG output signal passes through the amplifier and is detected and captured by the VSA. The VSA converts the waveform back into the baseband digital domain using its internal ADCs. The digitized output is fed back to the PC for processing in MATLAB. Note that in these experimental evaluations the aim is to show the suitable accuracy of the model. So a PC has been used for the computations and the FPGA architectures have not been implemented yet.

The measurement and its corresponding modeling results for a single carrier WCDMA signal applied to Amp #1 are shown in Fig. 3. In this evaluation, CMPM and MPM have the same memory depth and polynomial order of 4 and 7, respectively. The figure illustrates the highly suitable performance of the proposed models although the instantaneous power level of the amplifier is reaching the 2 dB compression point.

To show the consistency of the CMPM, the Amp #2 output has been measured and modeled. The results are shown in Fig 4. In this case both the MPM and CMPM have the same memory depth and polynomial order of 3 and 9, respectively. The convincing performance of the CMPM is also proved in this evaluation which uses a WCDMA signal reaching the 2 dB compression point.

The assessment of the performance can be made easier with table II. The criteria in table II for assessment is the normalized mean square error (NMSE), defined by

$$NMSE(dB) = 10 \log_{10} \left( \frac{1}{N} \sum_{n=1}^{N} |y(n) - y(n)|^2 \right)$$

Figure 2. Measurement set-up for modeling.

Figure 3. Modeling performances of MPM and CMPM for Amp #1 with a measured 1xWCDMA signal reaching the 2 dB compression point of the DUT.
The modeling capability of both MPM and CMPM have upper limit for high output power. This fact can be seen when the output power of Amp # 2 for instance is increased to reach the 2.5 dB operating point. Although both models in this case have still NMSE values higher than -39 dB, their prediction of adjacent channel power levels start to degrade as shown in Fig. 5. Therefore, like any other models care has to be taken when working with the CMPM and MPM at power levels close to saturation.

All these results confirm the reliable performance of CMPM, although the nonlinear weight is simplified into a function of only the present input sample. The model has the capability to model the behavior of a microwave power amplifier with power levels close to saturation.

IV. CONCLUSION

A novel formulation for modeling RF circuits is proposed. It is a modification to the memory polynomial with memory weights as a nonlinear function of the present input sample, implying a simpler and more energy-efficient implementation in FPGA. The performance of the proposed model is assessed using WCDMA signals on two different devices. The experimental results show that the method can suitably model nonlinear circuits with memory effects in wideband applications and up to high power levels close to saturation.

ACKNOWLEDGMENTS

The authors acknowledge the financial support by FWO-Flanders, KU Leuven GOA projects, and the Centre for Telecommunications Research (CTVR) and Science Foundation Ireland Grant 10/CE/I1853.

REFERENCES


TABLE II. PERFORMANCE OF CMPM AND MPM FOR MEASURED WCDMA SIGNAL REACHING THE 2 dB COMPRESSION POINT

<table>
<thead>
<tr>
<th>Amp</th>
<th>ML</th>
<th>MPM</th>
<th>CMPM</th>
</tr>
</thead>
<tbody>
<tr>
<td>#1</td>
<td>-27.4</td>
<td>-38.5</td>
<td>-39.1</td>
</tr>
<tr>
<td>#2</td>
<td>-27.8</td>
<td>-39.1</td>
<td>-39.6</td>
</tr>
</tbody>
</table>

in which \( \hat{y}(n) \) and \( y(n) \) are the estimated and the measured output, respectively.

A quantitative comparison is shown in table II where ML stands for the memoryless model. According to the table, the CMPM has a slightly better performance than MPM. Note that the big differences between NMSE values of both MPM and CMPM, and the NMSE value of the memoryless model indicate the existence of memory.