# Advanced Equalization Circuit Techniques for High-Speed 4-Level Pulse Amplitude Modulation (PAM-4) Serial Links

by

Can WANG

A Thesis Submitted to

The Hong Kong University of Science and Technology
in Partial Fulfillment of the Requirements for
the Degree of Master of Philosophy
in the Department of Electronic and Computer Engineering

July 2020, Hong Kong

# **Authorization**

I hereby declare that I am the sole author of the thesis.

I authorize the Hong Kong University of Science and Technology to lend this thesis to other institutions or individuals for purpose of scholarly research.

I further authorize the Hong Kong University of Science and Technology to reproduce the thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.

Can WANG

Can WANG ZKH

July 2020

### Advanced Equalization Circuit Techniques for High-Speed 4-Level Pulse Amplitude Modulation (PAM-4) Serial Links

by

#### Can WANG

This is to certify that I have examined the above MPhil thesis and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the thesis examination committee have been made.

Prof. C. Patrick YUE, ECE Department (Thesis Supervisor)

Prof. Bertram SHI (Head of ECE Department)

#### Thesis Examination Committee:

- 1. Prof. Volkan KURSUN (Chairperson) Dept. of Electronic and Computer Engineering
- 2. Prof. C. Patrick YUE (Supervisor)
  - Dept. of Electronic and Computer Engineering
- 3. Prof. Jie George YUAN Dept. of Electronic and Computer Engineering

Department of Electronic and Computer Engineering The Hong Kong University of Science and Technology July 2020

Advanced Equalization Circuit Techniques for High-Speed
4-Level Pulse Amplitude Modulation (PAM-4) Serial Links

### by Can WANG

Department of Electronic and Computer Engineering
The Hong Kong University of Science and Technology

### **Abstract**

Global IP traffic is predicted to triple in five years since 2016. The coming 5G mobile network deployment aim at providing higher bandwidth to each user, together with internet of things, poses stringent demand on ethernet infrastructures. Big data and cloud computing driven by AI/machine learning, generates tremendous data traffic. All of them combine together, pushes the data rate beyond 100Gbps at every interface. Higher data rate comes with server channel loss at Nyquist frequency and more demanding on power in order to provide enough gain. The increasing power consumption not only adds extra operation cost but also reduce reliability. Especially in data centers, where the cooling system consumes a significant amount of power, urges us to reduce the power consumption so as to improve the energy efficiency and reliability. The current mainstream Ethernet standard is developing from 100 Gb/s to the next generation 200-400 Gb/s standard. Since the existing electrical/optical interconnect chips supporting 100 Gb/s cannot meet the speed requirements of the next generation Ethernet standard, industry and academia are researching and formulating solutions for the next generation Ethernet 200-400 Gb/s. Therefore, there is an urgent need to research and design electrical/optical interconnect chips to support the 200-400 Gb/s standard. Nonreturn-to-zero (NRZ) signaling has been the mainstream modulation scheme for

data rate below 40Gb/s due to its simplicity. As the data rate goes beyond 50Gb/s NRZ signaling faces server channel loss. The 4-level pulse amplitude modulation becomes popular when data rate > 50Gbps since it transmits 2 bits a time to lower the signal Nyquist frequency. PAM-4 signaling with doubled bandwidth efficiency is a promising solution for energy efficient 200-400 Gb/s transceiver design. In this thesis, an energy efficient continuous time linear equalizer (CTLE) works at 52Gb/s and a 56Gb/s DML laser driver front-end using piece-wise feed-forward equalizer that compensates for laser nonlinearities are presented.

The first work presents a low-power PAM-4 receiver for very-short reach (VSR) applications enabled by the proposed single-stage multiple peaking CTLE, fabricated in 40-nm TSMC CMOS process. A wide bandwidth phase lock loop (WBW-PLL) is utilized to avoid the free running frequency shift and suppress the intrinsic phase noise of the ring oscillator, provides a low-jitter clocking performance with low power consumption. With a voltage-controlled delay line (VCDL) adjusted by a bang-bang phase detector and a charge pump, the data phase is recovered. This receiver IC achieves a bit efficiency of 0.92 pJ/bit/s while compensating for 7.3-dB channel loss at 13 GHz.

The second work presents a novel transmitter-side piece-wise feed-forward equalization circuit that compensates for laser/modulator nonlinearities. PAM-4 is very sensitive to inter-symbol interference caused by laser diode nonidealities such as limited bandwidth, dynamic bandwidth variations with respect to signal levels. This method implements 3 unary data signals to synthesize one PAM-4 signal by a summer at the final output stage. Feedforward equalization coefficients are generated according to each unary data pattern and the peaking frequency for each unary data can be adjusted independently. With linear superposition assumption, it can generate all the equalization coefficients for different transitions. The piecewise equalization is simulated with compact Verilog-A DFB laser nonlinear model using the simulation in 40-nm TSMC CMOS process.

Dedicated to my family

# **Acknowledgments**

Looking back to the years spent in the Hong Kong University of Science and Technology, I have been surrounded by a group of wonderful people. Although there were failures and regrets, but in the end, I enjoyed every moment at HKUST.

First and foremost, I have to thank my thesis supervisor and life mentor Professor C. Patrick YUE for his support and encouragement in research and life. I must say that having meetings with Prof. YUE is very valuable as he is always generous to provide suggestions to help me improve my work and presentation skills. Besides, I have been very honored to help with the funding applications, through which I have learned a lot. I feel incredibly fortunate to have him as my supervisor.

I would like to thank Professor George YUAN and Professor Vulkan KURSUN for serving as my thesis examination committees during such unusual period. My thanks also extend to Professor Qianneng ZHOU for his help and support in the times of need.

Gratefulness also goes to lab mates, Dr. Guang ZHU, Dr. Zhao ZHANG, Dr. Xiangyu MENG, Dr. Duona LUO, Dr. Weimin SHI, Li, Babar, Milad, Xuan, Fredrick, Jian and so on. I recall the talks and discussions with Guang and Zhao to be extra inspiring to me. I shall remember those moments forever.

Throughout my entire life, my families have always been supportive to me. I hereby dedicate this thesis to my family for their unconditional love and care through the years. Also, I had my luck to meet and fall in love with Joanna in the last year of my MPhil study in Hong Kong, which makes the time I spent in Hong Kong much more memorable than it would be.

# **Contents**

| CHAPT          | ΓER   | 1. INTORDUCTION                                     | 1  |
|----------------|-------|-----------------------------------------------------|----|
| 1.1            | Res   | search Background                                   | 1  |
| 1.2 Challenges |       | allenges                                            | 2  |
|                | 1.2   | .1 Energy efficiency                                | 2  |
| 1.2            | 2.2   | Laser/modulator Nonlinearities                      | 2  |
| 1.3            | The   | esis Organizations                                  | 3  |
| Refer          | rence |                                                     | 3  |
| СНАРТ          | ΓER 2 | 2. ENERGY EFFICIENT ANALOG EQUALIAZER               | 4  |
| 2.1            | Int   | oduction                                            | 4  |
| 2.2            | 2.1   | Dielectric Loss                                     | 5  |
| 2.1            | .2    | Skin Effect Loss                                    | 6  |
| 2.2            | 2.3   | Loss in Transmission Line                           | 8  |
| 2.2            | Red   | ceiver System Design                                | 10 |
| 2.3            | Pri   | or Arts of MP-CTLE                                  | 16 |
| 2.3            | 3.1   | Cascade-stage CTLEs                                 | 16 |
| 2.3            | 3.2   | CTLEs with Feedback/feedforward High-pass Amplifier | 17 |
| 2.4            | Pro   | posed Single-stage MP-CTLE                          | 18 |
| 2.4            | .1    | Design Consideration and Schematics                 | 18 |
| 2.4            | 1.2   | Transfer Function and Approximation Methods         | 30 |
| 2.4            | Exp   | periment Results                                    | 31 |
| 2.5            | Re    | ferences                                            | 39 |
| СНАРТ          | ΓER : | 3. PIECE-WISE EQUALIZATION FOR TRANSMITTER          | 41 |
| 3.1            | Inti  | roduction                                           | 41 |
| 3.2            | Co    | mpact Laser Model                                   | 45 |
| 3.3            | Pro   | posed Piece-wise Equalization                       | 47 |
| 3.3            | 3.1   | System Design                                       | 47 |
| 3.4            | Sin   | nulation Results                                    | 52 |
| 3.5            | Ret   | ference                                             | 55 |

| CHAP | TER 4. CONCLUSION AND FUTRUE WORK | 57 |
|------|-----------------------------------|----|
| 4.1  | Conclusion                        | 57 |
| 4.2  | Future Work                       | 57 |
| CHAP | TER 5. Appendix                   | 59 |

# **List of Figures**

| Fig. 1 Power consumption breakdown of data centers [1]                                     | 2  |
|--------------------------------------------------------------------------------------------|----|
| Fig. 2 Energy consumption breakdown for single addition operation in CPUs [2]              | 2  |
| Fig. 3 OIF CEI-56G-VSR-PAM4 typical applications [8]                                       | 5  |
| Fig. 4 Cross-section of a cylindrical conductor showing the skin depth $\delta$ [10]       | 7  |
| Fig. 5 Circulating eddy currents cancelling the current flow in the conductor center [10]. | 7  |
| Fig. 6 Schematic representation of infinitesimal segment of transmission line              | 8  |
| Fig. 7 Loss decomposition of a typical PCB-based transmission line                         | 10 |
| Fig. 8 System diagram of the source synchronous PAM-4 receiver                             | 10 |
| Fig. 9 Current-mode summer of the FFE block.                                               | 11 |
| Fig. 10 Timing diagram of the S/H, FFE and slicing point.                                  | 12 |
| Fig. 11 PAM-4 transitions (left) and transition selection phase detector (right)           | 12 |
| Fig. 12 Schematics of the CMOS delay cell                                                  | 13 |
| Fig. 13 Simulated delay versus control voltage VC cross corners                            | 14 |
| Fig. 14 Measured delay line tuning range                                                   | 14 |
| Fig. 15 System diagram of wide bandwidth ring oscillator based PLL                         | 14 |
| Fig. 16 Conventional cascade-stage MP-CTLEs for 12.5 Gbps NRZ                              | 16 |
| Fig. 17 Conventional cascade two stage CTLE                                                | 17 |
| Fig. 18 Feedback enabled CML high-pass amplifier                                           | 17 |
| Fig. 19 Simulated frequency response of the two-stage CTLE.                                | 18 |
| Fig. 20 Two-stage CTLE with our proposed single-stage MP-CTLE                              | 19 |
| Fig. 21 Simplified system diagram of the single-stage MP-CTLE                              | 20 |
| Fig. 22 Frequency response of single-stage MP-CTLE with LF-CTLE on/off                     | 23 |
| Fig. 23 Compensated lossy channel response with and without LF-CTLE                        | 24 |

| Fig. 24 Integral of ISI magnitude with LF-CTLE turned off                              | 25      |
|----------------------------------------------------------------------------------------|---------|
| Fig. 25 Integral of ISI magnitude with LF-CTLE turned on                               | 25      |
| Fig. 26 Simulated 25 Gbps NRZ eye diagram without LF-CTLE                              | 26      |
| Fig. 27 Simulated 25 Gbps NRZ eye diagram with LF-CTLE                                 | 26      |
| Fig. 28 Measured S21 of a Rogers PCB transmission line                                 | 27      |
| Fig. 29 3-inch Rogers PCB transmission line                                            | 27      |
| Fig. 30 Corrupted 56 Gbps PAM-4 signal after passing through the measured PC           | B S21   |
| response                                                                               | 27      |
| Fig. 31 Recovered 56 Gbps PAM-4 signal with LF-CTLE                                    | 28      |
| Fig. 32 Recovered 56 Gbps PAM-4 signal without LF-CTLE                                 | 29      |
| Fig. 33 Numerical simulation of the original and approximated system transfer function | ı31     |
| Fig. 34 52-Gbps PAM-4 receiver chip die photo                                          | 31      |
| Fig. 35 Test bench setup for the BER bathtub curve measurement                         | 32      |
| Fig. 36 Test bench diagram                                                             | 32      |
| Fig. 37 Measurement PCB board layout and signal input                                  | 33      |
| Fig. 38 Measured S21 of the coaxial cable                                              | 33      |
| Fig. 39 Measured corrupted 52 Gbps PAM-4 signal after the lossy channel                | 34      |
| Fig. 40 The decoded PRBS-7 quarter-rate MSB and LSB data transient waveform            | 34      |
| Fig. 41 Quarter-rate recovered data shown in the sampling scope: MSB (left) and LSB    | (right) |
|                                                                                        | 35      |
| Fig. 42 Measured BER bathtub of PRBS-7, PRBS-9 with/without LF-CTLE                    | 35      |
| Fig. 43 Measured divide by 2 output clock time-domain jitter                           | 36      |
| Fig. 44 Measured phase noise profile of the recovered clock                            | 37      |
| Fig. 45 System power consumption breakdown                                             | 37      |
| Fig. 46 Bit efficiency versus channel loss trend                                       | 39      |

| Fig. 47 Typical VCSEL cross-section [12]                                         | 41           |
|----------------------------------------------------------------------------------|--------------|
| Fig. 48 Basic compact VCSEL laser model [14]                                     | 42           |
| Fig. 49 Measured S21 of VCSEL laser under different bias condition [20]          | 43           |
| Fig. 50 Thermometer code decomposition of PAM-4 signal                           | 44           |
| Fig. 51 Consecutive symbol transitions table                                     | 44           |
| Fig. 52 Piece-wise equalization concept                                          | 45           |
| Fig. 53 Schematics of the Verilog-A compact VCSEL model                          | 45           |
| Fig. 54 Measured VCSEL laser S21 response (left), simulated VCSEL laser S21 res  | ponse under  |
| differetn bias condition [12]                                                    | 46           |
| Fig. 55 Simulated 25 Gbps eye diagram under 3 mA bias (left); Measured 25 Gbps   | eye diagram  |
| under 3 mA bias (right)                                                          | 47           |
| Fig. 56 Simulated 25 Gbps eye diagram under 6 mA bias (left); Measured 25 Gbps   | eye diagram  |
| under 6 mA bias (right)                                                          | 47           |
| Fig. 57 Proposed 56-Gbps PAM-4 DML driver with piece-wise equalization           | 48           |
| Fig. 58 Differential binary to thermometer code converter                        | 48           |
| Fig. 59 Rising and falling selector                                              | 49           |
| Fig. 60 Pulse generator diagram with programable delay element                   | 49           |
| Fig. 61 Red solid line stands for the current data symbol, whereas the green lin | e stands for |
| rising pulse and the brown curve stands for falling pulse                        | 50           |
| Fig. 62 Undershoot with the falling edge equalization turned on                  | 51           |
| Fig. 63 Overshoot with the rising edge equalization turned on.                   | 51           |
| Fig. 64 Overshoot and undershoot on both top eye and bottom eye                  | 51           |
| Fig. 65 Transmitter Optical Sub-Assembly dimensions [23]                         | 52           |
| Fig. 66 Loading network of the laser driver using TOSA package                   | 53           |
| Fig. 67 Behavioral dynamic bandwidth versus current relationship                 | 53           |

| Fig. | 68 56-Gbps PAM-4 signal after the dynamic bandwidth behavioral model     | 54 |
|------|--------------------------------------------------------------------------|----|
| Fig. | 69 PAM-4 signal compensated by the conventional binary code equalization | 54 |
| Fig. | 70 PAM-4 signal compensated by the proposed piece-wise equalization      | 55 |

#### **CHAPTER 1. INTORDUCTION**

### 1.1 Research Background

The ever-growing data traffic poses more and more challenges on the wireline and optical circuit system design. Especially the cloud computing, big data and 5G greatly enlarge the scale of data centers and warehouse. Owes to the unprecedent scale of the data centers and warehouses, one of the most critical problems of the data center is cooling. High-speed communication links between computational devices and data storages are badly needed. Electricity and cooling energy consumption become a huge burden. Therefore, the demands for energy efficient communication systems are increasing.

Also, the cost of optical devices is proportional to scales. Especially, as the communication distance prolongs, the laser device such as DFB laser, MZ modulator and EA-DFB becomes increasingly expensive. Advanced modulation scheme requires linear, high-bandwidth and coherent devices. Cheap laser device often means more nonlinearity, slow and wider FWHM. Electrical compensation methods to improve the laser device bandwidth, linearity and dispersion are of highly interests.

In order to achieve high bandwidth efficiency, advanced modulation scheme of 4-level Pulse Amplitude Modulation (PAM-4) is used. Compare to NRZ, PAM-4 signal transmits two-bit a time thus double the data rate while the Nyquist frequency of the baseband signal remains the same. In other words, PAM-4 signal trades signal to noise ratio for bandwidth efficiency. Therefore, PAM-4 signals are more vulnerable to inter-symbol interference (ISI).

### 1.2 Challenges

### 1.2.1 Energy efficiency



Fig. 1 Power consumption breakdown of data centers [1]

In the pie chart of Fig. 1, the power consumption of cooling system in a data center occupies up to 38% of the total energy consumption. Among the processor energy consumption (computing), the energy spends on communication is substantial, as shown in Fig. 2 the power breakdown of CPU to finish addition operation. The cache and register access consume 31pJ of the 70pJ total power consumption (around 44.3%).

# Instruction Energy Breakdown



Fig. 2 Energy consumption breakdown for single addition operation in CPUs [2]

Therefore, energy efficiency of the communication links can make a huge impact on saving the cooling system and computation system power consumption.

#### 1.2.2 Laser/modulator Nonlinearities

Popular laser devices in data center such as VCSEL, DFB and EA-DFB etc. These devices all experience with some extent of nonlinearities. For direct modulation lasers like VCSEL and DFB laser, the injection efficiency decreases at high bias current and temperature. Also, due to diode laser nature, the 3-dB bandwidth is dynamically changing with respect to

#### the injection current. [3]

Advanced modulation like PAM-4 signaling are sensitive to the ISI and signal distortions caused by the insufficient bandwidth and laser nonlinearities.

### 1.3 Thesis Organizations

This thesis is organized as follow:

Chapter Two covers the design of a low-power 52-Gbps PAM-4 receiver with an emphasis on our proposed single stage multiple peaking CTLE (MP-CTLE) on energy saving and low frequency equalization. The chip is fabricated in TSMC 40-nm technology. Experimental results show great improvements on energy efficiency and bit error rate performance.

Chapter Three introduces the laser nonlinearity, compact modeling using Verilog-A and our proposed piece-wise equalization. The compact laser model is introduced. The dynamic bandwidth changing is modeled by a behavior model and our proposed piece-wise equalization scheme is applied to compensate for the bandwidth insufficiency and dynamic bandwidth change. Simulation results are presented.

Chapter Four draws the conclusion and introduce the future work. Some source code example are presented in Appendix.

### Reference

- [1] http://large.stanford.edu/courses/2018/ph240/mangu2/
- [2] M. Horowitz, "1.1 Computing's energy problem (and what we can do about it)," 2014 IEEE International Solid-State Circuits Conference Digest of Technical Papers (ISSCC), San Francisco, CA, 2014, pp. 10-14.
- [3] Coldren, Larry A., Scott W. Corzine, and Milan L. Mashanovitch. Diode lasers and photonic integrated circuits. Vol. 218. John Wiley & Sons, 2012.

### CHAPTER 2. ENERGY EFFICIENT ANALOG EQUALIAZER

### 2.1 Introduction

Data boom has been emphasized many times in this thesis. We also know that 25~28-Gb/s NRZ links are still the mainstream in industrial applications. To meet the data boom, the data rate of the next-generation I/O will exceed 50 Gb/s. However, the design of powerefficient 50-Gb/s NRZ I/Os is quite challenging even with the advanced 16 nm CMOS FinFET technology. The advent of PAM4 signaling relaxes the design challenges by halving the Nyquist frequency of the NRZ transceivers. 56-Gb/s PAM4 I/O is becoming the research focus and will be the mainstream in near future. Recently, Optical Internetworking Forum (OIF) has released the 56-Gb/s PMA4 I/O standards: OIF CEI-56G-PAM4, among which OIF-CEI-56G-VSR-PAM4 is for chip-to-module communication with a channel loss of 10 dB at Nyquist frequency (14 GHz) as shown in Fig. 3. It is well known that ADC-based PAM4 receivers and mixed-signal PAM4 receivers are two popular topologies, which have been implemented extensively [1-7]. For middle-reach and long-reach applications, the channel usually has a loss of > 30 dB at Nyquist frequency. To compensate these kinds of channels, mixed-signal receivers have to employ a DFE with a number of taps leading to the increase of the circuit complexity and power consumption, and ADC-based topology is more suitable since it is convenient to implement more advanced equalization and PAM4-to-NRZ decoding in digital domain. For very-short-reach and short-reach applications with medium lossy channels (< 20 dB), mixed-signal topology is more preferred since analog circuit techniques can be fully taken advantage of to achieve good power efficiency. In this chapter, a 56-Gb/s mixed-signal PAM4 receiver targeting VSR applications will be presented. PAM4 receiver has new design challenges over NRZ counterpart: 1) PAM4 signal has four voltage levels and is very sensitive to bandwidth effect including limited bandwidth and over peaking; 2) the small eye opening of PAM4 signal causes a larger delay of slicing, so the timing constraint of the DFE first tap is more stringent; 3) PAM4 has 16 kinds of transitions between neighbor bits, and the PD design in CDR should be considered carefully. The receiver in this chapter will address the issues.



Fig. 3 OIF CEI-56G-VSR-PAM4 typical applications [8]

The CEI-56G-VSR-PAM4 standard defines the connection between chip and modules, including 7.3 dB host PCB loss, 1.2 dB connector loss and 1.5 dB module PCB loss. In general, the VSR standard can cover up to 10 cm. In this work, we designed for 10 dB loss, but due to noise coupled from power supply and internal slicer modules with the absence of on-chip LDOs/Regulators.

#### 2.2.1 Dielectric Loss

Dielectric Loss quantifies a dielectric material's inherent dissipation of electromagnetic energy (e.g. heat). It can be parameterized in terms of either the loss angle  $\delta$  or the corresponding loss tangent tan  $\delta$ . Both refer to the phasor in the complex plane whose real and imaginary parts are the resistive (lossy) component of an electromagnetic field and its reactive (lossless) counterpart.

Alternating EM wave to propagating through a transmission line, in a microstrip line, or waveguide etc., the wave and material interaction is described by Maxwell's equation that satisfies the boundary conditions of certain geometrical shapes. The material properties are represented by permittivity  $\varepsilon$ , permeability  $\mu$ , and conductivity  $\sigma$ .

In a typical wireline application scenario, the dielectric material is often the mechanical

supporting or separation material between conductors. The permittivity can have both real part and imaginary part as denoted

$$\varepsilon = \varepsilon' - j\varepsilon''$$
.

The real part  $\varepsilon'$  of the permittivity denotes the lossless permittivity given by the product of free space permittivity and the relative permittivity. The imaginary part  $\varepsilon''$  of the permittivity represents the loss associated with bound charge and dipole oscillations. Intuitively, any form of the energy loss must accompany with energy exchange which doesn't happen instantaneously. There must be a phase delay along with the energy exchange, the imaginary part is used to describe the delay brought by energy exchange in the interaction between EM wave and dielectric materials.

With zero conduction current assumption, the dielectric loss can be quantified by the loss tangent

$$\tan \delta = \frac{\varepsilon''}{\varepsilon'}$$

For small loss, this angle is <<1 and then  $\tan \delta = \delta$ . The energy decays along the propagation direction z following the equation below [9]:

$$P = P_0 e^{-\delta kz},$$

where  $P_0$  is the initial power;

$$k=\frac{2\pi}{\lambda};$$

 $\lambda$  is the wavelength in the dielectric material.

From the above equation, the dielectric loss is linearly dependent on the frequency.

#### 2.1.2 Skin Effect Loss

Skin effect is the tendency of AC currents distributes more near the surface of conductor, meaning the largest current density occurs near the surface and current density decreases as depth increase. This phenomenon causes the equivalent resistance of conductor to increase as

the signal frequency increases.



Fig. 4 Cross-section of a cylindrical conductor showing the skin depth  $\delta$  [10]

Distribution of current flow in a cylindrical conductor, shown in the above cross section. For alternating current, the current density decreases exponentially from the surface towards the inside, as shown in Fig. 4. The skin depth,  $\delta$ , is defined as the depth where the current density is just 1/e (about 37%) of the value at the surface; it depends on the frequency of the current and the electrical and magnetic properties of the conductor.



Fig. 5 Circulating eddy currents cancelling the current flow in the conductor center [10]

The red circle in Fig. 5 denotes eddy current induced by the alternating magnetic field in and around the conductor. The change in the alternating current I result in the change of the magnetic field, in turn, creates an electric field against the current I.

The most significant influence of the skin effect is the increase in the conductor resistance with respect to frequency. The AC resistance of a cylindrical conductor can be formulated as follow:

$$R \approx \frac{L\rho}{\pi D\delta}$$

Where, L is then conductor length;

D is the diameter of the cylindrical conductor;

 $\delta$  is the skin depth,

 $\rho$  is the resistivity of the conductor.

The skin depth at frequencies much below  $1/\rho\epsilon$  can be approximately expressed as:

$$\delta = \sqrt{\frac{2\rho}{\omega\mu}}$$

It is thus obvious that the loss brought by skin effect is proportional to the square root of frequency.

#### 2.2.3 Loss in Transmission Line

Transmission line is the most commonly used embodiment to confine the EM energy.

At radio frequency, wires are no longer the simple lumped model they are in the low frequency applications since the wave nature of the RF signal must be considered.

In wireline application, transmission lines are often constructed by two conductors such as coaxial cable, microstrip line and coplanar waveguide etc. Telegrapher's equations are a pair of coupled, linear partial differential equations that describe the voltage and current along the transmission line. A classic transmission line model is shown in Fig. 6 [11],



Fig. 6 Schematic representation of infinitesimal segment of transmission line. This model is the distributed-element model. It is an infinitesimal representation of the transmission line. The dx denotes unit length, and R, L, G, C are the schematic representations

for the transmission line. The line voltage V(x) and current I(x) can be expressed in the following form:

$$\frac{\partial V(x)}{\partial x} = -(R + j\omega L)I(x)$$

$$\frac{\partial I(x)}{\partial x} = -(G + j\omega C)V(x)$$

This is a pair of coupled partial differential equations, where the V(x) and I(x) cannot be solved explicitly. To separate them, we can take derivative and substitute the other equation into it:

$$\frac{d^2V}{dx^2} = (R + j\omega L)(G + j\omega C)V$$

Let  $\gamma = \sqrt{(R + j\omega L)(G + j\omega C)} = \alpha + j\beta$ , where  $\alpha$  is the attenuation constant and  $\beta$  is the phase constant. The solution for the above homogenous differential equation is:

$$V(x) = V^+ e^{-\gamma x} + V^- e^{+\gamma x}$$

The solution for I(x) takes the same form. From the above solution, it is easy to define a propagation constant  $\gamma = \alpha + i\beta$ . In copper wires, the attenuation constant is given by:

$$\alpha = \sqrt{RG}$$

Relating back to dielectric loss and skin effect loss, R accounts for skin effect loss and G accounts for the dielectric loss. As the skin effect loss is proportional to the square root of the frequency and dielectric loss is proportional to the frequency, the skin effect loss grows faster at low frequency, one example is shown in Fig. 7.



Fig. 7 Loss decomposition of a typical PCB-based transmission line

The conductor loss (mainly skin effect loss) dominates the loss characteristic below ~1.1 GHz. After that, dielectric loss surpasses the conductor loss, dominates the high frequency loss behavior.

# 2.2 Receiver System Design



Fig. 8 System diagram of the source synchronous PAM-4 receiver

Fig. 8 shows the proposed PAM-4 receiver topology including four parts: 1) A two-

stage CTLE with the proposed single stage MP-CTLE; 2) four Data & Edge paths; 3) Transition selection PD/CP; 4) WBW-PLL as the MPCG. The two-stage CTLE aims at opening the data eye before the sample/hold (S/H) stages. The compensated signal will then be sampled by quarter-rate clock CKD<sub>x</sub>/CKE<sub>x</sub> and decoded to RD<sub>x</sub>. The edge information is sampled by the slicer to RE<sub>x</sub>. The data samples and edge samples control the PD, which drives the charge pump to adjust the voltage-controlled delay line (VCDL) in the reference clock path to align the sampling clock with input data. For wireline receiver, equalization and clocking are the most critical blocks that affects the BER performance and power consumption. By removing extra stage in conventional multiple peaking cascade CTLE design, our single stage MP-CTLE can provide similar performance while saving significant amount of power. The power saving characteristic of our proposed MP-CTLE will be explained in the following paragraph using single/cascade stage CML amplifiers as example. The frequency response of the MP-CTLE will be discussed in section 2.4. This section will cover the transition selection PD and the design of WBW-PLL and VCDL.



Fig. 9 Current-mode summer of the FFE block.



Fig. 10 Timing diagram of the S/H, FFE and slicing point.

After CTLE and sample/hold, a feed-forward equalization is employed to further boost the high frequency energy. The FFE summer with SD1, SD2 as the input and SSD2 as the output, as depicted in Fig. 9. The FFE coefficient is adjusted by the tail current source I<sub>FFE</sub>. Each bit is extended to 2.5 UI after S/H. Subtract the 1-UI overlap between consecutive samples. FFE takes effect within a 1.5-UI window. Thus, we set the slicing point at the first half UI as shown in Fig. 10.



Fig. 11 PAM-4 transitions (left) and transition selection phase detector (right)

The PAM-4 signal possesses complicated data transition patterns that creates pattern dependent input jitter, as illustrated in Fig. 11 (left). The red transition lines have its crossing point centered uniformly, while the crossing points of green lines deviate from the center points due to its vertically asymmetrical positions. The deviation of crossing point causes pattern dependent jitter. The transition selection PD acts like a filter that only the red transitions will be valid for controlling the charge pump. Fig. 11 (right) shows the simplified transition selection phase detector diagram. If only MSB is considered, the crossing points of the green lines will transfer to clock phase wandering that deteriorates the bit error rate performance. To avoid this, the LSB toggle detection branch (red dotted box) is added so that only when LSB and MSB both toggles at the same time, the early-late information will be considered valid.



Fig. 12 Schematics of the CMOS delay cell



Fig. 13 Simulated delay versus control voltage VC cross corners



Fig. 14 Measured delay line tuning range



Fig. 15 System diagram of wide bandwidth ring oscillator based PLL

The ring oscillator (RO) based wide bandwidth (WBW) phase-locked loop (PLL) in this design as shown in Fig. 15, employs a 4-stage ring oscillator. The RO uses its six phases for data and edge sampling. The other two phases are feedback to PFD/CP without division. There are two major advantages of this RO-based WBW-PLL MPCG. Firstly, the PLL's bandwidth was designed to be 300 MHz to suppress the phase noise of the ring oscillator. Since the intrinsic phase noise of the ring oscillator is inversely proportional to power consumption, with the help of large loop bandwidth PLL, the ring oscillator can be a little noisy to save power. Secondly, open loop injection MPCG used in [5, 10] suffers from poor phase accuracy due to its unbalanced loading between injection node and other nodes. This RO-based WBW-PLL is fully symmetric, the phase accuracy is improved further by proper layout design. The measured frequency locking range is around 500 MHz at 6.5 GHz.

In Fig. 8, the frequency-synchronous external clock CKREF works as the reference clock of the ring oscillator based PLL after passing a VCDL. Since one of the output clocks of the PLL aligns with the delayed CKREF, the delay adjustment of the VCDL translates to phase adjustment of the output clocks of the PLL. The schematics of VCDL's delay cell is shown in Fig. 12. The delay cell adopts CMOS logic fashion, by tuning the loading resistance, the delay can be controlled. The cross-coupled PMOS exhibits negative resistance, where the PMOS controlled by control voltage VC provides positive resistance. Thus, the loading resistance can be adjusted through VC. To guarantee correct clock phase recovery, the tunable delay range of the VCDL should be at least 1-UI (~38.46 ps). The designed tunable delay range should also have enough margin to cover the PVT and accumulated phase shift of input data. A delay chain is employed and the simulated tunable delay range under different PVT are plotted with respect to control voltage VC, as shown in Fig. 13. Under ff corner and -40°C, the minimum tunable delay range is 58 ps, which leaves enough margin to resist PVT variations. The tunable delay range was measured by sweeping the VC, as shown in Fig. 14, the actual delay range reaches

66 ps. The WBW-PLL with VCDL only draws roughly 10mW from the supply, while providing good phase noise performance to ensure proper clocking for 52-Gb/s data.

### 2.3 Prior Arts of MP-CTLE

### 2.3.1 Cascade-stage CTLEs



Fig. 16 Conventional cascade-stage MP-CTLEs for 12.5 Gbps NRZ

The diagram for conventional cascade-stage MP-CTLE is shown in Fig. 16. The CTLE is designed to provide peaking at different frequency and cascaded together to obtain a multiple peaking response in the frequency domain.

This design bears very good flexibility and independent tuning capability. The conventional source degenerated differential pair is used repeatedly each stage provides only one peaking. Due to the cascaded design, the 3-dB bandwidth of each stage needs to be extended by burning more current.



Fig. 17 Conventional cascade two stage CTLE

### 2.3.2 CTLEs with Feedback/feedforward High-pass Amplifier



Fig. 18 Feedback enabled CML high-pass amplifier

This MP-CTLE is designed in a two-stage fashion (Fig. 18). LF-EQ is realized in the second stage. The main body is a current-mode logic differential pair, where the output is buffer by a degenerated differential pair and followed by a low-pass RC filter with a  $G_m$  cell.

It is similar in cascade fashion as compare to the previous conventional CTLEs with the second stage exhibits small variation. Thus, this approach suffers from excessive power consumption problem due to the added one extra stage too.

### 2.4 Proposed Single-stage MP-CTLE

#### 2.4.1 Design Consideration and Schematics

In this section we will cover the design consideration for the single-stage MP-CTLE. The frequency response of our two-stage CLTE is shown in Fig. 19. This is the major novelty in this thesis.



Fig. 19 Simulated frequency response of the two-stage CTLE.

CTLE creates peaking by degenerating the transconductance of CML differential pair capacitively at low frequency such that the amplifier's frequency response "peaks" at high frequency. In other words, the high frequency behavior is similar to CML amplifier. It is therefore legitimate to make analogy between CML and CTLE in terms of power and bandwidth trade-off owe to their similar high frequency characteristic. In Table. I, BW denotes the design target for both single stage CML amplifier and cascade CML amplifier's overall bandwidth. For the same overall bandwidth design target, the 3-dB bandwidth of each stage in cascade CML amplifier is  $\sqrt{N}/0.9$  times of the single stage counterpart, which indicates larger transconductance is needed to provide the same voltage gain since the loading resistor is reduced in order to meet the bandwidth requirement. As a result, the cascade stage amplifier

consumes  $\sqrt{N}/0.9$  times power of the single stage amplifier, where N is an integer and N>=2. For N=2, the total current consumption of cascade stage amplifier is around 1.57 times of the single stage amplifier for the same overall 3-dB bandwidth. This result shows very high accordance with the single stage MP-CTLE versus conventional cascade MP-CTLE simulation results in the following section.



Fig. 20 Two-stage CTLE with our proposed single-stage MP-CTLE

The two-stage CTLE with single-stage MP-CTLE is shown in Fig. 20, the second stage is our proposed single stage MP-CTLE. Simulated overall frequency response of the two-stage CTLE is shown in Fig. 20. The two-stage CTLE provides a dc offset suppression of 28-dB and a high frequency boost of 8 dB at 19 GHz, while the low frequency zero (denotes in the red circle) provides  $\sim$ 1.2-dB compensation starts from 500 MHz. The proposed single stage MP-CTLE employs multiple peaking schemes by feedback the output signal through a low pass filter (LPF) and subtract with the input signal. Its major advantage is power saving by eliminating the extra cascade stage for a second CTLE/CML feedback amplifier in prior works. Intuitively, the proposed single stage MP-CTLE reuses the transconductance of  $M_2$  in the

feedback loop to provide low frequency voltage gain suppression. After the LPF kicks in, the loop gain begins to decrease, thus, create the LF-EQ zero. For high frequency peaking, the LPF attenuates the feedback signal severely to break the loop, in other words, the feedback loop will not compromise the HF-EQ peaking amplitude. Thus, the single stage MP-CTLE is more energy efficient compare to conventional cascade stage MP-CTLE by providing HF-EQ and LF-EQ at the same time.



Fig. 21 Simplified system diagram of the single-stage MP-CTLE

With the added low frequency zero, the overall system transfer function becomes less explicit. The denominator consists of two complex poles and two real poles; hence, the undetermined coefficient method cannot be used to calculate the pole position [9]. Therefore, we propose an approximation approach to obtain the simplified overall system transfer function and pole position of this system. In Fig. 21,  $G_m$  is the degenerated transconductance of  $M_2$ ,  $Z_L$  is the output loading of the single stage MP-CTLE, which consists of shunt peaking inductor  $L_2$ , loading resistor  $R_{D2}$  and parasitic capacitance  $C_{L2}$ (refer to Fig. 2). The single stage MP-CTLE's output goes through an LPF, denotes as  $1/2\pi R_Z C_Z$ . The feedback network gain ratio is assumed to be 1. In real cases, this is not always true depends on  $g_{mFB}$  and previous stage's loading, which is close to unity in our case. We can write the complete closed loop transfer function as follow:

$$\frac{Vout(s)}{Vin(s)} = \frac{G_m(s) \cdot Z_L(s)}{1 + G_m(s) \cdot Z_L(s) \frac{1}{2\pi R_Z C_Z}}$$
(1)

where

$$G_m(s) = \frac{g_m(R_S C_S s + 1)}{1 + R_S C_S s + \frac{g_m R_S}{2}}$$

$$Z_{L}(s) = \frac{L_{p}s + R_{D}}{1 + R_{D}C_{D}s + L_{p}C_{D}s^{2}}$$

This complete closed loop transfer function can be simplified according to the fact that  $R_D/(1+R_DC_Ds+L_pC_Ds^2)$  dominates  $Z_L(s)$ . This approximation gives us a simplified version of the system transfer function. However, such approximations introduce errors at relatively wide and high frequency. Fig. 33 shows that, there is no significant difference at low frequency (< 2 GHz), meanwhile the pole location at high frequency remains relatively unchanged compare to the original system transfer function. It is therefore proper for us to write the simplified system transfer function and the approximated pole location as follow:

$$\frac{Vout(s)}{Vin(s)} \approx \frac{G_m(s) \cdot Z_L(s)}{1 + G_m(s) \cdot \frac{R_D}{2\pi R_Z C_Z}}$$

$$=\frac{g_m(R_SC_Ss+1)(R_ZC_Zs+1)}{\frac{2\tau_S\tau_Z}{2+g_mR_S+2g_mR_D}s^2+\frac{2\tau_S+2\tau_Z+g_mR_S\tau_Z+2g_mR_D\tau_S}{2+g_mR_S+2g_mR_D}s+1}\cdot\frac{L_ps+R_D}{1+R_DC_Ds+L_pC_Ds^2}$$

where  $\tau_X = R_X C_X$ .

From the simplified transfer function, assume that  $2R_ZC_Z + g_mR_SR_ZC_Z \gg 2R_sC_s + 2g_mR_DR_sC_s$ , it is easy to write the poles as follow:

$$P_1 = -(1 + \frac{2g_m R_D}{2 + g_m R_S}) \cdot \frac{1}{2\pi R_Z C_Z}$$
 (2)

$$P_2 = -(1 + \frac{g_m R_S}{2}) \cdot \frac{1}{2\pi R_S C_S} \tag{3}$$

Different than the cascade-stage MP-CTLE's pole position, which is only affected by degeneration resistors and capacitors, the single stage MP-CTLE's first pole  $P_1's$  location is determined by  $R_D$  and  $R_S$  together. From the above derivation, the major difference between our proposed single-stage MP-CTLE and cascade-stage MP-CTLE is that the first pole (2) of

our proposed single stage MP-CTLE sits at a higher frequency as the conventional cascade-stage MP-CTLE's first pole takes the same form of (3). As shown in Fig. 23, our single stage MP-CTLE can recover the lossy channel with/without the LF-EQ, while there is a 1.8-dB dip around 1.5 GHz when LF-EQ is off. The 3-dB bandwidth of both the LF-EQ on/off remain the same. Recovered PAM-4 eye diagram with LF-EQ in Fig. 31 shows significant improvement over the none LF-EQ counterpart in Fig. 32 exhibiting around 30% improvement on both vertical and horizontal opening. The differences are denoted by the comparison bars in Fig. 31 and Fig. 32, where the red bar stands for the vertical and horizontal opening of recovered PAM-4 eye diagram and green bar stands for improvements by turning on the LF-EQ.

The conventional cascade stage MP-CTLE exhibits very good design flexibility, but poor power efficiency. Due to its cascade structure, 3-dB bandwidth of each stage needs to be much higher than the overall 3-dB bandwidth. Since 3-dB bandwidth is inversely proportional to  $R_D$ ,  $g_m$  needs to be larger for higher bandwidth in order to provide enough gain. Thus, power

| TABLE I. SINGLE STAGE CML VS CASCADE CML |                                           |                                                            |
|------------------------------------------|-------------------------------------------|------------------------------------------------------------|
|                                          | Single Stage CML amplifier                | Cascade CML amplifier (N=number of stage)                  |
| $f_{-3dB\_oa}$                           | BW                                        | BW                                                         |
| $f_{-3dB}$ (each stage)                  | $1 \cdot BW$                              | $\frac{\sqrt{N}}{0.9} \cdot BW$                            |
| Ideal peak gain $A_V$ (decimal)          | $\frac{g_m}{2\pi \mathcal{C}_L \cdot BW}$ | $\frac{0.9}{\sqrt{N}} \cdot \frac{g_m}{2\pi C_L \cdot BW}$ |
| Current Consumption $I_D$                | $A_V V_{OV} \pi \mathcal{C}_L \cdot BW$   | $\frac{\sqrt{N}}{0.9} \cdot A_V V_{OV} \pi C_L \cdot BW$   |

#### Where

 $f_{-3dB\_oa}$  stands for overall 3-dB bandwidth of the CML amplifier;

 $g_m$  stands for the transconductance of the differential pair;

consumption is proportional to the number of stages/peakings. As shown in Fig. 22, the red curve is the frequency response of conventional cascade-stage MP-CTLE, bears a peaking of

 $f_{-3dB}$  stands for 3-dB bandwidth for individual CML stage;

 $A_V$  is the ideal peak gain equals to the undegenerated gain of CML;

 $I_D$  is the bias current of the CML amplifier stage;

 $R_D$  is the loading resistance of the CML amplifier stage;

 $C_L$  is the parasitic capacitance at the output node;

 $V_{OV}$  stands for the over drive voltage of the differential pair;

1.89 dB at 2 GHz and 5.54 dB at 16.5 GHz; the blue curve is the frequency response of single-stage MP-CTLE, bears a peaking of 1.74 dB at 2 GHz and 6.08 dB at 16 GHz according to the simulations. The power consumption of the conventional cascade-stage MP-CTLE and single-stage MP-CTLE is summarized in Table. I. A saving of 67% in power consumption is achieved by adopting the single stage MP-CTLE topology instead of conventional cascade-stage MP-CTLE topology. Thus, it is legitimate to conclude that the proposed single stage MP-CTLE provides similar performance while significantly saves power.



Fig. 22 Frequency response of single-stage MP-CTLE with LF-CTLE on/off
Fig. 22 shows a comparison of the single stage MP-CTLE with LF-CTLE on/off. With
the LF-CTLE on, an extra 1.8 dB boost around 1.5-GHz. This design example provides a 5.8
GHz peaking at around 18 GHz.



Fig. 23 Compensated lossy channel response with and without LF-CTLE

Let us have a comparison simulation using measured rogers 4003C 4-layer PCB transmission line S-parameter. The red curve in Fig. 23 shows the compensated channel response with LF-CTLE turned on, where the blue curve is LF-CTLE turned off. The 3-dB bandwidth are identical since they have the same high frequency equalization boost. But the blue curve dips at low frequency (500 MHz to 5 GHz) generates residual inter-symbol interference (ISI). Intuitively, the energy that falls within the non-flatten frequency region cannot be amplified homogeneously as compare to other frequency. Thus, ISI is generated. To further illustrate the ISI amount, the integrated residual ISI magnitude is compared in the following figures:



Fig. 24 Integral of ISI magnitude with LF-CTLE turned off



Fig. 25 Integral of ISI magnitude with LF-CTLE turned on

The residual ISI with LF-CTLE turned on is almost half of the LF-CTLE turned off

counterpart.

The residual ISI's effect on eye diagram is more significant, shown as follow:



Fig. 26 Simulated 25 Gbps NRZ eye diagram without LF-CTLE



Fig. 27 Simulated 25 Gbps NRZ eye diagram with LF-CTLE

Both horizontal and vertical eye-opening sees significant around 100% improvements when the LF-CTLE is opening.

The measured channel response (S21) is shown in Fig. 28. This is the S21 response of a 3-inch coplanar waveguide (CPWG) type transmission line (Fig. 29), where the 3-dB bandwidth is around 4.3 GHz and the channel loss at 13 GHz is around 6.8 dB.



Fig. 28 Measured S21 of a Rogers PCB transmission line



Fig. 29 3-inch Rogers PCB transmission line



Fig. 30 Corrupted 56 Gbps PAM-4 signal after passing through the measured PCB S21 response

After passing through this lossy channel, the ideal 52-Gbps PAM-4 signal was totally corrupted that both horizontal and vertical eye is completely closed as shown in Fig. 30.

This completely corrupted eye will be compensated by the single stage MP-CTLE with/without LF-CTLE. The frequency response of the single stage MP-CTLE with LF-CTLE and without LF-CTLE is shown in the Fig. xx. The aforementioned frequency response difference will generate different recovered PAM-4 eye quality, as shown in Fig. 31 and Fig. 32.



Fig. 31 Recovered 56 Gbps PAM-4 signal with LF-CTLE

This is the simulated single stage MP-CTLE compensated eye diagram in presence of LF-CTLE. There are two characteristics we need to pay extra attention: 1) the PAM-4 data transitions are clearly separated from each other; 2) the horizontal and vertical opening of this PAM-4 eye diagram displays very good signal-to-noise and distortion ratio (SNDR). The first observation shows the compensated PAM-4 data transitions have clear one-to-one correspondence to ideal PAM-4 data patterns. This is a strong indication of less distorted PAM-

4 signal since the recovered signal characteristic. The second observation is this recovered signal bears very decent SNDR, meaning the separation between logic levels has enough margin for later PAM-4 slicer stages. Thus, improved bit error rate (BER) can be obtained by enabling the LF-CTLE.



Fig. 32 Recovered 56 Gbps PAM-4 signal without LF-CTLE

This is the simulated single stage MP-CTLE compensated eye diagram without LF-CTLE. On the contrary, the two observations will be: 1) the blurred PAM-4 data transitions; 2) significantly compressed horizontal and vertical PAM-4 data eye opening. The first observation shows a much severer signal distortion as the boundary of each PAM-4 data transition are on longer clear. Such increment on signal distortion is introduced by the low frequency dip with the absence of LF-CTLE. The second observation shows that the horizontal and vertical PAM-4 eye opening is compressed compare to its counterpart. The improvement on horizontal and vertical eye opening brought by the LF-CTLE is around 30%, which lowers the CDR BER performance.

#### 2.4.2 Transfer Function and Approximation Methods

We can write the complete closed loop transfer function as follow:

$$\frac{Vout(s)}{Vin(s)} = \frac{G_m(s) \cdot Z_L(s)}{1 + G_m(s) \cdot Z_L(s) \frac{1}{2\pi R_Z C_Z}} \tag{1}$$

where

$$G_m(s) = \frac{g_m(R_S C_S s + 1)}{1 + R_S C_S s + \frac{g_m R_S}{2}}$$

$$Z_{L}(s) = \frac{L_{p}s + R_{D}}{1 + R_{D}C_{D}s + L_{p}C_{D}s^{2}}$$

This complete closed loop transfer function can be simplified according to the fact that  $R_D/(1+R_DC_Ds+L_pC_Ds^2)$  dominates  $Z_L(s)$ . This approximation gives us a simplified version of the system transfer function. However, such approximations introduce errors at relatively wide and high frequency. Fig. 33 shows that, there is no significant difference at low frequency (< 2 GHz), meanwhile the pole location at high frequency remains relatively unchanged compare to the original system transfer function. It is therefore proper for us to write the simplified system transfer function and the approximated pole location as follow:

$$\frac{Vout(s)}{Vin(s)} \approx \frac{G_m(s) \cdot Z_L(s)}{1 + G_m(s) \cdot \frac{R_D}{2\pi R_Z C_Z}}$$

$$= \frac{g_m(R_SC_Ss+1)(R_ZC_Zs+1)}{\frac{2\tau_S\tau_Z}{2+g_mR_S+2g_mR_D}} s^2 + \frac{2\tau_S+2\tau_Z+g_mR_S\tau_Z+2g_mR_D\tau_S}{2+g_mR_S+2g_mR_D} s + 1 \cdot \frac{L_ps+R_D}{1+R_DC_Ds+L_pC_Ds^2}$$

where  $\tau_X = R_X C_X$ .

From the simplified transfer function, assume that  $2R_ZC_Z + g_mR_SR_ZC_Z \gg 2R_sC_s + 2g_mR_DR_sC_s$ , it is easy to write the poles as follow:

$$P_1 = -(1 + \frac{2g_m R_D}{2 + g_m R_S}) \cdot \frac{1}{2\pi R_Z C_Z}$$
 (2)

$$P_2 = -(1 + \frac{g_m R_S}{2}) \cdot \frac{1}{2\pi R_S C_S}$$
 (3)



Fig. 33 Numerical simulation of the original and approximated system transfer function

## 2.4 Experiment Results



Fig. 34 52-Gbps PAM-4 receiver chip die photo

The receiver chip is fabricated in TSMC 40nm process and the die photo is shown in

Fig. 34. A GSG pad situates at the bottom where three silver bonding wires are used to connect

the incoming 52 Gbps PAM-4 signals. This chip occupies 0.715 mm<sup>2</sup> silicon area.



Fig. 35 Test bench setup for the BER bathtub curve measurement

The measurement setup is shown in Fig. 35. On the left-hand side, output MSB at 6.5 Gbps is monitored, and the divided PLL output is shown on the spectrum analyzer. The Bit Error Rate Tester (BERT), PRBS generator and Sampling scope are stacked in the middle. Power supplies are shown in the right-hand side.

The test setup diagram is as shown in Fig. 36. Our PAM-4 receiver chip with single-stage MP-CTLE is the device under test (DUT). The PRBS pattern generator uses two time-interleaved 26-Gbps NRZ signals with a wideband power combiner to synthesis a 52 Gbps PAM-4 signal.



Fig. 36 Test bench diagram



Fig. 37 Measurement PCB board layout and signal input

A lossy transmission line consists of a pair of coaxial cable and a pair of short Rogers<sup>TM</sup> PCB CPWG. The S21 of the coaxial cable is shown in Fig. 38, where the skin effect contributes around 1.5-dB loss at low frequency and in total 5.8-dB loss at 13 GHz. The combined channel loss of the lossy channel is around 7.3 dB (5.8 dB + 1.5 dB), with the estimated 1.5 dB PCB CPWG loss.



Fig. 38 Measured S21 of the coaxial cable



Fig. 39 Measured corrupted 52 Gbps PAM-4 signal after the lossy channel After passing through the lossy channel, the 52 Gbps PAM-4 eye is complete closed, as shown in Fig. 39.



Fig. 40 The decoded PRBS-7 quarter-rate MSB and LSB data transient waveform.



Fig. 41 Quarter-rate recovered data shown in the sampling scope: MSB (left) and LSB (right)

The recovered 6.5 Gbps MSB and LSB eye diagram on sampling scope is shown in Fig.

41. For MSB output, a root mean squared jitter of 2.42 ps is obtained. And for LSB output, the root mean squared jitter is 5.67 ps. The recovered MSB and LSB are check with BERT individually.



Fig. 42 Measured BER bathtub of PRBS-7, PRBS-9 with/without LF-CTLE The BER bathtub curves are measured under different PRBS pattern length (Fig. 42). The BER bathtub curve of PRBS-7 data pattern has a 0.2 UI margin at BER =  $10^{-6}$  and maintains roughly 0.1 UI margin at BER =  $10^{-12}$  with the LF-EQ turned on. When the PRBS data pattern extends to PRBS-9, the BER can barely reach  $10^{-6}$  without the help of the LF-EQ

of our proposed single stage MP-CTLE with a margin of 0.035 UI. When the LF-EQ is turned

on, the BER bathtub curve improves to error free operation with a margin of around 0.05 UI at  $BER = 10^{-12}$  and 0.1 UI margin at  $BER = 10^{-6}$ .

According to the experiment, the LF-EQ helps to extend the error free operation (BER =  $10^{-12}$ ) from PRBS-7 to PRBS-9. The margin of BER =  $10^{-6}$  is improved by 185% after the LF-CTLE is enabled.



Fig. 43 Measured divide by 2 output clock time-domain jitter.

The output clock of the wide bandwidth (WBW) ring oscillator based PLL is divided by an on-chip clock divider and measured, as shown in Fig. 43. It exhibits a root mean square jitter of 380 fs. And the integrated jitter from 2 kHz to 2000 MHz is 550 fs, as calculated from the following phase noise profile graph, Fig. 44.



Fig. 44 Measured phase noise profile of the recovered clock



Fig. 45 System power consumption breakdown

The total power consumption breakdown is shown in Fig. 45, where PLL + VCDL consumes 21%, CTLE + Summer +Slicer + DAC + CP consumes 47.5% and Clock Buffer + BBPD + Logic 31.5% of a 48mW total power consumption.

Table II: Performance summary and comparison.

|                                 | JSSC'2017              | JSS(                                          | 2'2017              | ISSC                           | C'2017          | VLSI'2015    | ISSCC'2016      | This work    |
|---------------------------------|------------------------|-----------------------------------------------|---------------------|--------------------------------|-----------------|--------------|-----------------|--------------|
| Clocking                        | 1/2                    | 1                                             | /4                  | 1                              | 1/2             | 1/2          | 1/4             | 1/4          |
| Functions and<br>Equalization   | Tx-FFE<br>CTLE,<br>DFE | Tx-FFE,<br>CTLE,<br>DFE, FFE<br>ADC<br>PI-CDR | Tx-FFE, CTLE PI-CDR | Tx-FFE,<br>CTLE,<br>FFE<br>ADC | Tx-FFE,<br>CTLE | CTLE,<br>DFE | Tx-FFE,<br>CTLE | CTLE,<br>FFE |
|                                 | PI-CDR                 |                                               |                     |                                |                 | PI-CDR       | PI-CDR          | VCDL-CDR     |
| DR (Gb/s)                       | 56                     | ţ                                             | 56                  | 64                             |                 | 56           | 64              | 52           |
| PRBS-n                          | 31                     | 3                                             | 31                  | 15                             |                 | 7            | NA              | 7            |
| CH. Att (dB)                    | 10                     | 32                                            | 7.5                 | 29.5                           | 8.6             | 24           | 16.8            | 7.3          |
| BER                             | 10-12                  | 10-12                                         | 10-12               | 10-6                           | 10-4            | 10-12        | 10-12           | 10-12        |
| H. W.<br>@10 <sup>-6</sup> (UI) | 0.2                    | 0.15                                          | 0.18                | N/A                            | N/A             | 0.31         | 0.19            | 0.20         |
| Supply (V)                      | 0.9/1.2                | 0.85/0.9/1.2/1.8                              |                     | 0.9/1.2                        |                 | 1            | 1               | 1            |
| Pwr (mW)                        | 230                    | 450                                           | 270                 | 284                            | 100             | 420          | 180             | 48           |
| Eff. (pJ/b)                     | 4.1                    | 8.0                                           | 4.8                 | 4.4                            | 1.6             | 7.5          | 2.8             | 0.92         |
| Eff. (pJ/b/dB)                  | 0.4                    | 0.25                                          | 0.64                | 0.149                          | 0.186           | 0.313        | 0.167           | 0.126        |
| Process                         | 16nm                   | 16nm                                          |                     | 16nm                           |                 | 40nm         | 28nm-SOI        | 40nm         |
| Area (mm²)                      | 0.36                   | 2.2 (Tx+Rx)                                   |                     | 0.16                           |                 | 1.6          | 0.32            | 0.72         |

A comparison table is used to compare this work with other state-of-the-art shows this work achieves a superior energy sufficiency of 0.92pJ/bit/s at 52-Gbps, while compensating for 7.3 dB channel loss at 13 GHz. Fig. 46. The y-axis stands for bit efficiency and x-axis for channel loss, the prior arts are plotted according to their bit efficiency and channel loss compensated. The blue dotted curve is a first order fitting line for the bit efficiency versus channel loss trend which better at comparing the energy efficiency across different channel applications. The plane is separated by the blue dotted line, where work in the upper half are less energy efficient and the lower half indicates better energy efficiency. It is obvious that our PAM-4 receiver with proposed single-stage MP-CTLE stands out for its energy efficiency across different applications.



Fig. 46 Bit efficiency versus channel loss trend

### 2.5 References

- [1] J. Im et al., "A 40-to-56 Gb/s PAM-4 Receiver With Ten-Tap Direct Decision-Feedback Equalization in 16-nm FinFET," in IEEE Journal of Solid-State Circuits, vol. 52, no. 12, pp. 3486-3502, Dec. 2017.
- [2] Y. Frans et al., "A 56Gb/s PAM4 wireline transceiver using a 32-way time-interleaved SAR ADC in 16nm FinFET," 2016 IEEE Symposium on VLSI Circuits (VLSI-Circuits), Honolulu, HI, 2016, pp. 1-2.
- [3] P. Peng et al., "A 56Gb/s PAM-4/NRZ transceiver 40nm CMOS," IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers, Feb. 2017, pp. 110-111.
- [4] D Cui *et al.*, "A 320mW 32Gb/s 8b ADC-based PAM-4 analog front-end with programmable gain control and analog peaking in 28nm CMOS," in *IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers*, Feb. 2016, pp. 58–59.
- [5] J. Lee et al., "56Gb/s PAM4 and NRZ SerDes transceiver in 40nm CMOS," in IEEE Symp. VLSI Circuits Dig. Tech. Papers, Jun. 2015, pp. 118–119.

- [6] L. Tang et al., "A 32Gb/s 133mW PAM-4 transceiver with DFE based on adaptive clock phase and threshold voltage in 65nm CMOS," in IEEE Int. Solid-State Circuit Conf. Dig. Tech. Papers, Feb. 2018, pp. 114–115.
- [7] T. Toifl et al., "A 22-Gb/s PAM4 receiver in 90-nm CMOS SOI technology," IEEE J. Solid-State Circuits, vol. 41, no. 4, pp. 954-965, Apr. 2006.
- [8] http://www.oiforum.com/wp-content/uploads/50317-FOE-Architecture-Presentation.pdf
- [9] https://en.wikipedia.org/wiki/Dielectric\_loss
- [10] https://en.wikipedia.org/wiki/Skin\_effect
- [11] https://en.wikipedia.org/wiki/Transmission\_line

### CHAPTER 3. PIECE-WISE EQUALIZATION FOR TRANSMITTER

## 3.1 Introduction

Laser diode is a semiconductor device in which a diode is directly driven by electrical energy and can generate lasing at the junction. Capable of converting electrical to coherent



Fig. 47 Typical VCSEL cross-section [12]

optical light, laser diode is one of the perfect carriers for transmitting information through a long distance. A typical cross-section of a vertical cavity surface emitting laser (VCSEL), is shown in the above figure. The injected energy goes into the diode from the anode electrode, through the p-type distributed Bragg reflector (p-DBR) which is modeled as a series resistor, reach to the core area of laser operation ---- active region where the gain medium resides is modeled as parasitic capacitance  $C_a$  and resistance  $R_a$ .  $C_p$  stands for pad capacitance.

Different than spontaneous emission, the stimulated emission can produce light with the same phase, coherence and wavelength. The establishment of stimulated emission is a complex and dynamic process. In short, the injected electrical energy to increase the carrier density and the two-mirror type resonant cavity to select a particular wavelength and mode that can sustain the oscillation. The major difficulty to overcome is the loss in laser cavities such as carrier leakage, nonradiative recombination and spontaneous recombination.

To describe this dynamic process, a pair of coupled partial differential equation is used to describe the behavior of photon generation and carrier recombination, as shown in the following carrier and photon density *rate equations* [13] in the active region,

Where *N* is carrier density;

*I* is injection current;

 $N_p$  is the photon density;

The detailed of this equation will be out of our focus to cover it in this thesis. There is no analytical solution to these equations, numerical integration method is adopted widely to emulate the transient behavior of the semiconductor laser.



Fig. 48 Basic compact VCSEL laser model [14]

In general, the compact model of semiconductor laser can be divided into two part: 1) Electrical input stage that accounts for the small signal frequency response of the diode; 2) Electrical to Optical conversion (rate equations), as depicted in .

Laser exhibits <u>different 3-dB bandwidth under different bias level</u>, as shown in Fig. 49. Meaning, the 3-dB bandwidth is changing with driving level. For NRZ signaling, this dynamic changing characteristic is negligible due to tits on-off nature. However, PAM-4 signaling is much sensitive to ISI since it trades SNDR for bandwidth efficiency. The stacking eye are more sensitive to ISI introduced by bandwidth limitation, over/under equalization etc. Equalization for PAM-4 signal needs to be extra careful to prevent over/under shoot as well as bandwidth insufficiency.

Conventional PAM-4 signal equalization in transmitter circuits generates equalization signal according to binary data by a look-up table which assign equalization coefficients to the FFE output stage according to different transitions [15-19], as shown in Fig. 50. There are 12 transitions in total, so the tap generation module could be rather complicated, and the equalization frequency is fixed for all the transitions in different signal level. Clearly, complicated coefficient generation and inseparable equalization frequency setting impedes flexible design of the transmitter front-end. Most of the inflexibility comes from the fact that the conventional PAM-4 signal is binary coded which prevents independent manipulation of the vertical stacking eyes. For example, the equalization of top eye and bottom eye of PAM-4 signal are all controlled by LSB equalization tap generation while the laser exhibits different bandwidth in the top and bottom eye.



Fig. 49 Measured S21 of VCSEL laser under different bias condition [20] What if we consider a different approach to synthesis PAM-4 signal?

Thermometer code can be used as an alternative. In Fig. 51, the PAM-4 signal is decomposed to three stacking NRZ signal, by using thermometer code, also called unary code system. For example, the two-bit binary for PAM-4 data in decimal will be 0, 1, 2, 3. The thermometer code representation of it will be 000, 001, 011, 111, respectively. Essentially, decimal *N* equals how the number of 1 in the thermometer representation. By decomposing the PAM-4 signal to thermometer code representation, independent equalization with different equalization frequency and coefficient can be applied to the PAM-4 signal to accommodate the



Fig. 51 Thermometer code decomposition of PAM-4 signal

### look-up table

| N<br>N-1 | 00                            | 01                            | 10                            | 11                            |
|----------|-------------------------------|-------------------------------|-------------------------------|-------------------------------|
| 00       |                               | A <sub>00</sub> <sup>01</sup> | A <sub>00</sub> <sup>10</sup> | A <sub>00</sub> <sup>11</sup> |
| 01       | A <sub>01</sub> <sup>00</sup> |                               | A <sub>01</sub> <sup>10</sup> | A <sub>01</sub> <sup>11</sup> |
| 10       | A <sub>10</sub> 00            | A <sub>10</sub> <sup>01</sup> |                               | A <sub>10</sub> <sup>11</sup> |
| 11       | A <sub>11</sub> 00            | A <sub>11</sub> <sup>01</sup> | A <sub>11</sub> <sup>10</sup> |                               |

## 12 transitions

Fig. 50 Consecutive symbol transitions table

dynamic bandwidth with respect to bias level, as shown in Fig. 52.

In this new representation of PAM-4 signal by thermometer code, the signal is viewed as three stacking NRZ signal. For each NRZ signal, there are two transitions-rising and falling



Fig. 52 Piece-wise equalization concept

transition. If we assume linear superposition, there 6 transitions that can synthesize all the 12 transitions in the original binary representation of PAM-4 signal. Thus, the circuit complexity is reduced by 50% if asymmetric FFE is adopted.

# 3.2 Compact Laser Model

Verilog-A is a popular tool for compact modeling for analog circuits. It is very suitable for modeling laser behavior owe to the complete and convenient spice simulator environment that forward integration can be realized directly by the transient simulation by calling *idt* function in the Verilog-A library [21].

The laser (VCSEL) compact model is shown as follow, Fig. 53:



Fig. 53 Schematics of the Verilog-A compact VCSEL model

On the left-hand side, the diode RC model is written in Verilog-A and their value will change according to different bias current and temperature setting. The resistance/capacitance relation with respect to bias current and temperature is determined by polynomial curve fitting with measurement data under different bias point and temperature.

The injection current of diode active region parasitic resistance  $I_{Ra}$  is considered the actual effective injection energy.  $I_{Ra}$  is measured by the Electrical to Optical module on the right-hand side and used as the input of the rate equations.

The source code of the Verilog-A rate equation model can be found in the appendix.

The frequency response of this time domain model is obtained by using single tone sweeping which is essentially taking sine wave as the input to record the response amplitude and sweep the frequency to obtain the overall frequency response. The frequency response of the laser under different bias condition is shown in the following picture.



Fig. 54 Measured VCSEL laser S21 response (left), simulated VCSEL laser S21 response under different bias condition [12]



Fig. 55 Simulated 25 Gbps eye diagram under 3 mA bias (left); Measured 25 Gbps eye diagram under 3 mA bias (right).

The measured overall laser frequency response compares to our modeled frequency response using the same set of laser parameters are very similar. The simulated 3-dB bandwidth are very close to the measured, with a small difference at high bias level. A more convincing comparison is the time domain eye diagram.



Fig. 56 Simulated 25 Gbps eye diagram under 6 mA bias (left); Measured 25 Gbps eye diagram under 6 mA bias (right)

As shown in Fig. 55 and Fig. 56, this model accurately reconstructs the VCSEL laser behavior which has tremendous help to enable co-design of the laser driver with laser devices.

## 3.3 Proposed Piece-wise Equalization

### 3.3.1 System Design

The proposed piece-wise equalization is capable of compensating the dynamic bandwidth variation under different bias condition. The quadratically changing bandwidth can be compensated by the proposed thermometer code representation of PMA-4 signal in piecewise fashion owes to its independent controllable equalization frequency and coefficients.



Fig. 57 Proposed 56-Gbps PAM-4 DML driver with piece-wise equalization

The system diagram for the piece-wise equalization is shown in Fig. 57, where the input CTLE helps to compensate for cable loss, PCB trace loss and bonding wire loss, the equalized signal will be sliced by quarter-rate NRZ receiver front-end [22] to retime and encoded by the later binary to thermometer code module. The FFE generation determines the equalization coefficient. The output stage is simply the conventional cascode differential pair. The loading network of the transmitter system will be introduced in the Test bench section.

CTLE reuses the design in previous low-power PAM-4 receiver chapter. The binary to thermometer code converter is as shown in Fig. 58.



Fig. 58 Differential binary to thermometer code converter

$$T = \overline{MSBP} + \overline{LSBP} = MSBP \cdot LSBP;$$

$$TN = MSBN + LSBN;$$

$$M = \overline{\overline{MSBP}} = MSBP;$$

$$MN = \overline{\overline{MSBN}} = MSBN;$$

$$B = \overline{\overline{MSBP} \cdot \overline{LSBP}} = MSBP + LSBP;$$

$$BN = \overline{\overline{MSBN} \cdot LSBN} = MSBP \cdot LSBP.$$



Fig. 59 Rising and falling selector

The FFE generation adopts pulse equalization method. Conventional pre-emphasis takes a sub-UI delay and subtract certain amount of the signal amplitude so as to boost the high frequency energy. At the same time, it suppresses the peak-to-peak amplitude. In order to avoid sacrifice on signal amplitude, the pulse equalization adds sharp pulses to the main tap signal to generate the peaking. Thus, it preserves the original signal amplitude while compensating for high frequency loss. Also, the pulse generator can select rising and falling transition separately by using a CML-based R/S selector as shown in Fig. 59. Pulse generation module is depicted in Fig. 60.



Fig. 60 Pulse generator diagram with programable delay element

The unary (Thermometer code) data from the binary to thermometer code converter will be split and re-timed into with/without delayed data. The delay is done by delaying the retime clock through a programmable delay element. Compare to other transmitter side preemphasis design directly delay the wideband baseband signal, our design takes advantage of the quarter-rate architecture since there are enough margin (4 UI) for re-time clock to delay. A high-speed CML MUX is used to select the rising and falling separately, as shown in the following picture. The selection is pre-determined by the current symbol *A*, the bandwidth requirement is loosened owes to the quarter-rate architecture. If the current symbol *A* is 1, then when transition happens it must be a falling transition, and vice versa.



Fig. 61 Red solid line stands for the current data symbol, whereas the green line stands for rising pulse and the brown curve stands for falling pulse.

There are some leakages of the opposite transition to each other due to the limited onoff time of the MOSFET. This will impact the independence of the piece-wise equalization tuning. We will cover it in the following simulation results.

With the asymmetric pulse equalization and our proposed piece-wise equalization scheme, the three stacking eyes of PAM-4 signal can be independently compensated as shown in Fig. 62, Fig. 63 and Fig. 64.



Fig. 62 Undershoot with the falling edge equalization turned on.



Fig. 63 Overshoot with the rising edge equalization turned on.



Fig. 64 Overshoot and undershoot on both top eye and bottom eye.

The three simulation is used to demonstrate that this design is capable of providing independent equalization of different transitions without affecting the other eye opening. Although, it is worth to note that due to the small leakage caused by the CML MUX in previous paragraph, there are still some ISI brought by the neighboring eye equalization, which is

acceptable in our case according to the simulation, as shown in Fig. 61.

The output stage of the symbol and FFE are both cascode differential pair. The equalization strength is set by the tuning the tail current source  $I_{BIAS}$  of the output stage for each branch.

## 3.4 Simulation Results

The test bench emulates a realistic application scenario where the driver circuits are biased off chip and the driver circuits are ac-coupled to the 25-ohm PCB transmission line and then to the flex PCB of the TOSA package (as shown in Fig. 65).



Fig. 65 Transmitter Optical Sub-Assembly dimensions [23]

The Transmitter Optical Sub-Assembly (TOSA) consists of a TO-CAN laser package, optical alignment structure and flexible PCB etc. The diode is directly connected to the cathode and anode of the TO-CAN pin. So, both the DC bias and AC modulations current needs to be delivered through the same pins. A common configuration for such system is shown in Fig. 66:



Fig. 66 Loading network of the laser driver using TOSA package

The main components of the bias network are the series combination of a large inductor (18uH) and a ferrite bead. The large inductor exhibits as a high impedance at high frequency and the resistance of the ferrite bead increases with the frequency so to reduce the Q of the bias network to prevent ringing.

Now, let us consider a behavior dynamic bandwidth model simulation with our proposed piece-wise equalization.



Fig. 67 Behavioral dynamic bandwidth versus current relationship

The dynamic bandwidth versus bias current relations is depicted in Fig. 67. The model has highest bandwidth at around 60 mA and lowest bandwidth at 30 mA. So, it is expectable that the MID sys has the largest opening and BOT eye has the smallest opening

A 56 Gbps PAM-4 signal passes through this behavioral model is shown in Fig. 68:



Fig. 68 56-Gbps PAM-4 signal after the dynamic bandwidth behavioral model
The TOP/MID/BOT eye have different eye openings, 327 mV for the TOP eye, 349
mV for the MID eye and 278 mV for the BOT eye.

If we use conventional binary coded PAM-4 equalization circuit to compensate the bandwidth insufficient signal, the LSB equalization generation cannot satisfy the different bandwidth insufficiency of the LSB at BOT and TOP at the same time, as shown in the Fig. 69. The TOP eye and BOT eye rely on the same LSB equalization circuit. If we enlarge the falling edge equalization strength of the BOT eye, then the TOP falling edge will become undershoot that destruct the MID eye.



Fig. 69 PAM-4 signal compensated by the conventional binary code equalization

Implementation of our proposed PAM-4 piece-wise equalization and asymmetric pulse equalization can solve this problem. The simulation results are shown in the following figure. With the proposed equalization techniques, the BOT eye can be fully recovered by the transmitter FFE with negligible undershoot.



Fig. 70 PAM-4 signal compensated by the proposed piece-wise equalization

## 3.5 Reference

- [12] Wang, Binhao, et al. "Comprehensive vertical-cavity surface-emitting laser model for optical interconnect transceiver circuit design." Optical Engineering 55.12 (2016): 126103.
- [13] Coldren L A, Corzine S W, Mashanovitch M L. Diode lasers and photonic integrated circuits[M]. John Wiley & Sons, 2012.
- [14] C. Wang, B. Xu, X. Li, L. Wang and C. P. Yue, "Compact Modeling of Laser Diode for Visible Laser Light Communication (VLLC)Systems," 2018 Conference on Lasers and Electro-Optics Pacific Rim (CLEO-PR), Hong Kong, Hong Kong, 2018, pp. 1-2.
- [15] A. Tyagi et al., "A 50 Gb/s PAM-4 VCSEL Transmitter With 2.5-Tap Nonlinear Equalization in 65-nm CMOS," in IEEE Photonics Technology Letters, vol. 30, no. 13, pp. 1246-1249, 1 July1, 2018.

- [16] J. Chen et al., "An Energy Efficient 56 Gbps PAM-4 VCSEL Transmitter Enabled by a 100 Gbps Driver in 0.25 μm InP DHBT Technology," in Journal of Lightwave Technology, vol. 34, no. 21, pp. 4954-4964, 1 Nov.1, 2016.
- [17] W. Soenen et al., "40 Gb/s PAM-4 Transmitter IC for Long-Wavelength VCSEL Links," in IEEE Photonics Technology Letters, vol. 27, no. 4, pp. 344-347, 15 Feb.15, 2015.
- [18] J. Hwang et al., "A 64Gb/s 2.29pJ/b PAM-4 VCSEL Transmitter With 3-Tap Asymmetric FFE in 65nm CMOS," 2019 Symposium on VLSI Circuits, Kyoto, Japan, 2019, pp. C268-C269.
- [19] S. Moazeni et al., "A 40-Gb/s PAM-4 Transmitter Based on a Ring-Resonator Optical DAC in 45-nm SOI CMOS," in IEEE Journal of Solid-State Circuits, vol. 52, no. 12, pp. 3503-3516, Dec. 2017.
- [20] M. Raj, M. Monge and A. Emami, "A Modelling and Nonlinear Equalization Technique for a 20 Gb/s 0.77 pJ/b VCSEL Transmitter in 32 nm SOI CMOS," in IEEE Journal of Solid-State Circuits, vol. 51, no. 8, pp. 1734-1743, Aug. 2016.
- [21] FitzPatrick D, Miller I. Analog behavioral modeling with the Verilog-A language[M]. Springer Science & Business Media, 1998.
- [22] C. Wang, G. Zhu, Z. Zhang and C. P. Yue, "A 52-Gb/s Sub-1pJ/bit PAM4 Receiver in 40-nm CMOS for Low-Power Interconnects," 2019 Symposium on VLSI Circuits, Kyoto, Japan, 2019, pp. C274-C275.
- [23] http://www.luxnetcorp.com.tw/datasheet\_files/L-AT-ID31-06\_1.0.pdf

### CHAPTER 4. CONCLUSION AND FUTRUE WORK

### 4.1 Conclusion

This thesis focusses on advanced equalization techniques for energy efficient 4-level pulse amplitude modulation on both receiver side and transmitter side.

The proposed single-stage MP-CTLE eliminates the extra stage by taking advantage of the degenerated transconductance and applying feedback to create a second zero to compensate for low frequency skin effect loss. The proposed single-stage MP-CTLE consumes 65% less than the conventional cascade-stage MP-CTLE while retains the ability to extend error-free operation of the receiver from PRBS-7 to PRBS-9 patterns.

The proposed transmitter-side piece-wise equalization is capable of compensating PAM-4 eye individually. The FFE for all 12 data transitions are synthesized by linear combination of the rising and falling edge pulse equalization.

### 4.2 Future Work

A low-power single-stage MP-CTLE and a piece-wise equalization for PAM-4 transmitter are proposed to promote PAM-4 communication links.

For the single-stage MP-CTLE, its low frequency EQ reduces ISI. Unlike the high frequency EQ which is easy to build adaptive algorithm by using peak detector. The low frequency EQ is less explicit to evaluate. So, more research efforts could be expected to investigate the adaptation of the low frequency EQ with our proposed single-stage MP-CTLE. Also, the cascading of the single-stage multiple peaking CTLE is another challenge in regard of the stabilities.

A potential exploitation of this single-stage multiple peaking CTLE is in replace of other equalization techniques such as FFE and DFE. The limited peaking capability and weakly coupled zero and pole relation between the main CTLE and the second peaking may impede

the generalization of this technique.

The piece-wise equalization decomposes PAM-4 signal into three stacking NRZ eyes.

Thus, various NRZ pre-distortion techniques could be implemented with this method such as dispersion compensated, etc. Further experimental efforts are needed to testify this idea.

## **CHAPTER 5.** Appendix

Compact Verilog-A VCSEL laser rate-equation model code example.

```
// VerilogA for VCSEL MODEL, Rate equation, veriloga
`include "constants.vams"
`include "disciplines.vams"
module Rate equation try 0809(p,n,OP,Dp,Dn,linjp,linjn,DC opt);
Ntr, Nint, Sint, Tau, gain, Nc, Np, e, Taup, Tauc, q, Va, Ith, Temp, NO, D, Delg, h, v, vg, Beta, dox, Rth, aO, B, E
tai,Eps,K,Td,P_opt,h_eV;
       real delta 1,S factor;
       inout p,n,Dp,Dn,linjp,linjn,DC_opt;
  inout OP;
       electrical p,n,Opt Power,Dp,Dn,linjp,linjn;
       electrical OP,DC opt;
               analog begin
                       @(initial step) begin
                              Nc = 2.1e6;
                      //
                              Np = 0;
                              Temp = $temperature;
                              Td = $temperature;
                       end
                       Rth = 2.27*1000;
                       I(Dp,Dn) <+ V(Dp,Dn) / 1M;
                       V(linjp,linjn) <+ 1*I(linjp,linjn);
                       Td = (V(Dp,Dn)*I(Iinjp,Iinjn) - V(Opt Power)) * Rth + (Temp - 273); //
calculate junction temperature
                       a0 = (1.3e-15 + (-8.3e-18) * (Td) + (8.7e-21)*pow(Td,2))/10000;
                       B = ((1.3e-10) - (6.2e-13) * (Td) - (1.3e-15) * pow(Td,2))/1000000;
                       Etai = 0.9 - (1.8e-3) * (Td) - (8.54e-6) * pow(Td,2);
                       Eps = vg*Taup*a0*((K/(4*pow(3.1415926,2)*Taup))-1);
               //
                      K = (0.4 - (9.95e-5) * (Td) + (1.37e-5) * pow(Td,2))/(1e9);
                       Ith = (0.3 + (1.79e-5) * (Td) + (4.2e-5) * pow(Td,2))/1000;
                       P_{opt} = Etai * (3.8/6.5) * (I(p,n) - Ith)*(v*h/q);
                       Td = (V(Dp,Dn)*I(Iinjp,Iinjn) - P \ opt) * Rth + (Temp - 273) - 23; //
calculate junction temperature
                       V(DC\_opt) <+ P\_opt;
                       K = (0.58 - (9.95e-5) * (Td) + (1.37e-5) * pow(Td,2))/(1e9);
                       dox = 7e-6:
//
                       e = (6.8e-19 - (6.6e-21) * (Td) + (8.9e-24) * pow(Td,2))/1000000;
                       Taup = 4.102564103e-12; //-12 photon lifetime
                       Tauc = 2.66666666666666-14; // round-trip cavity time
```

```
q = 1.6021766208e-19;
                      h = 6.62607004e-34;
                      h \ eV = 4.135667516e-15; // coefficient between joules and electronvolts
                      v = 3e14; // light frequency: 3x10^8 / 990nm;
                      vg = 84507042.25; // group velocity
                      Tau = 0.012;// cavity volume / active region volum;
                      Va = 6.927211801e-19;// 6.927211801e-19;//6.836884008e-18 //
active region volume;
                      D = (6.02 - (2.58e-2) * (Td) - (1.6e-5) * pow(Td,2)) * 31622776.6; // D
factor in H(s), determine resonace frequency
                      Ntr = 7.755964884e17; // carrier density in transperancy
              //
                      a0 = (1.3e-15 + (-8.3e-18) * (Td) + (8.7e-21)*pow(Td,2))/10000;
              //
                      a0 = (pow(3.1415926,2)*q*Tauc*pow(dox,2)*pow(D,2))/(2*Etai);
              //
                      gain = a0 * Ntr;
              //
                      NO = (Taun/(q*Va)) * Ith - (1/(1*gain*Taup));
              //
                      B = ((1.3e-10) - (6.2e-13) * (Td) - (1.3e-15) * pow(Td,2))/1000000;
                      Beta = 4e-3; // spontanoues emission coefficient
              //
                      K = (0.4 - (9.95e-5) * (Td) + (1.37e-5) * pow(Td,2))/(1e9);
              //
                      Eps = vg*Taup*V(a0)*((V(K)/(4*pow(3.1415926,2)*Taup))-1);
                      V(Opt\ Power) <+ ((Va*Etai*h*v)/(2*Tau*Taup)) * Np;
                                         idt(I(p,n)/(q*Va)-vg*a0*((Nc-Ntr)/(1+Eps*Np))*Np-
B*pow(Nc,2),2.1e6);
                                             idt(I(p,n)/(q*Va)-gain*((Nc-Ntr)/(1+e*Np))*Np-
B*pow(Nc,2),2.1e6);//B * pow(Nc,1) --> pow(Nc,1)/Taun
                                                 idt((Tau*vg*aO*((Nc-Ntr)/(1+Eps*Np))*Np-
                      Np
(Np/Taup)+(Tau*(Beta)*pow(Nc,2)*B)),0.0);
                      delta 1 = 1/0.909; // output DBR transmissivity
                      V(OP) <+ V(Opt Power) * delta 1; //Np*1e3/1e21;
              //
                      $debug(Np,Nc);
              //
                      $display(Np,Nc);
              //
                      $debug(Td,V(Dp,Dn),I(Iinjp,Iinjn),V(Opt Power), Etai,P opt);
              //
                      $display(Td,V(Dp,Dn),I(Iinjp,Iinjn),V(Opt Power), Etai,P opt);
              //
                      $bound step(2p);
                      end
```

endmodule