# **Multi-phase Clock Generator for High-speed Wireline Transceiver Systems**

by

Shaokang ZHAO

A Thesis Submitted to

The Hong Kong University of Science and Technology
in Partial Fulfilment of the Requirements for
the Degree of Master of Philosophy
in the Department of Electronic and Computer Engineering

June 2025, Hong Kong

## Multi-phase Clock Generator for High-speed Wireline Transceiver Systems

by Shaokang ZHAO

Department of Electronic and Computer Engineering

The Hong Kong University of Science and Technology

#### Abstract

The rapid advancements in Artificial Intelligence (AI) have driven a significant surge in demand for high-speed wireline communication systems. This increasing need for higher data rates calls for power- and area-efficient solutions in data transmission. Clocking circuits, which provide timing references to support various functionalities and directly determine the data rate, are a critical component of wireline systems. However, they also account for a substantial portion of the power overhead, consuming approximately one-third of the total system power. To improve the power efficiency of data links, sub-data-rate clocking architectures have gained popularity among designers due to their ability to reduce power consumption and alleviate bandwidth requirements. These architectures necessitate multi-phase clock generation to produce accurate, low-jitter quadrature clocks, meeting stringent requirements for both random jitter (RJ) and deterministic jitter (DJ) in modern wireline systems to ensure high-quality data transmission.

In this thesis, we present the design and implementation of a novel multi-phase clock generator (QCG) operating within the frequency range of 5 to 10 GHz. The proposed architecture integrates a duty cycle correction (DCC) circuit, a digitally controlled delay line (DCDL), and a two-stage open-loop quadrature error corrector (QEC) to effectively minimize phase errors. Additionally, a finite state machine (FSM) is implemented to perform initial calibration, ensuring optimal QCG performance without introducing extra jitter. The prototype chip occupies a compact area of 0.012 mm<sup>2</sup>. Measurement results demonstrate a phase error of  $\leq 0.8^{\circ}$  and an integrated RMS jitter of 61.1 fs, with a power consumption of 10.2 mW at 10 GHz operation.

In conclusion, the proposed design offers an open-loop alternative for QCG, delivering competitive performance in terms of noise contribution, power efficiency, and phase accuracy. This design meets the stringent requirements of modern wireline transceivers, making it a promising solution for high-speed communication systems.

## **Authorization**

I hereby declare that I am the sole author of the thesis.

I authorize the Hong Kong University of Science and Technology to lend this thesis to other institutions or individuals for the purpose of scholarly research.

I further authorize the Hong Kong University of Science and Technology to reproduce the thesis by photocopying or by other means, in total or in part, at the request of other institutions or individuals for the purpose of scholarly research.

## Signature redacted

Shaokang ZHAO

June 2025

## Multi-phase Clock Generator for High-speed Wireline Transceiver Systems

by

#### Shaokang ZHAO

This is to certify that I have examined the above MPhil thesis and have found that it is complete and satisfactory in all respects, and that any and all revisions required by the thesis examination committee have been made.

### Signature redacted

Prof. Chik Patrick YUE, Thesis Supervisor

## Signature redacted

Prof. Andrew Wing On POON, Head of ECE Department

#### Thesis Examination Committee

- 1. Prof. Wing Hung KI (Chairperson) Department of Electronic and Computer Engineering
- 2. Prof. Chik Patrick YUE (Supervisor)Department of Electronic and Computer Engineering
- 3. Prof. Fengbin TU

Department of Electronic and Computer Engineering

Department of Electronic and Computer Engineering

June 2025

#### **ACKNOWLEDGEMENTS**

First and foremost, I would like to express my deepest gratitude to my supervisor, Prof. Chik Patrick Yue, for his invaluable guidance, unwavering support, and insightful feedback throughout my master's journey. His expertise, patience, and encouragement have been instrumental in shaping this thesis and my growth as a researcher.

I am also deeply thankful to the members of my thesis committee, Prof. Wing Hung Ki and Prof. Fengbin Tu, for their time, thoughtful suggestions, and constructive critiques, which have significantly improved the quality of this work.

My sincere appreciation goes to the Optical Wireless Lab research group for the collaborative and inspiring environment we shared. Special thanks to Dr. Li Wang, Dr. Fuzhan Chen, and Dr. Chongyun Zhang for their guidance, stimulating discussions, and support, which have greatly enriched my research experience.

Lastly, I would like to extend my heartfelt thanks to my parents and friends for their unconditional love, encouragement, and understanding throughout this journey. Their unwavering belief in me has been a constant source of motivation and strength.

# TABLE OF CONTENTS

| Title Page          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | i                                                                                  |
|---------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------|
| Abstract            |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | ii                                                                                 |
| Authorization       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | iii                                                                                |
| Signature Pag       |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | iv                                                                                 |
| Acknowledge         | ments                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | v                                                                                  |
| Table of Conto      | ents                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          | vi                                                                                 |
| List of Figures     | S                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | viii                                                                               |
| List of Tables      |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               | X                                                                                  |
| Chapter 1 Chapter 2 | Introduction  1.1 High-speed wireline transceiver systems 1.1.1 Chip-to-chip communication 1.1.2 On-board interconnects 1.1.3 Modern data centers enabling AI applications  1.2 Multi-phase clock sampling 1.3 Thesis organization  Clocking architectures for wireline transceivers  2.1 Clock signals in wireline systems 2.1.1 Random jitter 2.1.2 Deterministic jitter 2.1.3 Effect of RJ and DJ on wireline systems  2.2 Clocking implementation for multi-lane applications 2.2.1 Global clock generation 2.2.2 Local clock generation 2.3 Review on QCG 2.4 Conclusion of this chapter | 1<br>1<br>2<br>3<br>4<br>5<br>6<br>7<br>7<br>9<br>10<br>12<br>15<br>15<br>16<br>17 |
| Chapter 3           | A quadrature clock generator with an open-loop quadrature error correctors.  3.1 Proposed QCG with PI-based QEC 3.1.1 Architecture 3.1.2 Open-loop QEC based on phase interpolation 3.1.3 Self-correction on duty cycle distortion  3.2 Circuit implementation 3.2.1 Digitally controlled delay line                                                                                                                                                                                                                                                                                          |                                                                                    |

|            | 3.2.2 Duty cycle correction                   | 26            |
|------------|-----------------------------------------------|---------------|
|            | 3.2.3 Phase interpolator for QEC              | 28            |
|            | 3.2.4 Monte Carlo simulation results          | 32            |
|            | 3.2.5 Digital calibration                     | 33            |
|            | 3.3 Design considerations                     | 39            |
|            | 3.3.1 Non-ideal factors                       | 39            |
|            | 3.3.2 Jitter contribution                     | 44            |
|            | 3.4 Measurement results                       | 45            |
|            | 3.5 Conclusion of this chapter                | 49            |
| Chapter 4  | Conclusion and future work                    | 50            |
|            | 4.1 Conclusion                                | 50            |
|            | 4.2 Future works                              | 50            |
|            | 4.2.1 Potential improvements                  | 50            |
|            | 4.2.2 Complete clocking implementation        | 51            |
|            | 4.2.3 Integration with a multi-lane transceiv | rer system 52 |
| References |                                               | 53            |
| Annendix A | List of Publications                          | 55            |

## LIST OF FIGURES

| Figure 1.1  | Wireline transceiver system                                                                                                                                                                                                                                                    | 1  |
|-------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----|
| Figure 1.2  | High-bandwidth memory with advanced package technology [2].                                                                                                                                                                                                                    | 2  |
| Figure 1.3  | Increasing data rates of PCI. Express standard [3].                                                                                                                                                                                                                            | 3  |
| Figure 1.4  | Evolving trend of data centers for AI applications [1].                                                                                                                                                                                                                        | 4  |
| Figure 1.5  | Comparison between half-rate and quarter-rate clocking architectures.                                                                                                                                                                                                          | 5  |
| Figure 2.1  | Clock signals in: (a) wireline system; (b) wireless/RF systems                                                                                                                                                                                                                 | 8  |
| Figure 2.2  | Measurement of random jitter. (a) Histogram method. (b) Phase noise method.                                                                                                                                                                                                    | 10 |
| Figure 2.3  | Measurement of deterministic jitter                                                                                                                                                                                                                                            | 11 |
| Figure 2.4  | A 32-Gbps quarter-rate transmitter with a 4-to-1 multiplexer using ideal sampling clocks                                                                                                                                                                                       | 12 |
| Figure 2.5  | A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 200-fs RMS jitter.                                                                                                                                                                                   | 13 |
| Figure 2.6  | A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 45% duty cycle.                                                                                                                                                                                      | 13 |
| Figure 2.7  | A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 45% duty cycle and 85° quadrature phase.                                                                                                                                                             | 14 |
| Figure 2.8  | A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 200-fs RMS jitter, 45% duty cycle and 85° quadrature phase.                                                                                                                                          | 14 |
| Figure 2.9  | Clocking implementation in a multi-lane transceiver systems                                                                                                                                                                                                                    | 15 |
| •           | Quadrature clock generator. (a) Frequency dividers. (b) Poly-phase filters. (c) Ring-oscillator-based PLL. (d) Open-loop injection-locked ring oscillators. (e) Wide-band injection-locked ring oscillators. (f) Delay-locked loop. (g) Delay line with digital control logic. | 17 |
|             |                                                                                                                                                                                                                                                                                |    |
| Figure 3.1  | System architecture of the proposed QCG.                                                                                                                                                                                                                                       | 21 |
| Figure 3.2  | Proposed open-loop QCG with PI-based QEC. (a) Block diagram. (b) Vector diagram                                                                                                                                                                                                | 22 |
| Figure 3.3  | QEC self correction on duty cycle distortion                                                                                                                                                                                                                                   | 24 |
| Figure 3.4  | Digitally controlled delay line                                                                                                                                                                                                                                                | 25 |
| Figure 3.5  | Simultaion results of DCDL                                                                                                                                                                                                                                                     | 25 |
| Figure 3.6  | Single-ended-to-differential conversion and duty cycle correction.                                                                                                                                                                                                             | 26 |
| Figure 3.7  | Schematic of duty cycle correction.                                                                                                                                                                                                                                            | 27 |
| Figure 3.8  | Simulation results of DCC                                                                                                                                                                                                                                                      | 27 |
| Figure 3.9  | Commonly used architectures of phase interpolators (a) Voltage-mode                                                                                                                                                                                                            |    |
|             | phase interpolator. (b) Current-mode phase interpolator. (c) Integrating-mode phase interpolator.                                                                                                                                                                              | 20 |
| Figure 2 10 | Integrating-mode phase interpolator for QEC. (a) Operational principle.                                                                                                                                                                                                        | 28 |
|             | (b) Schematic of integrating-mode PI.                                                                                                                                                                                                                                          | 30 |
| Figure 3.11 | CML to CMOS Converter. (a) Schematic. (b) Simulation results                                                                                                                                                                                                                   | 31 |

| Figure 3.12   | Monte Carlo simulation of QEC. (a) Qudrature phase distribution. (b)        |    |
|---------------|-----------------------------------------------------------------------------|----|
| C             | Duty cycle distribution.                                                    | 32 |
| Figure 3.13   | Digital calibration scheme.                                                 | 33 |
| •             | Duty cycle detection. (a) RC low-pass filter. (b) Operational principle.    | 34 |
| _             | Schematic of passive mixer for QED.                                         | 35 |
| •             | Simulated error detection characteristics. (a) DCD characteristics. (b)     |    |
| C             | DCD zero crossing voltage distribution. (c) QED characteristics. (d)        |    |
|               | QED zero crossing voltage distribution.                                     | 35 |
| Figure 3.17   | Schematic of auto-zeroing comparator with offset cancellation.              | 36 |
| •             | (a) Calibration without disabling strategy. (b) Calibration with disabling  |    |
| 8             | strategy.                                                                   | 37 |
| Figure 3.19   | Flowchart of the FSM                                                        | 38 |
| _             | AM-PM conversion of C2C converter.                                          | 39 |
| •             | Non-constant integrating currents                                           | 40 |
| _             | P/N mismatch                                                                | 41 |
| •             | Duty cycle distortion                                                       | 42 |
| _             | Jitter contribution of each stage. (a) Simulated output phase noise of each |    |
| 8             | stage. (b) Calculated jitter contribution.                                  | 44 |
| Figure 3.25   | Fabricated QCG prototype. (a) Chip micrograph. (b) Power breakdown.         | 45 |
| _             | Testing setup. (a) Block diagram. (b) Photo of the testing environment.     | 46 |
| _             | Measurement of quadrature error. (a) Measurement at 10-GHz operation.       |    |
| 1 18010 012 / | (b) Measured quadrature errors across different frequencies.                | 47 |
| Figure 3.28   | Measured phase noise of reference clock and output clocks.                  | 48 |
| 5410 0.20     | 1.11.11.11.11.11.11.11.11.11.11.11.11.1                                     | .0 |
| Figure 4.1    | Analog feedback loop detecting DC voltage drifting.                         | 51 |

## LIST OF TABLES

Table 3.1 Comparison with recently published QCG

48

#### CHAPTER 1

#### INTRODUCTION

The performance of wireline communication systems has become increasingly vital due to the rapid advancement of artificial intelligence (AI), which has generated an exponential amount of data and significantly heightened processing and transmission demands [1].

## 1.1 High-speed wireline transceiver systems



Figure 1.1: Wireline transceiver system

Wireline systems are essential components in modern communication networks, enabling the transmission and reception of data over wired connections. These systems typically consist of three main parts: the transmitter, the receiver, and the communication channel, as shown in Figure (1.1). The transmitter converts digital data into electrical signals in certain modulation schemes and send the modulated information signals into the channel, while the receiver captures these signals from the channel, performing amplification, clock/data recovery and demodulation to retrieve accurate digital data while mitigating noise and distortion. The communication channel, whether electrical or optical, transmits the modulated signals from the transmitters to the receivers, and influences data transmission quality through its bandwidth, attenuation, and noise characteristics.

Wireline systems are widely used in a lot of applications, such as die-to-die communications,

on-board interconnect (PCIE, NVLink, etc), and large-scale data centers, to support tremendous data throughput demand in this era of AI.

#### 1.1.1 Chip-to-chip communication



Figure 1.2: High-bandwidth memory with advanced package technology [2].

Driven by surging AI computing demands, data traffic between processors (CPUs/GPUs) and memory has escalated dramatically. To overcome the resulting latency and bandwidth bottlenecks, High Bandwidth Memory (HBM) utilizes memory stacking and Through-Silicon Vias (TSVs) to minimize physical distance between memory and logic units. This vertical integration is enabled by 2.5D/3D packaging technologies, where a silicon interposer, featuring ultradense wiring and TSVs, provides a high-speed communication bridge between stacked dies. While this addresses proximity challenges, scaling monolithic SoCs faces critical yield concerns: larger dies suffer exponentially higher defect rates. Chiplets directly mitigate this by disaggregating systems into smaller, specialized dies (<300mm²), where smaller silicon areas inherently achieve higher manufacturing yields. Defective chiplets can be discarded or binned independently, significantly improving cost efficiency versus scrapping a single large die. To enable robust communication between these heterogeneous chiplets, standards like UCIe (Universal Chiplet Interconnect Express) define high-bandwidth, low-latency die-to-die interfaces. UCIe ensures interoperability across vendors and process nodes, making modular, yield-optimized designs commercially viable for next-gen AI hardware.

#### 1.1.2 On-board interconnects

### PCle Data Rate vs. Generation (Gbps) 128 96 64 32 0 Gen1 Gen2 Gen3 Gen4 Gen5 Gen6 Gen7 ('03)('06)('10)('17)('19)('22)('24)CXL3.0

Figure 1.3: Increasing data rates of PCI. Express standard [3].

In modern computing systems spanning personal computers to artificial intelligence (AI) servers, on-board interconnects enable critical module-to-module communication between processors, memory, accelerators, and peripherals. These interconnects fundamentally rely on printed circuit board (PCB) traces – copper pathways etched onto the system board – which serve as the physical channel for transmitting electrical signals. To overcome the inherent bandwidth and signal integrity limitations of PCB traces, specialized high-speed protocols deliver the massive data throughput required for AI applications: Graphics Double Data Rate (GDDR) interfaces provide ultra-high-bandwidth connections between GPU and dedicated video memory; Peripheral Component Interconnect Express (PCIe) establishes versatile, high-speed links between CPUs and devices like solid-state drives (SSDs) or accelerators; and NVIDIA's NVLink technology facilitates direct, low-latency GPU-to-GPu communication in multi-GPU AI server configurations. Collectively, these protocols transform passive PCB infrastructure into intelligent data highways, enabling the rapid transfer capabilities essential for complex AI training and infer-

ence workloads.

#### 1.1.3 Modern data centers enabling AI applications



Figure 1.4: Evolving trend of data centers for AI applications [1].

Modern data centers supporting AI workloads rely heavily on advanced wireline systems to meet the unprecedented demand for high-speed, low-latency, and reliable connectivity. AI-driven applications, such as machine learning training, real-time inference, and large-scale data processing, require massive data transfer between servers, storage, and accelerators like GPUs and TPUs. These applications require massive data exchange at >200 Gbps per-lane rates across extended communication distances—from intra-rack meters to inter-building kilometers. Optical communication links enable this high-speed transmission with inherently lower signal loss versus electrical alternatives, while advanced modulation like 4-level amplitude modulation (PAM-4 efficiently) doubles spectral capacity within fixed bandwidth constraints. AI's exponential growth has driven demand expansion for these systems, pushing the boundaries of network capacity and efficiency to support the ever-increasing scale of AI models and datasets. As a result, modern data centers are rapidly evolving their wireline infrastructure to ensure seamless, high-performance connectivity, which is essential for sustaining the next generation of AI innovations.

## 1.2 Multi-phase clock sampling



Figure 1.5: Comparison between half-rate and quarter-rate clocking architectures.

In the applications discussed above, circuit designers continually strive to extract maximum performance from available technologies to address the ever-increasing demand for higher data throughput. As a result, the development of power- and area-efficient data links has emerged as a critical priority in modern communication systems.

Traditional half-rate sampling methods benefit from straightforward implementation, providing sufficient sampling edges with one pair of differential clocks, and thus have been widely employed in systems with a low-to-medium data rate. However, as data rate continuously grows and exceeds 100 Gbps, this approach face limitations in terms of high power consumption and restricted electrical bandwidth.

Multi-phase sampling architectures in wireline systems enhance efficiency by distributing the sampling process across multiple phases, reducing power consumption and easing bandwidth demands. By interleaving sampling operations across different phases, these architectures achieve lower per-phase sampling rates while maintaining overall system performance. This approach significantly reduces the power burden by minimizing the number of full-rate nodes in the system and lowering the power required for high-frequency clock generation and distribution. As a result, multi-phase sampling not only optimizes power efficiency but also offers a scalable solution for high-speed wireline communication systems, making it a key innovation for modern high-performance networks.

Multi-phase sampling architectures leverage sub-data-rate multi-phase clocks to deliver pre-

cise timing control and synchronization. Increasing the number of phases allows for a proportional reduction in clock frequency. However, this method is constrained by practical limitations, as an excessive number of phases introduces challenges in clock distribution, including phase inaccuracies, duty cycle distortion, and increased routing complexity. Consequently, quarter-rate architectures are commonly adopted as an optimal balance between power efficiency and difficulties of clock distribution.

In general, half-rate architectures are better suited for low-to-medium data-rate systems, such as HBM interfaces or lower-speed PCIe configurations. Conversely, sub-rate multi-phase sampling architectures (e.g., quarter-rate) are optimized for high-data-rate systems like 100G/200G optical interconnects.

### 1.3 Thesis organization

This thesis is structured to provide a comprehensive overview of the design and implementation of a multi-phase clock generator for high-speed wireline systems. The organization of the thesis is as follows:

Chapter 2 discusses the fundamental characteristics and requirements of clock signals in wireline systems. It explores the impact of random jitter and deterministic jitter on system performance and reviews various clocking architectures, including global and local clock generation techniques. The chapter concludes with an assessment of existing quadrature clock generator designs, emphasizing their strengths and weaknesses.

Chapter 3 presents the design and implementation details of the proposed quadrature clock generator (QCG). It describes the architecture, including the digitally controlled delay line (DCDL) and the phase interpolator(PI)-based quadrature error corrector (QEC). The chapter further discusses the calibration techniques employed to enhance performance, along with simulation and measurement results that validate the effectiveness of the proposed solution.

Chapter 4 summarizes the key contributions of the thesis, reflecting on the successful design and performance of the QCG in meeting the stringent requirements of high-speed wireline applications. It also outlines potential avenues for future research, including the integration of the QCG into complete clocking subsystems and its application in multi-lane transceiver systems.

#### **CHAPTER 2**

# CLOCKING ARCHITECTURES FOR WIRELINE TRANSCEIVERS

As previously discussed, clock signals play a crucial role in providing synchronization and enabling data transmission in wireline systems. In this chapter, we focus on the characteristics and requirements of clock signals in such systems, exploring their impact on overall performance. Additionally, typical clocking implementations, including global and local clock generation techniques, will be introduced to illustrate how precise timing control is achieved in high-speed wireline applications.

## 2.1 Clock signals in wireline systems

Clock signals are fundamental to the operation of communication systems, serving as the timing reference that synchronizes data transmission and signal processing. Clock signals are typically periodic and can take the form of either sinusoidal or square-wave waveforms.

In wireline and wireless systems, clock signals play a critical role in synchronization and data transmission, but their implementation differs significantly due to the nature of the communication medium. In wireline systems, square-wave clocks are predominantly used because they provide sharp, well-defined edges that are ideal for high-speed digital communication triggering sequential circuits such as filp-flops and multiplexers shown in Figure (2.1a), ensuring precise timing and minimal jitter. On the other hand, wireless systems often rely on sinusoidal clocks, which are better suited for modulation and transmission over the air due to their continuous and smooth waveform, reducing harmonic interference and improving spectral efficiency. While square-wave clocks are easier to generate and process in digital circuits, their high harmonic content makes them less suitable for wireless applications, where sinusoidal clocks are preferred for their ability to maintain signal integrity in RF environments. For example, Figure (2.1b) presents an up-conversion RF system with a mixer driven by local oscillator (LO) clock, which is normally sinusoidal to reduce harmonic components in the output signal.



Figure 2.1: Clock signals in: (a) wireline system; (b) wireless/RF systems

In terms of clock generation requirements, wireline and wireless systems exhibit distinct differences. In wireline systems, which typically employ direct baseband modulation and utilize the entire bandwidth of the physical channel, the clock generation circuit must support a wide range of operating frequencies to accommodate varying data rates. In contrast, wireless systems, which often rely on modulation schemes like orthogonal frequency-division multiplexing (OFDM), require clock generation circuits such as frequency synthesizers to provide clock signals with fine frequency resolution within a specific bandwidth but doesn't necessarily cover as wide a frequency range as wireline systems.

Furthermore, the requirements for phase noise and jitter performance vary between wireline and wireless systems. Typically, a wireless transceiver shares the spectrum with other transceivers, occupying only a portion of the available bandwidth. To mitigate adjacent channel leakage, phase noise within the relevant bandwidth, usually up to a 100 MHz offset frequency, is of primary concern. In contrast, a wireline transceiver generally utilizes the entire bandwidth of a specific channel, such as a cable or optical fiber, with a focus on real-time data transitions. In this context, real-time clock uncertainty, including deterministic jitter (DJ) and random jitter (RJ) in the sampling clocks, is of utmost importance. The phase noise of the sampling clock is often employed to estimate jitter in the time domain, necessitating integration over a broad frequency range, conservatively extending to half of the carrier frequency [4].

In the following section, we will focus on square-wave clocks and discuss the key performance specifications including random jitter and deterministic jitter. An example of a wireline transmitter will be shown to demonstrate the effect of them on an actual wireline system.

#### 2.1.1 Random jitter

Random jitter in clock signals refers to the unpredictable variations in the timing of clock edges, which can degrade the performance of communication systems by introducing timing uncertainty [5]. Random jitter is primarily caused by inherent noise processes, such as thermal noise, shot noise, and flicker noise within various electronic devices like transistors and resistors. These noise sources introduce small, random fluctuations in the clock period, making it challenging to predict the exact timing of clock transitions. Random jitter is typically quantified by its root mean square (RMS) value, representing the standard deviation of the timing variations.

To measure random jitter, two primary methods can be employed: the histogram method and the phase noise method, as shown in Figure (2.2). The first approach involves capturing the clock signal using a high-bandwidth oscilloscope and analyzing the timing variations of the clock edges. By constructing a histogram of the edge deviations, the random jitter can be extracted as the standard deviation of the Gaussian distribution fitted to the histogram. This method provides a direct time-domain measurement of jitter, offering intuitive insights into timing variations. The second method leverages phase noise measurements, which characterize the frequency-domain fluctuations of the clock signal. Using a general spectrum analyzer or a dedicated phase noise analyzer, the phase noise plot is measured and then integrated over a specified frequency range. Through established mathematical relationships, the phase noise is converted into random jitter. Both methods provide complementary perspectives, enabling comprehensive characterization of random jitter in clock signals.



Figure 2.2: Measurement of random jitter. (a) Histogram method. (b) Phase noise method.

#### 2.1.2 Deterministic jitter

Deterministic jitter refers to the predictable, repeatable timing variations in clock edges, which arise from specific, identifiable sources within a system [5, 6]. Unlike random jitter, which is caused by inherent noise processes, deterministic jitter is typically induced by systematic factors such as power supply noise, crosstalk, inter-symbol interference (ISI), or imperfections in the clock generation circuitry. This type of jitter is bounded and can often be characterized by its periodic or data-dependent behavior, making it possible to analyze and mitigate through careful design and signal conditioning. Deterministic jitter is commonly quantified by measuring its peak-to-peak amplitude, which represents the maximum deviation in timing over a given period. To measure deterministic jitter, tools such as oscilloscopes, jitter analyzers, or eye diagram analysis can be used, often in conjunction with techniques like spectral analysis or pattern triggering to isolate and identify its specific sources. Figure (2.3) shows the measurement method with oscilloscopes, where DJ is measured by the time difference of the two distribution peaks. By understanding and addressing deterministic jitter, designers can improve the timing accuracy of clock signals, which is critical for maintaining signal integrity and ensuring reliable operation in high-speed communication systems.



Figure 2.3: Measurement of deterministic jitter

In wireline transceiver systems, two specific factors from clock signals that could result in deterministic are duty cycle distortion and quadrature phase error. Duty cycle distortion occurs when the high and low periods of a clock signal are unequal, often due to asymmetries in the clock generation or amplification stages. This imbalance leads to timing mismatches, which can degrade the performance of digital circuits that rely on precise clock edges. Quadrature phase error, on the other hand, arises in systems where multiple clock signals are required to maintain specific phase relationships, such as in-phase (I) and quadrature-phase (Q) signals in quadrature modulation schemes. Imperfections in multi-phase clock generators, mismatched trace lengths, or propagation delays can cause deviations from the ideal 90-degree phase separation, leading to quadrature phase error. Both duty cycle distortion and quadrature phase distortion can degrade data transmission quality in wireline systems, particularly those employing multi-phase sampling architectures. Addressing these issues is essential for ensuring accurate timing and optimal performance in high-speed communication systems, often necessitating calibration techniques and error correction circuits.

#### 2.1.3 Effect of RJ and DJ on wireline systems



Figure 2.4: A 32-Gbps quarter-rate transmitter with a 4-to-1 multiplexer using ideal sampling clocks

Figure (2.4) illustrates the effects of RJ and DJ on the signal quality in a wireline system. The system features a 32-Gbps quarter-rate transmitter that serializes 4-way 8-Gbps parallel data into 32-Gbps serial data using quadrature clocks.

In Figure (2.4), the transmitter uses an ideal sampling clock for the 4:1 multiplexer (MUX), free from random jitter, duty cycle distortion, and quadrature phase error. Consequently, the output eye diagram displays high signal integrity, with sharp transitions and a wide decision window for the receiver. In contrast, Figure (2.5) depicts the impact of sampling the input data with clocks exhibiting 200 fs RMS random jitter. The jittery clocks cause the transition distribution to propagate to the output data, resulting in blurred transitions in the eye diagram. Regarding deterministic jitter, Figure (2.6) shows that multiplexing data with clocks affected by duty cycle distortion causes the transitions in the eye diagram to split. This splitting becomes more pronounced when both duty cycle distortion and quadrature phase error are present in the sampling clocks, as shown in Figure (2.7). Finally, Figure (2.8) demonstrates the combined effect of DJ and RJ, where the output eye diagram undergoes significant degradation. The transitions become widely dispersed, and the optimal decision window narrows substantially, making it challenging for receiver to accurately recover the data.



Figure 2.5: A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 200-fs RMS jitter.



Figure 2.6: A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 45% duty cycle.



Figure 2.7: A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 45% duty cycle and 85° quadrature phase.



Figure 2.8: A 32Gbps NRZ quarter-rate wireline transmitter using sampling clocks with 200-fs RMS jitter, 45% duty cycle and 85° quadrature phase.

## 2.2 Clocking implementation for multi-lane applications

As shown in Figure (2.9), the clocking implementation for a multi-lane wireline transceiver typically consists of two main components: global clock generation and local clock generation [7, 8]. The global clock generation is responsible for producing a stable and low-jitter reference clock that is distributed across the system, while the local clock generation focuses on generating multiple clock phases with precise phase relationships to enable high-speed data sampling and transmission.



Figure 2.9: Clocking implementation in a multi-lane transceiver systems

### 2.2.1 Global clock generation

Global clock generation is the first step in the clocking implementation process, providing a high-quality reference clock that serves as the foundation for the entire system. This reference clock must exhibit low phase noise and jitter to ensure accurate timing across the system, especially in high-speed wireline applications where timing margins are extremely tight. The global clock is typically generated using very-low-noise LC phase-locked loops (LC-PLLs) with an RMS jitter of <100fs.

#### 2.2.2 Local clock generation

Local clock generation focuses on creating multiple clock phases with precise phase relationships. This process typically involves phase control circuits that manipulate the global reference clock to generate the required phases. Multi-phase clock generation include two topics: quadrature clock generation and phase control.

Quadrature clock generation produces clock signals with with precise phase separations, such as 90 degrees for 4-phase clocks (0°, 90°, 180°, 270°) or 45 degrees for 8-phase clocks (0°, 45°, 90°, 135°, 180°, 225°, 270°, 315°) [9–24]. These clocks are typically generated using techniques like ring oscillators, frequency division, or polyphase filters, with the goal of maintaining accurate phase alignment and minimizing phase error.

Quadrature clocks serve two primary purposes in high-speed systems. First, they enable sub-data-rate multi-phase sampling architectures as discussed in Chapter 1. Second, they divide the phase plane into quadrants, typically 4 and 8 quadrants, providing reference phases that can be used as inputs for phase interpolators which can generate clock signals with precise timing/phase control.

To achieve functions like de-skewing and clock/data recovery, fine phase control is necessary in wireline applications. Voltage-controlled delay line (VCDL) is a common approach to control the clock phase. However, due to the limited tuning range, it is difficult for VCDL to cover a large phase control range over different operation frequencies. Phase interpolators serve as feasible method for phase controlling over different frequencies [17, 20–22, 25, 26]. A phase interpolator can generate an output clock with a phase that is an intermediate value between two input signals with a known phase difference (e.g. 90° and 45°). By combining these signals in specific proportions, different intermediate phases between the two input phases can be generated by certain resolution. For example, if the inputs are 0° and 90°, the output could be adjusted to 45° by equally weighting the two inputs. Theoretically, by choosing the combination of input phases (e.g. 2 phases from 0°, 90°, 180° and 270°) and modulating the combining weights, a phase interpolator can produce any phase from 0° to 360°.

## 2.3 Review on QCG



Figure 2.10: Quadrature clock generator. (a) Frequency dividers. (b) Poly-phase filters. (c) Ring-oscillator-based PLL. (d) Open-loop injection-locked ring oscillators. (e) Wide-band injection-locked ring oscillators. (f) Delay-locked loop. (g) Delay line with digital control logic.

Various approaches for QCG implementations have been proposed. Frequency dividers are widely used for QCG due to their ability to produce good phase accuracy from a 50%-duty-cycle clock [8, 27–40]. However, frequency dividers suffer from high power consumption in high-frequency clock generation and distribution. Poly-phase filters can generate quadrature clocks

but exhibit a narrow operation range, amplitude variation and poor phase accuracy [11, 24, 41–43]. Ring-oscillator-based phase-locked loops (ROPLL) can generate 4-phase or 8- phase clocks with high phase accuracy but struggle with phase noise performance since typical PLL structures usually cannot sufficiently suppress the phase noise from the ring voltage-controlled oscillator (VCO) to meet jitter requirements of wireline systems [44, 45].

Injection-locked methods can improve the jitter performance of ring oscillators for QCG by injecting a low-noise input clock. Open-loop injection-locked structure is widely used for its simple circuitry [12, 46–49]. However, the inherent frequency of a ring oscillator is highly sensitive to process, voltage, and temperature (PVT) variations, leading to a narrow locking range, limited correction performance, and potential degradation in the duty cycles of the output clocks. To address these challenges, designers have attempted cascading multiple stages of injection-locked ring oscillators. While this approach improves phase matching, it comes at the cost of increased power consumption and still suffers from a narrow locking range and duty cycle-related issues.

Wide-band injection-locked (IL) architectures can mitigate those issues by adaptively regulating the self-oscillation frequency of the ring oscillators [14, 15, 17, 20, 22, 50–52]. In [17], an injection-locked 8-phase quadrature-locked loop was implemented with a 4-stage ring oscillator whose supply voltage is regulated by a phase error detection circuit. Nevertheless, the commonly used two-phase injection approach introduces imbalances between injected and non-injected stages, leading to phase mismatch and limited jitter performance [53]. Increasing the number of injectors can solve this problem but necessitates an initial multi-phase clock generator, which increases the power and hardware overhead. [22] presented an injection-locked ring oscillator with 8-phase injection where the initial 8-phase clocks are generated from a quadrature delay-locked loop, showing good noise and phase accuracy performance. However, the extra delay-locked loop doubles the overall power consumption.

Delay-locked loops (DLL) serve as a viable alternative for 4-phase QCG with low noise contributions and good phase accuracy [20, 22]. On the other hand, the mismatch between different delay stages makes it challenging to achieve 8-phase QCG with DLLs. In [20], a quadrature delay-locked loop can only generate coarse 8-phase clocks with a phase error of up to  $6^{\circ}$ .

Digitally controlled delay line (DCDL) can achieve high resolution phase tuning and thus serve as a feasible method for QCG or quadrature error correction (QEC) [48, 49, 54]. Inverters loaded with a configurable switched-capacitor bank are usually used to implement delay cells, which can effectively reject the noise contribution. Hoewever, this approach requires on high-

resolution quadrature error detection and slicing circuit. What's more, to cover a wide tuning range, large amount of capacitors are required which occupies a considerable die area.

The QCG systems discussed above can be categorized as either closed-loop or open-loop. Closed-loop QCG systems employ a detection circuit that continuously monitors and corrects phase errors by dynamically adjusting QCG parameters in real time. This feedback mechanism ensures excellent phase matching, though phase accuracy is ultimately limited by loop gain constraints and device mismatches in the detection circuit. However, closed-loop designs face significant challenges, including stability concerns, complex loop analysis, high power consumption, and restricted tracking bandwidth—limitations that make them less attractive for wireline applications, which often demand wide operational ranges.

In contrast, open-loop QCG eliminates the need for a tracking loop by leveraging self-correcting behavior. Among open-loop implementations, injection-locked ring oscillators are widely adopted due to their simplicity and compact design, despite inheriting the drawbacks mentioned earlier. Beyond ILROs, few open-loop QCG or QEC solutions have been proposed in the literature [13, 16].

In the next chapter, we will discuss and present the design of a novel open-loop QCG system, which can address the common issues of traditional injection-lock methods such as duty cycle degradation and narrow operation range.

## 2.4 Conclusion of this chapter

In this chapter, we have delved into the critical role of clock signals in wireline transceiver systems, emphasizing their significance in ensuring synchronization and facilitating efficient data transmission. We explored the characteristics and requirements of clock signals, highlighting the distinctions between wireline and wireless systems in terms of clock generation and implementation.

The discussion on RJ and DJ revealed how these timing variations can adversely affect the performance of communication systems. We examined the sources of both types of jitter, as well as their measurement techniques, which are essential for evaluating the integrity of clock signals in high-speed applications.

Furthermore, we analyzed the implementation of clocking architectures tailored for multilane wireline applications. The two primary components, global clock generation and local clock generation, were outlined, showcasing how they work together to produce stable and lowjitter clock signals. The chapter also reviewed various QCG designs, assessing their advantages and limitations in terms of phase accuracy, power consumption, and operational range.

Overall, this chapter underscores the importance of advanced clocking architectures in enhancing the performance of wireline transceivers. The insights gained from this exploration lay the foundation for the development of innovative solutions in clock generation, which are crucial for meeting the evolving demands of modern high-speed communication systems.

#### **CHAPTER 3**

# A QUADRATURE CLOCK GENERATOR WITH AN OPEN-LOOP QUADRATURE ERROR CORRECTOR

## 3.1 Proposed QCG with PI-based QEC

#### 3.1.1 Architecture

The overall architecture of the proposed QCG with open-loop QEC is shown in Figure (3.1). The process begins with the duty cycle correction (DCC) circuit, which corrects duty-cycle errors in the input reference clock. Next, a DCDL generates coarse quadrature-phase clocks with relatively large phase errors. These errors are then further reduced by an open-loop two-stage QEC circuit, which utilizes phase interpolation. Additionally, a finite state machine (FSM) is implemented to enable automatic calibration of the DCC and the coarse correction of the delay line.



Figure 3.1: System architecture of the proposed QCG.

#### 3.1.2 Open-loop QEC based on phase interpolation

The concept of the proposed QCG is described in Figure (3.2). DCDL produces two pairs of differential clock signals with a phase difference of  $\theta$ , denoted by CK\_1P (0°)/CK\_1N (180°) and CK\_2P ( $\theta$ )/CK\_2N ( $\theta$  + 180°).







(a)

Figure 3.2: Proposed open-loop QCG with PI-based QEC. (a) Block diagram. (b) Vector diagram

The PI combines CK\_1P with CK\_2P, generating CK\_IP with the phase of  $\theta/2 + \theta_{D1}$ , where  $\theta/2$  is the result of middle-phase generation while  $\theta_{D1}$  is the probagation delay induced by PI and buffers. Furthermore, by combining CK\_2P and CK\_1N, CK\_QP can be generated with the phase of  $\theta/2 + 90^{\circ} + \theta_{D2}$ , ensuring a 90° phase difference between CK\_IP and CK\_QP if  $\theta_{D2}$  equals to  $\theta_{D1}$ , where  $\theta_{D1} = \theta_{D}(\theta)$  and  $\theta_{D2} = \theta_{D}(180^{\circ} - \theta)$ . It will be discussed that  $\theta_{D1} = \theta_{D}$  can be achieved without  $\theta$  being exactly 90°. Similarly, CK\_IN and CK\_QN can be produced with the rest of combinations. At this point, two differential clock pairs CK\_IP ( $\theta/2 + \theta_{D1}$ )/CK\_IN ( $\theta/2 + 180^{\circ} + \theta_{D1}$ ) and CK\_QP ( $\theta/2 + 90^{\circ} + \theta_{D2}$ )/CK\_QN ( $\theta/2 + 270^{\circ} + \theta_{D2}$ ) with quadrature phase difference (90°) are produced.

In Figure (3.2b), a vector diagram is used to demonstrate the proposed idea. In the ideal case,  $\theta_D$  of each path should be identical, thus it is removed from the vector diagram for an intuitive explanation.

#### 3.1.3 Self-correction on duty cycle distortion

The proposed open-loop QEC circuit not only corrects quadrature phase errors but also autonomously mitigates duty cycle distortion in the input clocks, a feature that surpasses other open-loop QEC methods, such as injection-locked ring oscillators, which may inadvertently degrade the duty cycle.

This feature is illustrated in Figure (3.3). Consider two pairs of differential clocks generated by the DCDL with an initial delay of t (equivalent to  $\theta$  in the phase domain) and a duty cycle of 48%. Transition timings are marked to calculate the delay and duty cycle. After the first stage of QEC, the time delay between the I-phase and Q-phase clocks is calculated at 0.24T (86.4°), with the I-phase clock maintaining a duty cycle of 48% and the Q-phase clock achieving a duty cycle of 50%. When the generated I and Q clocks are fed into the second QEC stage, the delay remains at 0.24T, while the duty cycles of the I-phase and Q-phase clocks adjust to 49% and 51%, respectively. Notably, the original duty cycle distortion is reduced by half. In fact, the duty cycle distortion and quadrature phase/time error are halved every two QEC stages. However, endless cascading of QEC stages is impractical due to power overhead and induced noise. Consequently, the limited duty cycle correction performance of the QEC necessitates the use of an initial DCC for the input clocks to ensure optimal performance.











Figure 3.3: QEC self correction on duty cycle distortion

## 3.2 Circuit implementation

#### 3.2.1 Digitally controlled delay line

The DCDL is used to generate a coarse 90° phase shift, which is then fine-tuned by the subsequent PI-based QEC. A DCDL with a load capacitor bank is widely used because of its simple design and minimal jitter contribution. In this design, instead of placing all the load capacitors at one stage, which could lead to longer transition times and thus higher sensitivity to noise, the capacitors are distributed across multiple stages as Figure (3.4) presents. This distribution reduces the transition time at each stage, helping to minimize noise interference while maintaining a wide tuning range. The 4-bit DCDL is designed with a tuning step of 1.2 ps, covering a delay range of 20 ps. The simulated delays generated from the DCDL under different process corners are illustrated in Figure (3.5).



Figure 3.4: Digitally controlled delay line



Figure 3.5: Simultaion results of DCDL

#### 3.2.2 Duty cycle correction

To ensure that the QEC produces an accurate 90° phase shift, it is important to minimize duty cycle distortion in the input clock signal. For this purpose, a DCC circuit is included in the system.

In this design, an off-chip single-ended clock is buffered, modified by DCC and then converted to differential clocks. As shown in Figure (3.6), the DCC circuit consists of an inverter-based buffer and configurable pull-up and pull-down current sources. When the pull-up current is enabled, the falling transition time of the clock waveform becomes longer, while the rising transition time becomes shorter, reducing the duty cycle of the output clock. On the other hand, enabling the pull-down current increases the duty cycle of the output clock.

To allow fine adjustments to the duty cycle, a 6-bit digitally controlled current mirror is used. This enables the pull-up and pull-down currents to be adjusted in small steps of 0.15%, providing a tuning range of up to 10% in duty cycle. The simulated DCC performance under different process corners are illustrated in Figure (3.8).



Figure 3.6: Single-ended-to-differential conversion and duty cycle correction.



Figure 3.7: Schematic of duty cycle correction.



Figure 3.8: Simulation results of DCC

#### 3.2.3 Phase interpolator for QEC

Output phase accuracy of the proposed QCG relies significantly on PI's linearity performance. Commonly used PI includes voltage-Mode PI (VMPI) and current-mode PI (CMPI) [18, 20, 25].

The VMPI employs a straightforward wire-AND logic connection between two inverter outputs, representing the simplest implementation of a PI. However, this approach suffers from highly nonlinear phase combining, making the output phase vulnerable to mismatch and PVT variations.

The CMPI utilizes a common-source combiner with two transconductance (gm) stages, enabling linear phase combination when properly biased with an optimal input common-mode voltage. While this architecture is well-suited for low-swing sinusoidal clocks, its performance degrades with rail-to-rail square-wave clocks due to their rich harmonic content and large voltage swings, which introduce significant nonlinearities at the output.



Figure 3.9: Commonly used architectures of phase interpolators (a) Voltage-mode phase interpolator. (b) Current-mode phase interpolator. (c) Integrating-mode phase interpolator.

Unlike VMPI and CMPI, integrating-mode PI (IMPI) shows simple circuitry, excellent linearity, low noise contribution, and full compatibility to square-wave clocks [25, 26], and thus serves as a good candidate for our QCG implementation. IMPI achieves phase interpolation by combining two periodically bi-directional current sources as Figure (3.10a) shows. In [25], the combining weights of the two current sources are modulated to achieve phase shifting. In our QCG design, the current sources are equally weighted for producing an averaged phase. Two trapezoid-shaped waveforms with a phase difference of 90° are generated on the load capacitor, which are subsequently amplified to rail-to-rail signals by CML-to-CMOS (C2C) converters, comparing the VX waveforms with a certain threshold and converting them to rail-to-rail signals while maintaining the 90° difference.

The schematic of IMPI is shown in Figure (3.10b). The charging and discharging currents are implemented with PMOS and NMOS transistors respectively. M1,2,7,8 are switch transistors used to enable current sources at different operating phases, whose gates are driven by PI's input clocks. M3,4,5,6 are current source transistors, controlled by a bias generation circuit.

Figure (3.11a) shows the schematic of C2C converter, which is consisted of an AC-coupling capacitor and a self biased inverter. Note that the discrepancy between  $\theta_{D1}$  and  $\theta_{D2}$  majorly comes from the C2C converters and associated buffers. Figure (3.11b) presents the simulation results of the phase transfer characteristics of C2C converters. When the input  $\theta$  is near 90°, denoted by target region, the mismatch between  $\theta_{D1}$  and  $\theta_{D2}$  is rather small and negligible, while the phase mismatch grows quickly outside target region. Therefore, it is necessary to calibrate the initial input  $\theta$  to the target region.

The proposed PI-based QEC achieves an error suppression ratio of 1/6 by post-layout simulation. However, substantial residual phase errors may still exist in the output clocks due to nonideal factors, such as current mismatch and duty cycle distortion. Cascading multiple QEC stages can improve performance, though power and noise must be considered. This work implements a 2-stage QEC to balance these factors.



Figure 3.10: Integrating-mode phase interpolator for QEC. (a) Operational principle. (b) Schematic of integrating-mode PI.



Figure 3.11: CML to CMOS Converter. (a) Schematic. (b) Simulation results

### 3.2.4 Monte Carlo simulation results

A 500-run Monte Carlo simulation was performed to evaluate the 2-stage QEC performance under mismatch. The input quadrature clocks have a 5° phase error and a 50% duty cycle. The phase and duty cycle distribution of the output quadrature clocks is presented in Figure (3.12). The output quadrature phase shows a standard deviation of 0.75° and a mean value of 90.05° while the duty cycle presents a standard deviation of 0.32% and a mean value of 50.0%.



Figure 3.12: Monte Carlo simulation of QEC. (a) Qudrature phase distribution. (b) Duty cycle distribution.

### 3.2.5 Digital calibration



Figure 3.13: Digital calibration scheme.

The duty cycle error of the input clock and the initial quadrature phase error need to be detected and calibrated. As shown in Figure (3.13), a digital calibration scheme is implemented in this work with duty cycle detection (DCD) and quadrature error detection (QED) circuits. RC low-pass filters are used to detect the duty cycle mismatch of differential clocks, extracting the DC components of the positive and negative signals. As illustrated in Figure (3.14), when the duty cycle deviates from 50%, VP and VN will show voltage difference which can be sliced to digital 0 and 1 by comparator. The quadrature phase error can also be detected and converted to DC voltage using a passive mixer. The schematic of the QED passive mixer is shown in Figure (3.15). The simulated DCD and QED transfer characteristics are shown in Figure (3.16), with a detection sensitivity of 20.1 mV/% and 10.1 mV/°. Figure (3.16b) and Figure (3.16d)

present the distribution of output voltage with well-matched input. It shows a standard deviation of 8.2 mV and 13.2 mV for DCD and QED respectively, corresponding to 0.4-% duty cycle and 1.3-° phase error.



Figure 3.14: Duty cycle detection. (a) RC low-pass filter. (b) Operational principle.



Figure 3.15: Schematic of passive mixer for QED.



Figure 3.16: Simulated error detection characteristics. (a) DCD characteristics. (b) DCD zero crossing voltage distribution. (c) QED characteristics. (d) QED zero crossing voltage distribution.

Auto-zeroing (AZ) comparators with offset cancellation are implemented to slice the detectors' output as 1 (above target) or 0 (below target) for the FSM to complete calibration [55, 56]. The schematic of the AZ comparator is presented in Figure (3.17), which consists of a preamplifier, a StrongArm latch, an SR latch, offset storage capacitors and switches. The offset

voltage of the comparators is simulated with a dynamic method [57]. Simulated comparator offset voltage shows a standard deviation of  $400~\rm uV_{rms}$ , corresponding to 0.02-% duty cycle and  $0.04^\circ$  phase resolution, which are sufficient in this application.



Figure 3.17: Schematic of auto-zeroing comparator with offset cancellation.

The duty cycle of the input clock and the initial delay form DCDL need to be calibrated for the QEC to produce accurate quadrature clocks. The calibration is performed in a binary-search manner: comparators output detection results of 1 or 0, indicating the duty cycle/delay is above or below the optimal value (50%/90°). Based on the detection results, a synthesized FSM updates DCC/DCDL controlling codes to decrease or increase the duty cycle/delay.

When the duty cycle/delay gets near the optimal value, the comparator's output starts to toggle between 1 and 0, while the FSM updates the duty cycle and delay continuously. The periodically changing parameters result in spurious tones or deterministic jitters in the output clocks, illustrated in Figure (3.18a).

To address this issue, the FSM employs a pattern-detecting strategy to disable calibration: consecutive comparator outputs are recorded. If the outputs are "00" or "11," the system is still in the binary search phase. Conversely, if the outputs alternate as "01" or "10" (toggling), the parameters are likely near the optimal value. Once M consecutive toggling instances are detected, the calibration is considered complete and is disabled. The flowchart of the FSM is

illustrated in Subsection 3.2.5. By implementing this disabling strategy, the calibration terminates properly, avoiding periodic parameter changes and thereby eliminating the output spur issue, as Figure (3.18b) shows.

Additionally, when near the optimal region, small input differential voltage may confuse the comparator due to its hysteresis and metastability, which leads to comparator randomly outputing 0 or 1. Considering this issue, detection depth is increased to 3 for better robustness.





Figure 3.18: (a) Calibration without disabling strategy. (b) Calibration with disabling strategy.



Figure 3.19: Flowchart of the FSM

# 3.3 Design considerations

#### 3.3.1 Non-ideal factors

The performance degradation of the proposed QCG come from four major factors: (1) amplitude modulation to phase modulation (AM-PM) conversion of the C2C converter, (2) non-constant integrating currents, (3) P/N mismatch, and (4) duty cycle distortion.



Figure 3.20: AM-PM conversion of C2C converter.

AM-PM distortion. From the analysis in Section II, the generation of the 90° phase shift seems irrelevant to the initial phase shift  $\theta$  if the duty cycle is perfectly 50%. However, if quadrature phase error from the DCDL is very large, meaning the initial  $\theta$  is far away from 90°, the voltage swings of PI output waveforms  $V_{X1}$  and  $V_{X2}$  will be quite different, as shown in Figure (3.20). Due to the AM-PM conversion characteristics, C2C converters driven by  $V_{X1}$  and  $V_{X2}$  produce different probagation delays, introducing extra skews to I/Q paths. In other words,  $\theta_D$  of I/Q paths in Figure (3.2a) are not identical in this case. To minimize this effect, DCDL should be calibrated to near 90°. Additionally, increasing the swings of  $V_{X1}$  and  $V_{X2}$  by utilizing a larger integrating current can help mitigate the AM-PM issue [26]. However, this approach may introduce other complications, which will be addressed in the following section.



Figure 3.21: Non-constant integrating currents

Non-constant integrating currents. The rising and falling slopes of the PI output waveform, which effectively represent the current-to-capacitor ratio, should be sufficiently high to minimize noise injection. This results in a large voltage swing that can also help mitigate the AM-PM issue discussed previously. However, excessive voltage swings can push the transistors into their triode regions, as illustrated in Figure (3.21), leading to non-constant charging and discharging currents. Consequently, the performance of the QEC degrades due to these unwanted current variations. Therefore, it is crucial to select an appropriate current level that avoids excessively

large output swings while still providing adequate noise rejection and minimizing AM-PM conversion.



Figure 3.22: P/N mismatch

P/N mismatch. The mismatch between PMOS and NMOS transistors leads to discrepancies in their charging and discharging currents. When this mismatch occurs in ideal current sources, it generates a skewed waveform with a DC component that continues to either increase or decrease, as illustrated in Figure (3.22). In practical circuits, this trend can push the NMOS or PMOS transistors into their triode regions, causing the waveforms to settle at a specific DC

point. Although the AC-coupled C2C converter can level-shift the waveforms, the performance of phase interpolation and QEC still suffers. To address this issue, a feedback loop can be implemented to adjust the bias voltage of the PMOS or NMOS transistors, enabling them to track the DC level to an optimal point.



Figure 3.23: Duty cycle distortion

Duty cycle distortion. In a well-aligned differential pair, duty cycle distortion causes the phase difference between CKP and CKN to deviate from the ideal 180°. As discussed in Section II, the 90° phase shift is derived from averaging 0 and  $\theta + 180^{\circ}$ . Thus, any distortion in

the initial 180° phase will inevitably lead to distortion in the generated 90° phase. Furthermore, when duty cycle distortion affects ideal current sources, it results in a skewed waveform, again causing the DC component to either increase or decrease in one of the waveforms, as shown in Figure (3.23). In real-world circuits, this trend can similarly push the NMOS or PMOS transistors into their triode regions, ultimately leading to a settling at a specific DC point. Consequently, the performance of phase interpolation and QEC degrades. While the proposed QEC structure can partially alleviate the duty cycle distortion issue, it remains essential to calibrate the input duty cycle distortion to ensure that QEC produces minimally mismatched quadrature phases.

#### 3.3.2 Jitter contribution

The primary source of jitter in the proposed QCG system comes from device noise, which accumulates over successive stages. The slopes of the voltage waveforms determine the clock sensitivity to noise, making it essential to maximize rising and falling slopes to minimize jitter injection.

In the DCDL design, capacitor banks are distributed among multiple inverter stages to prevent slow ramps at any single stage. In PI design, integrating currents are tuned across frequencies to enhance slopes while maintaining transistors in the saturation region. The PI current tuning can be potentially integrated with DCDL calibration.

To evaluate the QCG's jitter performance, Figure (3.24a) presents phase noise plots for each stage in parallel. The input reference clock's phase noise is derived from measured data of an external signal generator. Additionally, Figure (3.24b) summarizes the calculated additive jitter contributions. The QCG introduces a total additive jitter of 42.39 fs<sub>rms</sub>, with each stage's contribution minimized through careful design. However, the DCC exhibits a slightly higher jitter due to additional noise sources from the pull-up/pull-down currents.



|                                      | Reference<br>/Input | DCC   | DCDL  | PI-QEC | Total |
|--------------------------------------|---------------------|-------|-------|--------|-------|
| Output Jitter (fs <sub>rms</sub> )   | 41.43               | 49.74 | 54.39 | 59.28  | 59.28 |
| Additive Jitter (fs <sub>rms</sub> ) | N/A                 | 27.53 | 22.01 | 23.58  | 42.39 |
|                                      |                     | (b)   |       |        |       |

Figure 3.24: Jitter contribution of each stage. (a) Simulated output phase noise of each stage. (b) Calculated jitter contribution.

## 3.4 Measurement results

The proposed QCG is fabricated in 28-nm CMOS process with a core area of 0.0121 mm2, including the DCC, DCDL, QEC, error detection circuits, comparators and FSM. The chip micrograph and measured power breakdown are shown in Figure (3.25). The tested chip covers a frequency range from 5 to 10 GHz and consumes 10.2-mW power at 10-GHz operation with a 0.9-V supply voltage.





(b) Figure 3.25: Fabricated QCG prototype. (a) Chip micrograph. (b) Power breakdown.

Figure (3.26) presents the measurement setup. A signal generator (R&S SMF100A) provides a input clock which is splitted by a power divider into two signals, one serving as the input clock of the DUT, the other serve as a reference clock to trigger the oscilloscope. A high-bandwidth real-time oscilloscope (Keysight DSAV334A) capture the waveforms of reference clock and output clock for measuring the quadrature phase error. A specturm analyzer (R&S FSW67) is utilized to plot the phase noise of the reference and output clock. An off-chip SPI module (SUB-20), controlled by the laptop, is connected to DUT through PCB to set the chip configuration.



Figure 3.26: Testing setup. (a) Block diagram. (b) Photo of the testing environment.

An on-chip multiplexer selects between the I and Q clocks. Since the I and Q clock experience the same skew introduced by the multiplexer, buffers, bonding wires and off-chip interconnects, the quadrature phase shift can be measured by subtracting I-to-reference delay from Q-to-reference delay. Figure (3.27) shows the measurement results of phase error. The proposed QCG achieves a quadrature error of  $\leq 0.8^{\circ}$  from 5 to 10 GHz.



Figure 3.27: Measurement of quadrature error. (a) Measurement at 10-GHz operation. (b) Measured quadrature errors across different frequencies.

Figure (3.28) plots the measured phase noise of reference and output clock at 10GHz. The integrated jitter (10k-1GHz) of reference clock is 41.36fsrms. The jitter of output I and Q clocks show a slight difference (I: 59.56fsrms, Q: 61.09fsrms), resulting from the different PI inputs.



Figure 3.28: Measured phase noise of reference clock and output clocks.

Table (3.1) summarizes and compares the performance of the proposed QCG and the recently published work on multi-phase generators. The proposed open-loop QCG achieves a competitive performance in jitter, phase error and power efficiency.

Table 3.1: Comparison with recently published QCG

|                                        | This Work             | ISSCC'22<br>[21] | ISSCC'21<br>[20] | ISSCC'18<br>[17] | ESSCIRC'16<br>[16]    | CICC'11<br>[13]       |
|----------------------------------------|-----------------------|------------------|------------------|------------------|-----------------------|-----------------------|
| Technology                             | 28-nm<br>CMOS         | 65-nm<br>CMOS    | 65-nm<br>CMOS    | 7-nm<br>FinFET   | 28-nm<br>CMOS         | 65-nm<br>CMOS         |
| Architecture                           | Open loop<br>using PI | DLL              | DLL+IL-<br>QLL   | IL-QLL           | Open loop<br>using PI | Open loop<br>using PI |
| Number of Phases                       | 4                     | 4                | 8                | 8                | 4                     | 8                     |
| Frequency (GHz)                        | 5-10                  | 3.5-11           | 5-8              | 4-16             | 1-2.6                 | 8-12                  |
| Power (mW)                             | 10.2<br>@10 GHz       | 7.8<br>@7 GHz    | 15.6<br>@7 GHz   | 10<br>@16 GHz    | 4.4<br>@2 GHz         | 14.8<br>@10 GHz       |
| Power/Frequency (mW/GHz)               | 1.02                  | 1.11             | 2.23             | 0.63             | 2.2                   | 1.48                  |
| Integrated Jitter (fs <sub>rms</sub> ) | 61.1                  | 48.1             | 65.2             | 80               | 37.6                  | 470                   |
| Integration Band (Hz)                  | 10k-1G                | 10k-1G           | 10k-1G           | 100k-1G          | 10k-100M              | N/A                   |
| Phase Error (°)                        | ≤0.8                  | ≤0.9             | ≤0.5             | ≤1               | ≤5                    | ≤3.1                  |
| Active Area (um²)                      | 12100                 | 12000            | 21000            | N/A              | 3000                  | 1500                  |
| Supply Voltage (V)                     | 0.9                   | 1.2              | 1.2              | 1.2/0.88         | 1.1                   | 1.1                   |

# 3.5 Conclusion of this chapter

In this chapter, we proposed an novel approach of QCG with an open-loop QEC for wireline application, which is composed of a DCDL and a 2-stage PI-based QEC with digital automatic calibration. The inverter-based DCDL and buffers minimize the device jitter contribution. The high-linearity integrating-mode PI circuitry and 2-stage correction structure enhance the QEC performance. The digital automatic calibration for DCC and DCDL is performed with a pattern-detecting disabling strategy, eliminating spurs from the output clocks. The proposed QCG generates quadrature clocks with high phase accuracy and low jitter while consuming only a small portion of the system power.

#### **CHAPTER 4**

#### CONCLUSION AND FUTURE WORK

#### 4.1 Conclusion

The exponential growth of AI-driven applications and high-speed communication systems has intensified the demand for precise, low-jitter clock generation in wireline transceivers. This thesis addresses the critical challenge of generating multi-phase clocks with stringent phase accuracy and jitter requirements, proposing a novel open-loop QCG architecture.

The key contribution of this work is the design and implementation of a 5–10 GHz quadrature clock generator with digital automatic calibration. The proposed architecture combines a DCDL and a two-stage PI-based QEC to achieve 0.8° phase accuracy across the operating frequency range. By integrating DCC and a pattern-detecting digital calibration strategy, the design mitigates deterministic jitter sources such as duty cycle distortion and quadrature phase error. Fabricated in a 28-nm CMOS process, the QCG occupies 0.012 mm² of core area and consumes 10.2 mW at 10 GHz, demonstrating superior power efficiency compared to prior works. Measured results confirm 61.1 fs RMS jitter (integrated from 10 kHz–1 GHz), meeting the stringent requirements of modern wireline systems.

The success of this work lies in its hybrid approach, balancing open-loop simplicity with digital calibration robustness. The inverter-based DCDL minimizes jitter injection, while the integrating-mode PI ensures high linearity and phase accuracy. This architecture provides a scalable solution for high-speed wireline transceivers, particularly in AI-driven data centers and high-bandwidth memory applications.

#### 4.2 Future works

### 4.2.1 Potential improvements

In Subsection 3.3.1, we discussed several factors that can degrade QCG performance. DCDL and DCC calibration can sufficiently mitigate C2C AM-PM distortion and duty cycle distortion. Additionally, the issues of non-constant integrating currents and P/N current mismatch can be

addressed with dedicated circuits.

The non-constant integrating current issue can be resolved by properly calibrating the preset integrating current to limit the voltage swing on the load capacitor. At the same time, the current should be maximized, as a low current results in a slow waveform slope (making it noise-sensitive) and a small voltage swing (leading to AM-PM-induced phase errors). A suitable current value that restricts the voltage swing to approximately  $0.2 \times V_{DD}$  to  $0.8 \times V_{DD}$  provides a good trade-off. Note that the voltage swing is proportional to  $I_U$  and inversely proportional to frequency, implying that for a given swing, the optimal  $I_U$  scales with frequency—exhibiting the same behavior as the optimal DCDL value. Consequently,  $I_U$  calibration can be combined with DCDL calibration.

As discussed in Subsection 3.3.1, in addition to duty cycle distortion, P/N current mismatch induces DC voltage drift. An analog loop can be implemented to detect this drift and adjust the PMOS or NMOS gate voltage accordingly. A feasible solution, illustrated in Figure (4.1), employs an RC low-pass filter to extract the DC voltage, which is then compared to a threshold (set to  $V_{DD}/2$ ) using an operational amplifier to modulate the NMOS gate voltage.



Figure 4.1: Analog feedback loop detecting DC voltage drifting.

## 4.2.2 Complete clocking implementation

The current design focuses on standalone QCG performance. Future work should integrate the proposed QCG into a full clocking subsystem, including global clock distribution networks and PLLs. This would validate its compatibility with system-level timing constraints and enable end-to-end jitter analysis in multi-lane environments.

## 4.2.3 Integration with a multi-lane transceiver system

To evaluate practical efficacy, the QCG should be embedded within a multi-lane wireline transceiver prototype. Testing under real-world conditions—such as crosstalk, channel loss, and power supply noise—would provide insights into its robustness and guide refinements for industrial adoption.

#### REFERENCES

- [1] J. Q. Wang et al., "7.1 a 2.69pj/b 212gb/s dsp-based pam-4 transceiver for optical direct-detect application in 5nm finfet," in 2024 IEEE International Solid-State Circuits Conference (ISSCC), vol. 67, 2024, pp. 123–125. DOI: 10.1109/ISSCC49657.2024.10454275 (cit. on pp. 1, 4).
- [2] Samsung, "Samsung Announces Availability of Its Leading-Edge 2.5D Integration 'H-Cube' Solution for High Performance Applications," *Samsung Newsroom*, 2021 (cit. on p. 2).
- [3] Dong-Myung Choi et al., "A 4.6pJ/b 64Gb/s Transceiver Enabling PCIe 6.0 and CXL 3.0 in Intel 3 CMOS Technology," en, 2024 (cit. on p. 3).
- [4] Yu Zhao et al., "Phase Noise Integration Limits for Jitter Calculation," en, in 2022 IEEE International Symposium on Circuits and Systems (ISCAS), Austin, TX, USA: IEEE, 2022, pp. 1005–1008. DOI: 10.1109/ISCAS48785.2022.9937231 (cit. on p. 9).
- [5] Agilent Technologies, *Jitter Analysis: The dual-Dirac Model, RJ/DJ, and Q-Scale*, 2004 (cit. on pp. 9, 10).
- [6] Mike Li, Deterministic Jitter (DJ) Definition and Measurement Methods: An Old Problem Revisited, en, 2009 (cit. on p. 10).
- [7] Zhaowen Wang, "Efficient and High-Performance Clocking Circuits for High-Speed Data Links," en, Ph.D. dissertation, 2022 (cit. on p. 15).
- [8] Jahnavi Sharma et al., "Silicon Photonic Microring-Based 4 × 112 Gb/s WDM Transmitter With Photocurrent-Based Thermal Control in 28-nm CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 4, pp. 1187–1198, 2022. DOI: 10.1109/JSSC.2021.3134221 (cit. on pp. 15, 17).
- [9] K. Yamguchi et al., "2.5 GHz 4-phase clock generator with scalable and no feedback loop architecture," en, in 2001 IEEE International Solid-State Circuits Conference. Digest of Technical Papers. ISSCC (Cat. No.01CH37177), San Francisco, CA, USA: IEEE, 2001, pp. 398–399. DOI: 10.1109/ISSCC.2001.912691 (cit. on p. 16).
- [10] K. Yamaguchi et al., "A 2.5-GHz four-phase clock generator with scalable no-feedback-loop architecture," en, *IEEE Journal of Solid-State Circuits*, vol. 36, no. 11, pp. 1666–1672, 2001. DOI: 10.1109/4.962286 (cit. on p. 16).
- [11] P. Kinget et al., "An injection-locking scheme for precision quadrature generation," en, *IEEE Journal of Solid-State Circuits*, vol. 37, no. 7, pp. 845–851, 2002. DOI: 10.1109/JSSC.2002. 1015681 (cit. on pp. 16, 18).
- [12] Kyu-hyoun Kim et al., "A 2.6mW 370MHz-to-2.5GHz Open-Loop Quadrature Clock Generator," en, in 2008 IEEE International Solid-State Circuits Conference Digest of Technical Papers, San Francisco, CA, USA: IEEE, 2008, pp. 458–627. DOI: 10.1109/ISSCC.2008.4523255 (cit. on pp. 16, 18).
- [13] Xiaochen Yang et al., "An open-loop 10GHz 8-phase clock generator in 65nm CMOS," en, in 2011 IEEE Custom Integrated Circuits Conference (CICC), San Jose, CA, USA: IEEE, 2011, pp. 1–4. DOI: 10.1109/CICC.2011.6055348 (cit. on pp. 16, 19).

- [14] Mayank Raj et al., "22.3 A 4-to-11GHz injection-locked quarter-rate clocking for an adaptive 153fJ/b optical receiver in 28nm FDSOI CMOS," en, in 2015 IEEE International Solid-State Circuits Conference (ISSCC) Digest of Technical Papers, San Francisco, CA, USA: IEEE, 2015, pp. 1–3. DOI: 10.1109/ISSCC.2015.7063097 (cit. on pp. 16, 18).
- [15] Mayank Raj et al., "A Wideband Injection Locked Quadrature Clock Generation and Distribution Technique for an Energy-Proportional 16–32 Gb/s Optical Receiver in 28 nm FDSOI CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 51, no. 10, pp. 2446–2462, 2016. DOI: 10.1109/JSSC.2016.2584643 (cit. on pp. 16, 18).
- [16] Michael Kalcher et al., "Self-aligned open-loop local quadrature phase generator," en, in *ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference*, Lausanne, Switzerland: IEEE, 2016, pp. 351–354. DOI: 10.1109/ESSCIRC.2016.7598314 (cit. on pp. 16, 19).
- [17] Stanley Chen et al., "A 4-to-16ghz inverter-based injection-locked quadrature clock generator with phase interpolators for multi-standard i/os in 7nm finfet," in 2018 IEEE International Solid-State Circuits Conference (ISSCC), 2018, pp. 390–392. DOI: 10.1109/ISSCC.2018.8310348 (cit. on pp. 16, 18).
- [18] Wei-Chih Chen et al., "A 4-to-18GHz Active Poly Phase Filter Quadrature Clock Generator with Phase Error Correction in 5nm CMOS," en, in 2020 IEEE Symposium on VLSI Circuits, Honolulu, HI, USA: IEEE, 2020, pp. 1–2. DOI: 10.1109/VLSICircuits18222.2020.9162794 (cit. on pp. 16, 28).
- [19] Michael Kalcher et al., "1–3-GHz Self-Aligned Open-Loop Local Quadrature Phase Generator With Phase Error Below 0.4°," en, *IEEE Transactions on Microwave Theory and Techniques*, vol. 68, no. 8, pp. 3510–3518, 2020. DOI: 10.1109/TMTT.2020.3001651 (cit. on p. 16).
- [20] Zhaowen Wang et al., "11.4 A High-Accuracy Multi-Phase Injection-Locked 8-Phase 7GHz Clock Generator in 65nm with 7b Phase Interpolators for High-Speed Data Links," en, in 2021 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2021, pp. 186–188. DOI: 10.1109/ISSCC42613.2021.9365800 (cit. on pp. 16, 18, 28).
- [21] Zhaowen Wang et al., "A 65nm CMOS, 3.5-to-11GHz, Less-Than-1.45LSB-INL pp , 7b Twin Phase Interpolator with a Wideband, Low-Noise Delta Quadrature Delay-Locked Loop for High-Speed Data Links," en, in *2022 IEEE International Solid- State Circuits Conference (ISSCC)*, San Francisco, CA, USA: IEEE, 2022, pp. 292–294. DOI: 10.1109/ISSCC42614.2022.9731649 (cit. on p. 16).
- [22] Zhaowen Wang et al., "Multi-Phase Clock Generation for Phase Interpolation With a Multi-Phase, Injection-Locked Ring Oscillator and a Quadrature DLL," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 6, pp. 1776–1787, 2022. DOI: 10.1109/JSSC.2021.3124486 (cit. on pp. 16, 18).
- [23] Zhaowen Wang et al., "A Very High Linearity Twin Phase Interpolator With a Low-Noise and Wideband Delta Quadrature DLL for High-Speed Data Link Clocking," en, *IEEE Journal of Solid-State Circuits*, vol. 58, no. 4, pp. 1172–1184, 2023. DOI: 10.1109/JSSC.2022.3197061 (cit. on p. 16).
- [24] Qixuan Luo et al., "A 4-to-8GHz Multi-Phase Injection-Locked Quadrature Clock Generator in 65nm CMOS," en, in 2024 IEEE MTT-S International Wireless Symposium (IWS), Beijing, China: IEEE, 2024, pp. 1–3. DOI: 10.1109/IWS61525.2024.10713532 (cit. on pp. 16, 18).
- [25] Amit Kumar Mishra et al., "A 9b-Linear 14GHz Integrating-Mode Phase Interpolator in 5nm Fin-FET Process," en, in 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2022, pp. 1–3. DOI: 10.1109/ISSCC42614.2022.9731703 (cit. on pp. 16, 28, 29).

- [26] Amit Kumar Mishra et al., "Improving Linearity in CMOS Phase Interpolators," en, *IEEE Journal of Solid-State Circuits*, vol. 58, no. 6, pp. 1623–1635, 2023. DOI: 10.1109/JSSC.2023. 3243305 (cit. on pp. 16, 29, 39).
- [27] James Bailey et al., "A 112-Gb/s PAM-4 Low-Power Nine-Tap Sliding-Block DFE in a 7-nm FinFET Wireline Receiver," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 1, pp. 32–43, 2022. DOI: 10.1109/JSSC.2021.3109167 (cit. on p. 17).
- [28] Yikun Chang et al., "An 80-Gb/s 44-mW Wireline PAM4 Transmitter," *IEEE Journal of Solid-State Circuits*, vol. 53, no. 8, pp. 2214–2226, 2018, Conference Name: IEEE Journal of Solid-State Circuits. DOI: 10.1109/JSSC.2018.2831226 (cit. on p. 17).
- [29] Eric Groen et al., "10-to-112-Gb/s DSP-DAC-Based Transmitter in 7-nm FinFET With Flex Clocking Architecture," en, *IEEE Journal of Solid-State Circuits*, vol. 56, no. 1, pp. 30–42, 2021. DOI: 10.1109/JSSC.2020.3036981 (cit. on p. 17).
- [30] Kai Sheng et al., "A 4.6pJ/b 200Gb/s Analog DP-QPSK Coherent Optical Receiver in 28nm CMOS," en, in 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2022, pp. 282–284. DOI: 10.1109/ISSCC42614.2022.9731797 (cit. on p. 17).
- [31] Ahmad Khairi et al., "A 1.41-pJ/b 224-Gb/s PAM4 6-bit ADC-Based SerDes Receiver With Hybrid AFE Capable of Supporting Long Reach Channels," en, *IEEE Journal of Solid-State Circuits*, vol. 58, no. 1, pp. 8–18, 2023. DOI: 10.1109/JSSC.2022.3211475 (cit. on p. 17).
- [32] Z. Guo et al., "A 112.5Gb/s ADC-DSP-Based PAM-4 Long-Reach Transceiver with >50dB Channel Loss in 5nm FinFET," en, in 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2022, pp. 116–118. DOI: 10.1109/ISSCC42614.2022. 9731650 (cit. on p. 17).
- [33] Namik Kocaman et al., "An 182mW 1-60Gb/s Configurable PAM-4/NRZ Transceiver for Large Scale ASIC Integration in 7nm FinFET Technology," en, in 2022 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2022, pp. 120–122. DOI: 10.1109/ISSCC42614.2022.9731688 (cit. on p. 17).
- [34] Bo Zhang et al., "6.1 A 112Gb/s Serial Link Transceiver With 3-tap FFE and 18-tap DFE Receiver for up to 43dB Insertion Loss Channel in 7nm FinFET Technology," en, in 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2023, pp. 5–7. DOI: 10.1109/ISSCC42615.2023.10067657 (cit. on p. 17).
- [35] Kihwan Seong et al., "A 4nm 32Gb/s 8Tb/s/mm Die-to-Die Chiplet Using NRZ Single-Ended Transceiver With Equalization Schemes And Training Techniques," en, in 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2023, pp. 114–116. DOI: 10.1109/ISSCC42615.2023.10067477 (cit. on p. 17).
- [36] Jeonghyu Yang et al., "6.8 A 100Gb/s 1.6V<sub>ppd</sub> PAM-8 Transmitter with High-Swing \$\mathbf{3+1}\$\$ Hybrid FFE Taps in 40nm," en, in 2023 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2023, pp. 122–124. DOI: 10.1109/ISSCC42615. 2023.10067452 (cit. on p. 17).
- [37] Guansheng Li et al., "18.1 A 600Gb/s DP-QAM64 Coherent Optical Transceiver Frontend with 4x105GS/s 8b ADC/DAC in 16nm CMOS," en, in 2024 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2024, pp. 338–340. DOI: 10.1109/ISSCC49657.2024.10454499 (cit. on p. 17).
- [38] Marco Cusmai et al., "7.2 A 224Gb/s sub pJ/b PAM-4 and PAM-6 DAC-Based Transmitter in 3nm FinFET," en, in 2024 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2024, pp. 126–128. DOI: 10.1109/ISSCC49657.2024.10454558 (cit. on p. 17).

- [39] Liping Zhong et al., "7.6 A 112Gb/s/pin Single-Ended Crosstalk-Cancellation Transceiver with 31dB Loss Compensation in 28nm CMOS," en, in 2024 IEEE International Solid-State Circuits Conference (ISSCC), San Francisco, CA, USA: IEEE, 2024, pp. 134–136. DOI: 10.1109/ISSCC49657.2024.10454508 (cit. on p. 17).
- [40] Zeynep Toprak-Deniz et al., "A 128-Gb/s 1.3-pJ/b PAM-4 Transmitter With Reconfigurable 3-Tap FFE in 14-nm CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 55, no. 1, pp. 19–26, 2020. DOI: 10.1109/JSSC.2019.2939081 (cit. on p. 17).
- [41] Pen-Jui Peng et al., "A 112-Gb/s PAM-4 Voltage-Mode Transmitter With Four-Tap Two-Step FFE and Automatic Phase Alignment Techniques in 40-nm CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 56, no. 7, pp. 2123–2131, 2021. DOI: 10.1109/JSSC.2020.3038818 (cit. on p. 18).
- [42] Pen-Jui Peng et al., "A 56-Gb/s PAM-4 Transmitter/Receiver Chipset With Nonlinear FFE for VCSEL-Based Optical Links in 40-nm CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 10, pp. 3025–3035, 2022. DOI: 10.1109/JSSC.2022.3192711 (cit. on p. 18).
- [43] Hyosup Won et al., "A 0.87 W Transceiver IC for 100 Gigabit Ethernet in 40 nm CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 50, no. 2, pp. 399–413, 2015. DOI: 10.1109/JSSC. 2014.2369494 (cit. on p. 18).
- [44] Zhao Zhang et al., "A 32-Gb/s 0.46-pJ/bit PAM4 CDR Using a Quarter-Rate Linear Phase Detector and a Self-Biased PLL-Based Multiphase Clock Generator," en, *IEEE Journal of Solid-State Circuits*, vol. 55, no. 10, pp. 2734–2746, 2020. DOI: 10.1109/JSSC.2020.3005780 (cit. on p. 18).
- [45] Haidang Lin et al., "ADC-DSP-Based 10-to-112-Gb/s Multi-Standard Receiver in 7-nm FinFET," en, *IEEE Journal of Solid-State Circuits*, vol. 56, no. 4, pp. 1265–1277, 2021. DOI: 10.1109/JSSC.2021.3051109 (cit. on p. 18).
- [46] Zhongkai Wang et al., "An Output Bandwidth Optimized 200-Gb/s PAM-4 100-Gb/s NRZ Transmitter With 5-Tap FFE in 28-nm CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 1, pp. 21–31, 2022. DOI: 10.1109/JSSC.2021.3109562 (cit. on p. 18).
- [47] Jay Im et al., "A 112-Gb/s PAM-4 Long-Reach Wireline Transceiver Using a 36-Way Time-Interleaved SAR ADC and Inverter-Based RX Analog Front-End in 7-nm FinFET," en, *IEEE Journal of Solid-State Circuits*, vol. 56, no. 1, pp. 7–18, 2021. DOI: 10.1109/JSSC.2020. 3024261 (cit. on p. 18).
- [48] Jihwan Kim et al., "A 112 Gb/s PAM-4 56 Gb/s NRZ Reconfigurable Transmitter With Three-Tap FFE in 10-nm FinFET," *IEEE Journal of Solid-State Circuits*, vol. 54, no. 1, pp. 29–42, 2019, Conference Name: IEEE Journal of Solid-State Circuits. DOI: 10.1109/JSSC.2018.2874040 (cit. on p. 18).
- [49] Jihwan Kim et al., "A 224-Gb/s DAC-Based PAM-4 Quarter-Rate Transmitter With 8-Tap FFE in 10-nm FinFET," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 1, pp. 6–20, 2022. DOI: 10.1109/JSSC.2021.3108969 (cit. on p. 18).
- [50] Hao Li et al., "A 100-Gb/s PAM-4 Optical Receiver With 2-Tap FFE and 2-Tap Direct-Feedback DFE in 28-nm CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 1, pp. 44–53, 2022. DOI: 10.1109/JSSC.2021.3110088 (cit. on p. 18).
- [51] Chi Fung Poon et al., "A 1.24-pJ/b 112-Gb/s (870 Gb/s/Mm) Transceiver for In-Package Links in 7-nm FinFET," en, *IEEE Journal of Solid-State Circuits*, vol. 57, no. 4, pp. 1199–1210, 2022. DOI: 10.1109/JSSC.2022.3141802 (cit. on p. 18).
- [52] Dirk Pfaff et al., "A 224 Gb/s 3 pJ/bit 40 dB Insertion Loss Transceiver in 3-nm FinFET CMOS," en, *IEEE Journal of Solid-State Circuits*, vol. 60, no. 1, pp. 9–22, 2025. DOI: 10.1109/JSSC. 2024.3466092 (cit. on p. 18).

- [53] Yudong Zhang et al., "Analysis of Injection-Locked Ring Oscillators for Quadrature Clock Generation in Wireline or Optical Transceivers," en, *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 69, no. 8, pp. 3074–3082, 2022. DOI: 10.1109/TCSI.2022.3175111 (cit. on p. 18).
- [54] Yoel Krupnik et al., "112-Gb/s PAM4 ADC-Based SERDES Receiver With Resonant AFE for Long-Reach Channels," en, *IEEE Journal of Solid-State Circuits*, vol. 55, no. 4, pp. 1077–1085, 2020. DOI: 10.1109/JSSC.2019.2959511 (cit. on p. 18).
- [55] B. Razavi et al., "Design techniques for high-speed, high-resolution comparators," en, *IEEE Journal of Solid-State Circuits*, vol. 27, no. 12, pp. 1916–1926, 1992. DOI: 10.1109/4.173122 (cit. on p. 35).
- [56] Behzad Razavi, "The strongarm latch [a circuit for all seasons]," *IEEE Solid-State Circuits Magazine*, vol. 7, no. 2, pp. 12–17, 2015. DOI: 10.1109/MSSC.2015.2418155 (cit. on p. 35).
- [57] Achim Graupner, "A methodology for the offset-simulation of comparators," *The Designer's Guide Community*, vol. 1, pp. 1–7, 2006 (cit. on p. 36).

## **APPENDIX A**

## LIST OF PUBLICATIONS

### **Journal Publications**

[1] Shaokang Zhao, Li Wang, and C. Patrick Yue, "A 5–10-GHz Quadrature Clock Generator With Open-Loop Quadrature Error Correction in 28-nm CMOS," en, *IEEE Solid-State Circuits Letters*, vol. 8, pp. 149–152, 2025. DOI: 10.1109/LSSC.2025.3568061.

## **Conference Publications**

[1] Shaokang Zhao, Li Wang, and C. Patrick Yue, "Design of A 5–10 GHz Open-Loop Quadrature Clock Generator for High-Speed Wireline Systems," in 2025 IEEE 23rd Interregional NEWCAS Conference, 2025.