Re-Architecting SerDes

2022-05-28 08:43:00 By : Mr. shaoyong zhang

As implementations evolve to stay relevant, a new technology threatens to overtake SerDes.

Serializer/Deserializer (SerDes) circuits have been helping semiconductors move data around for years, but new process technologies are forcing it to adapt and change in unexpected ways.

Traditionally implemented as an analog circuit, SerDes technology has been difficult to scale, while low voltages, variation, and noise are making it more difficult to yield sufficiently. So to remain relevant, it has been architecturally transformed into a complex mixed-signal circuit, which increasingly relies on digital signal processing to deal with imperfections in the semiconductor and in the channel.

Advanced packaging is placing new demands on the SerDes, while also providing new opportunities when heterogenous dies are involved. Now, the SerDes design can be decoupled from the core design, allowing the most optimal choice of process technology for each. But advanced packaging also has created a completely new demand for communication between dies. The jury is still out as to whether this should be a parallel or a serial communications channel, or even if electrical communication has a long-term role to play.

The one thing that remains constant is that the drive to move data around is not slowing. “We are seeing a huge demand for high speed data transfer,” says Greg Curtis, senior product manager for the analog Fast-SPICE product line at Mentor, a Siemens Business. “There are upward of 2 billion photos uploaded every day. I have seen that video is about 60% of the total downstream traffic, especially as people are working from home. And then you’ve got the push towards autonomous vehicles. All that data has to be transferred from the application to something that can process the data. And that has to go through a transmitter and receiver of the SerDes design. That pipe is becoming the bottleneck for transmitting all that data, requiring higher bandwidth.”

Before getting into some of the tradeoffs between monolithic integration and chiplets, it is beneficial to look at the architectural change that is happening in the SerDes circuit. “Until a few years ago, SerDes were relatively straightforward,” says Jeff Galloway, principal and co-founder of Silicon Creations. “They have now evolved into a high-end, complex PAM4 system. PCIe rev 5 and below run at up to 32 gigabits per second and are two-level SerDes, and advanced processes don’t really help these much. Beyond 32 gigabits per second, most SerDes are PAM4. That distinction makes a lot of difference in the architecture.”

A traditional SerDes is shown in figure 1. “Previous generation SerDes used to be analog, where you have continuous time linear equalization (CTLE) circuitry, which amplifies and partially equalizes the signal,” explains Priyank Shukla, product marketing manager for High Speed SerDes IPs at Synopsys. “This is followed by a comparator that makes 1-bit decisions and a decision feedback equalizer (DFE). The clock and data recovery (CDR) was also mostly implemented in analog.”

Fig. 1: Traditional analog implementation of a SerDes. Source: Synopsys

The problem is that in the latest nodes, analog experiences a lot more variables than in the past. “Digital design is more predictable than analog,” says Mentor’s Curtis. “Design teams are going to try to move as much as they can to the digital side, but there is still some functionality that cannot be translated.”

From 16nm onward, and at speeds greater than 56Gbps, the architecture shown in figure 2 is more likely to be used.

Fig. 2: Mixed-signal SerDes block diagram. Source: Synopsys

“The SerDes receiver essentially undoes the channel impairment,” says Synopsys’ Shukla. “Much of this can now be done digitally. The receivers just have an analog-to-digital converter (ADC). This makes n-bit decisions and can be time-interleaved to achieve higher data rates. After that you have digital samples and can use a DSP to do the processing, which scales well with the technology. This includes a feed-forward equalizer (FFE).”

The designs are well-proven. “Our 56G and 112G transceivers have moved to an ADC/DSP-based receive equalization strategy,” says Martin Gilpatric, technical marketing manager for Xilinx. “That took a lot of what would typically be an analog circuit and made it into digital logic. With the move to PAM4 at these highest rates, where margin is super thin, we’re able to work around all of the problems and have a very strong digital receiver.”

The choice of architectures and a plethora of process nodes actively being used means it is good times for SerDes IP providers. “There’s more demand than the industry can fulfill,” says Silicon Creations’ Galloway. “For example, TSMC is adding 22nm variants and low power variants. Some of the older technologies and some of the later PCIe standards weren’t mature when 40nm was developed or 28nm was developed. PCIe 5.0 is coming and we are well past 16nm. There are a lot of design starts on the older technologies, so basically there is a need to backfill various standards on the older nodes.”

New challenges The newer nodes are presenting challenges. “The underlying transistors keep getting smaller and lower power, but interconnect keeps getting worse,” says Galloway. “You have complicated layout effects with interconnect resistance and capacitance. They have the effect of limiting the speed and causing extra power dissipation because of the extra parasitics.”

Mentor’s Curtis puts that into numbers. “One of our customers mentioned that the increase in interconnect resistance, when moving from 40nm to 5nm, has risen by more than seven times. That’s becoming limiting on the performance of the wire — more than actually a design.”

But that is not the only place where the numbers are not in your favor. “The number of GDS layers is increasing tremendously,” adds Curtis. “This has increased by 9X when going from 180nm down to 5nm. The impact of that is design rule checking (DRC) runtime. To go from 180nm down to 16nm finFET, it takes about 10X longer to run a DRC check. And then you go down from 16 to 5, it is another 10X.”

Another factor is noise. “Noise associated with the advanced process nodes is becoming a significant challenge,” says Shukla. “Noise is a difficult issue to tackle in an ADC. The architecture of choice is successive approximation register (SAR) type of ADC. It’s a modular approach. You time interleave a lot of slices of this ADC. There are challenges to align different slices of the SAR, but this can be compensated for in digital. So whatever challenges the analog throws up, we have some way to compensate. That’s where a lot of innovation is happening.”

People are creative. “We know that there are clever circuit design techniques that can be used in analog design, and specifically SerDes, to continue to support advanced technologies without compromising performance,” says Ashraf Takla, CEO for Mixel. “For example, stacking of thin oxide transistors while using I/O voltages is a way to continue to design higher-performance SerDes IP in advanced technologies.”

New transistors could present new challenges. “At the latest nodes, if transistor technology switches to Gate-All-Around (GAA), it will be impossible to integrate SerDes in an way that makes sense from an economic perspective,” says Andy Heinig, group leader for advanced system integration and department head for efficient electronics in Fraunhofer IIS’ Engineering of Adaptive Systems Division. “The necessary SerDes area will be growing or stuck, but with the higher costs for a GAA transistor. A GAA only makes sense if the scaling is used. From our perspective, it makes sense to integrate SerDes in such systems on another chip, in a specialized technology, and combine it with the GAA chip in an advanced packaging technology.”

The case for monolithic integration A new decision facing design teams is whether they should integrate everything onto a single monolithic die, or if a multi-chip heterogenous solution would provide benefits. The dynamic between these two choices is changing almost daily. “Multi-chip packaging is very expensive today,” says Geoff Tate, CEO of Flex Logix. “Until it has been cost-reduced further, cost-sensitive applications will continue to favor a monolithic die. It is certainly cheaper, even if the SerDes may not be optimal.”

Many of the benefits of going to a new node are related to the PPA gains. “If the design is pure analog, then going from 28nm to 16nm to 7nm will not see that big a saving on power unless the supply voltage changes,” says Wendy Wu, director of product marketing at Cadence. “With a DSP-based design, while there is still pretty a big chunk of analog circuitry, 40% to 50% of the circuits are digital. So we can benefit from area and power when going down to a smaller node. If we remained 100% analog, that motivation would be weaker, because you would not see much power or area benefits.”

Sometimes there are specific reasons why a monolithic integration is the only acceptable solution. “We specifically chose a monolithic solution because that’s what gave the best results in terms of lowest latency, managing power consumption, and thermal dissipation,” said Manuel Uhm, director of silicon marketing for Xilinx. “We pioneered chiplets years ago, like having high-bandwidth memory coupled to the FPGA die or having multiple FPGA die coupled together. All those options are on the table for us, but we have definitely not moved away from having SerDes integrated onto the die.

The case for chiplets Some companies need to find additional methods to remain competitive. “People in the high-performance computing (HPC) segment continue to drive for higher performance,” says Cadence’s Wu. “They used to rely on the process shrink, but people are really struggling today and are forced to be more creative to get to the next level of performance. Foundries are coming out with new processes every year, and they have to reiterate their design more quickly in order to keep up with the latest process. SerDes is mixed-signal. With digital designs you can just move the same design from 7nm to 5nm by resynthesizing it. If we need to port a SerDes mixed-signal design, then it’s a much longer process. One key motivation is to decouple the SerDes design cycle from the core design cycle by using a chiplet methodology.”

There can be other motivations, too. “In mobile, the more important concern is space,” says Shukla. “A chiplet approach allows them to vertically stack, and that way they can continue to integrate more functionality for the foreseeable future. For designs that already have an interposer, where power distribution is a concern, it offers cost advantage. So both adopters of SerDes have something to gain with these kinds of approaches, where you segregate the die.”

Even with the new SerDes architecture, scaling is slowing. “They are not scaling in area or power much anymore,” says Galloway. “Migrating from node to node doesn’t help traditional 32Gbps and below SerDes. For some of the advanced SerDes that are DSP-based, the scaling is helping to some degree, but it’s certainly not scaling as fast as the digital logic is scaling.”

Cost is a significant factor for many designs. “From our experience, 16/12nm is a process well-suited for analog design,” says Mixel’s Ashraf. “It has a much higher Ft/Fmax compared to 28nm, with more headroom than 5nm. We also account for the increased complexity of the design, the effort, and the tool cost going from 16/12nm to 7/6/5nm. When 16nm speed is not enough to achieve the target data rates, advanced nodes are needed. Alternatively, we could see more people adopt chiplets. Chiplets can be a great way to enable analog and SerDes blocks to use the most suitable and least-expensive process technology, while allowing the digital blocks to use more advanced nodes. Once the interconnect standardization puzzle is solved, we anticipate wide adoption of chiplets.”

Design effort can be a significant contributor to costs. “Digital scaling from 180nm down to say 5nm is greater than 1,000X, from a scaling perspective,” says Curtis. “From an analog scaling perspective, it’s more on the order of 10X. Analog is also a little bit harder to characterize compared to digital. I’m not saying you can’t characterize it. It’s just harder, and there are a lot more variables. That’s why there’s so much more time spent on PVT corner analysis.”

Breaking the dependence Splitting a design into multiple pieces has never been an issue. “Consider Intel and their Northbridge / Southbridge partitions,” says Shukla. “They have two different chips, which can progress with their individual cadence. Now the same logic is extended, and a lot of SerDes have gone to ‘Southbridge.’ It is now called peripheral controller hub (PCH). So the SerDes portion is pushed into another chip. And now that there is a separate dedicated chip for SerDes, you might want to use a cheaper process.”

Xilinx also has used that approach and sometimes switches between initial prototypes and production. “It boils down to concerns around whether or not the most cutting-edge technology is going to be mature enough to meet the requirements for the highest rate, and have enough margin at those high rates to be successful,” says Gilpatric. “When we were initially showing our Versal devices in test chip form, it was in 16nm. We built it in 16nm because that was the process that the main line products were currently in. As 7nm started coming online, we moved that test chip. There were changes that needed to be made, but we were able to effectively dial it in, regardless of that process, and saw very similar performance numbers between either of those processes.”

But having two dies creates a new problem. How do they communicate? There are two options — use a parallel interface or use a SerDes. “There is a shift back toward a more parallel-like interface, but the interface isn’t the parallel interface of the late 1990s or early 2000s,” says Galloway. “It is not the typical clock with data. Today’s parallel interfaces are a whole lot of very simple SerDes. There are typically hundreds of pins or thousands of pins available, versus a single pair or a handful of pairs. So there definitely is a shift back toward parallel, but it’s a whole bunch of parallel using very simple SerDes.”

However, they have to run on the very latest process nodes. “The SerDes must keep up with the core die process,” says Wu. “If the core die is 5nm and the I/O die is 16nm, you need a 5nm SerDes for the die-to-die. If we’re talking about HPC and data center applications, where the bandwidth between two dies needs to be hundreds of gigibits or even terabits, you will require high-speed SerDes. Because you are not going through those vias and bumps, and then packages, there are less discontinuities in the channels and less reflection.”

These SerDes, called short-reach SerDes, do have a simpler problem. “A typical off-chip SerDes can compensate up to 40dB of channel loss,” says Shukla. “If you have 1 volt in the transmitter, then the receiver would receive 10 millivolts, which is two orders of magnitude lower, because when you pass a voltage through this 40dB channel the output will be 100 times smaller. The receiver has to do this heavy lifting from 10 millivolts to receive the complete signal. A die-to-die link probably has 8 to 10dB channels.”

That is only part of the problem. “If we’re looking at consolidating and having an ecosystem that supports a chip to chiplet interconnect, we’re at the beginnings of that,” says Gilpatric. “It always starts with an electrical interconnect and then builds a protocol stack on top of that (as shown in figure 3). The OIF has already gone forward, and we have XSR. That is going to be kind of the first blush of having a serial 112 gig interconnect that is facilitating this style of interconnect. It’s very, very short-reach. Once we start seeing the electrical definitions for how these very, very short interconnects can work, with the technologies that can support them, then we can start consolidating on a common set of connections. Once that has happened for a number of different application areas, they can consolidate into what would be a real standard. I’m not aware of anything that really lands in that ballpark just yet, but we’re moving in that direction.”

Fig. 3: Standardization efforts for datalink and protocol. Source: Xilinx

Conclusion A lot of this is forward thinking. “Die-to-die interfaces are co-designed, typically by the same company today,” says Galloway. “They may even be different instances of the same design, so there’s less of a need for standardization, less of an interoperability issue there. We are in the early days, and there are no real standards in place to address what many customers are trying to do. That works okay for the customer, but has implications for IP availability.”

All the time, the need for speed, either within a package or external to the package, increases. “The obvious trend is co-package optics,” says Wu. “The intention is to replace long-reach SerDes with optics. Just look at the number of startups working on this. I don’t think 61Tb switches are going to adopt co-package optics for commercial production. There may be some prototypes, but it’s probably going be 100Tb. That is three years down the road. The alignment of the fiber is the biggest issue, and how to do that in production volume.”

But the industry is not ready to give up on copper until necessary. “Do you move to PAM8 electrical, or do you move to some sort of optical off-chip? It’s very much a full industry question in terms of how we’re going to consolidate on to specific technologies,” says Gilpatric.

Related SerDes Knowledge Center Top Stories, videos, special reports, white papers, and blogs about SerDes 112G SerDes Reliability How to ensure consistent performance in the real world. High-Speed SerDes At 7/5nm How to place macros inside a PHY in 7/5nm SoCs. High-Speed Signaling Drill-Down First of two parts: Different schemes emerge for moving signals down channels more quickly. What’s After PAM-4? Second of two parts: Parallel vs. serial options

Name* (Note: This name will be displayed publicly)

Email* (This will not be displayed publicly)

Suppliers are investing new 300mm capacity, but it’s probably not enough. And despite burgeoning 200mm demand, only Okmetic and new players in China are adding capacity.

Why UCIe is so important for heterogeneous integration.

Funding rolls in for photonics and batteries; 88 startups raise $1.3B.

Experts at the Table: Designing for context, and geopolitical impacts on a global supply chain.

Memory footprint, speed and density scaling are compounded by low-power constraints.

Disaggregation and the wind-down of Moore’s Law have changed everything.

It depends on whom you ask, but there are advantages to both.

Research shows significant improvement in time to market and optimization of key metrics.

Efficiency is improving significantly, but the amount of data is growing faster.

Some designs focus on power, while others focus on sustainable performance, cost, or flexibility. But choosing the best option for an application based on benchmarks is becoming more difficult.

Why UCIe is so important for heterogeneous integration.

Funding rolls in for photonics and batteries; 88 startups raise $1.3B.

The clock network is complex, critical to performance, but often it’s treated as an afterthought. Getting this wrong can ruin your chip.