News

Company News

Industry information

Contact Us

	86+131-2899-0370
	XILINX-CASI （WeChat）
	xilinx@casi-ic.com
	5406-5406B, 54th Floor, SEG Plaza, Huaqiangbei Street, Futian District, Shenzhen

You are here：Home >> News >> Industry information...

Industry information

2024. Chips, will they change?

Time:2024-06-18 Views:385

The chip industry is moving toward domain-specific computing, while artificial intelligence (AI) is moving in the opposite direction, a gap that could force major changes in future chip and system architectures.

Behind this split is the time it takes to design hardware and software. In the 18 months since ChatGPT launched globally, a large number of software startups have explored new architectures and technologies. Given the rapid pace of change in the tasks mapped to them, this trend is likely to continue. But it typically takes more than 18 months to produce a custom chip.

In the world of standards, where software doesn‘t change much over time, it‘s worthwhile to customize hardware to meet the exact needs of an application or workload, and that‘s about it. This is one of the main drivers behind RISC-V, where processor ISAs can be designed specifically for a given task. However, with the many variations of AI, the hardware may be obsolete by the time it is put into mass production. As a result, hardware optimized specifically for an application is unlikely to come to market fast enough for use unless the specification is continually updated.

As a result, there is an increased risk that a domain-specific AI chip will fail on its first run. Generative AI will continue to evolve while the problem is fixed.

But that doesn‘t mean the end of custom silicon. Data centers are deploying more and more processing architectures, each of which is better than a single general-purpose CPU for a given task," said Steve Roddy, chief marketing officer at Quadric. "As data center AI workloads proliferate, even the last bastion of general-purpose compute power will continue to grow as data center chips and systems are forced to adapt to the fast-moving landscape. last bastion of computing power has collapsed ."

But it does point to architectures that balance ultra-fast, low-power silicon with more general-purpose or smaller chips.

"In AI, there‘s a strong push to make things as general-purpose and programmable as possible, because no one knows when the next LLM thing is going to come along and revolutionize the way they do things," said Elad Alon, CEO of Blue Cheetah. "The more you get bogged down, the more likely you are to miss the trends. At the same time, it‘s clear that it‘s almost impossible to meet the computing power, and therefore the power and energy requirements, needed to use a fully general-purpose system. There is a strong demand to customize hardware to make it more efficient at the specific things known today."

The challenge is how to efficiently map software onto such heterogeneous processor arrays, a technology that the industry has not yet fully mastered. The more processor architectures that coexist, the more difficult the mapping problem becomes. "Modern chips have a GPU, a neural processing unit, and core processing," Frank Schirrmeister, vice president of solutions and business development at Arteris (who currently serves as executive director of strategic programs and system solutions at Synopsys), said in an interview. "You have at least three compute options, and you have to decide where to put things and set up the appropriate abstraction layers. We used to call that software co-design. When you port an algorithm or part of an algorithm to an NPU or GPU, you retool the software to move more of the software execution to a more efficient implementation. There‘s still a common component in computing that supports the different elements."

Chasing the leaders

AI has emerged thanks to the processing power of GPUs, and the functionality required for graphics processing is very close to that required for the core part of AI. In addition, the creation of a software toolchain that enables non-graphics functions to be mapped to the architecture makes NVIDIA GPUs the easiest processors to locate.

"When someone becomes a market leader, they may be the only competitor in the market and everyone tries to react to them," said Chris Mueth, Keysight‘s business manager for new opportunities. "But that doesn‘t mean it‘s the optimal architecture. We may not know that for a while. GPUs are suited to certain applications, such as performing repetitive math operations, and it‘s hard to beat them in that regard. If you optimize your software to work with GPUs, it can be very fast."

Becoming a universal gas pedal leader can bring resistance. If you‘re going to build a general-purpose gas pedal, you need to think about future-proofing it," said Russell Klein, senior director of integrated programs at Siemens EDA. When NVIDIA sits down to build a TPU, they have to make sure that it caters to as broad a market as possible, which means that anyone who conceives of a new neural network needs to be able to put it into this gas pedal and run it. If you‘re building something for an application, there‘s little need to think about future-proofing it. I might want to build a little bit of flexibility so that I have the ability to solve problems. But if it‘s just fixed to a specific implementation that does a job really well, someone will come up with a whole new algorithm in another 18 months. The good news is that I‘ll be ahead of everyone else, using my custom implementation, until they can catch up with their own. There‘s a limit to what we can do with off-the-shelf hardware."

But specificity can also be built in layers. "Part of the IP delivery is the hardware abstraction layer, which is exposed to software in a standardized way," Schirrmeister says. "Without middleware, the graphics core is useless. Application specificity moves up in the abstraction. If you look at CUDA, the NVIDIA core itself is fairly generic in its computational capabilities. cUDA is the abstraction layer, and then on top of that it has libraries for all sorts of things that are used in biology. That‘s great because application specificity rises to a much higher level."

These abstraction layers have been important in the past. according to Sharad Chole, chief scientist and co-founder of Expedera, "Arm integrated the software ecosystem on top of the application processor. Since then, heterogeneous computing has allowed everyone to build their own add-ons on top of that software stack. For example, Qualcomm‘s stack is completely independent of Apple‘s stack. If you extend it, there‘s an interface available for better performance or better power distribution. Then there‘s room for coprocessors. These coprocessors will allow you to differentiate more than just building with heterogeneous computing because you can add or remove it, or you can build a newer coprocessor without starting a new application process, which is much more expensive."

The economic factor is an important one. "The proliferation of fully programmable devices that accept C++ or other high-level languages, as well as function-specific GPUs, GPNPUs, and DSPs, has reduced the need for dedicated, fixed-function, and financially risky hardware acceleration modules in new designs," says Quadric‘s Roddy.

This is as much a technical issue as it is a business one, says Blue Cheetah‘s Alon: "Someone might say, I‘m going to do this very specific target application, in which case I know I‘m going to do a couple of the following things in the AI or other stacks, and then you just have to make them work." "If that market is big enough, then that could be an interesting option for a company. But for an AI gas pedal or AI chip startup, it‘s a trickier bet. If there isn‘t enough of a market to justify the entire investment, then you have to anticipate the capabilities needed for a market that doesn‘t yet exist. It‘s really a mix of what business model you‘re taking and what bets you‘re making, and therefore what technology strategy you can take to optimize it as much as possible."

The case for specialized hardware

Hardware implementations require choices, says Expedera‘s Chole: "If we could standardize neural networks and say that‘s all we‘re going to do, then you would still have to consider the number of parameters, the number of operations necessary, and the latency required. But that‘s never been the case, especially for AI. From the beginning, we started with 224 x 224 stamp images, then moved to HD, and now we‘re moving to 4k. the same is true for LLM. We started with 300 megabyte models (e.g. Bert), and now we‘re moving towards billions, billions, and even trillions of parameters. Initially we started with only language translation models (e.g., token prediction models). Now we have multimodal models that can support language, vision and audio at the same time. The workload is evolving and this is the game of chase that is happening.

There are many aspects of existing architectures that are questionable. "A key part of designing a good system is finding significant bottlenecks in system performance and finding ways to accelerate them," said Dave Fick, CEO and co-founder of Mythic." "Artificial intelligence is an exciting and far-reaching technology. However, it requires performance levels of trillions of operations per second and memory bandwidth that standard cache and DRAM architectures are completely unable to support. This combination of utility and challenge makes AI the preferred choice for specialized hardware units."

The insufficient number of general-purpose devices to meet demand may be the factor that forces the industry to start adopting more efficient hardware solutions. "The generative AI space is moving very fast," Chole said. "There is nothing out there that meets the hardware requirements in terms of cost and power. There‘s nothing. Not even enough GPUs are being shipped. There are orders, but not enough shipments. That‘s the problem everyone sees. There‘s not enough compute power to really support generative AI workloads."

Small chips may help alleviate this problem. "The coming tsunami of small chips will accelerate this shift in the data center," Roddy said. "The ability to mix and match fully programmable CPUs, GPUs, GPNPUs (general-purpose programmable NPUs), and other processing engines to accomplish specific tasks will first impact data centers as small-chip packages replace monolithic integrated circuits, then slowly radiate into higher-volume, more cost sensitive markets."

Multiple markets, multiple tradeoffs

While most of the attention is focused on the large data centers that train the new models, the ultimate benefit will go to the devices that use these models for inference. These devices can‘t afford the huge power budgets used for training. "The hardware used to train AI is somewhat standardized," says Marc Swinnen, director of product marketing at Ansys. "You buy NVIDIA chips and that‘s how you train AI. But once you‘ve built the model, how do you execute that model in the final application, perhaps at the edge. That‘s usually a chip that‘s tailored to a specific implementation of that AI algorithm. The only way to get high-speed, low-power AI models is to build custom chips for them. Artificial intelligence is going to be a huge driver of custom hardware that executes those models."

They‘re going to have to make a similar set of decisions. "Not every AI gas pedal is the same," says Mythic‘s Fick. "There are a lot of great ideas about how to address the memory and performance challenges that AI presents. In particular, there are new data types that go all the way down to 4-bit floating point or even 1-bit precision. Analog computation can be used to get very high memory bandwidth, which improves performance and energy efficiency. Others are considering streamlining neural networks down to the most critical bits to save memory and computation. All of these technologies will produce hardware that is strong in some areas and weak in others. This means greater hardware and software co-optimization and the need for an ecosystem with a variety of AI processing options."

This is where the interests of AI and RISC-V intersect, says Sigasi CEO Dieter Therssen: "They will dominate enough to drive new hardware architectures when it comes to software tasks such as LLM, but won‘t stop differentiation altogether, at least not in the short term." "Even RISC-V customization is based on the need to do some CNN or LLM processing. A key factor here is how AI is deployed. at the moment, there are too many ways to do this, so imaging convergence remains out of reach."

Conclusion

AI is new and evolving so quickly that no one can give a definitive answer. What is the best architecture for existing applications? Will future applications look similar enough that existing architectures just need to be extended? This may seem like a very naive prediction, but today it may be the best choice for many companies.

The GPU and the software abstractions built on top of it have enabled the rapid rise of AI. It provides an adequate framework for the scaling we‘ve seen, but that doesn‘t mean it‘s the most efficient platform. Model development has been forced to some extent to move in the direction of available hardware support, but as more architectures emerge, AI and model development may diverge based on available hardware resources and their need for power. Power is likely to be the factor that dominates both, as current projections are that AI will soon consume a significant portion of the world‘s power generation capacity. This cannot continue.

Previous： 2022/10/19

Next：Suspend the acquisition! Wild fall! 2024/02/28