Using AI In Chip Manufacturing
Coventor’s CTO drills down into predictive maintenance, the impact of variation, and what this means for yield and future technology.
David Fried, CTO at Coventor, a Lam Research Company, sat down with Semiconductor Engineering to talk about how AI and Big Data techniques will be used to improve yield and quality in chip manufacturing. What follows are excerpts of that conversation.
SE: We used to think about manufacturing data in terms of outliers, but as tolerances become tighter at each new node that data may need to be examined even within what is considered the normal range. What’s the impact of that on manufacturing?
Fried: When I started in CMOS at 200mm, there was some data on the tools in the fab, but by and large, we were losing it as soon as it was created. When we went to 300mm, we got better at putting sensors on tools, generating data, and in some cases looking at it.
SE: So you had a mass of data and no one really wanted to go through it if they didn’t have to, right?
Fried: Yes, and we graduated to the point where if [processed] lots come out of the fab and they don’t work, you go back and do standard characterization operations and figure out if it was ‘process 26’ that caused this failure. You find the process valve on process 26 is oddly out of position for every other lot you processed on that tool, so it is probably a throttle valve issue. Then you put in fault detection on that throttle valve signal to not let it happen again. But it’s a very reactive data flow. You already have the failure, and then you go out and find the cause. But if you think about the level of complexity we’re getting to in innovation and individual processes themselves, this reactive data flow will never catch up.
SE: So where do you go next?
Fried: The equipment has sensors that are analyzing data from operations of the tool and monitoring of the wafer process. For instance, sensors and data logs are picking up information about which wafer went to which chamber, where the robot arm is at any point in time, etc. All that data has to go into a system where it can be harvested and analyzed in real-time. And that’s just from one piece of equipment. In a fab, you have fleets of that kind of equipment, and then you have all sorts of other equipment and different processes. This is a massive big-data challenge, and what you really want to start doing is learning on that data.
SE: So you’re trying to map trends about what’s changing rather than fixing a single problem, right?
Fried: Yes, and being predictive. But you have to do this for the whole fab.
SE: How far forward in the manufacturing process do you have to go before you can understand there’s a problem at the beginning of the process?
Fried: That’s the art involved in the problem. You’re integrating additional data. We can harvest every bit that comes off every tool, but then you also want to integrate that with every piece of inline metrology, inline defect inspection classification, inline electrical test, all the way down to full functional test. So you have all these different data sources. The first class of problem is a big data problem—getting it all into a structure and format that’s usable because we are dealing with a massive amount of data from a massive number of sources with different formats. Solving the format problem doesn’t sound that difficult, but think about a temperature sensor in a deposition tool versus a slurry pH monitor in a slurry tank feeding a CMP tool. It’s a different type of data sampled in a different way using a different set of units. Just putting that into a format where you can operate on the data set is a massive big data problem. Let’s assume for a minute all that data is harvestable and accessible. Then you can start basic machine learning in real time, actively coupling electrical test data and metrology data back to things. Trends and patterns start emerging, and algorithms can be put in place to guard against or compensate for deviation.
SE: So actually applying machine learning to improve the manufacturing process?
Fried: Yes. Trends are emerging, and you fit an algorithm to those. There’s a big trend to move from machine learning to real AI. And assuming we can solve this big data challenge, the really exciting future is going from simple machine learning, which already is out there, to the bigger, broader, fully integrated data set and getting deeper into AI where you understand the objective of this factory line from a yield, throughput, device-performance perspective, and you can just learn all the right signals to tune in for the whole path.
SE: And that’s about fine-tuning this whole process beyond what has been done before, right? But you can’t do this in pieces. Every piece has to have context.
Fried: You can fine-tune pieces by looking at pieces of the data, and that has inherent value. We would not have gotten to 7nm without doing that kind of work. There are two challenges beyond that. One is putting the pieces of data together. People dismiss that aspect of it, but it’s incredibly complex. There are some really serious big data challenges in assembling all of this data. And then, the learning and the algorithms and the computational piece of optimization through that data, that’s where AI is starting to drive.
SE: Does the data you’re dealing with come through consistently, or is it ad hoc pieces in different formats?
Fried: It depends. You look at certain pieces of equipment in the fab and it retains data for maybe 7 days. There is certain data that goes up to the host, and then the host decides how long it wants to store it. There is certain data that stays on the tool, and the tool can decide how long it needs to keep that. And there is certain data you want to prioritize. You may keep some data for 30 days, some data for 7 days, and some you want to expunge immediately after process. Otherwise, we’d have data warehouses for each piece of equipment. But if you really want to get good at this, maybe we need to keep some data longer and warehouse it in a different way.
SE: How does variation fit in here? Does it alter the data?
Fried: There are two pieces to this. Variation is both a data source and a data influence. You’re detecting variation as you go with metrology. But as that variation hits the next tools, it may cause sensors to record data differently or actual processes to change. If I put a silicon wafer into a tool and run a process I will get different readings. If I put a different kind of wafer into that tool, I’m probably going to get different sensor readings. The question is whether incoming, inherent, fully processed wafer differences will result in different sensor data. As the sensors become more advanced and the signals from processing those wafers become more advanced, variation is going to control the nature of the data you’re getting.
SE: That completely changes the way we think about chip design through manufacturing.
Fried: We used to think about designing for nominal and containing the variation around it. You shoot for the bull’s-eye. You design everything around that. And you build enough margin into your design specification, your factory, to contain the expected variation. Given the complexity and the fact that the variation is now on the scale of the nominal dimensions in the first place, you can’t think that way anymore. Nobody is ever going to hit the bull’s-eye. What we found with some of the work we did years ago with modeling is that you have to design your technology, your integration, your process flow and process capabilities from the total allowable window of shippable goods. If you design your process for the window, you don’t care where the bull’s-eye is. And the really cool part is that if you design for the window, you can get a nice yielding process or flow where the nominal target that’s down the center doesn’t need to look good.
SE: We’re seeing that with a lot of designs, because the final design looks is not what anyone expected.
Fried: It’s a counterintuitive representation of perfection. If I design properly for the window, I don’t care what the center point is. If I find all possible variations, I can start tightening. But nominal becomes a concept that no longer matters.
SE: As we get down to the most advanced nodes, we’re starting to encounter noise that didn’t have to be accounted for in the past. It’s partly the dielectric, the distance between devices, the electrons can’t be contained.
Fried: That happens every generation. There’s new physics every generation, new mechanisms, new data and new issues we never thought we had to worry about. We never had to worry about gate leakage or poly depletion. But we got to the point where we needed to, we learned the effect, we built it into predictive models, and we built an industry around that technology. This is no different.
SE: We’re hearing for the first time in more than a decade about a memory bottleneck. Basic pieces are being rethought again for the first time in decades.
Fried: It goes back to the workload. There are newer and novel workloads that have an insatiable demand on memory. Those are very specific applications. There are applications still driving single-thread CPU performance and others driving GPU core parallelization. Depending on whether it’s cloud or IoT, you’re going to pick a different configuration for that workload. AI, machine learning, big data are the applications that are driving a lot of the data back and forth between memory and the processor. We’ve made massive progress in memory density, cost per bit and memory scaling. But now we need a pipe to get the data in and out. The demand for memory and the technology in memory outpace the infrastructure.
SE: The other piece is people can’t just send all of this data to the cloud because it’s too much data being generated.
Fried: There also are architectural decisions. When you think about IoT, if you’re going to put hundreds of sensors in each section of a city, you’re going to get tons of data. If a guy robs a bank and you’re going to follow him, one approach is to pump all of the HD video from each of the thousands of traffic signal cameras to some remote data center and you start processing that data as hard as you can. It’s a grossly inefficient allocation of resources. But if you put some low-level compute power in each one of those cameras, you can say, ‘He turned right, look at the camera one block down that way.’ So you put a whole bunch of compute power at the edge of the network and now you have a much more efficient system architecture.
SE: And that’s where the big change is, right? It’s the system architecture, so designs are no longer done in isolation. They’re done in context, but with an eye to how they will evolve over time.
Fried: With this concept of tracking a bank robber through a city, the decision of how high up you put the decision power and the compute power is really important.
SE: Five years ago, this was science fiction.
Fried: Yes, and now we’re making detailed decisions, such as the best bus structure, to actually implement these things. It’s not science fiction anymore.