Skip to main content

Multi-Fidelity Training: The Key to Affordable and Accurate AI Models

Student with brown curly hair & glasses smiling. Robert Seif, MS. student in the Department of Engineering

Artificial Intelligence seems like it’s everywhere these days. From writing emails to stopping shoplifters it is prevalent in almost every one of our daily lives, and many of us don’t even know it. Even as I write this article, a small AI agent acts to ensure that I spell the word ‘discombobulated’ without getting the letters…well, discombobulated. My research is focused on computer vision systems, the ones you see guiding autonomous cars or recognizing your face to log onto your phone or computer. But how do we train these AI models? Data, and lots of it.  

Let’s use the example of teaching a toddler to explain training an AI. If you were to train a toddler what a cat looks like, you’d most likely print out a picture of one and show it to them. Imagine this cat in the photo has a long tail, four legs, and a nice black coat of fur. The issue is, when our toddler sees a dog later that day with all of those features our trainee might get it confused with one of its feline friends. That’s why we’re going to train our student with more data, a lot more data. Pictures of cats of all sizes and colors; big ones, small ones, hairless ones, maybe even a few with funny sweaters expressing their disdain for Mondays. Only after all this training can we be confident that our cat-identifying professional is fit for the role. It’s the same with AI-based image recognition systems. The COCO dataset, one of the most widely used trainers, comes with over 140,000 images that in total train a single AI to recognize only 80 types of objects. That’s a lot of data for engineers to gather for their next computer vision project.  

So, what if our entire data source wasn’t all real photos of our target? If we trained our toddler on a few cartoon kitties it would still get the point across, right? Simulated data is significantly easier, cheaper and faster to maneuver and manipulate for AI models and will have a big role in training them in our future. This is the essence of multi-fidelity training and testing. 

Multi-fidelity. It’s the term the industry uses when AI trainers combine both high fidelity (and high quality) with low fidelity (and low cost!) data to bring a new AI into creation. The question is, how much low fidelity data can we supplement into high-fidelity training sets before our identification abilities greatly suffer? In my research with the Design Engineering Lab at Purdue (DELP) we explored this issue in an effort to ensure safety in the next generation of autonomous vehicles being released worldwide.  

The cost of controlling classified data is estimated to be over $50 billion a year for just the US government alone. Meanwhile, the cost of finding a clip from almost any highway in the world is having to watch a 15-second ad on YouTube. This is what makes cutting a significant percentage of potentially controlled (and costly) data from my training set for only a small decrease in accuracy the right decision. The simulated data is significantly cheaper and faster to acquire than the alternatives, and with the excess in funds the lost 4% can be overcome elsewhere. This is the value of these multi-fidelity datasets.  

AI has become an indispensable tool for a variety of industries, from education to automotive to defense. The key to these models is data, more than we could ever fathom. These datasets need to be robust and obtainable for us to complete before the next tech craze takes over. My research, in collaboration with the DELP, shows how multi-fidelity training can revolutionize the most critical of tasks. Embracing this multi-fidelity method will allow us to significantly reduce the costs of training without sacrificing the end accuracy and safety of the project. Applications for this approach are limitless, and as the technology advances multi-fidelity training will undoubtedly play a pivotal role in shaping the future of AI. 

About the Author: 

Robert Seif was a master’s student in the Department of Mechanical Engineering, obtaining his undergraduate in Mechanical Engineering at Purdue as well. His research on creating methods to safely test and evaluate ground-breaking AI systems to ensure best use won ASME's Best Paper award for AI/ML approaches in 2024. Outside of his work with Purdue University, he is a co-founder and advisor to both AI and non-AI startups, and currently lives in the San Francisco Bay Area looking for his next big project. 

Engineering

April 08, 2024

More InnovetED Articles