Explained: Generative Artificial Intelligence | MIT News

Explained: Generative Artificial Intelligence |  MIT News

A quick scan of the headlines makes it seem like generative AI is everywhere these days. In fact, some of these headlines may have already been written by generative AI, like OpenAI’s ChatGPT, a chatbot that has shown an uncanny ability to produce text that appears to have been written by a human.

But what do people really mean when they say “generative AI”?

Before the generative AI boom of the past few years, when people talked about AI, they were usually talking about machine learning models that could learn how to make predictions based on data. For example, such models are trained, using millions of examples, to predict whether a particular X-ray shows signs of a tumor or whether a particular borrower is likely to default on a loan.

Generative AI can be considered a machine learning model that is trained to generate new data, rather than to make predictions on a specific data set. A generative AI system is a system that learns how to create more objects that are similar to the data it was trained on.

“When it comes to the actual machines behind generative AI and other types of AI, the distinctions can be a bit blurry,” says Philip Isola, associate professor of electrical engineering and computer science at MIT, and a member of the Computer Science and Artificial Intelligence Laboratory (CSAIL). ): “Often, the same algorithms can be used for both.”

Despite the hype surrounding the release of ChatGPT and its counterparts, the technology itself is not entirely new. These powerful machine learning models are based on research and computational advances dating back more than 50 years.

Increase in complexity

An early example of generative AI is a much simpler model known as a Markov chain. This technique is named after Andrei Markov, a Russian mathematician who introduced this statistical method in 1906 to model the behavior of random processes. In machine learning, Markov models have long been used in next-word prediction tasks, such as the autocomplete function in an email program.

In text prediction, a Markov model generates the next word in a sentence by looking at the previous word or a few previous words. But because these simple models can only look back so far, they’re not good at creating a plausible text, says Tommy Jaakkola, the Thomas Siebel Professor of Electrical Engineering and Computer Science at MIT, who is also a member of CSAIL and the Institute of Electrical Engineering and Computer Science. Computer. Data, Systems and Society (IDSS).

“We were generating things before the last decade, but the key difference here is in terms of the complexity of the things we can generate and the scale at which we can train these models,” he explains.

Just a few years ago, researchers tended to focus on finding a machine learning algorithm that made the best use of a specific data set. But this focus has changed somewhat, and many researchers now use larger datasets, perhaps containing hundreds of millions or even billions of data points, to train models that can produce impressive results.

The basic models underlying ChatGPT and similar systems work in the same way as a Markov model. But one big difference is that ChatGPT is much larger and more complex, with billions of parameters. It was trained on a huge amount of data – in this case, a lot of publicly available text on the Internet.

In this huge body of text, words and sentences appear in sequences with certain dependencies. This repetition helps the model understand how to break up text into statistical chunks that have some predictability. It learns the patterns of these blocks of text and uses this knowledge to suggest what might come next.

More robust builds

While larger data sets are one of the catalysts that have led to the boom in generative AI, a variety of key research developments have also led to more complex deep learning architectures.

In 2014, researchers at the University of Montreal proposed a machine learning architecture known as a generative adversarial network (GAN). GANs use two models that work in tandem: one learns how to generate a target output (such as an image) and the other learns to distinguish between real data and generator output. The generator tries to fool the discriminator, and in the process learns how to provide more realistic output. The StyleGAN image generator is based on these types of models.

Diffusion models were introduced a year later by researchers at Stanford University and the University of California, Berkeley. By iteratively improving their output, these models learn how to create new data samples that resemble the samples in the training dataset, and have been used to create realistic-looking images. The publishing model is at the heart of the text-to-image system, Stable Diffusion.

In 2017, researchers at Google introduced the Transformer architecture, which has been used to develop large language models, such as the one that powers ChatGPT. In natural language processing, a transformer encodes each word in a corpus of text as a token and then creates an attention map, which captures the relationships of each token with all other tokens. This attention map helps the converter understand the context when it creates new text.

These are just a few of the many approaches that can be used for generative AI.

A set of applications

What all of these methods have in common is that they transform inputs into a set of tokens, which are digital representations of pieces of data. As long as your data can be converted to this standard token format, in theory, you can apply these methods to create new data that looks the same.

“Your mileage may vary, depending on how noisy your data is and how difficult it is to extract the signal, but it really comes close to the way a general-purpose CPU could take any type of data and start processing it into a unified unit.” says Isola.

This opens up a wide range of applications for generative AI.

For example, Isola’s group uses generative AI to create synthetic image data that can be used to train another intelligent system, such as teaching a computer vision model how to recognize objects.

Jakola’s group uses generative AI to design new protein structures or valid crystal structures that define new materials. He explains that in the same way that the generative model learns language dependencies, if crystal structures are shown instead, it can learn the relationships that make the structures stable and realizable.

But although generative models can achieve amazing results, they are not the best choice for all types of data. For tasks that involve making predictions on structured data, such as tabular data in a spreadsheet, generative AI models tend to outperform traditional machine learning methods, says Devavrat Shah, the Andrew and Erna Viterbi Professor of Electrical Engineering and Computer Science at MIT. He is a member of IDSS and the Information and Decision Systems Laboratory.

“Its highest value, in my opinion, is to become this amazing interface for human-friendly machines. Previously, humans had to talk to machines in machine language to make things happen. Now, this interface has figured out how to talk to both humans and machines,” says Shah. .

Raising red flags

AI-driven generative chatbots are now being used in call centers to answer questions from human customers, but this application highlights a potential red flag for implementing these models – worker displacement.

Additionally, generative AI can inherit and propagate biases found in training data, or amplify hate speech and false statements. Models have the potential to plagiarize, and can create content that appears as if it was produced by a specific human creator, raising potential copyright issues.

On the other hand, Shah suggests that generative AI can empower artists, who can use generative tools to help them create creative content that they may not have the means to produce.

In the future, he believes that generative AI will change economics in many disciplines.

One promising future direction Isola sees for generative AI is its use in manufacturing. Instead of the model making a picture of the chair, perhaps a plan could be created for a chair that could be produced.

He also sees future uses for generative AI systems in developing smarter AI agents in general.

“There are differences in how these models work and how we think the human brain works, but I think there are also similarities. We have the ability to think and dream in our heads, and come up with interesting ideas or plans, and I think generative AI is one of the Tools that will enable customers to do this as well.”

    (Tags for translation)Tommy Jaakkola

You may also like...

Leave a Reply

Your email address will not be published. Required fields are marked *