The Best Large Language Models (LLMs) in 2024
Table of contents
Large language models (LLMs) are the main kind of text-handling AIs, and they're becoming ubiquitous. ChatGPT is by far the most famous tool that uses an LLM—it's powered by a specially tuned version of OpenAI's GPT3 models. However, there are many other chatbots and text generators built on top of LLMs, including Google Gemini, Anthropic's Claude.
LLMs have been evolving in research labs since the late 2010s, but after the release of ChatGPT (which showcased the power of GPT), they've transitioned from the lab to the real world.
Some LLMs have been in development for years, while others have been rapidly created to catch the latest hype. Additionally, there are open research tools. The first generations of large multimodal models (LMMs), capable of handling various input and output modalities like images, audio, and video, as well as text, are also starting to become widely available, complicating the landscape further. Here, I will break down some of the most important LLMs currently available.
The Best LLMs in 2024
There are dozens of major LLMs and hundreds that are significant for various reasons. Listing them all would be nearly impossible and would quickly become outdated due to the rapid development of LLMs. (I'm updating this list a few months after its first publication, and there are already new versions of multiple models and at least one new addition.)
Take the word "best" with a grain of salt here: I've tried to narrow things down by offering a list of the most significant, interesting, and popular LLMs (and LMMs), not necessarily the ones that outperform on benchmarks (though most of these do). I've also mostly focused on LLMs that you can use, rather than ones that are the subjects of interesting research papers, to keep things practical.
One last note before diving in: many AI-powered apps don't disclose the LLMs they rely on. Some we can infer from their marketing materials, but for many, we simply don't know. That's why you'll see "Undisclosed" in the table below—it means we don't know of any major apps that use the LLM, though some might.
What is an LLM?
An LLM, or large language model, is a general-purpose AI text generator. It's what's behind all AI chatbots and AI writing generators.
LLMs are supercharged auto-complete systems. They take a prompt and generate an answer using a string of plausible follow-on text. The chatbots built on top of LLMs don't look for keywords to answer with a canned response; instead, they try to understand what's being asked and reply appropriately.
This is why LLMs have become so popular: the same models (with or without extra training) can be used to respond to customer queries, write marketing materials, summarize meeting notes, and more.
However, LLMs can only work with text, which is why LMMs are emerging—they can incorporate images, handwritten notes, audio, video, and more. While not as widely available as LLMs, LMMs have the potential to offer much more real-world functionality.
How Do LLMs Work?
Early LLMs, like GPT-1, would generate nonsense after a few sentences, but today's LLMs, like GPT-4, can generate thousands of coherent words.
LLMs are trained on huge corpuses of data. While specifics vary between LLMs depending on how developers acquire the rights to their training materials, generally, they are trained on something akin to the entire public internet and every major book ever published at a minimum. This extensive training enables LLMs to generate text that sounds authoritative on a wide variety of subjects.
From this data, LLMs model the relationships between words (or fractions of words called tokens) using high-dimensional vectors. Every token gets a unique ID, and similar concepts are grouped together, forming a neural network—a multi-layered algorithm based on how the human brain works—at the core of every LLM.
The neural network has an input layer, an output layer, and multiple hidden layers with numerous nodes. These nodes compute what words should follow from the input. Different nodes have different weights. For example, if the input string contains "Apple," the neural network must decide whether to follow with "Mac" or "iPad," "pie" or "crumble," or something else entirely. The number of parameters in an LLM indicates the number of layers and nodes in the neural network, generally meaning more nodes allow the model to understand and generate more complex text.
LMMs are even more complex as they incorporate data from additional modalities but are typically trained and structured similarly.
An AI model trained on the open internet with little direction could be a nightmare and not very useful, so LLMs undergo further training and fine-tuning to guide them toward generating safe and useful responses. One major method is adjusting the weights of different nodes, among other techniques.
While LLMs are black boxes, their workings aren't magic. Understanding a bit about how they work reveals why they're so good at answering certain questions and why they sometimes make up (or hallucinate) random things.
For example, consider these questions:
What bones does the femur connect to?
What currency does the USA use?
What is the tallest mountain in the world?
LLMs handle these easily because the text they were trained on likely generated a neural network predisposed to correct responses.
However, questions like these are more challenging:
What year did Margot Robbie win an Oscar for Barbie?
What weighs more, a ton of feathers or a ton of bricks?
Why did China join the European Union?
LLMs might generate odd answers to these because the questions are tricky or incorrect.
What Can LLMs Be Used For?
LLMs are powerful because they can be generalized to many different tasks. The same core LLM, sometimes with a bit of fine-tuning, can perform various tasks. While all tasks involve generating text, how they're prompted changes the perceived features.
Common uses for LLMs include:
General-purpose chatbots (like ChatGPT and Google Gemini)
Customer service chatbots trained on your business's documents and data
Translating text between languages
Converting text into computer code, or one language into another
Generating social media posts, blog posts, and other marketing copy
Sentiment analysis
Moderating content
Correcting and editing writing
Data analysis
And many other tasks. We're in the early days of the current AI revolution.
However, there are tasks LLMs can't do that other AI models can. Examples include:
Interpreting images
Generating images
Converting files between different formats
Searching the web
Performing math and other logical operations
Some LLMs and chatbots seem to do these tasks, but often another AI service steps in to assist, or an LMM is being used.
With this context, let's move on to the LLMs themselves.
The Best LLMs in 2024
GPT
Developer: OpenAI
Parameters: More than 175 billion (likely trillions)
Access: API
OpenAI's Generative Pre-trained Transformer (GPT) models kickstarted the latest AI hype cycle. There are four main models currently available: GPT-3.5-turbo, GPT-4, and GPT-4 Turbo. A new multimodal version called GPT-4o is also available. All versions of GPT are general-purpose AI models with an API, used by a diverse range of companies—including Microsoft, Duolingo, Stripe, Descript, Dropbox, and Zapier—to power countless tools. Still, ChatGPT is probably the most popular demo of its powers.
You can also [connect Zapier to GPT](You can also connect Zapier to GPT or ChatGPT, enabling you to use GPT directly from other apps in your tech stack. Here’s more on how to automate ChatGPT, along with some pre-made workflows to get started.
Gemini
Developer: Google
Parameters: Nano available in 1.8 billion and 3.25 billion versions; others unknown
Access: API
Google Gemini is a family of AI models from Google. The three models—Gemini Nano, Gemini Pro, and Gemini Ultra—are designed to operate on different devices, from smartphones to dedicated servers. While capable of generating text like an LLM, the Gemini models also handle images, audio, video, code, and other kinds of information.
Gemini Pro powers AI features throughout Google's apps, such as Docs and Gmail, as well as Google's chatbot, also called Gemini (formerly Bard). Gemini Pro 1.5 is available to developers through Google AI Studio or Vertex AI, and Gemini Nano and Ultra are expected later in 2024.
With Zapier's Google Vertex AI and Google AI Studio integrations, you can access Gemini from all your work apps. Here are a few examples to get you started.
Google Gemma
Developer: Google
Parameters: 2 billion and 7 billion
Access: Open
Google Gemma is a family of open AI models from Google based on the same research and technology used to develop Gemini. It’s available in two sizes: 2 billion parameters and 7 billion parameters.
Llama 3
Developer: Meta
Parameters: 8 billion, 70 billion, and 400 billion (unreleased)
Access: Open
Llama 3 is a family of open LLMs from Meta, the parent company of Facebook and Instagram. In addition to powering most AI features throughout Meta's apps, it’s one of the most popular and powerful open LLMs. You can download the source code from GitHub. As it’s free for research and commercial uses, many other LLMs use Llama 3 as a base.
There are 8 billion and 70 billion parameter versions available now, with a 400 billion parameter version still in training. Meta's previous model family, Llama 2, is still available in 7 billion, 13 billion, and 70 billion parameter versions.
Vicuna
Developer: LMSYS Org
Parameters: 7 billion, 13 billion, and 33 billion
Access: Open
Vicuna is an open chatbot built off Meta's Llama LLM. It’s widely used in AI research and as part of Chatbot Arena, a chatbot benchmark operated by LMSYS.
Claude 3
Developer: Anthropic
Parameters: Unknown
Access: API
Claude 3 is a major competitor to GPT. Its three models—Haiku, Sonnet, and Opus—are designed to be helpful, honest, harmless, and safe for enterprise use. As a result, companies like Slack, Notion, and Zoom have partnered with Anthropic.
Stable Beluga and StableLM 2
Developer: Stability AI
Parameters: 1.6 billion, 7 billion, 12 billion, 13 billion, and 70 billion
Access: Open
Stability AI, known for Stable Diffusion, has released several open LLMs based on Llama, including Stable Beluga and StableLM 2, though they’re less popular than their image generator.
Coral
Developer: Cohere
Parameters: Unknown
Access: API
Like Claude 3, Cohere's Coral LLM is designed for enterprise users. It similarly offers an API and allows organizations to train versions of its model on their own data, enabling accurate responses to specific queries from employees and customers.
Falcon
Developer: Technology Innovation Institute
Parameters: 1.3 billion, 7.5 billion, 40 billion, and 180 billion
Access: Open
Falcon is a family of open LLMs that have consistently performed well in AI benchmarks. It has models with up to 180 billion parameters and can outperform older models like GPT-3.5 in some tasks. It’s released under a permissive Apache 2.0 license, making it suitable for commercial and research use.
DBRX
Developer: Databricks and Mosaic
Parameters: 132 billion
Access: Open
Databricks' DBRX LLM, the successor to Mosaic's MPT-7B and MPT-30B LLMs, is one of the most powerful open LLMs. Interestingly, it’s not based on Meta's Llama model, unlike many other open models.
DBRX surpasses or equals previous-generation closed LLMs like GPT-3.5 on most benchmarks while being available under an open license.
Mixtral 8x7B and 8x22B
Developer: Mistral
Parameters: 45 billion and 141 billion
Access: Open
Mistral's Mixtral 8x7B and 8x22B models use a series of sub-systems to efficiently outperform larger models. Despite having fewer parameters, they can beat models like Llama 2 and GPT-3.5 in some benchmarks. They’re released under an Apache 2.0 license.
Mistral has also released a direct GPT competitor called Mistral Large, available through cloud computing platforms.
XGen-7B
Developer: Salesforce
Parameters: 7 billion
Access: Open
Salesforce's XGen-7B isn’t especially powerful or popular but highlights how many large tech companies have AI and machine learning departments capable of developing and launching their own LLMs.
Grok
Developer: xAI
Parameters: Unknown
Access: Chatbot and open
Grok, an AI model and chatbot trained on data from X (formerly Twitter), doesn’t merit a place on this list based on performance. However, it’s developed by xAI, the AI company founded by Elon Musk, and gets media coverage, making it worth knowing about.
Why Are There So Many LLMs?
Until recently, LLMs were limited to research labs and tech demos. Now, they power countless apps and chatbots, with hundreds of models available for use. How did we get here?
Several factors are at play:
OpenAI's GPT-3 and ChatGPT demonstrated AI research’s practical potential, prompting other companies to follow suit.
LLMs require significant computing power to train but can be developed in weeks or months.
Many open-source models can be retrained or adapted into new models without developing an entirely new one.
The influx of investment in AI companies incentivizes skilled developers to create new LLMs.
What to Expect from LLMs in the Future
We can expect to see many more LLMs in the near future, especially from major tech companies. Amazon, IBM, Intel, and NVIDIA all have LLMs under development, in testing, or available for customers. While they may not generate as much buzz as the models listed above and regular users might not interact with them directly, it's reasonable to anticipate that large enterprises will start deploying them widely, both internally and for customer support.
Additionally, more efficient LLMs tailored to run on smartphones and other lightweight devices will likely become more common. Google has already hinted at this with Gemini Nano, which runs some features on the Google Pixel Pro 8. Developments like Mistral's Mixtral 8x22B showcase techniques enabling smaller LLMs to compete efficiently with larger ones.
The next big advancement will likely be large multimodal models (LMMs), which combine text generation with other modalities, like images and audio. These models allow users to ask a chatbot about the contents of an image or receive responses in audio format. GPT-4o and Google's Gemini models are among the first LMMs expected to be widely deployed, but more are undoubtedly on the horizon.
In summary, the landscape of LLMs is rapidly evolving, with new models and capabilities continually emerging. As AI technology advances, we can anticipate more specialized and efficient LLMs and LMMs that will significantly enhance various applications and industries.
Sure, here is the information formatted into a table:
Model | Developer | Parameters | Access | Notes |
GPT | OpenAI | More than 175 billion (likely trillions) | API | OpenAI's GPT models include GPT-3.5-turbo, GPT-4, GPT-4 Turbo, and GPT-4o. |
Gemini | Nano: 1.8 billion and 3.25 billion | API | Includes Gemini Nano, Gemini Pro, and Gemini Ultra. | |
Google Gemma | 2 billion and 7 billion | Open | Open family of AI models. | |
Llama 3 | Meta | 8 billion, 70 billion, and 400 billion | Open | Includes 8 billion and 70 billion parameter versions; 400 billion parameter version in training. |
Vicuna | LMSYS Org | 7 billion, 13 billion, and 33 billion | Open | Built on Meta's Llama LLM. |
Claude 3 | Anthropic | Unknown | API | Designed for enterprise use with models: Haiku, Sonnet, and Opus. |
Stable Beluga and StableLM 2 | Stability AI | 1.6 billion, 7 billion, 12 billion, 13 billion, and 70 billion | Open | Released several open LLMs based on Llama. |
Coral | Cohere | Unknown | API | Designed for enterprise users. |
Falcon | Technology Innovation Institute | 1.3 billion, 7.5 billion, 40 billion, and 180 billion | Open | Released under an Apache 2.0 license. |
DBRX | Databricks and Mosaic | 132 billion | Open | Successor to Mosaic's MPT-7B and MPT-30B LLMs. |
Mixtral 8x7B and 8x22B | Mistral | 45 billion and 141 billion | Open | Efficiently outperform larger models using sub-systems. |
XGen-7B | Salesforce | 7 billion | Open | Highlights Salesforce's AI capabilities. |
Grok | xAI | Unknown | Chatbot and open | Developed by Elon Musk's AI company. |