Skip to main content

What is an AI token?

A presenter at Google IO shows information on a new AI project.

Google recently announced that Gemini 1.5 Pro would increase from a 1 million token context window to 2 million. That sounds impressive, but what in the world is a token anyways?

At its core, even chatbots need help processing the text they get so they can understand concepts and communicate with you in a human-like fashion. This is accomplished using a token system in the generative AI space that breaks down data so it is more easily digestible by AI models.

What is an AI token?

An infograph highlighting Gemini's 1 million token long context window capability.

An AI token is the smallest unit a word or phrase can be broken down into when being processed by a large language model (LLM). Tokens account for words, punctuation marks, or subwords, which allow models to efficiently analyze and interpret text and, subsequently, generate content in a similar unit-based fashion. This is similar to how a computer will convert data into Unicode zeros and ones for easier processing. Tokens allow a model to determine a pattern or relationship within words and phrases so they can predict future terms and respond in the context of your prompt.

When you input a prompt, the phrase and words are too long for a chatbot to interpret as is – they must be broken down into smaller pieces before the LLM can even process the request. They are converted into tokens, then the request is submitted and analyzed, and a response is returned to you.

The process of turning text into tokens is called tokenization. There are many tokenization methods, which can differ based on variants, including dictionary instructions, word combinations, language, etc. For example, the space-based tokenization method splits words up based on the spaces between them. The phrase “It’s raining outside” would be split into the tokens ‘It’s’, ‘raining’, ‘outside’.

How do AI tokens work?

The general token conversion breakdown followed in the generative AI space denotes that one token equals approximately four characters in English — or 3/4 of a word — and 100 tokens equals approximately 75 words. Other conversions suggest one to two sentences equals about 30 tokens, one paragraph equals about 100 tokens, and 1,500 words equals about 2,048 tokens.

Whether you’re a general user, a developer, or an enterprise, the AI program you’re using is employing tokens to perform its tasks. Once you begin paying for generative AI services, you’re paying for tokens to maintain the service at its optimum level.

Most generative AI brands also have basic rules around how tokens function on their AI models. Many companies have token limitations, which put a cap on the number of tokens that can be processed in one turn. If the request is larger than the token limit on an LLM, the tool won’t be able to complete a request in a single turn. For example, if you input a 10,000-word article for translation into a GPT with a 4,096-token limit, it won’t be able to process it fully to give a detailed answer because such a request would require at least 15,000 tokens.

However, companies have quickly been advancing the capabilities of their LLMs, adding to the token limitation with new versions. Google’s research-based BERT model had a maximum input length of 512 tokens. OpenAI’s GPT-3.5 LLM, which runs the free version of ChatGPT, has a max of 4,096 input tokens, while its GPT-4 LLM, which runs the paid version of ChatGPT, has a max of 32,768 input tokens. This equates to approximately 64,000 words or 50 pages of text.

Google’s Gemini 1.5 Pro which provides audio functionality to the brand’s AI Studio has a standard 128,000 token context window. The Claude 2.1 LLM has a limit of up to 200,000 context tokens. This equates to approximately 150,000 words or 500 pages of text.

What are the different types of AI tokens?

There are several types of tokens used in the generative AI space that allow LLMs to identify the smallest units available for analysis. Here are some of the main tokens that are of interest to an AI model.

  • Word Tokens are words that represent single units on their own, such as “bird,” “house,” or “television.”
  • Sub-word Tokens are words that can be truncated into smaller units, such as splitting Tuesday into “Tues” and “day.”
  • Punctuation Tokens take the place of punctuation marks, including commas (,), periods (.), and others.
  • Number Tokens take the place of numerical figures, including the number “10.”
    Special Tokens can note several unique instructions within executing queries and training data.

What are the benefits of tokens?

There are several benefits to tokens in the generative AI space. Primarily, they act as a connector between human language and computer language when working with LLMs and other AI processes. Tokens help models process large amounts of data at once, which is especially beneficial in enterprise spaces that use LLMs. Companies can work with token limits to optimize the performance of AI models. As future LLM versions are introduced, tokens will allow models to have a larger memory through higher limits or context windows.

Other benefits of tokens lie in the training aspects of LLMs. Since they are small units, they can be used to make it easier to optimize the speed of processing data. Due to the predictive nature of tokens, they have a greater understanding of concepts and improve sequences over time. Tokens assist in implementing multimodal aspects such as images, videos, and audio into LLMs alongside text-to-speech chatbots.

Tokens also have some data security and cost-efficiency benefits, due to their Unicode setup protecting vital data and truncating longer text into a simplified version.

Editors' Recommendations

Fionna Agomuoh
Fionna Agomuoh is a technology journalist with over a decade of experience writing about various consumer electronics topics…
Microsoft is already backing down on its most controversial AI feature
The new Surface Pro on a table.

Even before Copilot+ PCs have made it to store shelves, Microsoft is already making changes to its Recall feature. Recall is at the center of Copilot+, taking snapshots of everything you do on your PC and using a local AI model to sift through that information. In response to backlash, Microsoft is making changes to how Recall works, as announced through a Windows blog post.

For starters, Recall is now opt-in instead of opt-out. Previously, Recall would be the default setting on Copilot+ laptops, but Microsoft will now show a screen during the setup process that tells users what Recall does. If you skip past the screen, Recall will remain turned off.

Read more
DuckDuckGo’s new AI service keeps your chatbot conversations private

DuckDuckGo released its new AI Chat service on Thursday, enabling users to anonymously access popular chatbots like GPT-3.5 and Claude 3 Haiku without having to share their personal information as well as preventing the companies from training the AIs on their conversations. AI Chat essentially works by inserting itself between the user and the model, like a high-tech game of telephone.

From the AI Chat home screen, users can select which chat model they want to use -- Meta’s Llama 3 70B model and Mixtral 8x7B are available in addition to GPT-3.5 and Claude -- then begin conversing with it as they normally would. DuckDuckGo will connect to that chat model as an intermediary, substituting the user's IP address with one of their own. "This way it looks like the requests are coming from us and not you," the company wrote in a blog post.

Read more
Intel’s new AI image generation app is free and runs entirely on your PC
screenshot of AI Playground image creation screen showing more advanced ccontrols

Intel shared a sneak preview of its upcoming AI Playground app at Computex earlier this week, which offers yet another way to try AI image generation. The Windows application provides you with a new way to use generative AI a means to create and edit images, as well as chat with an AI agent, without the need for complex command line prompts, complicated scripts, or even a data connection.

The interesting bit is that everything runs locally on your PC, leveraging the parallel processing power of either an Intel Core Ultra processor with a built-in Intel Arc GPU or through a separate 8GB VRAM Arc Graphics card.

Read more