Training Large Language Models

December 8, 2023
Posted by: Aanchal Iyer
Category: Artificial Intelligence

Large Language Models (LLMs), such as Google’s PaLM or OpenAI’s GPT-4, have taken the world of Artificial Intelligence (AI) by storm. Yet, most companies cannot train these models and depend on a handful of large tech firms that are technology providers. When it comes to LLMs, there are multiple training mechanisms with different means, goals, and requirements. As they serve different purposes, it is essential not to confuse them with each other and to be aware of the different scenarios they apply to.

Why Train your LLMs?

One of the most common questions is “Why train your models?” There are many reasons why an organization decides to train its LLMs. The reasons vary from data security and privacy to better control over the updates and improvements. The following are the main benefits of training your LLMs:

Customization

Training a custom model allows for the personalization of the models for specific requirements, including platform-specific capabilities, terminology, and context that is well-covered in general-purpose models such as GPT-4 or code-specific models such as Codex. Such models are trained to do a better job with specific web-based languages including Typescript React (TSX), Javascript, and React (JSX).

Reduced Dependency

While we always use the model that is most appropriate for the task at hand, an organization cannot depend completely on only a handful of AI developers.

Cost Efficiency

Although costs continue to go down, LLMs are still expensive for use within the global developer community. With LLM training, custom models that are smaller and more efficient can be hosted with drastically reduced costs. Let us understand the different ways of training LLM models.

Different Ways to Train LLM Models

Pretraining

Pretraining is a basic way of training and means training from other Machine Learning (ML) domains. Here, you begin with an untrained model and train it to foresee the next token in a sequence of previous tokens. A large number of sentences are collected from various sources and fed to the model in small chunks.

Finetuning

Although a trained LLM is, can perform various tasks, there are two main drawbacks:

The structure of its output and the absence of knowledge that has not been encoded in the data in the first place.

An LLM always predicts the next tokens given a sequence of tokens before. For continuing a certain story that may be fine, but there are other scenarios where this may not be what you want. If you need a different output, there are two main ways to make that possible. Either write the prompts in a way that the model’s ability to predict the next tokens solves your task (prompt engineering). Or, change the last layer’s output to reflect the task as you would do in any other ML model. In finetuning, one has to take a trained model and continue with its training using different data. This requires only a fraction of the resources as compared to the initial training and can be done much faster.

RLHF Finetuning

A special case of finetuning the model is Reinforcement Learning from Human Feedback (RLHF). This is the main difference between a GPT model and a chatbot like Chat-GPT. With such finetuning, the model is trained toward generating outputs that humans find most useful during their conversation with the model. RLHF is often used to make LLM outputs more conversation-like or to avoid undesired behavior such as a model being mean, insulting, or intrusive.

Adapters

Using adapters means making use of additional layers to a trained model. During the finetuning, only those adapters are trained while the rest of the model’s parameters remain the same. However, those layers are much smaller than the layers the model comes with. And this makes it easier to tune them. Additionally, they can be inserted at different positions in the model, and not only at the very end.

Prompting

Prompting means creating instructions that precede the actual model’s input. Particularly, if you use few-shot-prompting, you offer examples to the LLM within the prompt, which is very similar to training. This also consists of examples that are presented to a model.

Prompt Tuning

A big advantage of prompt tuning is that you can train multiple prompts for different tasks and leverage them with the same model. Just as in hard prompting, where you can construct different prompts for text summarization, sentiment analysis, and text classification, but use all with the same model, you can tune three prompts and use the same model.

Wrapping Up

LLMs are an important tool for different natural language processing tasks. By understanding the general characteristics of a language, these models generate language-based datasets that can power different applications. Training LLMs empowers the models to adapt various parameters and enhance their performance. This enables it to understand the intricacies, nuances, and domain-specific patterns. These are necessary for creating precise and context-aware outputs.