Fine-tuning LLMs

Fine-tuning LLM with instruction

limitation of In-context Learning

  • may not work for smaller models

  • Examples take up space in the context window

Using prompts to fine-tune LLMs with instructions

other tasks: summarize the following text, translate this sentence to...

LLM fine-tuning process

Fine-tuned LLM: instruct LLM

Fine-tuning with instruction prompts is the most common way to fine-tune LLMs these days. From this point on, when you hear or see the term fine-tuning, you can assume that it always means instruction fine tuning.

For a single task, often, only 500-1000 examples needed to fine-tune a single task

Limitation for fine-tuning on a single task

  • Catastrophic forgetting: Fine-tuning can significantly increase the performance of a model on a specific task, but can lead to reduction in ability on other tasks.

How to avoid catastrophic forgetting

  • First note that we might not have to

  • Fine-tune on multiple tasks at the same time

  • Consider Parameter Efficient Fine-Tuning (PEFT)

Parameters Efficient Fine-Tuning (PEFT)

PEFT trade-offs

  • Parameter Efficiency

  • Memory Efficiency

  • Model performance

  • Training speed

  • Inference Costs

PEFT methods (Scaling Down to Scale Up: A Guide to Parameter-Efficient Fine-Tuning)

  1. Selective: Select subset of initial LLM parameters to fine-tune

  2. Reparameterization: Reparameterize model weights using a low-rank representation

    1. LoRA

  3. Additive: Add trainable layers or parameters to model

    1. Adapters

    2. Soft Prompts: Prompt tuning

LoRA

LoRA represents large weight matrices as two smaller, rank decomposition matrices, and trains those instead of the full weights. The product of these smaller matrices is then added to the original weights for inference.

Prompt tuning

A soft prompt refers to a set of trainable tokens that are added to a prompt. Unlike the tokens that represent language, these tokens can take on any value within the embedding space. The token values may not be interpretable by humans, but are located in the embedding space close to words related to the language prompt or task to be completed.

Evaluation Metric

  • ROUGE

    • used for text summarization

  • BLEU SCORE

    • used for text translation

Reference

Multi-task, instruction fine-tuning

Model Evaluation Metrics

Parameter- efficient fine tuning (PEFT)

LoRA

Prompt tuning with soft prompts

Last updated