Fine-Tune LLMs in 2025 with Hugging Face

Tutorial • Fine-Tuning

Open LLMs ko 2025 me Fine-Tune Kaise Karein 🚀

Dosto, aaj kal open-source LLMs jaise Llama 3, Mistral, aur Gemma ko fine-tune karna bahut aasan ho gaya hai. Is process se aap in powerful models ko apne specific tasks ke liye customize kar sakte hain. Is tutorial me hum seekhenge ki Hugging Face ki modern libraries (TRL, PEFT, bitsandbytes) ka use karke LLMs ko efficiently kaise fine-tune karein.

Step 1: Environment Setup 🛠️

Sabse pehle, humein zaroori libraries install karni hongi. Yeh libraries humein training, model optimization, aur dataset handling me madad karengi.

pip install -U transformers datasets accelerate peft trl bitsandbytes

Libraries ka matlab:

Step 2: Dataset Taiyar Karna 📚

Fine-tuning ke liye, aapka dataset ek specific format me hona chahiye. Chat models ke liye, "chat template" format best hai, jisme har example me roles (user, assistant) aur content hota hai.

Hum yahan ek sample dataset "mlabonne/guanaco-llama2-1k" use karenge.

from datasets import load_dataset

# Dataset load karein
dataset = load_dataset("mlabonne/guanaco-llama2-1k", split="train")

Step 3: Model ko Fine-Tune Karna (SFT) 🏋️‍♂️

Ab hum training ka process start karenge. Hum QLoRA (Quantized Low-Rank Adaptation) ka use karenge, jo kam memory me high performance deta hai. TRL library ka SFTTrainer is kaam ko bahut simple bana deta hai.

import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig
from trl import SFTTrainer
from transformers import TrainingArguments

# Model ID
model_id = "meta-llama/Meta-Llama-3-8B"

# BitsAndBytesConfig for 4-bit quantization (QLoRA)
bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16
)

# LoRA config
lora_config = LoraConfig(
    r=16,
    lora_alpha=32,
    lora_dropout=0.05,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    task_type="CAUSAL_LM",
)

# Model aur tokenizer load karein
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)

# Training arguments
training_args = TrainingArguments(
    output_dir="./llama3-8b-finetuned",
    per_device_train_batch_size=4,
    gradient_accumulation_steps=4,
    learning_rate=2e-4,
    num_train_epochs=3,
    logging_steps=10,
)

# SFTTrainer
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    peft_config=lora_config,
    dataset_text_field="text", # Dataset ke column ka naam
    tokenizer=tokenizer,
    args=training_args,
    max_seq_length=512,
)

# Training start karein
trainer.train()

Explanation: Yeh code Llama 3 8B model ko 4-bit precision me load karta hai, LoRA configuration apply karta hai, aur phir SFTTrainer ka use karke dataset par fine-tune karta hai. Isse sirf adapter weights train hote hain, pura model nahi.

💡 Pro Tips

← Back to Fine-Tuning Projects