Open LLMs ko 2025 me Fine-Tune Kaise Karein 🚀
Dosto, aaj kal open-source LLMs jaise Llama 3, Mistral, aur Gemma ko fine-tune karna bahut aasan ho gaya hai.
Is process se aap in powerful models ko apne specific tasks ke liye customize kar sakte hain.
Is tutorial me hum seekhenge ki Hugging Face ki modern libraries (TRL, PEFT, bitsandbytes) ka use karke LLMs ko efficiently kaise fine-tune karein.
Step 1: Environment Setup 🛠️
Sabse pehle, humein zaroori libraries install karni hongi. Yeh libraries humein training, model optimization, aur dataset handling me madad karengi.
pip install -U transformers datasets accelerate peft trl bitsandbytes
Libraries ka matlab:
transformers: Hugging Face models ko load karne ke liye.peft: Parameter-Efficient Fine-Tuning (LoRA) ke liye.trl: Supervised Fine-Tuning (SFT) ko aasan banane ke liye.bitsandbytes: Model ko quantize (QLoRA) karke memory bachane ke liye.
Step 2: Dataset Taiyar Karna 📚
Fine-tuning ke liye, aapka dataset ek specific format me hona chahiye. Chat models ke liye, "chat template" format best hai, jisme har example me roles (user, assistant) aur content hota hai.
Hum yahan ek sample dataset "mlabonne/guanaco-llama2-1k" use karenge.
from datasets import load_dataset
# Dataset load karein
dataset = load_dataset("mlabonne/guanaco-llama2-1k", split="train")
Step 3: Model ko Fine-Tune Karna (SFT) 🏋️♂️
Ab hum training ka process start karenge. Hum QLoRA (Quantized Low-Rank Adaptation) ka use karenge, jo kam memory me high performance deta hai.
TRL library ka SFTTrainer is kaam ko bahut simple bana deta hai.
import torch
from transformers import AutoModelForCausalLM, AutoTokenizer, BitsAndBytesConfig
from peft import LoraConfig
from trl import SFTTrainer
from transformers import TrainingArguments
# Model ID
model_id = "meta-llama/Meta-Llama-3-8B"
# BitsAndBytesConfig for 4-bit quantization (QLoRA)
bnb_config = BitsAndBytesConfig(
load_in_4bit=True,
bnb_4bit_quant_type="nf4",
bnb_4bit_compute_dtype=torch.bfloat16
)
# LoRA config
lora_config = LoraConfig(
r=16,
lora_alpha=32,
lora_dropout=0.05,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
task_type="CAUSAL_LM",
)
# Model aur tokenizer load karein
model = AutoModelForCausalLM.from_pretrained(model_id, quantization_config=bnb_config, device_map="auto")
tokenizer = AutoTokenizer.from_pretrained(model_id)
# Training arguments
training_args = TrainingArguments(
output_dir="./llama3-8b-finetuned",
per_device_train_batch_size=4,
gradient_accumulation_steps=4,
learning_rate=2e-4,
num_train_epochs=3,
logging_steps=10,
)
# SFTTrainer
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
peft_config=lora_config,
dataset_text_field="text", # Dataset ke column ka naam
tokenizer=tokenizer,
args=training_args,
max_seq_length=512,
)
# Training start karein
trainer.train()
Explanation: Yeh code Llama 3 8B model ko 4-bit precision me load karta hai, LoRA configuration apply karta hai, aur phir SFTTrainer ka use karke dataset par fine-tune karta hai. Isse sirf adapter weights train hote hain, pura model nahi.
💡 Pro Tips
- Model Selection: Apne task aur hardware ke hisaab se sahi base model chunein (e.g., Llama 3 8B, Mistral 7B).
- Dataset Quality: Fine-tuning ka result aapke dataset ki quality par depend karta hai. Clean aur relevant data use karein.
- LoRA Parameters:
r(rank) aurlora_alphako adjust karke aap training ko control kar sakte hain.r=16auralpha=32ek acchi starting point hai. - Hardware: QLoRA ke saath, aap Llama 3 8B jaise model ko single consumer GPU (jaise NVIDIA T4 ya 24GB VRAM wale GPU) par bhi fine-tune kar sakte hain.