Language Models
How to Fine-Tune DeepSeek-R1 for Your Custom Dataset (Step-by-Step)
Fine-Tuning DeepSeek-R1 for Custom Datasets: A Step-by-Step Guide
In the rapidly evolving field of natural language processing, fine-tuning pre-trained language models has become an essential technique for adapting models to specific tasks and domains. This guide provides a comprehensive walkthrough of fine-tuning the DeepSeek-R1 model using Unsloth, an optimized framework that enables efficient fine-tuning even on resource-constrained hardware.
Introduction to Model Fine-Tuning
Fine-tuning allows practitioners to adapt pre-trained language models to specific tasks or datasets by training them on new examples. While traditionally performed using Hugging Face's Transformers library, this process often requires substantial computational resources. Unsloth addresses these challenges by:
- Reducing memory usage
- Accelerating download speeds
- Implementing Low-Rank Adaptation (LoRA) for efficient parameter tuning
Step 1: Environment Setup
Begin by installing the necessary libraries:
%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git
Step 2: Model Initialization
Load the DeepSeek-R1 model with optimized configurations:
from unsloth import FastLanguageModel
import torch
# Configuration parameters
max_seq_length = 2048
dtype = None
load_in_4bit = True
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
max_seq_length=max_seq_length,
dtype=dtype,
load_in_4bit=load_in_4bit
)
Step 3: Implementing LoRA Adapters
Apply Low-Rank Adaptation for efficient fine-tuning:
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
lora_alpha=16,
lora_dropout=0,
bias="none",
use_gradient_checkpointing="unsloth",
random_state=3407,
use_rslora=False,
loftq_config=None
)
Step 4: Dataset Preparation
Load and preprocess the training dataset:
from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt
dataset = load_dataset("Sulav/mental_health_counseling_conversations_sharegpt", split="train")
dataset = standardize_sharegpt(dataset)
Step 5: Training Configuration
Set up the training parameters using SFTTrainer:
from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported
trainer = SFTTrainer(
model=model,
tokenizer=tokenizer,
train_dataset=dataset,
dataset_text_field="text",
max_seq_length=max_seq_length,
data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
dataset_num_proc=2,
packing=False,
args=TrainingArguments(
per_device_train_batch_size=2,
gradient_accumulation_steps=4,
warmup_steps=5,
max_steps=60,
learning_rate=2e-4,
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
logging_steps=1,
optim="adamw_8bit",
weight_decay=0.01,
lr_scheduler_type="linear",
seed=3407,
output_dir="outputs",
report_to="none",
),
)
Step 6: Model Training
Execute the training process:
from unsloth.chat_templates import train_on_responses_only
trainer = train_on_responses_only(
trainer,
instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)
trainer_stats = trainer.train()
Step 7: Model Inference
Generate responses using the fine-tuned model:
from unsloth.chat_templates import get_chat_template
tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")
tokenizer.pad_token = tokenizer.eos_token
FastLanguageModel.for_inference(model)
messages = [{"role": "user", "content": "I am sad because I failed my Maths test today"}]
inputs = tokenizer.apply_chat_template(
messages,
tokenize=True,
add_generation_prompt=True,
return_tensors="pt",
padding=True,
).to("cuda")
attention_mask = inputs != tokenizer.pad_token_id
outputs = model.generate(
input_ids=inputs,
attention_mask=attention_mask,
max_new_tokens=64,
use_cache=True,
temperature=0.6,
min_p=0.1,
)
text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)
Step 8: Model Persistence
Save the fine-tuned model for future use:
my_model = "MindSeek-8B"
model.save_pretrained(my_model)
tokenizer.save_pretrained(my_model)
# For online storage
model.push_to_hub("your_name/your_model_name")
tokenizer.push_to_hub("your_name/your_model_name")
# Save in GGUF format
%%capture
model.push_to_hub_gguf(my_model, tokenizer, quantization_method="q4_k_m")
Best Practices
When working with DeepSeek-R1 models, consider these recommendations:
- Maintain temperature settings between 0.5 and 0.7 for optimal response quality
- Incorporate all necessary instructions directly within user prompts
- For mathematical tasks, include explicit step-by-step instructions
- Conduct multiple test runs and average results for reliable performance evaluation
Conclusion
This guide has demonstrated an efficient approach to fine-tuning the DeepSeek-R1 model using Unsloth. By following these steps, practitioners can adapt large language models to specific use cases while optimizing resource utilization. For further exploration, consult the Unsloth Documentation and GitHub repository.
About the Author
Dawood Shahzad is a machine learning engineer specializing in computer vision, deep learning, and AI applications. A graduate in Artificial Intelligence from Bahria University, Dawood has developed innovative solutions in areas including chatbot development, predictive modeling, and AI-powered medical tools. He is the winner of the GDG Kolachi Software House AI Hackathon and has contributed to projects like PsyScribe and Novik AI Assistant. Connect with Dawood on LinkedIn or explore his work on GitHub.