Language Models

How to Fine-Tune DeepSeek-R1 for Your Custom Dataset (Step-by-Step)

Fine-Tuning DeepSeek-R1 for Custom Datasets: A Step-by-Step Guide

In the rapidly evolving field of natural language processing, fine-tuning pre-trained language models has become an essential technique for adapting models to specific tasks and domains. This guide provides a comprehensive walkthrough of fine-tuning the DeepSeek-R1 model using Unsloth, an optimized framework that enables efficient fine-tuning even on resource-constrained hardware.

Introduction to Model Fine-Tuning

Fine-tuning allows practitioners to adapt pre-trained language models to specific tasks or datasets by training them on new examples. While traditionally performed using Hugging Face's Transformers library, this process often requires substantial computational resources. Unsloth addresses these challenges by:

Reducing memory usage
Accelerating download speeds
Implementing Low-Rank Adaptation (LoRA) for efficient parameter tuning

Step 1: Environment Setup

Begin by installing the necessary libraries:

%%capture
!pip install unsloth
!pip install --force-reinstall --no-cache-dir --no-deps git+https://github.com/unslothai/unsloth.git

Step 2: Model Initialization

Load the DeepSeek-R1 model with optimized configurations:

from unsloth import FastLanguageModel
import torch

# Configuration parameters
max_seq_length = 2048
dtype = None
load_in_4bit = True

model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="unsloth/DeepSeek-R1-Distill-Llama-8B",
    max_seq_length=max_seq_length,
    dtype=dtype,
    load_in_4bit=load_in_4bit
)

Step 3: Implementing LoRA Adapters

Apply Low-Rank Adaptation for efficient fine-tuning:

model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj", "gate_proj", "up_proj", "down_proj"],
    lora_alpha=16,
    lora_dropout=0,
    bias="none",
    use_gradient_checkpointing="unsloth",
    random_state=3407,
    use_rslora=False,
    loftq_config=None
)

Step 4: Dataset Preparation

Load and preprocess the training dataset:

from datasets import load_dataset
from unsloth.chat_templates import standardize_sharegpt

dataset = load_dataset("Sulav/mental_health_counseling_conversations_sharegpt", split="train")
dataset = standardize_sharegpt(dataset)

Step 5: Training Configuration

Set up the training parameters using SFTTrainer:

from trl import SFTTrainer
from transformers import TrainingArguments, DataCollatorForSeq2Seq
from unsloth import is_bfloat16_supported

trainer = SFTTrainer(
    model=model,
    tokenizer=tokenizer,
    train_dataset=dataset,
    dataset_text_field="text",
    max_seq_length=max_seq_length,
    data_collator=DataCollatorForSeq2Seq(tokenizer=tokenizer),
    dataset_num_proc=2,
    packing=False,
    args=TrainingArguments(
        per_device_train_batch_size=2,
        gradient_accumulation_steps=4,
        warmup_steps=5,
        max_steps=60,
        learning_rate=2e-4,
        fp16=not is_bfloat16_supported(),
        bf16=is_bfloat16_supported(),
        logging_steps=1,
        optim="adamw_8bit",
        weight_decay=0.01,
        lr_scheduler_type="linear",
        seed=3407,
        output_dir="outputs",
        report_to="none",
    ),
)

Step 6: Model Training

Execute the training process:

from unsloth.chat_templates import train_on_responses_only

trainer = train_on_responses_only(
    trainer,
    instruction_part="<|start_header_id|>user<|end_header_id|>\n\n",
    response_part="<|start_header_id|>assistant<|end_header_id|>\n\n",
)

trainer_stats = trainer.train()

Step 7: Model Inference

Generate responses using the fine-tuned model:

from unsloth.chat_templates import get_chat_template

tokenizer = get_chat_template(tokenizer, chat_template="llama-3.1")
tokenizer.pad_token = tokenizer.eos_token
FastLanguageModel.for_inference(model)

messages = [{"role": "user", "content": "I am sad because I failed my Maths test today"}]

inputs = tokenizer.apply_chat_template(
    messages,
    tokenize=True,
    add_generation_prompt=True,
    return_tensors="pt",
    padding=True,
).to("cuda")

attention_mask = inputs != tokenizer.pad_token_id

outputs = model.generate(
    input_ids=inputs,
    attention_mask=attention_mask,
    max_new_tokens=64,
    use_cache=True,
    temperature=0.6,
    min_p=0.1,
)

text = tokenizer.decode(outputs[0], skip_special_tokens=True)
print(text)

Step 8: Model Persistence

Save the fine-tuned model for future use:

my_model = "MindSeek-8B"
model.save_pretrained(my_model)
tokenizer.save_pretrained(my_model)

# For online storage
model.push_to_hub("your_name/your_model_name")
tokenizer.push_to_hub("your_name/your_model_name")

# Save in GGUF format
%%capture
model.push_to_hub_gguf(my_model, tokenizer, quantization_method="q4_k_m")

Best Practices

When working with DeepSeek-R1 models, consider these recommendations:

Maintain temperature settings between 0.5 and 0.7 for optimal response quality
Incorporate all necessary instructions directly within user prompts
For mathematical tasks, include explicit step-by-step instructions
Conduct multiple test runs and average results for reliable performance evaluation

Conclusion

This guide has demonstrated an efficient approach to fine-tuning the DeepSeek-R1 model using Unsloth. By following these steps, practitioners can adapt large language models to specific use cases while optimizing resource utilization. For further exploration, consult the Unsloth Documentation and GitHub repository.

About the Author
Dawood Shahzad is a machine learning engineer specializing in computer vision, deep learning, and AI applications. A graduate in Artificial Intelligence from Bahria University, Dawood has developed innovative solutions in areas including chatbot development, predictive modeling, and AI-powered medical tools. He is the winner of the GDG Kolachi Software House AI Hackathon and has contributed to projects like PsyScribe and Novik AI Assistant. Connect with Dawood on LinkedIn or explore his work on GitHub.