Qwen3-30B-A3B-Base 추론 속도 이슈 (파이썬 버젼 문제)

https://huggingface.co/Qwen/Qwen3-30B-A3B-Base

Qwen/Qwen3-30B-A3B-Base · Hugging Face

Qwen3-30B-A3B-Base Qwen3 Highlights Qwen3 is the latest generation of large language models in Qwen series, offering a comprehensive suite of dense and mixture-of-experts (MoE) models. Building upon extensive advancements in training data, model architectu

huggingface.co

이런 내용이 있는데 나는 MCP를 활용한 LLM모델을 만들기 위해서 작업진행했었습니다. Qwen 모델은 MCP를 활용하기 쉽게 하였고 추론과정에 COT를 설명할 수 있는 on off가 있는 장점이 있고, A3B인 특징(3b)를 써야 하는 장점이 있다.내가 설치하는 환경이 H100 2장 lora를 사용했는데 그게 참 잘 안되었었다.

문제 해결하는데 파이썬 가상환경 버젼이 3.12.8 이였는데 이때 추론 속도가 8분이나 걸려서 의아했는데 계속 시도한 결과 3.12.9부터 속도가 정상적으로 간다.

H100 기준 아래와 같은 순서로 진행하면 될 것으로 보인다

conda create --name Qwen_3_12 python=3.12

conda activate Qwen_3_12

pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 | torch 12.8

pip install transformers>=4.51.0. | pip install --upgrade transformers

pip install accelerate

- lora를 이용한 fine tuining -

import json
import torch
from datasets import Dataset, DatasetDict
from unsloth import FastModel
from trl import SFTConfig, SFTTrainer
from transformers import EarlyStoppingCallback
############################################
# 1) JSON Dataset 직접 로드
############################################
train_path = r"/home/work/for_train/train_data/llm_model_train.json"
valid_path = r"/home/work/for_train/train_data/llm_model_validation.json"

with open(train_path, 'r', encoding='utf-8') as f:
    train_list = json.load(f)
with open(valid_path, 'r', encoding='utf-8') as f:
    valid_list = json.load(f)

ds = DatasetDict({
    "train": Dataset.from_list(train_list),
    "validation": Dataset.from_list(valid_list)
})

############################################
# 2) 「text」 필드 생성
############################################
import json

def make_llm_text(example):
    triples   = example["input"]
    if not isinstance(triples, str):
        triples = json.dumps(triples, ensure_ascii=False)
    question  = example["question"].strip()
    prompt    = (
        f"Given the following knowledge graph triples:\n"
        f"{triples}\n\n"
        f"Question: {question}\n"
        f"Answer:"
    )
    label     = example["answer"].strip()
    return {"text": prompt + " " + label}

ds = ds.map(make_llm_text, remove_columns=["input","question","answer"])

############################################
# 3) 이후는 기존대로…
############################################
# Gemma-3 로드 → LoRA 설정 → SFTTrainer 구성 → trainer.train() 등
import torch
from unsloth import FastModel
from trl import SFTConfig, SFTTrainer
from transformers import AutoModelForCausalLM, BitsAndBytesConfig

model_name = "./local_gemma3"  # 기본 Gemma-3 베이스 모델 경로

bnb_config = BitsAndBytesConfig(
    load_in_4bit=True,
    bnb_4bit_use_double_quant=True,
    bnb_4bit_quant_type="nf4",
    bnb_4bit_compute_dtype=torch.bfloat16,
)

model = AutoModelForCausalLM.from_pretrained(
    model_name,
    quantization_config=bnb_config,
    device_map="auto",                  # ← 중요: 가용 GPU 전체에 자동 분산
    torch_dtype=torch.bfloat16,
)
############################################
# 3) Gemma-3 모델 로드
############################################

model, tokenizer = FastModel.from_pretrained(
    model_name,
    dtype=torch.bfloat16,
    load_in_4bit=False,
    load_in_8bit=False,
    full_finetuning=False,
    max_seq_length=2048,
    device_map="auto",                # ← 전체 GPU 자동 분산
    max_memory={                       # ← GPU별 메모리 한도
        0: "60GB",
        1: "60GB"
    }
)

model.config.use_cache = False
if hasattr(model.config, "gradient_checkpointing"):
    model.config.gradient_checkpointing = False

############################################
# 4) LoRA 적용
############################################
model = FastModel.get_peft_model(
    model,
    finetune_vision_layers     = False,  # 텍스트만 학습
    finetune_language_layers   = True,
    finetune_attention_modules = True,
    finetune_mlp_modules       = True,
    r           = 256,
    lora_alpha  = 32,
    lora_dropout= 0.05,
    bias        = "none",
)

############################################
# 5) SFTTrainer 구성
############################################
train_args = SFTConfig(
     dataset_text_field           = "text",
     output_dir                   = "FT_gemma3_KGQA_llm_27b",
     per_device_train_batch_size  = 2,
     gradient_accumulation_steps  = 4,
     learning_rate                = 2e-4,
     num_train_epochs             = 3,
     logging_steps                = 100,
     save_strategy                = "steps",
     eval_strategy                = "steps",         # 평가를 steps 단위로
     eval_steps                   = 500,             # 500 스텝마다 평가
     save_steps                   = 500,             # 500 스텝마다 체크포인트 저장
     load_best_model_at_end       = True,            # 학습 종료 시 가장 좋은 모델 로드
     metric_for_best_model        = "eval_loss",     # 기준 평가지표
     greater_is_better            = False,           # eval_loss는 낮을수록 좋음
     warmup_ratio                 = 0.1,
     fp16                         = False,
     bf16                         = True,
     report_to                    = "none",
 )

trainer = SFTTrainer(
    model          = model,
    tokenizer      = tokenizer,
    train_dataset  = ds["train"],
    eval_dataset   = ds["validation"],
    args           = train_args,
    callbacks      = [EarlyStoppingCallback(early_stopping_patience=2)]
)

############################################
# 6) 학습 실행 & 저장
############################################
trainer.train()
trainer.save_model()   # LoRA 가중치 포함 모델 저장
print("✅ LLM (KGQA) 모델 LoRA 파인튜닝 완료!")

저작자표시 (새창열림)

'IT - 코딩 > AI, 예측모델' 카테고리의 다른 글

꼬맨틀 따라하기(무한맨틀, 사용자 단어 지정가능) with python (2)	2024.12.30
시계열 예측 모델 _ LSTM 환율 예측 (1)	2023.10.24
머신러닝 예측모델 선정하기 (인공지능 코테 예제)Adult Census Income Tutorial (1)	2023.10.23
[데이콘]데이크루 6 _ 신용카드 연체 예측 (0)	2023.09.17
전력 사용량 예측 (0)	2023.08.07

1원장자

Qwen3-30B-A3B-Base 추론 속도 이슈 (파이썬 버젼 문제)

'IT - 코딩 > AI, 예측모델' 카테고리의 다른 글

티스토리툴바

Qwen3-30B-A3B-Base﻿ 추론 속도 이슈 (파이썬 버젼 문제)

'IT - 코딩 > AI, 예측모델' 카테고리의 다른 글

관련글

티스토리툴바

Qwen3-30B-A3B-Base 추론 속도 이슈 (파이썬 버젼 문제)