baichuan2模型微调

结论

全参数训练要求配置 4*A100 80G stage 3 勉强可以跑通issue202
轻量化微调在RTX 3090 单卡上需要花费318+小时

前言

因deepspeed不支持在windows，可进行源码安装、但是仅支持部分功能，只有推理能力DeepSpeed

因此，没有办法后使用wsl进行相应的操作.

wsl 设置

升级到wsl2

打开powershell,执行

wsl -l -v

可以看到WSL版本号。如果WSL版本号是1，则需要进行一些操作，才能升级到WSL2。

# 版本升级
wsl --update
# 关闭
wsl --shutdown
# 默认为WSL2，如果不是，可以通过下面命令切换默认为WSL2
wsl --set-default-version 2
# 进入wsl
wsl

挂载硬盘(仅作记录，实际过程中使用不到该步骤)

在wsl中创建挂载锚点

mkdir /mnt/e

创建挂载

sudo mount -t drvfs E: /mnt/e

永久生效

sudo vi /etc/fstab
# 添加新一行
E: /mnt/e drvfs defaults 0 0

重新加载

sudo mount -a

至此挂载搞定，可像普通目录一样访问 ls /mnt/e

注意路径中的磁盘要小写对应

安装ubuntu

wsl --install

自动安装wsl内核以及默认的ubuntu系统

更改WSL2的存放路径

安装系统默认存在地址是c盘，把c盘占满了、所以进行一下迁移.

已安装的WSL的名称和版本

# Powershell中输入以下命令
wsl -l --all -v

导出系统到指定位置

# wsl --export <系统名> <导出目录>
wsl --export Ubuntu D:\wsl-ubuntu.tar

删除当前C盘中的wsl系统

wsl --unregister Ubuntu

导入系统到指定位置

# wsl --import <系统名> <安装位置> <tar文件目录> WSL版本号
wsl --import Ubuntu D:\WSL2_Ubuntu D:\wsl-ubuntu.tar --version 2

配置之前的默认登录用户

# ubuntu.exe config --default-user <用户名>
ubuntu.exe config --default-user xs

windows打开wsl文件系统

\\wsl$

安装conda

下载在官方选择版本下载

可直接在ubuntu里wget下载，也可以在windows下载后复制到ubuntu里

安装

sh Anaconda3-2023.09-0-Linux-x86_64.sh

安装过程中回车 yes(同意协议) yes(conda初始化，添加环境变量)进行安装。

安装完成后，如果存在conda无法使用、可重启ubuntu后查看是否可用

创建环境

conda create -n baichuan2 python=3.10
# 查看环境列表
conda env list
# 激活环境
conda activate baichuan2
# 退出环境
conda deactivate

安装CUDA

访问官网选择对应的版本官方会提供安装命令，执行即可

wget https://developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/cuda-wsl-ubuntu.pin
sudo mv cuda-wsl-ubuntu.pin /etc/apt/preferences.d/cuda-repository-pin-600
wget https://developer.download.nvidia.com/compute/cuda/12.3.1/local_installers/cuda-repo-wsl-ubuntu-12-3-local_12.3.1-1_amd64.deb
sudo dpkg -i cuda-repo-wsl-ubuntu-12-3-local_12.3.1-1_amd64.deb
sudo cp /var/cuda-repo-wsl-ubuntu-12-3-local/cuda-*-keyring.gpg /usr/share/keyrings/
sudo apt-get update
sudo apt-get -y install cuda-toolkit-12-3

nvidia-smi

nvcc和nvidia-smi区别

nvcc 属于CUDA的编译器，将程序编译成可执行的二进制文件，nvidia-smi 全称是 NVIDIA System Management Interface ，是一种命令行实用工具，旨在帮助管理和监控NVIDIA GPU设备。

CUDA有runtime api和 driver api，两者都有对应的CUDA版本，nvcc –version 显示的就是前者对应的CUDA版本，而 nvidia-smi显示的是后者对应的CUDA版本。

模型运行

安装依赖

pip install -r requirements.txt

模型下载由于huggingface下载速度过慢，直接运行程序下载的话还需要翻墙。所以使用https://hf-mirror.com/

可以参照文档进行下载对应的模型。

安装时碰见的其他问题见附录
网页启动

streamlit run web_demo.py

正常启动后，尝试进行下一步模型微调

模型微调

依赖安装

git clone https://github.com/baichuan-inc/Baichuan2.git
cd Baichuan2/fine-tune
pip install -r requirements.txt

如需使用 LoRA 等轻量级微调方法需额外安装 peft
如需使用 xFormers 进行训练加速需额外安装 xFormers

单机训练

hostfile=""
deepspeed --hostfile=$hostfile fine-tune.py  \
    --report_to "none" \
    --data_path "data/belle_chat_ramdon_10k.json" \
    --model_name_or_path "baichuan-inc/Baichuan2-13B-Chat" \
    --output_dir "output" \
    --model_max_length 512 \
    --num_train_epochs 4 \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --save_strategy epoch \
    --learning_rate 2e-5 \
    --lr_scheduler_type constant \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 1e-4 \
    --warmup_ratio 0.0 \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --deepspeed ds_config.json \
    --bf16 True \
    --tf32 True

多机训练多机训练只需要给一下 hostfile ，内容类似如下：

ip1 slots=8
ip2 slots=8
ip3 slots=8
ip4 slots=8

同时在训练脚本里面指定 hosftfile 的路径：

hostfile="/path/to/hostfile"
deepspeed --hostfile=$hostfile fine-tune.py  \
    --report_to "none" \
    --data_path "data/belle_chat_ramdon_10k.json" \
    --model_name_or_path "baichuan-inc/Baichuan2-13B-Chat" \
    --output_dir "output" \
    --model_max_length 512 \
    --num_train_epochs 4 \
    --per_device_train_batch_size 16 \
    --gradient_accumulation_steps 1 \
    --save_strategy epoch \
    --learning_rate 2e-5 \
    --lr_scheduler_type constant \
    --adam_beta1 0.9 \
    --adam_beta2 0.98 \
    --adam_epsilon 1e-8 \
    --max_grad_norm 1.0 \
    --weight_decay 1e-4 \
    --warmup_ratio 0.0 \
    --logging_steps 1 \
    --gradient_checkpointing True \
    --deepspeed ds_config.json \
    --bf16 True \
    --tf32 True

轻量化微调代码已经支持轻量化微调如 LoRA，如需使用仅需在上面的脚本中加入以下参数：

--use_lora True

LoRA 具体的配置可见 fine-tune.py 脚本。使用 LoRA 微调后可以使用下面的命令加载模型：

from peft import AutoPeftModelForCausalLM
model = AutoPeftModelForCausalLM.from_pretrained("output", trust_remote_code=True)

碰见问题

要求安装jax

地址：https://github.com/google/jax#installation

Hardware	Instructions
CPU	`pip install -U "jax[cpu]"`
NVIDIA GPU on x86_64	`pip install -U "jax[cuda12_pip]" -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html`
Google TPU	`pip install -U "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html`
AMD GPU	Use Docker or build from source.
Apple GPU	Follow Apple’s instructions.

报错got an unexpected keyword argument ‘enable’

TypeError: _set_gradient_checkpointing() got an unexpected keyword argument 'enable'

降级transformers==4.33.3

The current device_map had weights offloaded to the disk

修改web_demo.py文件里的init_model方法

model = AutoModelForCausalLM.from_pretrained(
        "baichuan-inc/Baichuan2-13B-Chat",
        torch_dtype=torch.float16,
        device_map="auto",
        trust_remote_code=True
    )

在初始化模型时，增加

offload_folder="offload",
offload_state_dict=True,

This modeling file requires the following packages that were not found in your environment: bitsandbytes. Run pip install bitsandbytes

运行了一下: python -m bitsandbytes 发现报错 No module named ‘scipy’ 再pip install scipy

RuntimeError: “addmm_impl_cpu_” not implemented for ‘Half’

修改web_demo.py文件，把torch_dtype=torch.float16,修改成torch_dtype=torch.bfloat16,

gpu未生效

torch.cuda.is_available() returns False

pip uninstall torch

Go to https://pytorch.org/get-started/locally/ and select your system configurations(as shown in the figure)

下载量化模型

from huggingface_hub import snapshot_download
snapshot_download(repo_id="baichuan-inc/Baichuan2-13B-Chat-4bits", local_dir="model")
# pip install -U sentence-transformers

bitsandbytes windows库

pip install https://github.com/jllllll/bitsandbytes-windows-webui/releases/download/wheels/bitsandbytes-0.41.1-py3-none-win_amd64.whl

AttributeError: ‘list’ object has no attribute ‘as_dict’

pip install bitsandbytes==0.41.0

AttributeError: ‘BaichuanTokenizer’ object has no attribute ‘sp_model’ 降级transformers==4.33.3