DiffusionNFT环境配置踩坑合集
-
conda-pack Files managed by conda were found to have been deleted/overwritten的问题
conda install --force-reinstall [pkg1][pkg2] -
pip安装flash-attn报错no module named ‘torch’
pip install flash-attn --no-build-isolation -
xformers与pytorch版本问题,查看下面github链接
https://github.com/facebookresearch/xformers/releases?page=2
https://blog.csdn.net/cainiaoshileyuan/article/details/148000602
-
清华镜像源临时使用:
pip install -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple some-package conda install xxx -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main -
conda 安装 cuda-toolkit,注意不同版本对应cuda不同版本,12.3 for cuda12.4
https://anaconda.org/channels/nvidia/packages/cuda-toolkit/overview
-
Hugging face镜像站使用
-
apt安装的cmake版本过低,用pip。安装pyarrow时会用到cmake。
pip install –upgrade cmake
-
nvcc command not found
用conda安装cuda-toolkit,见5。nvcc与cuda版本不同,通过改cuda-toolkit版本进行解决
-
libgl.so.1: cannot open shared object file: no such file or directory
apt-get install libgl1
- torch cuda下载,南京大学镜像源
pip3 install torch torchvision torchaudio --index-url https://mirrors.nju.edu.cn/pytorch/whl/cu126https://zhuanlan.zhihu.com/p/1909015652892644209
不确定清华源可不可以用,参考: https://blog.csdn.net/cyy0789/article/details/131137525
-
No module named ‘mmcv._ext’
when installing mmcv, should use
--no-build-isolation -
通过git安装mmcv时,要重命名git文件夹,否则import mmcv的功能时会经常发生找不到module的问题。
-
Pytorch distributed RuntimeError: Address already in use
torchrun --nproc_per_node=1 --master_port=29501 ./train_nft_sd3.py --config config/nft.py:sd3_geneval