DiffusionNFT环境配置踩坑合集

  1. conda-pack Files managed by conda were found to have been deleted/overwritten的问题

    解决conda pack的 Files managed by conda were found to have been deleted/overwritten的问题_conda pack报错deleted-CSDN博客

     conda install --force-reinstall [pkg1][pkg2]
    
  2. pip安装flash-attn报错no module named ‘torch’

     pip install flash-attn --no-build-isolation
    
  3. xformers与pytorch版本问题,查看下面github链接

    https://github.com/facebookresearch/xformers/releases?page=2

    https://blog.csdn.net/cainiaoshileyuan/article/details/148000602

  4. 清华镜像源临时使用:

     pip install -i https://mirrors.tuna.tsinghua.edu.cn/pypi/web/simple some-package
     conda install xxx -c https://mirrors.tuna.tsinghua.edu.cn/anaconda/pkgs/main
    

    https://mirror.tuna.tsinghua.edu.cn/help/pypi/

    https://mirrors.tuna.tsinghua.edu.cn/help/anaconda/

  5. conda 安装 cuda-toolkit,注意不同版本对应cuda不同版本,12.3 for cuda12.4

    https://anaconda.org/channels/nvidia/packages/cuda-toolkit/overview

  6. Hugging face镜像站使用

    https://zhuanlan.zhihu.com/p/663712983

  7. apt安装的cmake版本过低,用pip。安装pyarrow时会用到cmake。

    pip install –upgrade cmake

    https://askubuntu.com/questions/355565/how-do-i-install-the-latest-version-of-cmake-from-the-command-line

  8. nvcc command not found

    用conda安装cuda-toolkit,见5。nvcc与cuda版本不同,通过改cuda-toolkit版本进行解决

    https://zhuanlan.zhihu.com/p/21676609751

  9. libgl.so.1: cannot open shared object file: no such file or directory

    apt-get install libgl1

    https://itsmycode.com/importerror-libgl-so-1-cannot-open-shared-object-file-no-such-file-or-directory/

  10. torch cuda下载,南京大学镜像源
    pip3 install torch torchvision torchaudio --index-url https://mirrors.nju.edu.cn/pytorch/whl/cu126
    

    https://zhuanlan.zhihu.com/p/1909015652892644209

    不确定清华源可不可以用,参考: https://blog.csdn.net/cyy0789/article/details/131137525

  11. No module named ‘mmcv._ext’

    when installing mmcv, should use --no-build-isolation

  12. 通过git安装mmcv时,要重命名git文件夹,否则import mmcv的功能时会经常发生找不到module的问题。

  13. Pytorch distributed RuntimeError: Address already in use

    torchrun --nproc_per_node=1 --master_port=29501 ./train_nft_sd3.py --config config/nft.py:sd3_geneval