Q&A#
RuntimeErrorwhen training InternVL:
RuntimeError:
CUDA Setup failed despite GPU being available. Please run the following command to get more information:
python -m bitsandbytes
Inspect the output of the command and see if you can locate CUDA libraries. You might need to add them
to your LD_LIBRARY_PATH. If you suspect a bug, please take the information from python -m bitsandbytes
and open an issue at: https://github.com/TimDettmers/bitsandbytes/issues
Reason: it might be the reason of incompatibility of CUDA, PyTorch, and bitsandbytes. Run
python -m bitsandbytesfor more details.Solution:
Remove the version limitation of bitsandbytes in
requirements/internvl_chat.txtin the home of InternVL to avoid installing the wrong version again when starting the env. Then reinstall it withpip uninstall bitsandbytes && pip install bitsandbytes.If the above solution does not work, reinstall the PyTorch that is compatible with the CUDA version of your GPU, and repeat the above step, until the command
python -m bitsandbytesoutputs SUCCESS.Then, the
flash-attnneeds to be reinstalled as well.
AssertionErrorwhen training InternVL:
AssertionError: It is illegal to call Engine.step() inside no_sync context manager
Solution: downgrade the version of
deepspeedto0.15.4, and remove the version limitation ofdeepspeedin bothrequirements/internvl_chat.txtandpyproject.tomlin the home of InternVL.
java not foundwhen evaluating InternVL:
Solution: install java.