Error
torchrun : The term 'torchrun' is not recognized as the name of a cmdlet, function, script file, or operable program. Check b/ --tokenizer_path tokenizer.model
+ CategoryInfo : ObjectNotFound: (torchrun:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir llama-2-7b/ --tokenizer_path tokenizer.modeltorchrun : The term 'torchrun' is not recognized as the name of a cmdlet, function, script file, or operable program. Check
the spelling of the name, or if a path was included, verify that the path is correct and try again.
At line:1 char:1
+ torchrun --nproc_per_node 1 example_text_completion.py --ckpt_dir lla ...
+ ~~~~~~~~
+ CategoryInfo : ObjectNotFound: (torchrun:String) [], CommandNotFoundException
+ FullyQualifiedErrorId : CommandNotFoundException
Resolution
Use python -m torch.distributed.run instead of torchrun
References
https://stackoverflow.com/a/72463935/3361311