Skip to content

No module named 'megatron_patch' #26

@gintmr

Description

@gintmr

I encountered the following issue while configuring the training environment according to the QuickStart documentation provided in the RoboBrain 2.0 repository:

  1. I have completed environment setup, model conversion, and data preparation as per the documentation, and verified that these steps should be correct.
  2. I have also applied the module patches as instructed in the QuickStart documentation of the robobrain2.0 repository.

However, when starting the training, I encountered the following error:
ModuleNotFoundError: No module named 'megatron_patch'
I haven’t found an effective solution so far.

Could you please advise on what might be causing this issue? Are there any other configuration items or dependencies I should check?

Below is the detailed error message:

[default2]:[rank2]: Traceback (most recent call last):
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/./flagscale/train/train_qwen2_5_vl.py", line 797, in <module>
[default2]:[rank2]:     pretrain(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/flagscale/train/train.py", line 1056, in pretrain
[default2]:[rank2]:     build_train_valid_test_data_iterators(train_valid_test_dataset_provider)
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/flagscale/train/train.py", line 3247, in build_train_valid_test_data_iterators
[default2]:[rank2]:     train_dataloader, valid_dataloader, test_dataloader = build_train_valid_test_data_loaders(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/flagscale/train/train.py", line 3211, in build_train_valid_test_data_loaders
[default2]:[rank2]:     train_ds, valid_ds, test_ds = build_train_valid_test_datasets(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/flagscale/train/train.py", line 3180, in build_train_valid_test_datasets
[default2]:[rank2]:     return build_train_valid_test_datasets_provider(train_valid_test_num_samples)
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/./flagscale/train/train_qwen2_5_vl.py", line 694, in train_valid_test_dataloaders_provider
[default2]:[rank2]:     train_ds, valid_ds1, test_ds = datasets_provider(worker_config)
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/./flagscale/train/train_qwen2_5_vl.py", line 605, in datasets_provider
[default2]:[rank2]:     train_dataset = get_train_dataset(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/third_party/Megatron-LM/megatron/energon/task_encoder/loader.py", line 93, in get_train_dataset
[default2]:[rank2]:     blend_mode, datasets = loader.get_datasets(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/third_party/Megatron-LM/megatron/energon/metadataset/dataset_loader.py", line 94, in get_datasets
[default2]:[rank2]:     self.get_dataset(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/third_party/Megatron-LM/megatron/energon/metadataset/dataset_loader.py", line 68, in get_dataset
[default2]:[rank2]:     return get_dataset_from_config(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/third_party/Megatron-LM/megatron/energon/dataset_config.py", line 92, in get_dataset_from_config
[default2]:[rank2]:     dataset: BaseCoreDatasetFactory[T_sample] = load_config(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/third_party/Megatron-LM/megatron/energon/dataset_config.py", line 48, in load_config
[default2]:[rank2]:     return parser.raw_to_instance(data, default_type)
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/third_party/Megatron-LM/megatron/energon/typed_converter.py", line 154, in raw_to_instance
[default2]:[rank2]:     cls = self._resolve_object(
[default2]:[rank2]:   File "/data2/user/RoboBrain_Project/FlagScale/third_party/Megatron-LM/megatron/energon/typed_converter.py", line 79, in _resolve_object
[default2]:[rank2]:     module = importlib.import_module(module_name)
[default2]:[rank2]:   File "/home/share/.conda/envs/flagscale-train-wxr/lib/python3.10/importlib/__init__.py", line 126, in import_module
[default2]:[rank2]:     return _bootstrap._gcd_import(name[level:], package, level)
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 992, in _find_and_load_unlocked
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
[default2]:[rank2]:   File "<frozen importlib._bootstrap>", line 1004, in _find_and_load_unlocked
[default2]:[rank2]: ModuleNotFoundError: No module named 'megatron_patch'

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions