Skip to content

Conversation

@malay-nagda
Copy link
Collaborator

@malay-nagda malay-nagda commented Dec 23, 2024

What does this PR do ?

llama3 pre-training recipes with performance optimizations

Collection: [llm]

Changelog

  • Add specific line by line info of high level changes in this PR.

Usage

python3 scripts/llm/performance/llama3_8b.py -a <slurm_account> -p <slurm_partition> -i nvcr.io/nvidia/nemo:24.09

GitHub Actions CI

The Jenkins CI system has been replaced by GitHub Actions self-hosted runners.

The GitHub Actions CI will run automatically when the "Run CICD" label is added to the PR.
To re-run CI remove and add the label again.
To run CI on an untrusted fork, a NeMo user with write access must first click "Approve and run".

Before your PR is "Ready for review"

Pre checks:

  • Make sure you read and followed Contributor guidelines
  • Did you write any new necessary tests?
  • Did you add or update any necessary documentation?
  • Does the PR affect components that are optional to install? (Ex: Numba, Pynini, Apex etc)
    • Reviewer: Does the PR have correct import guards for all optional libraries?

PR Type:

  • New Feature
  • Bugfix
  • Documentation

If you haven't finished some of the above items you can still open "Draft" PR.

Who can review?

Anyone in the community is free to review the PR once the checks have passed.
Contributor guidelines contains specific people who can review PRs to various areas.

Additional Information

  • Related to # (issue)

Signed-off-by: Malay Nagda <malayn@nvidia.com>
@malay-nagda malay-nagda changed the title perf scripts llama3 8b NeMo2.0 perf scripts Dec 23, 2024
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
malay-nagda and others added 5 commits December 23, 2024 16:18
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
@malay-nagda malay-nagda requested a review from erhoo82 December 23, 2024 15:07
malay-nagda and others added 2 commits December 23, 2024 21:02
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
@malay-nagda malay-nagda marked this pull request as ready for review December 23, 2024 15:35
@malay-nagda malay-nagda changed the title NeMo2.0 perf scripts NeMo2.0 llama3 perf scripts Dec 23, 2024
@malay-nagda malay-nagda self-assigned this Dec 23, 2024
@erhoo82
Copy link
Collaborator

erhoo82 commented Dec 23, 2024

Maybe should set tensorboard logger disabled by default and document how to enable it?
tensorboard logger causes performance overhead.

@github-actions
Copy link
Contributor

[🤖]: Hi @malay-nagda 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

Signed-off-by: Malay Nagda <malayn@nvidia.com>
@github-actions github-actions bot removed the NLP label Dec 24, 2024
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
malay-nagda and others added 3 commits December 25, 2024 00:18
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
malay-nagda and others added 5 commits December 31, 2024 20:33
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: Malay Nagda <malayn@nvidia.com>
@github-actions
Copy link
Contributor

beep boop 🤖: 🙏 The following files have warnings. In case you are familiar with these, please try helping us to improve the code base.


Your code was analyzed with PyLint. The following annotations have been identified:

************* Module nemo.lightning.run.plugins
nemo/lightning/run/plugins.py:80:0: C0301: Line too long (145/119) (line-too-long)
nemo/lightning/run/plugins.py:91:0: C0301: Line too long (201/119) (line-too-long)
nemo/lightning/run/plugins.py:96:0: C0301: Line too long (200/119) (line-too-long)
nemo/lightning/run/plugins.py:97:0: C0301: Line too long (127/119) (line-too-long)
nemo/lightning/run/plugins.py:181:0: C0301: Line too long (174/119) (line-too-long)
nemo/lightning/run/plugins.py:257:0: C0301: Line too long (150/119) (line-too-long)
nemo/lightning/run/plugins.py:70:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/run/plugins.py:105:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/run/plugins.py:159:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/run/plugins.py:204:4: C0116: Missing function or method docstring (missing-function-docstring)
nemo/lightning/run/plugins.py:267:4: C0116: Missing function or method docstring (missing-function-docstring)

-----------------------------------
Your code has been rated at 9.19/10

Mitigation guide:

  • Add sensible and useful docstrings to functions and methods
  • For trivial methods like getter/setters, consider adding # pylint: disable=C0116 inside the function itself
  • To disable multiple functions/methods at once, put a # pylint: disable=C0116 before the first and a # pylint: enable=C0116 after the last.

By applying these rules, we reduce the occurance of this message in future.

Thank you for improving NeMo's documentation!

@malay-nagda malay-nagda enabled auto-merge (squash) December 31, 2024 19:06
@github-actions
Copy link
Contributor

github-actions bot commented Jan 1, 2025

[🤖]: Hi @malay-nagda 👋,

We wanted to let you know that a CICD pipeline for this PR just finished successfully

So it might be time to merge this PR or get some approvals

I'm just a bot so I'll leave it you what to do next.

//cc @pablo-garay @ko3n1g

@malay-nagda malay-nagda merged commit 91471f0 into main Jan 1, 2025
193 of 196 checks passed
@malay-nagda malay-nagda deleted the malay/perf_scripts branch January 1, 2025 01:45
abhinavg4 pushed a commit that referenced this pull request Jan 30, 2025
* perf scripts llama3 8b

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* copyright

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* llama3 70b

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* 405b recipe

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* doc strings

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* remove tb logging and formatting

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* disable default tb and profiling

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* num steps per epoch

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* correct filepaths

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* remove param

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* README

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* updated param

Signed-off-by: Malay Nagda <malayn@nvidia.com>

---------

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: Abhinav Garg <abhgarg@nvidia.com>
youngeunkwon0405 pushed a commit to youngeunkwon0405/NeMo that referenced this pull request Feb 10, 2025
* perf scripts llama3 8b

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* copyright

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* llama3 70b

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* 405b recipe

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* doc strings

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* remove tb logging and formatting

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* disable default tb and profiling

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* num steps per epoch

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* correct filepaths

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* Apply isort and black reformatting

Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>

* remove param

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* README

Signed-off-by: Malay Nagda <malayn@nvidia.com>

* updated param

Signed-off-by: Malay Nagda <malayn@nvidia.com>

---------

Signed-off-by: Malay Nagda <malayn@nvidia.com>
Signed-off-by: malay-nagda <malay-nagda@users.noreply.github.com>
Co-authored-by: malay-nagda <malay-nagda@users.noreply.github.com>
Signed-off-by: Youngeun Kwon <youngeunk@nvidia.com>
@malay-nagda malay-nagda restored the malay/perf_scripts branch May 6, 2025 06:37
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants