-
Notifications
You must be signed in to change notification settings - Fork 588
HDDS-13870. Bare Metal Ozone Installer with support for multiple node deployment. #9247
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
ssulav
wants to merge
14
commits into
apache:master
Choose a base branch
from
ssulav:HDDS-13870-installer
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
14 commits
Select commit
Hold shift + click to select a range
aa1f697
HDDS-13870: Ozone Installer Script with service user
ssulav aa6b86e
Move the XML files
ssulav 93533f6
Code changes tested on EC2
ssulav a9e51fa
Default service user and group to ozone
ssulav 57db653
Added README.md
ssulav 2cf486f
Add Recon service and few other enhancements
ssulav aa8827e
Moved from shell to ansible
ssulav 1a1c545
Merge branch 'apache:master' into HDDS-13870-installer
ssulav f040bdc
Minor config changes
ssulav a79650b
Added resume option with local snapshot deployment
ssulav fdba78c
Add ozone_smoke in same playbook and JAVA_TOOL_OPTIONS
ssulav 26cd904
Add resume option with delegates
ssulav 036de52
Merge ha and non-ha service tasks
ssulav a818772
Add click for user inputs and loger for log file generation
ssulav File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,102 @@ | ||
| # Ozone Installer (Ansible) | ||
|
|
||
| ## Software Requirements | ||
|
|
||
| - Controller: Python 3.10–3.12 (prefer 3.11) | ||
| - Ansible Community 9.x (ansible-core 2.16.x) | ||
| - rsync installed on controller and all target hosts (required for local snapshot tarball copy) | ||
| - Debian/Ubuntu: `sudo apt-get install -y rsync` | ||
| - RHEL/CentOS/Rocky: `sudo yum install -y rsync` or `sudo dnf install -y rsync` | ||
| - SUSE: `sudo zypper in -y rsync` | ||
|
|
||
| ### Controller node requirements | ||
| - Can be local or remote. | ||
| - Must be on the same network as the target hosts. | ||
| - Requires SSH access (key or password). | ||
|
|
||
| ### Run on the controller node | ||
| ```bash | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| ## File structure | ||
|
|
||
| - `ansible.cfg` (defaults and logging) | ||
| - `inventories/dev/hosts.ini` + `inventories/dev/group_vars/all.yml` | ||
| - `playbooks/` (`cluster.yml`) | ||
| - `roles/` (ssh_bootstrap, ozone_user, java, ozone_layout, ozone_fetch, ozone_config, ozone_service_non_ha, ozone_service_ha) | ||
|
|
||
| ## Usage (two options) | ||
|
|
||
| 1) Python wrapper (orchestrates Ansible for you) | ||
|
|
||
| ```bash | ||
| # Non-HA upstream | ||
| python3 ozone_installer.py -H host1.domain -v 2.0.0 | ||
|
|
||
| # HA upstream (3+ hosts) - mode auto-detected | ||
| python3 ozone_installer.py -H "host{1..3}.domain" -v 2.0.0 | ||
|
|
||
| # Local snapshot build | ||
| python3 ozone_installer.py -H host1 -v local --local-path /path/to/share/ozone-2.1.0-SNAPSHOT | ||
|
|
||
| # Cleanup and reinstall | ||
| python3 ozone_installer.py --clean -H "host{1..3}.domain" -v 2.0.0 | ||
|
|
||
| # Notes on cleanup | ||
| # - During a normal install, you'll be asked whether to cleanup an existing install (if present). Default is No. | ||
| # - Use --clean to cleanup without prompting before reinstall. | ||
| ``` | ||
|
|
||
| ### Resume last failed task | ||
|
|
||
| ```bash | ||
| # Python wrapper (picks task name from logs/last_failed_task.txt) | ||
| python3 ozone_installer.py -H host1.domain -v 2.0.0 --resume | ||
| ``` | ||
|
|
||
| ```bash | ||
| # Direct Ansible | ||
| ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml \ | ||
| --start-at-task "$(head -n1 logs/last_failed_task.txt)" | ||
| ``` | ||
|
|
||
| 2) Direct Ansible (run playbooks yourself) | ||
|
|
||
| ```bash | ||
| # Non-HA upstream | ||
| ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=non-ha" | ||
|
|
||
| # HA upstream | ||
| ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=ha" | ||
|
|
||
| # Cleanup only (run just the cleanup role) | ||
| ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml \ | ||
| --tags cleanup -e "do_cleanup=true" | ||
| ``` | ||
|
|
||
| ## Inventory | ||
|
|
||
| Edit `inventories/dev/hosts.ini` and group vars in `inventories/dev/group_vars/all.yml`: | ||
|
|
||
| - Groups: `[om]`, `[scm]`, `[datanodes]`, `[recon]` | ||
| - Key vars: `ozone_version`, `install_base`, `data_base`, `jdk_major`, `service_user`, `start_after_install` | ||
|
|
||
| ## Non-HA | ||
|
|
||
| ```bash | ||
| ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "cluster_mode=non-ha" | ||
| ``` | ||
|
|
||
| ## HA cluster | ||
|
|
||
| ```bash | ||
| ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "cluster_mode=ha" | ||
| ``` | ||
|
|
||
| ## Notes | ||
|
|
||
| - Idempotent where possible; runtime `ozone` init/start guarded with `creates:`. | ||
| - JAVA_HOME and OZONE_HOME are exported in `/etc/profile.d/ozone.sh`. | ||
| - Local snapshot mode archives from controller and uploads to targets. | ||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,26 @@ | ||
| [defaults] | ||
| inventory = inventories/dev/hosts.ini | ||
| stdout_callback = yaml | ||
| retry_files_enabled = False | ||
| gathering = smart | ||
| forks = 20 | ||
| strategy = free | ||
| timeout = 30 | ||
| roles_path = roles | ||
| log_path = logs/ansible.log | ||
| bin_ansible_callbacks = True | ||
| callback_plugins = callback_plugins | ||
| callbacks_enabled = timer, profile_tasks, last_failed ; for execution time profiling and resume hints | ||
| deprecation_warnings = False | ||
| host_key_checking = False | ||
| remote_tmp = /tmp/.ansible-${USER} | ||
|
|
||
| [privilege_escalation] | ||
| become = True | ||
| become_method = sudo | ||
|
|
||
| [ssh_connection] | ||
| pipelining = True | ||
| ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,57 @@ | ||
| from __future__ import annotations | ||
|
|
||
| import os | ||
| from pathlib import Path | ||
| from ansible.plugins.callback import CallbackBase | ||
|
|
||
|
|
||
| class CallbackModule(CallbackBase): | ||
| CALLBACK_VERSION = 2.0 | ||
| CALLBACK_TYPE = 'notification' | ||
| CALLBACK_NAME = 'last_failed' | ||
| CALLBACK_NEEDS_WHITELIST = False | ||
|
|
||
| def __init__(self): | ||
| super().__init__() | ||
| # Write to installer logs dir | ||
| self._out_dir = Path(__file__).resolve().parents[1] / "logs" | ||
| self._out_file = self._out_dir / "last_failed_task.txt" | ||
| try: | ||
| os.makedirs(self._out_dir, exist_ok=True) | ||
| except Exception: | ||
| pass | ||
|
|
||
| def _write_last_failed(self, result): | ||
| try: | ||
| task_name = result._task.get_name() # noqa | ||
| task_path = getattr(result._task, "get_path", lambda: None)() # noqa | ||
| lineno = getattr(result._task, "get_lineno", lambda: None)() # noqa | ||
| role_name = None | ||
| if task_path and "/roles/" in task_path: | ||
| try: | ||
| role_segment = task_path.split("/roles/")[1] | ||
| role_name = role_segment.split("/")[0] | ||
| except Exception: | ||
| role_name = None | ||
| host = getattr(result, "_host", None) | ||
| host_name = getattr(host, "name", "unknown") if host else "unknown" | ||
| line = f"{task_name}\n# host: {host_name}\n" | ||
| if task_path: | ||
| line += f"# file: {task_path}\n" | ||
| if lineno: | ||
| line += f"# line: {lineno}\n" | ||
| if role_name: | ||
| line += f"# role: {role_name}\n" | ||
| with open(self._out_file, "w", encoding="utf-8") as f: | ||
| f.write(line) | ||
| except Exception: | ||
| # Best effort only; never break the run | ||
| pass | ||
|
|
||
| def v2_runner_on_failed(self, result, ignore_errors=False): | ||
| self._write_last_failed(result) | ||
|
|
||
| def v2_runner_on_unreachable(self, result): | ||
| self._write_last_failed(result) | ||
|
|
||
|
|
||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,41 @@ | ||
| --- | ||
| # Global defaults | ||
| cluster_mode: "non-ha" # non-ha | ha | ||
|
|
||
| # Source selection | ||
| ozone_version: "2.0.0" # "2.0.0" | "local" | ||
| dl_url: "https://dlcdn.apache.org/ozone" | ||
|
|
||
| # Local snapshot settings | ||
| local_shared_path: "" | ||
| local_ozone_dirname: "" | ||
|
|
||
| # Install and data directories | ||
| install_base: "/opt/ozone" | ||
| data_base: "/data/ozone" | ||
|
|
||
| # Java settings | ||
| jdk_major: 17 | ||
| ozone_java_home: "" # autodetected if empty | ||
|
|
||
| # Service user/group | ||
| service_user: "ozone" | ||
| service_group: "ozone" | ||
|
|
||
| # Runtime and behavior | ||
| use_sudo: true | ||
| start_after_install: true | ||
| ozone_opts: "-Xmx1024m -XX:ParallelGCThreads=8" | ||
| service_command_timeout: 300 # seconds for service init/start commands | ||
| ansible_remote_tmp: "/tmp/.ansible-{{ ansible_user_id }}" | ||
|
|
||
| # SSH bootstrap | ||
| allow_cluster_ssh_key_deploy: false | ||
| ssh_public_key_path: "" # optional path on controller to a public key to install | ||
| ssh_private_key_path: "" # optional path to private key to copy for cluster identity | ||
|
|
||
| # Markers for profile management | ||
| JAVA_MARKER: "Apache Ozone Installer Java Home" | ||
| ENV_MARKER: "Apache Ozone Installer Env" | ||
|
|
||
|
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,17 @@ | ||
| [om] | ||
| # om1.example.com | ||
|
|
||
| [scm] | ||
| # scm1.example.com | ||
|
|
||
| [datanodes] | ||
| # dn1.example.com | ||
| # dn2.example.com | ||
|
|
||
| [recon] | ||
| # recon1.example.com | ||
|
|
||
| [all:vars] | ||
| cluster_mode=non-ha | ||
|
|
||
|
|
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
use logger instead of writing directly to the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a persisted file that is used by the next rerun and I don't think logger is the right choice for such use cases.
I don't see any problem with "with open" for handling persisted file.
Let me know if you still feel there is an issue with using above way of writing file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if you want to implement state machine please use state machine FW.