Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -41,3 +41,7 @@ dev-support/ci/bats-support

.mvn/.gradle-enterprise/
.mvn/.develocity/

# Installer logs
tools/installer/logs/**
**/__pycache__
102 changes: 102 additions & 0 deletions tools/installer/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,102 @@
# Ozone Installer (Ansible)

## Software Requirements

- Controller: Python 3.10–3.12 (prefer 3.11)
- Ansible Community 9.x (ansible-core 2.16.x)
- rsync installed on controller and all target hosts (required for local snapshot tarball copy)
- Debian/Ubuntu: `sudo apt-get install -y rsync`
- RHEL/CentOS/Rocky: `sudo yum install -y rsync` or `sudo dnf install -y rsync`
- SUSE: `sudo zypper in -y rsync`

### Controller node requirements
- Can be local or remote.
- Must be on the same network as the target hosts.
- Requires SSH access (key or password).

### Run on the controller node
```bash
pip install -r requirements.txt
```

## File structure

- `ansible.cfg` (defaults and logging)
- `inventories/dev/hosts.ini` + `inventories/dev/group_vars/all.yml`
- `playbooks/` (`cluster.yml`)
- `roles/` (ssh_bootstrap, ozone_user, java, ozone_layout, ozone_fetch, ozone_config, ozone_service_non_ha, ozone_service_ha)

## Usage (two options)

1) Python wrapper (orchestrates Ansible for you)

```bash
# Non-HA upstream
python3 ozone_installer.py -H host1.domain -v 2.0.0

# HA upstream (3+ hosts) - mode auto-detected
python3 ozone_installer.py -H "host{1..3}.domain" -v 2.0.0

# Local snapshot build
python3 ozone_installer.py -H host1 -v local --local-path /path/to/share/ozone-2.1.0-SNAPSHOT

# Cleanup and reinstall
python3 ozone_installer.py --clean -H "host{1..3}.domain" -v 2.0.0

# Notes on cleanup
# - During a normal install, you'll be asked whether to cleanup an existing install (if present). Default is No.
# - Use --clean to cleanup without prompting before reinstall.
```

### Resume last failed task

```bash
# Python wrapper (picks task name from logs/last_failed_task.txt)
python3 ozone_installer.py -H host1.domain -v 2.0.0 --resume
```

```bash
# Direct Ansible
ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml \
--start-at-task "$(head -n1 logs/last_failed_task.txt)"
```

2) Direct Ansible (run playbooks yourself)

```bash
# Non-HA upstream
ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=non-ha"

# HA upstream
ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "ozone_version=2.0.0 cluster_mode=ha"

# Cleanup only (run just the cleanup role)
ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml \
--tags cleanup -e "do_cleanup=true"
```

## Inventory

Edit `inventories/dev/hosts.ini` and group vars in `inventories/dev/group_vars/all.yml`:

- Groups: `[om]`, `[scm]`, `[datanodes]`, `[recon]`
- Key vars: `ozone_version`, `install_base`, `data_base`, `jdk_major`, `service_user`, `start_after_install`

## Non-HA

```bash
ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "cluster_mode=non-ha"
```

## HA cluster

```bash
ANSIBLE_CONFIG=ansible.cfg ansible-playbook -i inventories/dev/hosts.ini playbooks/cluster.yml -e "cluster_mode=ha"
```

## Notes

- Idempotent where possible; runtime `ozone` init/start guarded with `creates:`.
- JAVA_HOME and OZONE_HOME are exported in `/etc/profile.d/ozone.sh`.
- Local snapshot mode archives from controller and uploads to targets.

26 changes: 26 additions & 0 deletions tools/installer/ansible.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
[defaults]
inventory = inventories/dev/hosts.ini
stdout_callback = yaml
retry_files_enabled = False
gathering = smart
forks = 20
strategy = free
timeout = 30
roles_path = roles
log_path = logs/ansible.log
bin_ansible_callbacks = True
callback_plugins = callback_plugins
callbacks_enabled = timer, profile_tasks, last_failed ; for execution time profiling and resume hints
deprecation_warnings = False
host_key_checking = False
remote_tmp = /tmp/.ansible-${USER}

[privilege_escalation]
become = True
become_method = sudo

[ssh_connection]
pipelining = True
ssh_args = -o ControlMaster=auto -o ControlPersist=60s -o StrictHostKeyChecking=no -o UserKnownHostsFile=/dev/null


57 changes: 57 additions & 0 deletions tools/installer/callback_plugins/last_failed.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,57 @@
from __future__ import annotations

import os
from pathlib import Path
from ansible.plugins.callback import CallbackBase


class CallbackModule(CallbackBase):
CALLBACK_VERSION = 2.0
CALLBACK_TYPE = 'notification'
CALLBACK_NAME = 'last_failed'
CALLBACK_NEEDS_WHITELIST = False

def __init__(self):
super().__init__()
# Write to installer logs dir
self._out_dir = Path(__file__).resolve().parents[1] / "logs"
self._out_file = self._out_dir / "last_failed_task.txt"
try:
os.makedirs(self._out_dir, exist_ok=True)
except Exception:
pass

def _write_last_failed(self, result):
try:
task_name = result._task.get_name() # noqa
task_path = getattr(result._task, "get_path", lambda: None)() # noqa
lineno = getattr(result._task, "get_lineno", lambda: None)() # noqa
role_name = None
if task_path and "/roles/" in task_path:
try:
role_segment = task_path.split("/roles/")[1]
role_name = role_segment.split("/")[0]
except Exception:
role_name = None
host = getattr(result, "_host", None)
host_name = getattr(host, "name", "unknown") if host else "unknown"
line = f"{task_name}\n# host: {host_name}\n"
if task_path:
line += f"# file: {task_path}\n"
if lineno:
line += f"# line: {lineno}\n"
if role_name:
line += f"# role: {role_name}\n"
with open(self._out_file, "w", encoding="utf-8") as f:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use logger instead of writing directly to the file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a persisted file that is used by the next rerun and I don't think logger is the right choice for such use cases.
I don't see any problem with "with open" for handling persisted file.

Let me know if you still feel there is an issue with using above way of writing file

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if you want to implement state machine please use state machine FW.

f.write(line)
except Exception:
# Best effort only; never break the run
pass

def v2_runner_on_failed(self, result, ignore_errors=False):
self._write_last_failed(result)

def v2_runner_on_unreachable(self, result):
self._write_last_failed(result)


41 changes: 41 additions & 0 deletions tools/installer/inventories/dev/group_vars/all.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
---
# Global defaults
cluster_mode: "non-ha" # non-ha | ha

# Source selection
ozone_version: "2.0.0" # "2.0.0" | "local"
dl_url: "https://dlcdn.apache.org/ozone"

# Local snapshot settings
local_shared_path: ""
local_ozone_dirname: ""

# Install and data directories
install_base: "/opt/ozone"
data_base: "/data/ozone"

# Java settings
jdk_major: 17
ozone_java_home: "" # autodetected if empty

# Service user/group
service_user: "ozone"
service_group: "ozone"

# Runtime and behavior
use_sudo: true
start_after_install: true
ozone_opts: "-Xmx1024m -XX:ParallelGCThreads=8"
service_command_timeout: 300 # seconds for service init/start commands
ansible_remote_tmp: "/tmp/.ansible-{{ ansible_user_id }}"

# SSH bootstrap
allow_cluster_ssh_key_deploy: false
ssh_public_key_path: "" # optional path on controller to a public key to install
ssh_private_key_path: "" # optional path to private key to copy for cluster identity

# Markers for profile management
JAVA_MARKER: "Apache Ozone Installer Java Home"
ENV_MARKER: "Apache Ozone Installer Env"


17 changes: 17 additions & 0 deletions tools/installer/inventories/dev/hosts.ini
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
[om]
# om1.example.com

[scm]
# scm1.example.com

[datanodes]
# dn1.example.com
# dn2.example.com

[recon]
# recon1.example.com

[all:vars]
cluster_mode=non-ha


Loading