Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
48 commits
Select commit Hold shift + click to select a range
aaf8274
HDDS-9864. Add overview/docs landing page
kerneltime Mar 26, 2025
3dad30c
HDDS-9864. Add overview/docs landing page
kerneltime Mar 26, 2025
e4d655a
Updates to architecture and overview page
kerneltime Mar 27, 2025
f1a72ef
checkpoint: Made more icons
kerneltime Mar 28, 2025
941d184
Update docusaurus.config.js for GitHub Pages deployment
kerneltime Mar 28, 2025
b6c06ea
Update docusaurus config with trailingSlash and fixed URL regex
kerneltime Mar 28, 2025
193b8df
Set broken links to warn instead of throw
kerneltime Mar 28, 2025
681fe0c
Add plugin to create .nojekyll file in build output
kerneltime Mar 28, 2025
2d4b4b6
Fix image and link paths in homepage to use baseUrl for GitHub Pages …
kerneltime Mar 28, 2025
115adb5
Fix integration icon paths using better baseUrl approach
kerneltime Mar 28, 2025
3a0f870
Fix integration icons paths to work correctly with GitHub Pages baseUrl
kerneltime Mar 28, 2025
1b6d7e9
Update heading styles with lighter font-weight and adjusted sizes
kerneltime Mar 28, 2025
dfa2f33
Fix social icons in footer by applying baseUrl prefix
kerneltime Mar 28, 2025
7cdd4d6
Fix import for useDocusaurusContext
kerneltime Mar 28, 2025
0a0cd83
Update typography with comprehensive heading size hierarchy
kerneltime Mar 28, 2025
de8a3d5
Increase heading contrast with bolder weights and improved color hier…
kerneltime Mar 28, 2025
0f110d1
Final adjustments to heading contrast and styles
kerneltime Mar 28, 2025
1654bf0
Increase size of container SVG images on architecture page
kerneltime Mar 28, 2025
552c7ea
Fix navbar style configuration
kerneltime Mar 28, 2025
ddd411d
Change navbar style to dark
kerneltime Mar 28, 2025
b2ec4de
Fix text for read write page
kerneltime Mar 28, 2025
1d1566f
Use consistent GitHub logo in top bar with theme colors
kerneltime Mar 28, 2025
1f370a7
Enhance website documentation with new SVG icons and layouts
kerneltime Mar 29, 2025
a3e7213
Add comprehensive documentation for storage containers
kerneltime Mar 29, 2025
4e9db69
Add comprehensive documentation for write pipelines
kerneltime Mar 29, 2025
49cec99
Lint fixes and minor updates
kerneltime Mar 31, 2025
e0c136b
Fix boxes alignment
kerneltime Mar 31, 2025
73ac385
update architecture document
kerneltime Apr 1, 2025
569027a
Fix component interaction diagram
kerneltime Apr 1, 2025
893102d
Reorganize documentation structure for core concepts
kerneltime Apr 8, 2025
f075671
Add comprehensive documentation for Ozone web UIs
kerneltime Apr 8, 2025
227b168
Add detailed documentation for Ozone web UIs
kerneltime Apr 8, 2025
bfcaedc
Fix S3 Gateway port information
kerneltime Apr 8, 2025
b065e81
Fix HTTPFS configuration properties
kerneltime Apr 9, 2025
f277edf
gemini updates
kerneltime Apr 11, 2025
a4bc194
Add theme
kerneltime Apr 11, 2025
25e3937
Initial stubs from gemini
kerneltime Apr 11, 2025
c7ce514
More gemini content
kerneltime Apr 11, 2025
54bf681
Gemini updates
kerneltime Apr 11, 2025
34b480c
Logo updates for client interfaces and integrations
kerneltime Apr 11, 2025
d398874
Gemini + Claude updates
kerneltime Apr 13, 2025
97f20ed
updates
kerneltime Apr 17, 2025
41bf564
Add S3 security documentation and fix card components
kerneltime Apr 23, 2025
e5d6fad
Add comprehensive deployment architecture documentation
kerneltime Apr 23, 2025
76b41e4
Split deployment architecture and hardware documentation
kerneltime Apr 24, 2025
d18ea78
Add note about single-node deployment capability
kerneltime Apr 24, 2025
da76cbb
S3 doc improvements
kerneltime Apr 24, 2025
c47a178
checkpoint
kerneltime Jul 9, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
74 changes: 38 additions & 36 deletions docs/01-overview.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,58 +20,52 @@ Ozone includes features relevant to large-scale storage requirements:

### Scale

Ozone's architecture separates metadata management from data storage. The Ozone Manager (OM) and
Storage Container Manager (SCM) handle metadata operations, while Datanodes manage the physical storage of data blocks.
Ozone's architecture separates metadata management from data storage. The Ozone Manager (OM) and
Storage Container Manager (SCM) handle metadata operations, while Datanodes manage the physical storage of data blocks.
This design allows for independent scaling of these components and supports incremental cluster growth.

### Flexible Durability

Ozone offers configurable data durability options per bucket or per object:

- **Replication (RATIS):** Uses 3-way replication via the [Ratis (Raft)](https://ratis.apache.org) consensus protocol for high availability.
- **Erasure Coding (EC):** Supports various EC codecs (e.g., Reed-Solomon) to reduce storage overhead compared to replication while maintaining specified durability levels.
* **Replication (RATIS):** Uses 3-way replication via the [Ratis (Raft)](https://ratis.apache.org) consensus protocol for high availability.
* **Erasure Coding (EC):** Supports various EC codecs (e.g., Reed-Solomon) to reduce storage overhead compared to replication while maintaining specified durability levels.

### Secure

Security features are integrated at multiple layers:

- **Authentication:** Supports Kerberos integration for user and service authentication.
- **Authorization:** Provides Access Control Lists (ACLs) for managing permissions at the volume, bucket, and key levels. Supports Apache Ranger integration for centralized policy management.
- **Encryption:** Supports TLS/SSL for data in transit and Transparent Data Encryption (TDE) for data at rest.
- **Tokens:** Uses delegation tokens and block tokens for access control in distributed operations.
* **Authentication:** Supports Kerberos integration for user and service authentication.
* **Authorization:** Provides Access Control Lists (ACLs) for managing permissions at the volume, bucket, and key levels. Supports Apache Ranger integration for centralized policy management.
* **Encryption:** Supports TLS/SSL for data in transit and Transparent Data Encryption (TDE) for data at rest.
* **Tokens:** Uses delegation tokens and block tokens for access control in distributed operations.

### Performance

Ozone's design considers performance for different access patterns:

- **Throughput:** Intended for streaming reads and writes of large files. Data can be served directly from Datanodes after initial metadata lookup.
- **Latency:** Metadata operations are managed by OM and SCM, designed for low-latency access.
- **Small File Handling:** Includes mechanisms for managing metadata and storage for large quantities of small files.
* **Throughput:** Intended for streaming reads and writes of large files. Data can be served directly from Datanodes after initial metadata lookup.
* **Latency:** Metadata operations are managed by OM and SCM, designed for low-latency access.
* **Small File Handling:** Includes mechanisms for managing metadata and storage for large quantities of small files.

### Multiple Protocols

Applications can access data stored in Ozone through several interfaces:

- **S3 Protocol:** Provides an S3-compatible REST API, allowing use with S3-native applications and tools.
- **Hadoop Compatible File System (ofs):** Offers the `ofs://` scheme for integration with Hadoop ecosystem tools (e.g., Iceberg, Spark, Hive, Flink, MapReduce).
- **Native Java Client API:** A client library for Java applications.
- **Command Line Interface (CLI):** Provides tools for administrative tasks and data interaction.
* **S3 Protocol:** Provides an S3-compatible REST API, allowing use with S3-native applications and tools.
* **Hadoop Compatible File System (OFS):** Offers the `ofs://` scheme for integration with Hadoop ecosystem tools (e.g., Iceberg, Spark, Hive, Flink, MapReduce).
* **Native Java Client API:** A client library for Java applications.
* **Command Line Interface (CLI):** Provides tools for administrative tasks and data interaction.

### Efficient Storage Use

Ozone includes features aimed at optimizing storage utilization:

- **Erasure Coding:** Can reduce the physical storage footprint compared to 3x replication.
- **Small File Handling:** Manages metadata and block allocation for small files.
- **Containerization:** Groups data blocks into larger Storage Containers, which can simplify management and disk I/O.
* **Erasure Coding:** Can reduce the physical storage footprint compared to 3x replication.
* **Small File Handling:** Manages metadata and block allocation for small files.
* **Containerization:** Groups data blocks into larger Storage Containers, which can simplify management and disk I/O.

### Storage Management

Ozone uses a hierarchical namespace and provides management tools:

- **Namespace:** Organizes data into Volumes (often mapped to tenants) and Buckets (containers for objects), which hold Keys (objects/files).
- **Quotas:** Administrators can set storage quotas at the Volume and Bucket levels.
- **Snapshots:** Supports point-in-time, read-only snapshots of buckets for data protection and versioning.
* **Namespace:** Organizes data into Volumes (often mapped to tenants) and Buckets (containers for objects), which hold Keys (objects/files).
* **Quotas:** Administrators can set storage quotas at the Volume and Bucket levels.
* **Snapshots:** Supports point-in-time, read-only snapshots of buckets for data protection and versioning.

### Strong Consistency

Expand All @@ -84,20 +78,28 @@ The design of Ozone leads to certain characteristics relevant for large-scale da
### Storage Costs

Factors influencing storage costs include:

- **Storage Efficiency:** Erasure Coding can reduce physical storage requirements.
- **Hardware:** Designed to run on commodity hardware.
- **Licensing:** Apache Ozone is open-source software under the Apache License 2.0.
- **Scalability:** Clusters can be expanded by adding nodes or racks. Data rebalancing mechanisms help manage utilization.
* **Storage Efficiency:** Erasure Coding can reduce physical storage requirements.
* **Hardware:** Designed to run on commodity hardware.
* **Licensing:** Apache Ozone is open-source software under the Apache License 2.0.
* **Scalability:** Clusters can be expanded by adding nodes or racks. Data rebalancing mechanisms help manage utilization.

### Operations

Aspects related to storage administration include:

- **Unified Storage:** Can potentially serve as a common storage layer for different types of workloads.
- **Management Tools:** Includes the Recon web UI for monitoring and CLI tools for administration.
- **Maintenance:** Supports features like rolling upgrades, node decommissioning, and data balancing.
* **Unified Storage:** Can potentially serve as a common storage layer for different types of workloads.
* **Management Tools:** Includes the Recon web UI for monitoring and CLI tools for administration.
* **Maintenance:** Supports features like rolling upgrades, node decommissioning, and data balancing.

### Hybrid Cloud Scenarios

Ozone's S3 compatibility allows applications developed for S3 to run on-premises using Ozone. This can be relevant for hybrid cloud strategies or migrating workloads between on-premises and cloud environments.

## Dive Deeper

To learn more about Ozone, refer to the following sections:

* **New to Ozone?** Try the **[Quick Start Guide](./02-quick-start/README.mdx)** to set up a cluster.
* **Want to understand the internals?** Read about the **[Core Concepts](./03-core-concepts/README.mdx)** (architecture, replication, security).
* **Need to use Ozone?** Check the **[User Guide](./04-user-guide/README.mdx)** for client interfaces and integrations.
* **Managing a cluster?** Consult the **[Administrator Guide](./05-administrator-guide/README.mdx)** for installation, configuration, and operations.
* **Running into issues?** The **[Troubleshooting Guide](./06-troubleshooting/README.mdx)** may provide assistance.
107 changes: 105 additions & 2 deletions docs/02-quick-start/01-installation/02-kubernetes.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,109 @@
sidebar_label: Kubernetes
---

# Try Ozone With Kubernetes
# Deploy Ozone on Kubernetes

**TODO:** File a subtask under [HDDS-9856](https://issues.apache.org/jira/browse/HDDS-9856) and complete this page or section.
Apache Ozone can be easily deployed on a Kubernetes cluster using the official [Helm](https://helm.sh) chart or by applying raw Kubernetes manifests. Helm is the recommended approach for most users as it simplifies installation and configuration management.

## Using Helm (Recommended)

This is the quickest way to get an Ozone cluster running on Kubernetes.

### Prerequisites

* A running Kubernetes cluster (v1.29+ recommended).
* [Helm v3](https://helm.sh/docs/intro/install/) installed on your client machine.
* `kubectl` configured to interact with your cluster.

### Installation Steps

1. **Add the Apache Ozone Helm repository:**
```bash
helm repo add ozone https://apache.github.io/ozone-helm-charts/
helm repo update
```

2. **Install the Ozone chart:**
This command installs Ozone with the release name `ozone` into the default namespace using default configuration values.
```bash
helm install ozone ozone/ozone
```
Wait for all the pods (SCM, OM, Datanodes, S3 Gateway, Recon) to become ready. You can monitor the status using:
```bash
kubectl get pods -w
```

### Configuration

The default installation creates a basic Ozone cluster. You can customize the deployment by overriding values in the `values.yaml` file.

* **View default values:**
```bash
helm show values ozone/ozone
```
* **Customize installation:** Create a custom `my-values.yaml` file with your overrides and install using:
```bash
helm install ozone ozone/ozone -f my-values.yaml
```

* **Persistence:** By default, the Helm chart does *not* enable persistence, meaning all data will be lost if pods are restarted. For storing actual data, enable persistence in your `my-values.yaml`:
```yaml
# Example: my-values.yaml
datanode:
persistence:
enabled: true
# Optional: specify storage class, size, accessModes
# size: 50Gi
# storageClassName: my-storage-class
om:
persistence:
enabled: true
# size: 10Gi
scm:
persistence:
enabled: true
# size: 10Gi
# Persistence might also be needed for Recon if you use it actively.
```
Ensure your Kubernetes cluster has a default StorageClass or specify one that supports `ReadWriteOnce` access mode for the components requiring persistence.

### Accessing Ozone Services

Once installed, you can access Ozone services. Typically, services are exposed within the cluster using `ClusterIP`. To access them externally, you might use `kubectl port-forward` or configure Ingress.

* **Example: Accessing S3 Gateway:**
```bash
# Find the S3 Gateway service name (e.g., ozone-s3g)
kubectl get svc

# Forward a local port (e.g., 9878) to the S3 Gateway port
kubectl port-forward service/ozone-s3g 9878:9878
```
You can now access the S3 Gateway at `http://localhost:9878`.

### Uninstalling

To remove the Ozone deployment installed via Helm:
```bash
helm uninstall ozone
```
This will delete all Kubernetes resources associated with the release. If you enabled persistence, the PersistentVolumeClaims (PVCs) might need manual deletion depending on the reclaim policy.

## Using Raw Manifests

For advanced users or specific customization needs, Ozone provides example Kubernetes manifests. These are located within the Ozone source distribution under the `hadoop-ozone/dist/src/main/k8s/examples/` directory (or the corresponding path in the `source` directory if you checked out the code).

The `getting-started` example provides basic manifests for deploying Ozone components (OM, SCM, Datanodes, etc.) as StatefulSets and Services. You can apply these using `kubectl apply -f <directory>`.

```bash
# Example using kubectl and kustomize (if available)
# Navigate to the examples directory
cd hadoop-ozone/dist/src/main/k8s/examples/getting-started

# Apply the manifests
kubectl apply -k .
```

This method requires more manual configuration and management compared to using the Helm chart.

*(Refer to the [Administrator Guide > Installation](/docs/05-administrator-guide/01-installation/01-deployment-architecture.md) for more advanced Kubernetes deployment topics like High Availability, security, and detailed configuration.)*
Loading
Loading