diff --git a/Makefile b/Makefile index ec3399a..d6c150e 100644 --- a/Makefile +++ b/Makefile @@ -1,10 +1,15 @@ -.PHONY: build-docs docs preview help +.PHONY: build-docs docs preview clean help help: @echo "Available targets:" @echo " build-docs - Build the Sphinx documentation" @echo " docs - Build and serve documentation with auto-reload" - @echo " preview - Build and preview documentation in browser" + @echo " preview - Clean, build and preview documentation in browser" + @echo " clean - Remove the documentation build directory" + +clean: + @echo "Cleaning documentation build directory..." + rm -rf docs/_build build-docs: @echo "Building documentation..." @@ -15,7 +20,7 @@ docs: build-docs @echo "Press Ctrl+C to stop the server" cd docs/_build/html && uv run python -m http.server 8001 -preview: build-docs +preview: clean build-docs @echo "Starting documentation server with auto-reload..." @echo "Documentation will be available at http://127.0.0.1:8001" uv run sphinx-autobuild docs docs/_build/html --port 8001 --open-browser diff --git a/docs/_static/custom.css b/docs/_static/custom.css index 9b07293..d855453 100644 --- a/docs/_static/custom.css +++ b/docs/_static/custom.css @@ -5,7 +5,6 @@ /* Enhanced code blocks */ .highlight { border-radius: 8px; - margin: 1.5em 0; } div.highlight pre { diff --git a/docs/api-quick-reference.md b/docs/api-quick-reference.md index 1418a1e..58e5f38 100644 --- a/docs/api-quick-reference.md +++ b/docs/api-quick-reference.md @@ -273,6 +273,5 @@ ls .ml-dash/project/experiment/files/ ## See Also -- [Getting Started](getting-started.md) - [Complete Examples](complete-examples.md) - [Runnable Examples](examples.md) diff --git a/docs/getting-started.md b/docs/getting-started.md deleted file mode 100644 index 631c4cc..0000000 --- a/docs/getting-started.md +++ /dev/null @@ -1,324 +0,0 @@ -# Getting Started with ML-Dash - -This guide will help you get started with ML-Dash. - -## Installation - -```bash -# Install from source (for now) -cd ml-dash_python_sdk -pip install -e . -``` - -## Core Concepts - -### Experiments - -A **Experiment** represents a single experiment run or training experiment. Experiments contain: -- Logs (structured logging) -- Parameters (hyperparameters and configuration) -- Metrics (time-series metrics like loss, accuracy) -- Files (models, datasets, artifacts) - -### Projects - -A **Project** is a container for organizing related experiments. Think of it as a project or team project. - -### Local vs Remote Mode - -ML-Dash operates in two modes: - -- **Local Mode**: Data stored in filesystem (`.ml-dash/` directory) -- **Remote Mode**: Data stored in MongoDB + S3 via API - -## Your First Experiment - -ML-Dash supports **three usage styles**. Choose the one that fits your workflow best: - -### Style 1: Decorator (Recommended for ML Training) - -Perfect for wrapping training functions: - -```python -from ml_dash import ml_dash_experiment - -@ml_dash_experiment( - name="hello-ml-dash", - project="tutorials", - local_prefix="./my_experiments" -) -def my_first_experiment(experiment): - """Experiment is automatically injected as a parameter""" - # Log a message - experiment.log("Hello from ML-Dash!", level="info") - - # Metric a parameter - experiment.parameters().set(message="Hello World") - - print("Experiment created successfully!") - return "Done!" - -# Run the experiment - experiment is managed automatically -result = my_first_experiment() -``` - -### Style 2: Context Manager (Recommended for Scripts) - -The most common and Pythonic approach: - -```python -from ml_dash import Experiment - -# Create a experiment in local mode -with Experiment( - name="hello-ml-dash", - project="tutorials", - local_prefix="./my_experiments", - local_path=".ml-dash" -) as experiment: - # Log a message - experiment.log("Hello from ML-Dash!", level="info") - - # Metric a parameter - experiment.parameters().set(message="Hello World") - - print("Experiment created successfully!") - print(f"Data stored in: {experiment._storage.root_path}") -``` - -### Style 3: Direct Instantiation (Advanced) - -For fine-grained control: - -```python -from ml_dash import Experiment - -# Create an experiment -experiment = Experiment( - name="hello-ml-dash", - project="tutorials", - local_prefix="./my_experiments", - local_path=".ml-dash" -) - -# Explicitly open -experiment.open() - -try: - # Log a message - experiment.log("Hello from ML-Dash!", level="info") - - # Metric a parameter - experiment.parameters().set(message="Hello World") - - print("Experiment created successfully!") -finally: - # Explicitly close - experiment.close() -``` - -Save this as `hello_ml-dash.py` and run it: - -```bash -python hello_ml-dash.py -``` - -You should see: -``` -Experiment created successfully! -Data stored in: ./my_experiments -``` - -## What Just Happened? - -1. **Experiment Created**: A new experiment named "hello-ml-dash" was created in the "tutorials" project -2. **Log Written**: A log message was written to `.ml-dash/tutorials/hello-ml-dash/logs.jsonl` -3. **Parameter Saved**: The parameter was saved to `.ml-dash/tutorials/hello-ml-dash/parameters.json` -4. **Auto-Closed**: The `with` statement automatically closed the experiment - -## Inspecting Your Data - -Let's check what was created: - -```bash -# View the directory structure -tree ./my_experiments/.ml-dash - -# View logs -cat ./my_experiments/.ml-dash/tutorials/hello-ml-dash/logs.jsonl - -# View parameters -cat ./my_experiments/.ml-dash/tutorials/hello-ml-dash/parameters.json -``` - -## Experiment Context Manager - -ML-Dash uses Python's context manager pattern (`with` statement) to ensure proper cleanup: - -```python -# ✓ Good - Automatic cleanup -with Experiment(name="my-experiment", project="test", local_prefix="./data", - local_path=".ml-dash") as experiment: - experiment.log("Training started") - # ... do work ... -# Experiment automatically closed here - -# ✗ Manual cleanup (not recommended) -experiment = Experiment(name="my-experiment", project="test", local_prefix="./data", - local_path=".ml-dash") -experiment.open() -try: - experiment.log("Training started") -finally: - experiment.close() -``` - -## Experiment Metadata - -You can add metadata to your experiments: - -```python -with Experiment( - name="mnist-baseline", - project="computer-vision", - local_prefix="./experiments", - description="Baseline CNN for MNIST classification", - tags=["mnist", "cnn", "baseline"], - folder="/experiments/mnist", - local_path=".ml-dash" -) as experiment: - experiment.log("Experiment created with metadata") -``` - -## Error Handling - -Experiments handle errors gracefully: - -```python -from ml_dash import Experiment - -try: - with Experiment( - name="test-experiment", - project="test", - local_prefix="./data", - local_path=".ml-dash" - ) as experiment: - experiment.log("Starting work...") - # Your code here - raise Exception("Something went wrong!") -except Exception as e: - print(f"Error occurred: {e}") - # Experiment is still properly closed -``` - -## Next Steps - -Now that you understand the basics, explore: -- [Experiments](experiments.md) - Advanced experiment management -- [Logging](logging.md) - Structured logging -- [Parameters](parameters.md) - Parameter metricing -- [Metrics](metrics.md) - Time-series metrics -- [Files](files.md) - File uploads - -## Quick Reference - -### Three Usage Styles - -```python -from ml_dash import Experiment, ml_dash_experiment - -# ======================================== -# Style 1: Decorator (ML Training) -# ======================================== -@ml_dash_experiment( - name="experiment-name", - project="project-name", - local_prefix="./path/to/data" -) -def train(experiment): - experiment.log("Training...") - -train() # Experiment managed automatically - -# ======================================== -# Style 2: Context Manager (Scripts) -# ======================================== -# Local mode (filesystem) -with Experiment( - name="experiment-name", - project="project-name", - local_prefix="./path/to/data", - local_path=".ml-dash" -) as experiment: - pass - -# Remote mode (API + S3) - with username -with Experiment( - name="experiment-name", - project="project-name", - remote="https://cu3thurmv3.us-east-1.awsapprunner.com", - user_name="your-username" -) as experiment: - pass - -# Remote mode (API + S3) - with API key (advanced) -with Experiment( - name="experiment-name", - project="project-name", - remote="https://cu3thurmv3.us-east-1.awsapprunner.com", - api_key="your-api-key" -) as experiment: - pass - -# ======================================== -# Style 3: Direct Instantiation (Advanced) -# ======================================== -experiment = Experiment( - name="experiment-name", - project="project-name", - local_prefix="./path/to/data", - local_path=".ml-dash" -) -experiment.open() -try: - # Do work - pass -finally: - experiment.close() -``` - -### All Styles Work With Remote Mode - -```python -# Decorator + Remote -@ml_dash_experiment( - name="experiment-name", - project="project-name", - remote="https://cu3thurmv3.us-east-1.awsapprunner.com", - user_name="your-username" -) -def train(experiment): - pass -``` - -**Note**: Using `user_name` is simpler for development - it automatically generates an API key from your username. - ---- - -## See Also - -Now that you know the basics, explore these guides: - -- **[Architecture](architecture.md)** - Understand how ML-Dash works internally -- **[Deployment Guide](deployment.md)** - Deploy your own ML-Dash server -- **[API Quick Reference](api-quick-reference.md)** - Cheat sheet for common patterns -- **[Complete Examples](complete-examples.md)** - End-to-end ML workflows -- **[FAQ & Troubleshooting](faq.md)** - Common questions and solutions - -**Feature-specific guides:** -- [Experiments](experiments.md) - Experiment lifecycle and management -- [Logging](logging.md) - Structured logging with levels -- [Parameters](parameters.md) - Hyperparameter metricing -- [Metrics](metrics.md) - Time-series metrics -- [Files](files.md) - File upload and management diff --git a/docs/index.md b/docs/index.md index f0a601f..0bafa5e 100644 --- a/docs/index.md +++ b/docs/index.md @@ -1,53 +1,126 @@ # Welcome to ML-Dash -## Installation +You can install the package with uv or pip: ```shell uv add ml-dash ``` -or using pip +or ```shell pip install ml-dash ``` -## Quick Example +The core of ML-Dash is the `Experiment` class. It supports logging, upload, and download of +metrics, training hyper parameters, and binary files. The following example shows how to use +it in a simple training script. ```python from ml_dash import Experiment -with Experiment(name="my-experiment", project="my-project", local_path=".ml_dash") as experiment: - # Log messages - experiment.log("Training started") +dxp = Experiment() - # Metric parameters - experiment.parameters().set(learning_rate=0.001, batch_size=32) +dxp.run.start("You can log any message here, to mark the start of the run") - # Metric metrics - experiment.metric("loss").append(value=0.5, epoch=1) +# you can log the training hyper-parameters. These will be indexed and searchable +dxp.parameters.set(learning_rate=0.001, batch_size=32) + +# log metrics +dxp.metric.append(loss=0.001, accuracy=0.5) + +# and you can namespace the metrics by calling the metrics writer. +dxp.metric("eval").append(loss=0.001, accuracy=0.5) + +# You can upload files with a prefix +dxp.file("checkpoints").save(fname="model.pth") + +# and you can mark the run as complete. +dxp.run.complete("This is over!") + +``` + +Each experiment has a current "Run", that doubles as a context manager that +automatically manages the start and end of the current execution: + +```python +with dxp.run("Training"): + # training logic + pass ``` +```{admonition} Dash Experiment Run Lifecycle 🔄 +:class: note + +Each experimental run has the following lifecycle stages: +- created: when the experimental run has been registered in the zaku job queue. +- running: when the run has been pulled, hydrated, and initialized. +- on-hold: when the context recieves a pause trigger event to put it on hold. +- complete: when the run finished without error. Sometimes the job can hang here due to on-goinng file-upload in the background. +- failed: when the run failed due to an error. +- aborted: when the run was aborted by the user. +- deleted: when the run was deleted by the user (soft delete) + +``` + +## A More Complete Example + +```python +import torch.nn as nn + +# We typically access the dash experiment through the singleton import +from ml_dash.auto import dxp + + +# This `auto` module creates a new experiment instance upon import. +# the default experiment path is https://dash.ml/@user/scratch/date-and-time +# and this template can be changed by setting the ml_dash.run.Run template. +def train(lr=0.001, batch_size=32, n_steps=10): + net = nn.Sequential(nn.Linear(784, 128), nn.ReLU(), nn.Linear(128, 10)) + + for step in range(n_steps): + # Training logic + loss = (lambda: 0.001)() + accuracy = (lambda: 0.5)() + + # Logging metrics + dxp.metrics.append(loss=loss, accuracy=accuracy, step=step) + + # then you can log the evaluation metrics. + eval_loss, eval_accuracy = 0.001, 0.02 + dxp.metrics("eval").append(loss=eval_loss, accuracy=eval_accuracy, step=step) + + # this allows you to upload the file. + dxp.files.save_torch(net, "model_last.pt") + +``` + +Refer to the [Quick Start Guide](quickstart.md) and the Examples section for more detailed usage examples: + +- [Basic Training](basic-training.md) - Simple training loop with ML-Dash +- [Hyperparameter Search](hyperparameter-search.md) - Running parameter sweeps +- [Model Comparison](model-comparison.md) - Comparing multiple model runs +- [Complete Examples](complete-examples.md) - Full end-to-end examples + ```{toctree} :maxdepth: 2 -:caption: Getting Started +:caption: Introduction :hidden: -overview quickstart -getting-started ``` ```{toctree} :maxdepth: 2 -:caption: Tutorials +:caption: Core Concepts :hidden: -experiments -logging -parameters -metrics -files +Experiment Configuration +The Run Life-cycle +Parameters & Hyperparameters +Metrics & Time Series +Message Logging +Files & Artifacts ``` ```{toctree} diff --git a/docs/overview.md b/docs/overview.md deleted file mode 100644 index d0d11b7..0000000 --- a/docs/overview.md +++ /dev/null @@ -1,35 +0,0 @@ -# Overview - -ML-Dash is a lightweight Python SDK for metricing machine learning experiments and storing experiment data. It provides a simple, intuitive API for logging, parameter metricing, metrics monitoring, and file management. - -**Start in 60 seconds.** Install, import, and start metricing - no configuration needed. - -## Key Features - -**Zero Setup** - Start metricing experiments instantly with filesystem-based storage. No server configuration, no database setup. - -**Dual Modes** - Choose local (filesystem) or remote (server with MongoDB + S3) based on your needs. Switch between them easily. - -**Fluent API** - Clean, chainable syntax that feels natural: - -```{code-block} python -:linenos: - -experiment.log("Training started") -experiment.parameters().set(learning_rate=0.001, batch_size=32) -experiment.metric("loss").append(value=0.5, epoch=1) -experiment.file(file_prefix="model.pth", prefix="/models").save() -``` - - -## Core Concepts - -**Experiment** - Represents a single experiment run containing logs, parameters, metrics, and files. - -**Project** - A container for organizing related experiments, like a project folder. - -**Upsert Behavior** - Experiments can be reopened and updated, perfect for resuming training after crashes or iterative development. - ---- - -**Ready to start?** Check out the [Quickstart](quickstart.md) guide.