Skip to content

Commit 0c37da5

Browse files
committed
full documentation of high-level and conceptarium
1 parent 6b7731b commit 0c37da5

33 files changed

+6733
-705
lines changed

conceptarium/README.md

Lines changed: 63 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -12,21 +12,12 @@
1212

1313
- **Experiment tracking**: Integrated <img src="../doc/_static/img/logos/wandb.svg" width="20px" align="center"/> [Weights & Biases](https://wandb.ai/) logging for monitoring and reproducibility
1414

15-
- [Quick Start](#quick-start)
16-
- [Installation](#installation)
17-
- [Configuration](#configuration)
18-
- [Running Experiments](#running-experiments)
19-
- [Custom configurations](#custom-configurations)
20-
- [Output Structure](#output-structure)
21-
- [Configuration Details](#configuration-details)
22-
- [Configuration Structure](#configuration-structure)
23-
- [Dataset Configuration](#dataset-configuration-datasetyaml)
24-
- [Model Configuration](#model-configuration-modelyaml)
25-
- [Implementation](#implementation)
26-
- [Implementing Your Own Model](#implementing-your-own-model)
27-
- [Implementing Your Own Dataset](#implementing-your-own-dataset)
28-
- [Contributing](#contributing)
29-
- [Cite this library](#cite-this-library)
15+
📚 **Full Documentation**: See the [comprehensive Conceptarium guide](../doc/guides/using_conceptarium.rst) for detailed documentation on:
16+
- Configuration system and hierarchy
17+
- Dataset and model configuration
18+
- Custom losses and metrics
19+
- Advanced usage patterns
20+
- Troubleshooting
3021

3122
---
3223

@@ -63,20 +54,21 @@ hydra:
6354
name: my_experiment
6455
sweeper:
6556
params:
66-
model: cbm # One or more models (blackbox, cbm, cem, cgm, c2bm, etc.)
67-
dataset: celeba, cub # One or more datasets (celeba, cub, MNIST, alarm, etc.)
68-
seed: 1,2,3,4,5 # sweep over multiple seeds for robustness
57+
seed: 1,2,3,4,5 # Sweep over multiple seeds for robustness
58+
dataset: cub,celeba # One or more datasets
59+
model: cbm_joint # One or more models (blackbox, cbm_joint)
6960

7061
model:
7162
optim_kwargs:
72-
lr: 0.001
63+
lr: 0.01
64+
65+
metrics:
7366
summary_metrics: true
74-
perconcept_metrics: false
67+
perconcept_metrics: true
7568

7669
trainer:
77-
max_epochs: 500
78-
patience: 30
79-
monitor: "val_loss"
70+
max_epochs: 200
71+
patience: 20
8072
```
8173
8274
## Running Experiments
@@ -97,13 +89,13 @@ python run_experiment.py --config-name your_sweep.yaml
9789
On top of this, you can also override configurations from command line:
9890
```bash
9991
# Change dataset
100-
python run_experiment.py dataset=alarm
92+
python run_experiment.py dataset=cub
10193

10294
# Change learning rate
103-
python run_experiment.py model.optim_kwargs.lr=0.001
95+
python run_experiment.py model.optim_kwargs.lr=0.01
10496

10597
# Change multiple configurations
106-
python run_experiment.py model=cbm dataset=asia,alarm seed=1,2,3
98+
python run_experiment.py model=cbm_joint dataset=cub,celeba seed=1,2,3
10799
```
108100

109101
## Output Structure
@@ -133,57 +125,44 @@ Configuration files are organized in `conceptarium/conf/`:
133125

134126
```
135127
conf/
136-
├── _default.yaml # Base configuration with defaults
137-
├── sweep.yaml # Experiment sweep configuration
138-
├── dataset/ # Dataset configurations
139-
│ ├── _commons.yaml # Common dataset parameters
140-
│ ├── celeba.yaml
141-
│ ├── cub.yaml
142-
│ ├── sachs.yaml
143-
│ └── ...
144-
└── model/ # Model architectures
145-
├── loss/ # Loss function configurations
146-
│ ├── _default.yaml # Type-aware losses (BCE, CE, MSE)
147-
│ └── weighted.yaml # Weighted type-aware losses
148-
├── metrics/ # Metric configurations
149-
│ ├── _default.yaml # Type-aware metrics (Accuracy, MAE, MSE)
150-
│ └── ...
151-
├── _commons.yaml # Common model parameters
152-
├── blackbox.yaml # Black-box baseline
153-
├── cbm_joint.yaml # Concept Bottleneck Model (Joint)
154-
├── cem.yaml # Concept Embedding Model
155-
├── cgm.yaml # Concept Graph Model
156-
└── c2bm.yaml # Causally Reliable CBM
157-
```
158-
│ ├── default.yaml # Type-aware metrics (Accuracy, MAE, MSE)
159-
│ └── ...
160-
├── _commons.yaml # Common model parameters
161-
├── blackbox.yaml # Black-box baseline
162-
├── cbm.yaml # Concept Bottleneck Model
163-
├── cem.yaml # Concept Embedding Model
164-
├── cgm.yaml # Concept Graph Model
165-
└── c2bm.yaml # Causally Reliable CBM
128+
├── _default.yaml # Base configuration with defaults
129+
├── sweep.yaml # Example sweep configuration
130+
├── dataset/ # Dataset configurations
131+
│ ├── _commons.yaml # Common dataset parameters
132+
│ ├── cub.yaml # CUB-200-2011 birds dataset
133+
│ ├── celeba.yaml # CelebA faces dataset
134+
│ └── ... # More datasets
135+
├── loss/ # Loss function configurations
136+
│ ├── standard.yaml # Standard type-aware losses
137+
│ └── weighted.yaml # Weighted type-aware losses
138+
├── metrics/ # Metric configurations
139+
│ └── standard.yaml # Type-aware metrics (Accuracy)
140+
└── model/ # Model architectures
141+
├── _commons.yaml # Common model parameters
142+
├── blackbox.yaml # Black-box baseline
143+
├── cbm.yaml # Alias for CBM Joint
144+
└── cbm_joint.yaml # Concept Bottleneck Model (Joint)
166145
```
167146

168147

169148
## Dataset Configuration (`dataset/*.yaml`)
170149

171-
Dataset configurations specify the dataset class to instantiate, all data-specific parameters, and all necessary preprocessing parameters. An example configuration for the CUB dataset is provided below:
150+
Dataset configurations specify the dataset class to instantiate, all data-specific parameters, and all necessary preprocessing parameters. An example configuration for the CUB-200-2011 birds dataset is provided below:
172151

173152
```yaml
174153
defaults:
175154
- _commons
176155
- _self_
177156

178-
_target_: torch_concepts.data.datamodules.CUBDataModule # the path to your datamodule class
157+
_target_: torch_concepts.data.datamodules.CUBDataModule
179158

180159
name: cub
181160

182161
backbone:
183-
_target_: "path.to.your.backbone.ClassName"
184-
# ... (backbone arguments)
162+
_target_: torchvision.models.resnet18
163+
pretrained: true
185164

186-
precompute_embs: true # precompute input to speed up training
165+
precompute_embs: true # precompute embeddings to speed up training
187166

188167
default_task_names: [bird_species]
189168

@@ -197,7 +176,8 @@ label_descriptions:
197176

198177
### Common Parameters
199178

200-
Default parameters, common to all dataset, are in `_commons.yaml`:
179+
Default parameters, common to all datasets, are in `_commons.yaml`:
180+
201181
- **`batch_size`**: Training batch size (default: 256)
202182
- **`val_size`**: Validation set fraction (default: 0.15)
203183
- **`test_size`**: Test set fraction (default: 0.15)
@@ -212,38 +192,37 @@ Model configurations specify the architecture, loss, metrics, optimizer, and inf
212192
```yaml
213193
defaults:
214194
- _commons
215-
- loss: _default
216-
- metrics: _default
217195
- _self_
218196

219-
_target_: "torch_concepts.nn.ConceptBottleneckModel_Joint"
197+
_target_: torch_concepts.nn.ConceptBottleneckModel_Joint
220198

221199
task_names: ${dataset.default_task_names}
222200

223201
inference:
224-
_target_: "torch_concepts.nn.DeterministicInference"
202+
_target_: torch_concepts.nn.DeterministicInference
225203
_partial_: true
226204

227205
summary_metrics: true # enable/disable summary metrics over concepts
228206
perconcept_metrics: false # enable/disable per-concept metrics
229207
```
230208
231-
### Common Parameters
209+
### Model Common Parameters
232210
233211
From `_commons.yaml`:
212+
234213
- **`encoder_kwargs`**: Encoder architecture parameters
235214
- **`hidden_size`**: Hidden layer dimension in encoder
236215
- **`n_layers`**: Number of hidden layers in encoder
237216
- **`activation`**: Activation function (relu, tanh, etc.) in encoder
238217
- **`dropout`**: Dropout probability in encoder
239-
- **`variable_distributions`**: Probability distributions with which concepts are modeled:
218+
- **`variable_distributions`**: Probability distributions with which concepts are modeled
240219
- **`optim_class`**: Optimizer class
241220
- **`optim_kwargs`**:
242221
- **`lr`**: 0.00075
243222

244223
and more...
245224

246-
### Loss Configuration (`model/loss/_default.yaml`)
225+
### Loss Configuration (`loss/standard.yaml`)
247226

248227
Type-aware losses automatically select appropriate loss functions based on variable types:
249228

@@ -264,7 +243,7 @@ fn_collection:
264243
# ... not supported yet
265244
```
266245

267-
### Metrics Configuration (`model/metrics/_default.yaml`)
246+
### Metrics Configuration (`metrics/standard.yaml`)
268247

269248
Type-aware metrics automatically select appropriate metrics based on variable types:
270249

@@ -306,32 +285,40 @@ This involves the following steps:
306285
- Run experiments using your model.
307286

308287
If your model is compatible with the default configuration structure, you can run experiments directly as follows:
288+
309289
```bash
310-
python run_experiment.py model=your_model dataset=...
290+
python run_experiment.py model=your_model dataset=cub
311291
```
312-
Alernatively, create your own sweep file `conf/your_sweep.yaml` containing your mdoel and run:
292+
293+
Alternatively, create your own sweep file `conf/your_sweep.yaml` containing your model and run:
294+
313295
```bash
314-
python run_experiment.py --config-file your_sweep.yaml
296+
python run_experiment.py --config-name your_sweep
315297
```
316298

317299
---
318300

319301
## Implementing Your Own Dataset
302+
320303
Create your dataset in Conceptarium by following the guidelines given in [torch_concepts/examples/contributing/dataset.md](../examples/contributing/dataset.md).
321304

322305
This involves the following steps:
306+
323307
- Create the dataset (`your_dataset.py`).
324308
- Create the datamodule (`your_datamodule.py`) wrapping the dataset.
325309
- Create configuration file in `conceptarium/conf/dataset/your_dataset.yaml`, targeting the datamodule class.
326-
- Run experiments using your dataset.
310+
- Run experiments using your dataset.
327311

328312
If your dataset is compatible with the default configuration structure, you can run experiments directly as follows:
313+
329314
```bash
330-
python run_experiment.py dataset=your_dataset model=...
315+
python run_experiment.py dataset=your_dataset model=cbm_joint
331316
```
317+
332318
Alternatively, create your own sweep file `conf/your_sweep.yaml` containing your dataset and run:
319+
333320
```bash
334-
python run_experiment.py --config-name your_sweep.yaml
321+
python run_experiment.py --config-name your_sweep
335322
```
336323

337324
---

doc/guides/using.rst

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -78,7 +78,7 @@ Pick the best entry point based on your experience:
7878
Start from the High-Level API to use pre-defined models with one line of code.
7979

8080
.. grid-item-card:: :octicon:`beaker;1em;sd-text-primary` No experience with programming?
81-
:link: modules/conceptarium
81+
:link: using_conceptarium
8282
:link-type: doc
8383
:shadow: lg
8484
:class-card: sd-border-primary
@@ -138,3 +138,4 @@ Need Help?
138138
using_mid_level_proba
139139
using_mid_level_causal
140140
using_high_level
141+
using_conceptarium

0 commit comments

Comments
 (0)