mctigger
diff --git a/‎docs/tensor_distribution/development.md‎
Lines changed: 93 additions & 0 deletions b/‎docs/tensor_distribution/development.md‎
Lines changed: 93 additions & 0 deletions
diff --git a/‎refactored.md‎
Lines changed: 0 additions & 175 deletions b/‎refactored.md‎
Lines changed: 0 additions & 175 deletions
diff --git a/‎src/tensorcontainer/tensor_distribution/beta.py‎
Lines changed: 2 additions & 9 deletions b/‎src/tensorcontainer/tensor_distribution/beta.py‎
Lines changed: 2 additions & 9 deletions
diff --git a/‎src/tensorcontainer/tensor_distribution/continuous_bernoulli.py‎
Lines changed: 12 additions & 24 deletions b/‎src/tensorcontainer/tensor_distribution/continuous_bernoulli.py‎
Lines changed: 12 additions & 24 deletions
@@ -0,0 +1,93 @@
+# TensorDistribution Development Guide
+
+This document outlines the design requirements and implementation patterns for [`TensorDistribution`](/src/tensorcontainer/tensor_distribution/base.py) and its subclasses in the [`src/tensorcontainer/tensor_distribution/`](/src/tensorcontainer/tensor_distribution/) module.
+
+## Objective
+
+The [`tensor_distribution`](/src/tensorcontainer/tensor_distribution/) module provides `torch.distributions.Distribution` functionality that enables direct application of tensor operations to probability distributions. The module maintains complete signature compatibility with `torch.distributions` while extending [`TensorContainer`](/src/tensorcontainer/tensor_container.py) functionality through [`TensorAnnotated`](/src/tensorcontainer/tensor_annotated.py) inheritance.
+
+## Architecture Requirements
+
+### Signature Compatibility
+
+All classes in [`tensorcontainer.tensor_distribution`](/src/tensorcontainer/tensor_distribution/) must maintain exact signature compatibility with their corresponding `torch.distributions` classes. This compatibility is enforced through automated testing using [`tests/tensor_distribution/conftest.py::assert_init_signatures_match`](/tests/tensor_distribution/conftest.py).
+
+**Implementation Requirement**: When `torch.distributions` classes lack proper type annotations for `__init__` parameters, implementers **must** consult the class docstring to determine correct type hints.
+
+### Distribution Delegation Pattern
+
+[`TensorDistribution`](/src/tensorcontainer/tensor_distribution/base.py) subclasses **must not** implement distribution-specific logic. Instead, each subclass **must** implement a `dist()` method that constructs and returns the equivalent `torch.distributions` object using the instance's parameters.
+
+**Implementation Requirement**: The `dist()` method **must** return the raw `torch.distributions` instance, not a wrapped one (e.g., with `Independent`).
+
+**Design Principle**: [`TensorDistribution`](/src/tensorcontainer/tensor_distribution/base.py) serves as a parameter management wrapper around `torch.distributions`, delegating all distribution operations to the underlying implementation via `self.dist()` calls.
+
+### Parameter Broadcasting Requirements
+
+Many `torch.distributions` constructors accept parameters of type `Union[Number, Tensor]` or any specialization of `Number` (e.g. `float`). However, [`TensorContainer`](/src/tensorcontainer/tensor_container.py) and [`TensorDistribution`](/src/tensorcontainer/tensor_distribution/base.py) can only process `Union[Tensor, TensorContainer]` objects and require all parameters to have compatible shapes for broadcasting.
+
+**Implementation Rule**: When the constructor signature contains `Union[Number, Tensor]` or any specialization of `Number` parameters, implementations **must** use `torch.distributions.utils.broadcast_all` to:
+1. Convert scalar numbers to tensors
+2. Broadcast all parameters to a common shape
+
+This preprocessing ensures proper shape and device management within the [`TensorAnnotated`](/src/tensorcontainer/tensor_annotated.py) framework.
+
+**Decision Criterion**: If the constructor signature does not contain `Union[Number, Tensor]` parameters, simpler parameter handling approaches should be preferred.
+
+### Validation Strategy
+
+[`TensorDistribution`](/src/tensorcontainer/tensor_distribution/base.py) accepts a `validate_args` parameter during initialization and stores it as the `_validate_args` attribute of the base class. Subclasses must pass this value to the underlying `torch.distributions` object (if the constructor supports it).
+
+**Validation Policy**: Parameter validation for [`TensorDistribution`](/src/tensorcontainer/tensor_distribution/base.py) subclasses is generally unnecessary because the `TensorDistribution.__init__` method constructs the underlying distribution once via `self.dist()`, triggering parameter validation in the `torch.distributions` implementation.
+
+**Exception Handling**: Implementations should only raise validation errors when required parameters needed for device and shape inference are missing or invalid.
+
+### Property Implementation Pattern
+
+Following the `torch.distributions.Distribution` pattern, basic distribution properties are provided through the [`TensorDistribution`](/src/tensorcontainer/tensor_distribution/base.py) base class via delegation to `self.dist()`.
+
+**Specialization Rule**: Distribution-specific properties **must** be implemented only in the corresponding subclass, maintaining the same delegation pattern to the underlying `torch.distributions` object.
+
+## Implementation Patterns
+
+### Annotated Attribute Pattern
+
+All tensor parameters must be declared as annotated class attributes to enable automatic transformation by [`TensorAnnotated`](/src/tensorcontainer/tensor_annotated.py) operations (e.g., `.to()`, `.expand()`).
+
+**Example Pattern**:
+```python
+class TensorNormal(TensorDistribution):
+    _loc: Tensor
+    _scale: Tensor
+
+    def __init__(self, loc: Tensor, scale: Tensor, validate_args: Optional[bool] = None):
+        self._loc = loc
+        self._scale = scale
+        super().__init__(loc.shape, loc.device, validate_args)
+
+    def dist(self) -> Distribution:
+        return Normal(self._loc, self._scale, validate_args=self._validate_args)
+```
+
+Note: If parameters like `loc` and `scale` could be scalars in the constructor signature, apply the broadcasting rules described in the "Parameter Broadcasting" section before assignment to ensure proper tensor handling.
+
+### Lazy Distribution Creation
+
+The actual `torch.distributions.Distribution` instance is created on-demand through the `dist()` method. This lazy evaluation pattern enables efficient tensor operations without premature distribution instantiation.
+
+### Reconstruction Pattern
+
+The `_unflatten_distribution()` class method reconstructs distribution instances from serialized tensor and metadata attributes. This method is called by `_init_from_reconstructed()` during operations like `.to()` and `.expand()`.
+
+**Customization Requirement**: Subclasses with complex parameter relationships **must** override `_unflatten_distribution()` to implement appropriate reconstruction logic.
+
+**Example Implementation**:
+```python
+@classmethod
+def _unflatten_distribution(cls, attributes: Dict[str, Any]):
+    """For TensorCategorical, extract _probs and _logits from attributes."""
+    return cls(
+        probs=attributes.get("_probs"),
+        logits=attributes.get("_logits"),
+        validate_args=attributes.get("_validate_args"),
+    )
@@ -1,11 +1,10 @@
 from __future__ import annotations
 
-from typing import Any, Dict, Optional, get_args
+from typing import Any, Dict, Optional
 
 from torch import Tensor
 from torch.distributions import Beta
 from torch.distributions.utils import broadcast_all
-from torch.types import Number
 
 from .base import TensorDistribution
 
@@ -37,13 +36,7 @@ def __init__(
         self._concentration1, self._concentration0 = broadcast_all(
             concentration1, concentration0
         )
-
-        if isinstance(concentration1, get_args(Number)) and isinstance(
-            concentration0, get_args(Number)
-        ):
-            shape = tuple()
-        else:
-            shape = self._concentration1.shape
+        shape = self._concentration1.shape
 
         device = self._concentration1.device
 
 
@@ -1,8 +1,9 @@
-from typing import Optional, Tuple, Union, get_args
+from typing import Any, Dict, Optional, Tuple, Union
 
 import torch
 from torch import Tensor
 from torch.distributions import ContinuousBernoulli as TorchContinuousBernoulli
+from torch.distributions.utils import broadcast_all
 from torch.types import Number
 
 from .base import TensorDistribution
@@ -21,34 +22,21 @@ def __init__(
         validate_args: Optional[bool] = None,
     ) -> None:
         self._lims = lims
-
-        if probs is not None and logits is not None:
+        if (probs is None) == (logits is None):
             raise ValueError(
                 "Either `probs` or `logits` must be specified, but not both."
             )
-        elif probs is None and logits is None:
-            raise ValueError("Either `probs` or `logits` must be specified.")
-
-        if probs is not None and isinstance(probs, get_args(Number)):
-            self._probs = torch.tensor(probs)
-        else:
-            self._probs = probs
 
-        if logits is not None and isinstance(logits, get_args(Number)):
-            self._logits = torch.tensor(logits)
+        if probs is not None:
+            (self._probs,) = broadcast_all(probs)
+            self._logits = None
         else:
-            self._logits = logits
+            (self._logits,) = broadcast_all(logits)
+            self._probs = None
 
-        if self._probs is not None:
-            batch_shape = self._probs.shape
-            device = self._probs.device
-        elif self._logits is not None:
-            batch_shape = self._logits.shape
-            device = self._logits.device
-        else:
-            # This case should ideally not be reached due to the checks above,
-            # but as a fallback for type inference or future changes.
-            raise ValueError("Either `probs` or `logits` must be specified.")
+        data = self._probs if self._probs is not None else self._logits
+        batch_shape = data.shape  # type: ignore
+        device = data.device  # type: ignore
 
         super().__init__(shape=batch_shape, device=device, validate_args=validate_args)
 
@@ -63,7 +51,7 @@ def dist(self) -> TorchContinuousBernoulli:
     @classmethod
     def _unflatten_distribution(
         cls,
-        attributes: dict,
+        attributes: Dict[str, Any],
     ) -> "TensorContinuousBernoulli":
         return cls(
             probs=attributes.get("_probs"),