Skip to content

bug: Module finder is obsolete if target is created right before load, causing a broken export #410

@bswck

Description

@bswck

Description of the bug

To Reproduce

During finding a minimal reproducer for #407, I found another bug.
I imagine this code path could happen in some sophisticated file-generating system.

Two loaders, before and after creating the destination path, report different results:

import os
import pprint
from contextlib import suppress

from griffe import GriffeLoader

with suppress(FileNotFoundError):
    os.rmdir("ns")

l1 = GriffeLoader()
os.makedirs("ns", exist_ok=False)
l2 = GriffeLoader()

l1_result = l1.load("ns")
l2_result = l2.load("ns")

print("Loader 1 (loader instantiated before creating ns/):")
pprint.pprint(l1_result.as_dict())

print("Loader 2 (loader instantiated after creating ns/):")
pprint.pprint(l2_result.as_dict())

The reported results are not the same, and that's flaky:

Loader 1 (loader instantiated before creating ns/):
{'analysis': 'dynamic',
 'filepath': '/home/bswck/Pawamoy/griffe/ns',
 'git_info': {'commit_hash': '0709f0d15411b1c1707fc43f10def8c38b3eba93',
              'remote_url': 'https://github.com/mkdocstrings/griffe',
              'repository': PosixPath('/home/bswck/Pawamoy/griffe'),
              'service': 'github'},
 'kind': <Kind.MODULE: 'module'>,
 'members': {'__doc__': {'analysis': 'dynamic',
                         'docstring': <griffe._internal.models.Docstring object at 0x7f0da260a3c0>,
                         'kind': <Kind.ATTRIBUTE: 'attribute'>,
                         'labels': {'module-attribute'},
                         'name': '__doc__',
                         'runtime': True,
                         'value': 'None'},
             '__file__': {'analysis': 'dynamic',
                          'docstring': <griffe._internal.models.Docstring object at 0x7f0da25f25d0>,
                          'kind': <Kind.ATTRIBUTE: 'attribute'>,
                          'labels': {'module-attribute'},
                          'name': '__file__',
                          'runtime': True,
                          'value': 'None'},
             '__name__': {'analysis': 'dynamic',
                          'docstring': <griffe._internal.models.Docstring object at 0x7f0da25f2990>,
                          'kind': <Kind.ATTRIBUTE: 'attribute'>,
                          'labels': {'module-attribute'},
                          'name': '__name__',
                          'runtime': True,
                          'value': "'ns'"},
             '__package__': {'analysis': 'dynamic',
                             'docstring': <griffe._internal.models.Docstring object at 0x7f0da25b6ea0>,
                             'kind': <Kind.ATTRIBUTE: 'attribute'>,
                             'labels': {'module-attribute'},
                             'name': '__package__',
                             'runtime': True,
                             'value': "'ns'"},
             '__path__': {'analysis': 'dynamic',
                          'docstring': <griffe._internal.models.Docstring object at 0x7f0da25b7230>,
                          'kind': <Kind.ATTRIBUTE: 'attribute'>,
                          'labels': {'module-attribute'},
                          'name': '__path__',
                          'runtime': True,
                          'value': "_NamespacePath(['/home/bswck/Pawamoy/griffe/ns'])"}},
 'name': 'ns',
 'runtime': True}
Loader 2 (loader instantiated after creating ns/):
{'filepath': ['/home/bswck/Pawamoy/griffe/ns'],
 'git_info': {'commit_hash': '0709f0d15411b1c1707fc43f10def8c38b3eba93',
              'remote_url': 'https://github.com/mkdocstrings/griffe',
              'repository': PosixPath('/home/bswck/Pawamoy/griffe'),
              'service': 'github'},
 'kind': <Kind.MODULE: 'module'>,
 'name': 'ns',
 'runtime': True}

It's because the first loader's finder got obsolete as soon as ns/ got created.

That, consequently, causes that loader to produce seemingly broken state on export:

Object.from_json(l1_result.as_json())
Traceback (most recent call last):
  File "/home/bswck/Pawamoy/griffe/t.py", line 24, in <module>
    Object.from_json(l1_result.as_json())
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/home/bswck/Pawamoy/griffe/src/griffe/_internal/mixins.py", line 247, in from_json
    obj = json.loads(json_string, **kwargs)
  File "/usr/lib64/python3.13/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
           ~~~~~~~~~~~~~~~~^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 345, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 361, in raw_decode
    obj, end = self.scan_once(s, idx)
               ~~~~~~~~~~~~~~^^^^^^^^
  File "/home/bswck/Pawamoy/griffe/src/griffe/_internal/encoders.py", line 367, in json_decoder
    return _loader_map[kind](obj_dict)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/home/bswck/Pawamoy/griffe/src/griffe/_internal/encoders.py", line 279, in _load_attribute
    lineno=obj_dict["lineno"],
           ~~~~~~~~^^^^^^^^^^
KeyError: 'lineno'

Whereas Object.from_json() on the second loader correctly reproduces #407:

Traceback (most recent call last):
  File "/home/bswck/Pawamoy/griffe/t.py", line 24, in <module>
    Object.from_json(l2_result.as_json())
    ~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^
  File "/home/bswck/Pawamoy/griffe/src/griffe/_internal/mixins.py", line 247, in from_json
    obj = json.loads(json_string, **kwargs)
  File "/usr/lib64/python3.13/json/__init__.py", line 359, in loads
    return cls(**kw).decode(s)
           ~~~~~~~~~~~~~~~~^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 345, in decode
    obj, end = self.raw_decode(s, idx=_w(s, 0).end())
               ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/json/decoder.py", line 361, in raw_decode
    obj, end = self.scan_once(s, idx)
               ~~~~~~~~~~~~~~^^^^^^^^
  File "/home/bswck/Pawamoy/griffe/src/griffe/_internal/encoders.py", line 367, in json_decoder
    return _loader_map[kind](obj_dict)
           ~~~~~~~~~~~~~~~~~^^^^^^^^^^
  File "/home/bswck/Pawamoy/griffe/src/griffe/_internal/encoders.py", line 197, in _load_module
    filepath=Path(obj_dict["filepath"]) if "filepath" in obj_dict else None,
             ~~~~^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/lib64/python3.13/pathlib/_local.py", line 503, in __init__
    super().__init__(*args)
    ~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/lib64/python3.13/pathlib/_local.py", line 132, in __init__
    raise TypeError(
    ...<2 lines>...
        f"not {type(path).__name__!r}")
TypeError: argument should be a str or an os.PathLike object where __fspath__ returns a str, not 'list'

I applied a thread-unsafe workaround patch that defers creating the finder until first load:

diff --git a/src/griffe/_internal/loader.py b/src/griffe/_internal/loader.py
index 15ea774..cfe2be8 100644
--- a/src/griffe/_internal/loader.py
+++ b/src/griffe/_internal/loader.py
@@ -9,6 +9,7 @@ import sys
 import tempfile
 from contextlib import suppress
 from datetime import datetime, timezone
+from functools import cached_property
 from importlib.util import find_spec
 from pathlib import Path
 from typing import TYPE_CHECKING, ClassVar, cast
@@ -90,13 +91,17 @@ class GriffeLoader:
         """Whether to force inspecting (importing) modules, even when sources were found."""
         self.store_source: bool = store_source
         """Whether to store source code in the lines collection."""
-        self.finder: ModuleFinder = ModuleFinder(search_paths)
-        """The module source finder."""
+        self._search_paths: Sequence[str | Path] | None = search_paths
         self._time_stats: dict = {
             "time_spent_visiting": 0,
             "time_spent_inspecting": 0,
         }
 
+    @cached_property
+    def finder(self) -> ModuleFinder:
+        """The module source finder."""
+        return ModuleFinder(search_paths=self._search_paths)
+
     def load(
         self,
         objspec: str | Path | None = None,

and now the results are identical:

Loader 1 (loader instantiated before creating ns/):
{'filepath': ['/home/bswck/Pawamoy/griffe/ns'],
 'git_info': {'commit_hash': '0709f0d15411b1c1707fc43f10def8c38b3eba93',
              'remote_url': 'https://github.com/mkdocstrings/griffe',
              'repository': PosixPath('/home/bswck/Pawamoy/griffe'),
              'service': 'github'},
 'kind': <Kind.MODULE: 'module'>,
 'name': 'ns',
 'runtime': True}
Loader 2 (loader instantiated after creating ns/):
{'filepath': ['/home/bswck/Pawamoy/griffe/ns'],
 'git_info': {'commit_hash': '0709f0d15411b1c1707fc43f10def8c38b3eba93',
              'remote_url': 'https://github.com/mkdocstrings/griffe',
              'repository': PosixPath('/home/bswck/Pawamoy/griffe'),
              'service': 'github'},
 'kind': <Kind.MODULE: 'module'>,
 'name': 'ns',
 'runtime': True}

This works around most cases of such a race, but will still fail if the directory is created after the module finder is instantiated lazily during the load() call. I'll look into that output and see if we can fix it, as it seems like the timing of creating the module finder shouldn't matter in the first place.

For now, we might just merge that patch to prevent 99% of problematic cases... unless load() can be called from multiple threads. Can it? @pawamoy

@cached_property stopped being thread-safe in some recent version of Python because of lock contention.

Environment information

- __System__: Linux-6.13.10-200.fc41.x86_64-x86_64-with-glibc2.41
- __Python__: cpython 3.13.7 (/home/bswck/Pawamoy/griffe/.venv/bin/python3)
- __Environment variables__:
- __Installed packages__:
  - `griffe` v1.14.1.dev21+g0709f0d```

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions