Skip to content

Conversation

@snejus
Copy link
Member

@snejus snejus commented Dec 24, 2025

Replace python-musicbrainzngs with Custom Lightweight MusicBrainz Client

Core Problem Solved

Before: Beets depended on the external python-musicbrainzngs library (v0.7.1) for all MusicBrainz API interactions. This dependency required separate installation for multiple plugins and introduced an abstraction layer that obscured direct HTTP semantics.

After: Custom lightweight MusicBrainz client (beetsplug.utils.musicbrainz) built directly on requests and requests-ratelimiter, eliminating the external dependency while maintaining full API compatibility.


Architecture Overview

1. New MusicBrainz Client Foundation

Created beetsplug/utils/musicbrainz.py with three core components:

MusicBrainzAPI (base client)
├─ Configuration-driven initialization (from config['musicbrainz'])
├─ Rate-limited session via LimiterTimeoutSession
├─ Generic entity fetching (get_entity, search_entity)
├─ Specialized methods (get_release, get_recording, get_work)
└─ Recursive relation grouping (group_relations)

MusicBrainzUserAPI (authenticated operations)
├─ Extends MusicBrainzAPI with HTTPDigestAuth
├─ Collection management (get_collections)
└─ User-specific API operations

MBCollection (collection manipulation)
├─ Paginated release fetching
├─ Chunked PUT operations (add_releases)
└─ Chunked DELETE operations (remove_releases)

2. Config parsing now centralized in MusicBrainzAPI.__post_init__().

And applies to all plugins.

3. Mixin Pattern for Plugin Integration

# OLD: Plugins directly instantiated musicbrainzngs
import musicbrainzngs
musicbrainzngs.auth(user, pass)
resp = musicbrainzngs.get_recording_by_id(id, includes=['releases'])

# NEW: Plugins inherit MusicBrainzAPIMixin
class MyPlugin(MusicBrainzAPIMixin, BeetsPlugin):
    def some_method(self):
        recording = self.mbapi.get_recording(id, includes=['releases'])

Affected plugins:

  • musicbrainzMusicBrainzAPIMixin
  • listenbrainzMusicBrainzAPIMixin
  • mbcollectionMusicBrainzUserAPIMixin (requires authentication)
  • missingMusicBrainzAPIMixin
  • parentworkMusicBrainzAPIMixin

Plugin-Specific Refactoring

mbcollection: From Procedural to Object-Oriented

Before: Module-level functions (submit_albums, mb_call) with error handling scattered across multiple try-except blocks.

After: MBCollection dataclass encapsulating collection operations:

# OLD approach
collection_id = get_collection()
albums_in_collection = get_albums_in_collection(collection_id)
submit_albums(collection_id, album_ids)

# NEW approach
collection = self.collection  # cached property with validation
collection.add_releases(album_ids)
collection.remove_releases(removed_ids)

Key improvements:

  • Eliminated mb_call error wrapper (errors propagate naturally from RequestHandler)
  • Consolidated pagination logic into MBCollection.releases property
  • Type hints for Library, ImportSession, ImportTask parameters

listenbrainz: Search Query Simplification

Before: Called musicbrainzngs.search_recordings() with constructed query strings.

After: Uses self.mbapi.search_entity() which handles query formatting:

# OLD
resp = musicbrainzngs.search_recordings(
    query=f'track:{track_name} AND release:{album_name}',
    strict=True
)
if resp.get('recording-count') > 1:
    return resp['recording-list'][0].get('id')

# NEW
for recording in self.mbapi.search_entity('recording', 
                                           {'track': track_name, 
                                            'release': album_name}):
    return recording['id']
return None

parentwork: Inline Traversal Logic

Before: Module-level functions (direct_parent_id, work_parent_id, find_parent_work_info) that made sequential API calls.

After: Single method find_parent_work_info that traverses work hierarchy inline:

def find_parent_work_info(self, mbworkid: str) -> tuple[dict, str | None]:
    workdate = None
    parent_id = mbworkid
    
    while parent_id := current_id:  # walrus operator
        workinfo = self.mbapi.get_work(current_id, includes=['work-rels', 'artist-rels'])
        workdate = workdate or extract_composer_date(workinfo)
        parent_id = find_parent_in_relations(workinfo)
    
    return workinfo, workdate

Eliminates three function calls per traversal level.


Dependency & Installation Impact

Package Dependencies

# pyproject.toml
-musicbrainzngs = {version = ">=0.4", optional = true}

# Removed from extras groups
-listenbrainz = ["musicbrainzngs"]
-mbcollection = ["musicbrainzngs"]
-missing = ["musicbrainzngs"]
-parentwork = ["musicbrainzngs"]

Result: Four plugin extras no longer require external package installation beyond requests (already a core dependency).

CI Workflow

# .github/workflows/ci.yaml
-poetry install --extras=parentwork
+poetry install  # parentwork now part of core

The parentwork extra is removed entirely since it had no other dependencies.

Documentation Updates

Removed "Installation" sections from four plugin docs that previously required:

pip install beets[listenbrainz]  # NO LONGER NEEDED
pip install beets[mbcollection]  # NO LONGER NEEDED
pip install beets[missing]       # NO LONGER NEEDED
pip install beets[parentwork]    # NO LONGER NEEDED

Plugins now work out-of-the-box with just plugins: [listenbrainz, ...] in config.


Testing Improvements

New Shared Fixture: requests_mock

Created test/plugins/conftest.py with fixture that disables rate limiting during tests:

@pytest.fixture
def requests_mock(requests_mock, monkeypatch):
    """Use plain session wherever MB requests are mocked."""
    monkeypatch.setattr(
        'beetsplug.utils.musicbrainz.MusicBrainzAPI.session',
        requests.Session,
    )
    return requests_mock

This avoids artificial delays when mocking HTTP responses.

New Plugin Test Suites

Added comprehensive tests for previously untested plugins:

  1. test_listenbrainz.py: Tests recording ID lookup and track info fetching
  2. test_mbcollection.py: Tests collection validation, pagination, and sync operations
  3. test_missing.py: Tests missing album detection logic
  4. test/plugins/utils/test_musicbrainz.py: Tests group_relations transformation

Test Migration

Moved test_group_relations from test_musicbrainz.py to test/plugins/utils/test_musicbrainz.py (84 lines) since group_relations is now a utility function.


Migration Benefits

Aspect Before After
External dependencies python-musicbrainzngs (0.7.1) None (uses existing requests)
Plugin installation pip install beets[plugin] required Works with base install
API surface area ~50 functions in musicbrainzngs ~10 methods tailored to Beets
Error messages Generic exceptions with status codes Full HTTP response text included
Response structure Raw MusicBrainz JSON Normalized with grouped relations
Code ownership External maintenance dependency Direct control over API client
Test speed Rate-limited even with mocks Fixture disables limits for mocked requests
Type safety Minimal type hints in musicbrainzngs Full type hints (JSONDict, list[str])

Backward Compatibility

✅ Fully backward compatible:

  • All existing plugin APIs unchanged from user perspective
  • Configuration keys remain identical (musicbrainz.user, musicbrainz.pass, etc.)
  • MusicBrainz API responses maintain same structure (with additional normalization)
  • Test suite passes without modification to integration tests

Breaking changes: None from end-user perspective.

closes #6265

@snejus snejus requested a review from a team as a code owner December 24, 2025 04:09
@snejus snejus requested review from JOJ0 and semohr December 24, 2025 04:09
Copy link
Contributor

@sourcery-ai sourcery-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey - I've found 4 issues, and left some high level feedback:

  • In ListenBrainzPlugin.get_mb_recording_id the return type is annotated as int | None but the method returns the string recording['id']; update the annotation (or cast) so the declared and actual types match.
  • The use of an empty-string key in MusicBrainzAPI.search_entity (e.g., in ListenBrainzPlugin.get_mb_recording_id) is a bit obscure; consider adding an explicit parameter or helper for the "bare" query term instead of encoding this with a special key value.
Prompt for AI Agents
Please address the comments from this code review:

## Overall Comments
- In `ListenBrainzPlugin.get_mb_recording_id` the return type is annotated as `int | None` but the method returns the string `recording['id']`; update the annotation (or cast) so the declared and actual types match.
- The use of an empty-string key in `MusicBrainzAPI.search_entity` (e.g., in `ListenBrainzPlugin.get_mb_recording_id`) is a bit obscure; consider adding an explicit parameter or helper for the "bare" query term instead of encoding this with a special key value.

## Individual Comments

### Comment 1
<location> `beetsplug/listenbrainz.py:133-134` </location>
<code_context>
         return tracks

-    def get_mb_recording_id(self, track):
+    def get_mb_recording_id(self, track) -> int | None:
         """Returns the MusicBrainz recording ID for a track."""
-        resp = musicbrainzngs.search_recordings(
</code_context>

<issue_to_address>
**suggestion (bug_risk):** The return type of `get_mb_recording_id` should be `str | None`, not `int | None`.

This function returns MusicBrainz recording MBIDs, which are UUID strings. `self.mb_api.search_entity(...)` produces dicts where `"id"` is a string and you return `recording["id"]` directly, so annotating the return type as `int | None` is incorrect and may confuse callers or type checkers. Please update the signature to `-> str | None` to match the actual return type.

```suggestion
    def get_mb_recording_id(self, track) -> str | None:
        """Returns the MusicBrainz recording ID for a track."""
```
</issue_to_address>

### Comment 2
<location> `beetsplug/_utils/requests.py:159-160` </location>
<code_context>
                 raise beets_error(response=e.response) from e
-            raise
+
+            # re-raise the same exception with the response text
+            raise e.__class__(e.response.text) from e

     def request(self, *args, **kwargs) -> requests.Response:
</code_context>

<issue_to_address>
**issue (bug_risk):** Reconstructing the HTTPError drops the original `response` object, which may break callers relying on `e.response`.

The previous implementation re-raised the original `HTTPError`, keeping its `response` attribute. The new `e.__class__(e.response.text)` call creates a new exception where `response` will usually be `None`, so existing handlers that inspect `e.response` may break. You can still add the response text without replacing the instance, e.g.:

```python
e.args = (*e.args, e.response.text)
raise
```

or by attaching the text another way, while re-raising the original exception to preserve behavior.
</issue_to_address>

### Comment 3
<location> `test/plugins/utils/test_musicbrainz.py:4-13` </location>
<code_context>
         assert candidates[0].album == "hi"
-
-
-def test_group_relations():
-    raw_release = {
-        "id": "r1",
</code_context>

<issue_to_address>
**suggestion (testing):** Extend tests for the new MusicBrainz client to cover query construction and browse helpers, not just relation grouping.

Right now we only verify `_group_relations`. The new `MusicBrainzAPI` also packs in logic that’s currently untested:
- `search_entity` query building (empty-key handling, trimming, lowercasing).
- `get_entity` `includes`/`inc` handling in `params`.
- `browse_*` helpers (e.g. `browse_release_groups`) returning the correct slice from the response.
A few `requests_mock`-based tests (like the plugin tests) that assert `search_entity('recording', {...})` hits `/ws/2/recording` with the expected query string, and that `browse_release_groups(artist='X')` returns `response['release-groups']`, would cover these shared behaviours and better exercise the new client.

Suggested implementation:

```python
from beetsplug._utils.musicbrainz import MusicBrainzAPI
from requests_mock import ANY as RM_ANY


def test_group_relations():
    api = MusicBrainzAPI()

    raw_release = {
        "id": "r1",
        "relations": [
            {"target-type": "artist", "type": "vocal", "name": "A"},
            {"target-type": "artist", "type": "instrument", "name": "B"},
            {"target-type": "url", "type": "streaming", "url": "http://s"},
            {"target-type": "url", "type": "purchase", "url": "http://p"},
            {
                "target-type": "work",
                "type": "performance",
                "work": {
                    "title": "Work title",
                    "relations": [
                        {
                            "target-type": "artist",
                            "type": "composer",
                            "name": "Composer A",
                        },
                        {
                            "target-type": "artist",
                            "type": "lyricist",
                            "name": "Lyricist B",
                        },
                    ],
                },
            },
        ],
    }

    grouped = MusicBrainzAPI._group_relations(raw_release["relations"])

    # Base-level artist relations.
    assert "artist" in grouped
    artist_names = {rel["name"] for rel in grouped["artist"]}
    assert artist_names == {"A", "B"}

    # URL relations grouped by type.
    assert "url" in grouped
    url_types = {rel["type"] for rel in grouped["url"]}
    assert url_types == {"streaming", "purchase"}

    # Nested work relations should be flattened into the same structure.
    assert "work" in grouped
    work_relations = grouped["work"]
    assert any(rel.get("work", {}).get("title") == "Work title" for rel in work_relations)


def test_search_entity_builds_query_and_hits_expected_endpoint(requests_mock):
    api = MusicBrainzAPI()

    # Mock any GET to the recording endpoint.
    requests_mock.get(
        "https://musicbrainz.org/ws/2/recording",
        json={"recordings": []},
    )

    # Mixed/whitespace keys and an empty key that should be ignored.
    query = {
        "Artist": "  Brian Eno  ",
        "title": "  AN Ending (Ascent) ",
        "": "should be ignored",
    }

    api.search_entity("recording", query)

    last = requests_mock.last_request
    # Endpoint path.
    assert last.path.startswith("/ws/2/recording")
    # Query parameters.
    params = last.qs
    # We always expect a single `query` parameter for the search expression.
    assert "query" in params
    [query_value] = params["query"]

    # Empty-key entry should not appear, keys/values should be normalized.
    assert "should be ignored" not in query_value
    assert "artist:brian eno" in query_value
    assert "title:an ending (ascent)" in query_value
    # No uppercase keys should remain in the built query.
    assert query_value == query_value.lower()


def test_get_entity_includes_are_encoded_into_inc_param(requests_mock):
    api = MusicBrainzAPI()

    mbid = "00000000-0000-0000-0000-000000000001"
    includes = ["artists", "aliases", "recordings"]

    requests_mock.get(
        f"https://musicbrainz.org/ws/2/release-group/{mbid}",
        json={"id": mbid},
    )

    api.get_entity("release-group", mbid, includes=includes)

    last = requests_mock.last_request
    assert last.path == f"/ws/2/release-group/{mbid}"

    params = last.qs
    # MusicBrainz uses `inc` for include parameters, separated by `+`.
    assert "inc" in params
    [inc_value] = params["inc"]
    for part in includes:
        assert part in inc_value
    assert "+" in inc_value or len(includes) == 1


def test_browse_release_groups_returns_release_groups_list(requests_mock):
    api = MusicBrainzAPI()

    artist_mbid = "11111111-1111-1111-1111-111111111111"
    payload = {
        "release-groups": [
            {"id": "rg1", "title": "First RG"},
            {"id": "rg2", "title": "Second RG"},
        ],
        "release-group-count": 2,
    }

    requests_mock.get(
        "https://musicbrainz.org/ws/2/release-group",
        json=payload,
    )

    result = api.browse_release_groups(artist=artist_mbid)

    # Helper should return just the `release-groups` slice from the payload.
    assert result == payload["release-groups"]

    last = requests_mock.last_request
    assert last.path == "/ws/2/release-group"
    params = last.qs
    # Browse helper should pass through the browse parameter.
    assert params.get("artist") == [artist_mbid]

```

These tests assume:
1. `MusicBrainzAPI` builds URLs under `https://musicbrainz.org/ws/2` and uses `/ws/2/<entity>` and `/ws/2/<entity>/<mbid>` paths.
2. `search_entity(entity, query_dict, ...)` performs an HTTP GET on `/ws/2/<entity>` with a `query` parameter built by:
   - Dropping empty-string keys,
   - Lowercasing keys and values,
   - Trimming whitespace,
   - Joining as `key:value` pairs in a single query string.
3. `get_entity(entity, mbid, includes=...)` performs an HTTP GET on `/ws/2/<entity>/<mbid>` and encodes `includes` into an `inc` parameter joined with `+`.
4. `browse_release_groups(**params)` performs an HTTP GET on `/ws/2/release-group` and returns `response_json["release-groups"]`.

If `MusicBrainzAPI` differs (e.g. different base URL, it delegates to `musicbrainzngs` rather than using `requests`, or it uses a different `inc` encoding), adjust:
- The mocked URLs in `requests_mock.get(...)`,
- The assertions on `last.path` and `last.qs`,
- The expectations around the `query` and `inc` parameters,
- And, if `browse_release_groups` returns the raw payload, adapt the final assertion accordingly.
</issue_to_address>

### Comment 4
<location> `docs/plugins/listenbrainz.rst:20-21` </location>
<code_context>
+    plugins:
+        - listenbrainz

 You can then configure the plugin by providing your Listenbrainz token (see
 intructions here_) and username:
</code_context>

<issue_to_address>
**issue (typo):** Fix typo in “instructions” and ensure consistent capitalization of “ListenBrainz”.

In this paragraph, update “Listenbrainz” to “ListenBrainz” and correct “intructions” to “instructions”.
</issue_to_address>

Sourcery is free for open source - if you like our reviews please consider sharing them ✨
Help me be more useful! Please click 👍 or 👎 on each comment and I'll use the feedback to improve your reviews.

@codecov
Copy link

codecov bot commented Dec 24, 2025

Codecov Report

❌ Patch coverage is 90.75630% with 22 lines in your changes missing coverage. Please review.
✅ Project coverage is 68.67%. Comparing base (ea2e7bf) to head (a801afd).
⚠️ Report is 18 commits behind head on master.
✅ All tests successful. No failed tests found.

Files with missing lines Patch % Lines
beetsplug/_utils/musicbrainz.py 91.07% 8 Missing and 2 partials ⚠️
beetsplug/parentwork.py 75.00% 4 Missing and 3 partials ⚠️
beetsplug/mbcollection.py 95.00% 3 Missing and 1 partial ⚠️
beetsplug/missing.py 80.00% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #6234      +/-   ##
==========================================
+ Coverage   68.24%   68.67%   +0.43%     
==========================================
  Files         138      138              
  Lines       18815    18532     -283     
  Branches     3167     3061     -106     
==========================================
- Hits        12840    12727     -113     
+ Misses       5302     5151     -151     
+ Partials      673      654      -19     
Files with missing lines Coverage Δ
beetsplug/_utils/requests.py 98.59% <100.00%> (+0.08%) ⬆️
beetsplug/listenbrainz.py 33.83% <100.00%> (+13.98%) ⬆️
beetsplug/mbpseudo.py 79.87% <100.00%> (ø)
beetsplug/missing.py 58.02% <80.00%> (+24.69%) ⬆️
beetsplug/mbcollection.py 92.00% <95.00%> (+66.11%) ⬆️
beetsplug/parentwork.py 70.65% <75.00%> (+3.03%) ⬆️
beetsplug/_utils/musicbrainz.py 91.07% <91.07%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@snejus snejus force-pushed the drop-musicbrainzngs branch 5 times, most recently from 0c8b87e to d015efa Compare December 24, 2025 04:34
@snejus snejus force-pushed the drop-musicbrainzngs branch 2 times, most recently from 18c7c71 to 599fdd3 Compare December 24, 2025 22:28
@snejus
Copy link
Member Author

snejus commented Dec 25, 2025

@semohr I have now typed MusicBrainzAPI properly and added it to the docs. Try generating docs locally to see how it looks and let me know if I should include anything else.

Copy link
Contributor

@semohr semohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just had another look here, the PR looks good to me and should mostly ready to merge.

It’s a bit difficult for me to test the mbcollection plugin in a live environment, but I trust that you’ve done your due diligence ;)

Copilot AI review requested due to automatic review settings January 5, 2026 22:57
@snejus snejus force-pushed the drop-musicbrainzngs branch from 7f29f40 to 09afc6e Compare January 5, 2026 22:57
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This pull request replaces the external python-musicbrainzngs dependency with a custom lightweight MusicBrainz client implementation, eliminating an external dependency and simplifying plugin installation for users.

Key changes:

  • Created new beetsplug._utils.musicbrainz module with MusicBrainzAPI, MusicBrainzUserAPI, and mixin classes
  • Refactored 6 plugins (musicbrainz, listenbrainz, mbcollection, missing, parentwork, mbpseudo) to use the new API
  • Removed musicbrainzngs from dependencies and plugin extras, enabling plugins to work without separate installation
  • Added comprehensive test coverage for previously untested plugins

Reviewed changes

Copilot reviewed 29 out of 30 changed files in this pull request and generated 7 comments.

Show a summary per file
File Description
beetsplug/_utils/musicbrainz.py New lightweight MusicBrainz client with rate limiting, request normalization, and typed search/browse/lookup methods
beetsplug/_utils/requests.py Enhanced to retry on server errors (500, 502, 503, 504) and added PUT/DELETE methods
beetsplug/musicbrainz.py Refactored to use MusicBrainzAPIMixin, removed direct API initialization
beetsplug/listenbrainz.py Updated to use mixin for MB searches instead of direct musicbrainzngs calls
beetsplug/mbcollection.py Major refactor: introduced MusicBrainzUserAPI with digest auth, MBCollection dataclass for collection operations
beetsplug/missing.py Updated error handling and API calls to use new mixin
beetsplug/parentwork.py Refactored work traversal into inline method, updated to use grouped relations from new API
beetsplug/mbpseudo.py Updated API access to use mb_api property
test/plugins/conftest.py New shared fixture for disabling rate limiting in tests
test/plugins/utils/test_musicbrainz.py New test file for group_relations functionality
test/plugins/test_parentwork.py Converted from unittest mocks to pytest fixtures with requests_mock
test/plugins/test_listenbrainz.py New test suite for recording ID lookup and track info fetching
test/plugins/test_mbcollection.py New comprehensive tests for collection validation and sync operations
test/plugins/test_missing.py New tests for missing album detection
test/plugins/utils/test_request_handler.py Added test case for server error retry behavior
pyproject.toml Removed musicbrainzngs dependency and plugin extras; added sphinx-toolbox for TypedDict docs
poetry.lock Updated with sphinx-toolbox and dependencies, removed musicbrainzngs
docs/plugins/*.rst Removed installation sections from listenbrainz, mbcollection, missing, and parentwork docs
docs/conf.py Added sphinx_toolbox configuration for TypedDict documentation
docs/api/plugin_utilities.rst New documentation page for RequestHandler and MusicBrainzAPI
docs/changelog.rst Documented the dependency replacement for plugin developers
.github/workflows/ci.yaml Removed parentwork extra from installation commands

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

@snejus snejus force-pushed the drop-musicbrainzngs branch 2 times, most recently from 2d758d8 to 7cd0e2b Compare January 5, 2026 23:50
@snejus snejus force-pushed the drop-musicbrainzngs branch from 7cd0e2b to 093b3f0 Compare January 6, 2026 00:53
@snejus snejus requested a review from semohr January 6, 2026 00:57
@snejus snejus force-pushed the drop-musicbrainzngs branch from 093b3f0 to 4e2d519 Compare January 6, 2026 09:45
@snejus snejus force-pushed the drop-musicbrainzngs branch from 4e2d519 to a801afd Compare January 6, 2026 09:54
Copy link
Contributor

@semohr semohr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems to work as expected for me and I'm happy with the code changes.

It also fixes #6265 for me.

@snejus snejus merged commit 1a899cc into master Jan 7, 2026
21 checks passed
@snejus snejus deleted the drop-musicbrainzngs branch January 7, 2026 11:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

parentwork is throwing errors about a user-agent for music brainz

4 participants