-
Notifications
You must be signed in to change notification settings - Fork 58
Description
Hey!
I am new to the discogs_client
and api. I am trying to learn how to use it to wrangle together a discography information. Fundamentally I would like to get just all songs from a given artist, but I have, through deezer
, spotipy
, lyricsgenius
, ytmusicapi
and BeautifulSoup
quickly learned that oof this data is both messy and not standardized very well no matter where I look e.g. (Trackname about Thing
vs Trackname About Thing
vs Trackname (About Thing)
vs Trackname - About Thing
) and while this is can be mitigated through throughly scrubbing and normalizing names, it gets hard as not everyone separates what is a remix, cover, alternate version, extended version etc. All of this makes mapping songs to albums and even finding the base version of the song even more difficult, especially when combining apis.
I think Discogs has a nice approach. To my understanding (from the official API documentation)
Release
The Release resource represents a particular physical or digital object released by one or more Artists.
Master Release
The Master resource represents a set of similar Releases. Masters (also known as "master releases") have a "main release" which is often the chronologically earliest.
Master Release Versions
Retrieves a list of all Releases that are versions of this master. Accepts Pagination parameters.
Artist Releases
Returns a list of Releases and Masters associated with the Artist.
So regardless of EP, Single, or Album, if a generic entity was released for the first time a master version gets created which represents that unique entity, to which a "main version" is immediately created and tethered representing the first release (and subsequently when the master gets new versions they appear as part of the versions
list). Accordingly, a single (or a standalone track) , even if it is a master, has a tracklist
corresponding to list of one containing track data (which is mostly empty). Notably track
objects (in your api) do not have an id property
So my question, if I want to get the full discography from discogs the process should look something like this?
- Get all artist alias objects. Artist aliases have a different id and return a different set of releases**!**
- Get all releases (master or otherwise) for all artist aliases (main included)
- Recursively follow all release links (master --> main_release, master --> versions) to make sure we aren't missing any (and that discogs doesn't accidentally have a missed entries in the DB) and keep track of the link tree
- Create a messy graph mapping "master" tracks to all their versions (including main release) each of which has 1 or more album/releases?
e.g.
| Track | Version | Album | Artists |
|--------------------------------|
| T1 | T1v1 | EP | A1 |
| T1 | T1v1 | EP | A2 |
| T1 | T1v1 | Alb1 | A1 |
| T1 | T1v1 | Alb1 | A2 |
| T1 | T1ext | Alb1 | A1 |
| T1 | T1ext | Alb1 | A2 |
So in this example there a "core" track T1 that has two versions T1v1 and T1ext, which has two artists and is on two different albums.
Question How to tell if a release is a single vs an EP?
I have a master track that has a trackless of just one. Do I just compare titles?
Or do I look at release.formats
[{'name': 'CDr', 'qty': '1', 'descriptions': ['Single', 'Promo']}]
and check if 'Single'
is in descriptions?
Not every release object (master / main / version) has a "type" (<-- if master) or "type_" (<--- if track in a trackless).
Metadata
Metadata
Assignees
Labels
Type
Projects
Status