diff --git a/tutorials/intro.ipynb b/tutorials/intro.ipynb
index d14308bb1..30dff8277 100644
--- a/tutorials/intro.ipynb
+++ b/tutorials/intro.ipynb
@@ -3,28 +3,34 @@
{
"cell_type": "markdown",
"metadata": {
- "colab_type": "text",
"id": "view-in-github"
},
"source": [
" "
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "ZOWusGNiFdLN"
+ },
+ "source": [
+ "# Introduction to `pyannote.audio`"
+ ]
+ },
{
"cell_type": "markdown",
"metadata": {
"id": "1Fs2d8otYnp7"
},
"source": [
- "[`pyannote.audio`](https://github.com/pyannote/pyannote-audio) is an open-source toolkit written in Python for **speaker diarization**.\n",
+ "[`pyannote.audio`](https://github.com/pyannote/pyannote-audio) is an open-source Python toolkit for **speaker diarization** — the task of determining *“who speaks when”* by partitioning an audio conversation into speaker-specific time segments. \n",
"\n",
- "Based on [`PyTorch`](https://pytorch.org) machine learning framework, it provides a set of trainable end-to-end neural building blocks that can be combined and jointly optimized to build speaker diarization pipelines.\n",
+ "Based on the [`PyTorch`](https://pytorch.org) machine learning framework, it offers a collection of trainable, end-to-end neural building blocks. These components can be combined and jointly optimized to create powerful speaker diarization pipelines. \n",
"\n",
- "`pyannote.audio` also comes with pretrained [models](https://huggingface.co/models?other=pyannote-audio-model) and [pipelines](https://huggingface.co/models?other=pyannote-audio-pipeline) covering a wide range of domains for voice activity detection, speaker segmentation, overlapped speech detection, speaker embedding reaching state-of-the-art performance for most of them.\n",
+ "In addition, `pyannote.audio` provides pretrained [models](https://huggingface.co/models?other=pyannote-audio-model) and [pipelines](https://huggingface.co/models?other=pyannote-audio-pipeline) for a wide range of tasks such as voice activity detection, speaker segmentation, overlapped speech detection, and speaker embedding — many of which achieve state-of-the-art performance. \n",
"\n",
- "**This notebook will teach you how to apply those pretrained pipelines on your own data.**\n",
- "\n",
- "Make sure you run it using a GPU (or it might otherwise be slow...)"
+ "**This notebook will show you how to apply these pretrained pipelines to your own audio data.**\n"
]
},
{
@@ -44,137 +50,120 @@
},
"outputs": [],
"source": [
- "!pip install -qq pyannote.audio==3.1.1\n",
+ "# install pyannote.audio 4.0\n",
+ "!pip install -qq git+https://github.com/pyannote/pyannote-audio.git@develop\n",
+ "\n",
+ "# install ipyannote, an interactive visualization tool for pyannote\n",
+ "!pip install -qq ipyannote\n",
+ "\n",
"!pip install -qq ipython==7.34.0"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "qggK-7VBYnp8"
+ "id": "qIzuCWUzFPHU"
},
"source": [
- "# Visualization with `pyannote.core`\n",
+ "**⚠️ If you are running this notebook on Colab, restart the session (Runtime > Restart session), to avoid any dependencies errors in the rest of this tutorial.**"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "5MclWK2GYnp_"
+ },
+ "source": [
+ "## Hugging Face setup\n",
+ "\n",
+ "Official [pyannote.audio](https://github.com/pyannote/pyannote-audio) pipelines (i.e. those under the [`pyannote` organization](https://hf.co/pyannote) umbrella) are open-source, but gated. It means that you have to first accept users conditions on their respective Hugging Face page to access the pretrained weights and hyper-parameters.\n",
+ "\n",
+ "For instance, to load the speaker diarization pipelines used in this tutorial, you have to visit [hf.co/pyannote/speaker-diarization-community-1](https://hf.co/pyannote/speaker-diarization-community-1) and accept the terms. Do the same for [hf.co/pyannote/speaker-diarization-community-1-cloud](https://hf.co/pyannote/speaker-diarization-community-1-cloud) and [hf.co/pyannote/speaker-diarization-precision-2](https://hf.co/pyannote/speaker-diarization-precision-2).\n",
"\n",
- "For the purpose of this notebook, we will download and use an audio file coming from the [AMI corpus](http://groups.inf.ed.ac.uk/ami/corpus/), which contains a conversation between 4 people in a meeting room."
+ "Finally log in using `notebook_login` below:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "id": "uJWoQiJgYnp8"
+ "id": "r5u7VMb-YnqB"
},
"outputs": [],
"source": [
- "!wget -q http://groups.inf.ed.ac.uk/ami/AMICorpusMirror/amicorpus/ES2004a/audio/ES2004a.Mix-Headset.wav\n",
- "DEMO_FILE = {'uri': 'ES2004a.Mix-Headset', 'audio': 'ES2004a.Mix-Headset.wav'}"
+ "from huggingface_hub import notebook_login\n",
+ "notebook_login()"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "EPIapoCJYnp8"
+ "id": "qggK-7VBYnp8"
},
"source": [
- "Because AMI is a benchmarking dataset, it comes with manual annotations (a.k.a *groundtruth*). \n",
- "Let us load and visualize the expected output of the speaker diarization pipeline.\n"
+ "## How to use `pyannote/speaker-diarization-community-1` ?"
]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {
- "id": "Mmm0Q22JYnp8"
+ "id": "xtZXNtEtOnlZ"
},
- "outputs": [],
"source": [
- "!wget -q https://raw.githubusercontent.com/pyannote/AMI-diarization-setup/main/only_words/rttms/test/ES2004a.rttm"
+ "Firstly, load the pipeline:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 233
- },
- "id": "ToqCwl_FYnp9",
- "outputId": "a1d9631f-b198-44d1-ff6d-ec304125a9f4"
+ "id": "y5w-IGE1Ov25"
},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 10,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
- "# load groundtruth\n",
- "from pyannote.database.util import load_rttm\n",
- "_, groundtruth = load_rttm('ES2004a.rttm').popitem()\n",
+ "from pyannote.audio import Pipeline\n",
+ "import torch\n",
"\n",
- "# visualize groundtruth\n",
- "groundtruth"
+ "pipeline = Pipeline.from_pretrained(\"pyannote/speaker-diarization-community-1\", skip_dependencies=True)\n",
+ "\n",
+ "# send pipeline to GPU (when available)\n",
+ "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+ "pipeline.to(torch.device(device))"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "p_R9T9Y5Ynp9"
+ "id": "3ctcF5fVQFdg"
},
"source": [
- "For the rest of this notebook, we will only listen to and visualize a one-minute long excerpt of the file (but will process the whole file anyway)."
+ "Apply the pipeline on an audio file:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 230
- },
- "id": "bAHza4Y1Ynp-",
- "outputId": "c4cc2369-bfe4-4ac2-bb71-37602e7c7a8a"
+ "id": "xyn8ufT1QNIr"
},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 4,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
- "from pyannote.core import Segment, notebook\n",
- "# make notebook visualization zoom on 600s < t < 660s time range\n",
- "EXCERPT = Segment(600, 660)\n",
- "notebook.crop = EXCERPT\n",
+ "from pyannote.audio.sample import SAMPLE_FILE\n",
"\n",
- "# visualize excerpt groundtruth\n",
- "groundtruth"
+ "audio = SAMPLE_FILE[\"audio\"]\n",
+ "\n",
+ "# check https://github.com/pyannote/pyannote-audio/blob/853b2ab42c3ccd9ec898459d0ad24adc65167b3d/pyannote/audio/core/io.py#L45\n",
+ "# to see all accepted input types.\n",
+ "outputs = pipeline(audio)"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "L3FQXT5FYnp-"
+ "id": "a6-rL-_aU0_Z"
},
"source": [
- "This nice visualization is brought to you by [`pyannote.core`](http://pyannote.github.io/pyannote-core/) and basically indicates when each speaker speaks."
+ "We can then visualize `outputs.speaker_diarization`:"
]
},
{
@@ -183,399 +172,231 @@
"metadata": {
"colab": {
"base_uri": "https://localhost:8080/",
- "height": 62
+ "height": 136
},
- "id": "rDhZ3bXEYnp-",
- "outputId": "a82efe4e-2f9c-48bd-94fb-c62af3a3cb43"
+ "id": "OV3sOnhLWULo",
+ "outputId": "94479f85-b5cb-43e6-ca89-690e68016b3e"
},
"outputs": [
{
"data": {
- "text/html": [
- "\n",
- " \n",
- " \n",
- " Your browser does not support the audio element.\n",
- " \n",
- " "
- ],
+ "image/png": "",
"text/plain": [
- ""
+ ""
]
},
- "execution_count": 11,
+ "execution_count": 21,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
- "from pyannote.audio import Audio\n",
- "from IPython.display import Audio as IPythonAudio\n",
- "waveform, sr = Audio(mono=\"downmix\").crop(DEMO_FILE, EXCERPT)\n",
- "IPythonAudio(waveform.flatten(), rate=sr)"
+ "outputs.speaker_diarization"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "hkzox7QIYnp_"
+ "id": "9qQct_dbTd21"
},
"source": [
- "# Processing your own audio file (optional)\n",
- "\n",
- "In case you just want to go ahead with the demo file, skip this section entirely.\n",
+ "Or visualize the output in a more interactive way using `ipyannote` widget:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "N_XryMONTcue"
+ },
+ "outputs": [],
+ "source": [
+ "from ipyannote import IPyannote\n",
"\n",
- "In case you want to try processing your own audio file, proceed with running this section. It will offer you to upload an audio file (preferably a `wav` file but all formats supported by [`SoundFile`](https://pysoundfile.readthedocs.io/en/latest/) should work just fine)."
+ "IPyannote(audio=audio, annotation=outputs.speaker_diarization)"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "3hmFmLzFYnp_"
+ "id": "_ajQeskZU5Y0"
},
"source": [
- "## Upload audio file"
+ "And even compare with the reference using `ipyannote.Errors`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "id": "xC05jFO_Ynp_"
+ "id": "Ku7Q6pP4VCbP"
},
"outputs": [],
"source": [
- "import google.colab\n",
- "own_file, _ = google.colab.files.upload().popitem()\n",
- "OWN_FILE = {'audio': own_file}\n",
- "notebook.reset()\n",
+ "from ipyannote import Errors\n",
"\n",
- "# load audio waveform and play it\n",
- "waveform, sample_rate = Audio(mono=\"downmix\")(OWN_FILE)\n",
- "IPythonAudio(data=waveform.squeeze(), rate=sample_rate, autoplay=True)"
+ "reference = SAMPLE_FILE[\"annotation\"]\n",
+ "\n",
+ "Errors(audio=audio, reference=reference.rename_tracks(), hypothesis=outputs.speaker_diarization)"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "ctw4nLaPYnp_"
+ "id": "q5KSipqrYiTU"
},
"source": [
- "Simply replace `DEMO_FILE` by `OWN_FILE` in the rest of the notebook.\n",
- "\n",
- "Note, however, that unless you provide a groundtruth annotation in the next cell, you will (obviously) not be able to visualize groundtruth annotation nor evaluate the performance of the diarization pipeline quantitatively"
+ "In the visualizer above, the first line shows the reference, the second one displays the pipeline output, and the last line highlights the errors made by the pipeline compared to the reference."
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "x9AQgDzFYnp_"
+ "id": "P0dB68V3bZsO"
},
"source": [
- "## Upload groundtruth (optional)\n",
- "\n",
- "The groundtruth file is expected to use the RTTM format, with one line per speech turn with the following convention:\n",
- "\n",
- "```\n",
- "SPEAKER {file_name} 1 {start_time} {duration} {speaker_name} \n",
- "```"
+ "You might want to get speaker diarization adapted to downstream transcription that does not contains any overlapping speech turns. This can be achieved using `outputs.exclusive_speaker_diarization`:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "id": "iZaFudpDYnp_",
- "outputId": "981274fa-e654-4091-c838-91c81f921e5d"
+ "id": "ZkkYB14uZfcv"
},
- "outputs": [
- {
- "data": {
- "text/html": [
- "\n",
- " \n",
- " \n",
- " Upload widget is only available when the cell has been executed in the\n",
- " current browser session. Please rerun this cell to enable.\n",
- " \n",
- " "
- ],
- "text/plain": [
- ""
- ]
- },
- "metadata": {},
- "output_type": "display_data"
- },
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "Saving sample.rttm to sample.rttm\n"
- ]
- },
- {
- "data": {
- "image/png": "iVBORw0KGgoAAAANSUhEUgAABHQAAACsCAYAAAAaLvvnAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4yLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+WH4yJAAAOHUlEQVR4nO3de6ykZ10H8O+v3YIGCghbG1yqC+WiBWwpa9OKJk2DbQUVURRISCDyhxowXNQEFOzWqEnBtl4AjQVCDYSLgFpBqA1ZBJWCp1As5aJtbFPWUkStbVHLpT//mJdwaLuX2Z1zZp6zn08yOe95b/ObeeeZ951vnmemujsAAAAAjOOoZRcAAAAAwHwEOgAAAACDEegAAAAADEagAwAAADAYgQ4AAADAYAQ6AAAAAIMR6AAAAAAMRqADAAAAMBiBDgAAAMBgBDoAAAAAgxHoTKrqeVX1msPY/uSq+khVXVNVf1VVD1i37OVVdV1Vfa6qzllMxVvbRh2PqnpIVe2pqjsOZ/8AAACwTAKdBaiqo5O8PsnLuvvxSf48ya9Oy05K8qwkj01ybpLXTeuzQfZ3PJL8X5JXJvmVJZUHAAAAh22oQKeq7ldV762qT1bVp6rqmVV1Q1W9auqJ8bGqeuS07nFV9a6q+sfp9qRp/mlTz41PVNU/VNVj7uV+njqts72qzp6mP15Vf1ZV95/WuaGqLqiqjyf5mSSPTvKhaRdXJPnpafppSd7W3Xd2978muS7JaRv6RG2SEY9Hd3+5u/8us2AHAAAAhjRUoJNZD5d/6+6Tu/txSd4/zf/vqSfGa5L83jTv95Nc3N0/kNmH+ddP8z+b5Ie7+wlJfiPJ76y/g6p6epKXJXnKNOsVSZ7c3acmWUvy0nWr/0d3n9rdb0tybWbhTTILFE6YpnckuWndNp+f5m0FIx4PAAAAGN62w9l4744Tdic5bzGlJEnO37H3pt37WX5Nkgur6oIk7+nuD1dVkrx1Wv7WJBdP009OctK0PEkeMPXmeGCSS6vqUUk6yTHr9n9Wkl1Jzu7u26rqx5KclOTvp/3cJ8lH1q3/9nXTP5fkD6rqlUkuS/KVg37UC3L6eZfvzoKPx5Xnn7N7P8sdDwAAAFiCwwp0Nlt3/3NVnZpZb43fqqoPfGPR+tWmv0clOb27v2VozfRFuHu6++lVtTPJB9ctvj7JIzIbrrOWpJJc0d3P3kdJX15X22eTnD3dx6OTPHVatDff2jvkYdO84Q16PAAAAGB4Qw25qqrvSvI/3f3mJK9Ocuq06Jnr/n6jx8bfJPmlddueMk0+MN8MVJ53t7u4MbPhQH9aVY9NcmWSJ637Hpj7TeHAvdX2ndPfozIbFvTH06LLkjyrqu5bVQ9P8qgkH5vjYa+sQY8HAAAADK+6+8BrrYjpJ79fneSuJF9N8otJ3pnZUJsfTXJnkmd393VVtT3Ja5N8X2Y9kT7U3b9QVWckuTSz3hzvTfKc7t5ZVc9Lsqu7X1hVT0jyliQ/nuR7klyQ5L5TGa/o7suq6oZp/S9Ntb0oyQumdd6d5OU9PblV9euZDQH6WpIXd/f7NuQJ2mQDH48bkjwgsyFbt2Y2pOvTG/AUAQAAwIYYKtC5N3f/IM9yOR4AAACw8YYacgUAAADAFuihAwAAAHCk0UMHAAAAYDACHQAAAIDBCHQAAAAABrNtnpW3b9/eO3fu3KBSAAAAAI48V1111Ze6+7h5tpkr0Nm5c2fW1tbmqwoAAACAfaqqG+fdxpArAAAAgMEIdAAAAAAGI9ABAAAAGIxABwAAAGAwAh0AAACAwQh0AAAAAAYj0AEAAAAYjEAHAAAAYDACHQAAAIDBCHQAAAAABiPQAQAAABiMQAcAAABgMAIdAAAAgMEIdAAAAAAGI9ABAAAAGIxABwAAAGAwAh0AAACAwQh0AAAAAAYj0AEAAAAYjEAHAAAAYDACHQAAAIDBCHQAAAAABiPQAQAAABiMQAcAAABgMAIdAAAAgMEIdAAAAAAGI9ABAAAAGIxABwAAAGAwAh0AAACAwcwV6Hz9llsWeue3XXjRQveXJJfsuW7h+1yEVa1rFW3F52orPibG53XJKtnf63EjrhfgUB3q69F7LnCk8z64eHMFOnctONC5/aKLF7q/JHnDB69f+D4XYVXrWkVb8bnaio+J8Xldskr293rciOsFOFSH+nr0ngsc6bwPLp4hVwAAAACDEegAAAAADGbbvBvs3XHCRtSxUKefd/myS+AwOYawObQ1RjHC9QcciPdcABZJDx0AAACAwQh0AAAAAAYz95CrHXtvWtidb1T36SvPP2dD9ns4dLGdzyoew8Ph+LOqtlpbY1wHep9c5PUHHI7DuX71ngscyXwmWjw9dAAAAAAGI9ABAAAAGIxABwAAAGAwcwU6Rx1//ELv/NiXvmSh+0uS55954sL3uQirWtcq2orP1VZ8TIzP65JVsr/X40ZcL8ChOtTXo/dc4EjnfXDxqrsPeuVdu3b12traBpYDAAAAcGSpqqu6e9c82xhyBQAAADAYgQ4AAADAYAQ6AAAAAIMR6AAAAAAMRqADAAAAMBiBDgAAAMBgBDoAAAAAgxHoAAAAAAxGoAMAAAAwGIEOAAAAwGAEOgAAAACDEegAAAAADEagAwAAADAYgQ4AAADAYAQ6AAAAAIMR6AAAAAAMRqADAAAAMBiBDgAAAMBgBDoAAAAAgxHoAAAAAAxGoAMAAAAwGIEOAAAAwGAEOgAAAACDEehsgtsuvGjZJQxvs55DxwruSbtg1Vyy57oh9gkAq8Z13dYi0NkEt1908bJLGN5mPYeOFdyTdsGqecMHrx9inwCwalzXbS0CHQAAAIDBCHQAAAAABiPQAQAAABjMtmUXcKTYu+OEZZfAQXKsAFbf6eddvuwSAGBIPu9sHXroAAAAAAxGoAMAAAAwGEOuNsmOvTctu4ShbWa3QMcKvpVuuayiK88/Z6H7M4QLgCOFzzsrqmruTfTQAQAAABiMQAcAAABgMAKdTXDsS1+y7BKGt1nPoWMF96RdsGqef+aJQ+wTAFaN67qtpbr7oFfetWtXr62tbWA5AAAAAEeWqrqqu3fNs40eOgAAAACDEegAAAAADEagAwAAADAYgQ4AAADAYAQ6AAAAAIMR6AAAAAAMRqADAAAAMBiBDgAAAMBgBDoAAAAAgxHoAAAAAAxGoAMAAAAwGIEOAAAAwGAEOgAAAACDEegAAAAADEagAwAAADAYgQ4AAADAYAQ6AAAAAIMR6AAAAAAMRqADAAAAMBiBDgAAAMBgBDoAAAAAgxHoAAAAAAxGoAMAAAAwGIEOAAAAwGAEOgAAAACDEegAAAAADEagAwAAADAYgQ4AAADAYAQ6AAAAAIOp7j74lav+PcmNG1cO3KvtSb607CKAQ6L9wpi0XRiTtgvjekx3HzvPBtvmWbm7j5uvHjh8VbXW3buWXQcwP+0XxqTtwpi0XRhXVa3Nu40hVwAAAACDEegAAAAADEagwwj+ZNkFAIdM+4UxabswJm0XxjV3+53rS5EBAAAAWD49dAAAAAAGI9BhpVTVG6vqi1X1qXXzHlxVV1TVv0x/v2OZNQL3tI+2u7uq9lbV1dPtKcusEbinqjqhqvZU1aer6tqqetE037kXVtx+2q/zL6ywqvq2qvpYVX1yarvnT/MfXlUfrarrqurtVXWfA+1LoMOqeVOSc+8272VJPtDdj0rygel/YLW8Kfdsu0lycXefMt3+epNrAg7sa0l+ubtPSnJ6khdU1Ulx7oUR7Kv9Js6/sMruTHJWd5+c5JQk51bV6UkuyKztPjLJfyV5/oF2JNBhpXT3h5L8591mPy3JpdP0pUl+clOLAg5oH20XWHHdfXN3f3yavj3JZ5LsiHMvrLz9tF9ghfXMHdO/x0y3TnJWkndO8w/q3CvQYQTHd/fN0/QXkhy/zGKAubywqv5pGpJlyAassKrameQJST4a514Yyt3ab+L8Cyutqo6uqquTfDHJFUmuT3Jrd39tWuXzOYiAVqDDUHr2s2x+mg3G8EdJTsysK+nNSS5cbjnAvlTV/ZO8K8mLu/u29cuce2G13Uv7df6FFdfdX+/uU5I8LMlpSb73UPYj0GEEt1TVQ5Nk+vvFJdcDHITuvmU6Wd2V5JLMTlbAiqmqYzL7MPiW7n73NNu5FwZwb+3X+RfG0d23JtmT5IwkD6qqbdOihyXZe6DtBTqM4LIkz52mn5vkL5dYC3CQvvFhcPL0JJ/a17rAclRVJXlDks9090XrFjn3worbV/t1/oXVVlXHVdWDpulvT/IjmX0H1p4kz5hWO6hzb8160cJqqKq3JjkzyfYktyQ5L8lfJHlHku9OcmOSn+1uX74KK2QfbffMzLp7d5Ibkvz8uu/kAFZAVf1Qkg8nuSbJXdPsX8vsezice2GF7af9PjvOv7Cyqur7M/vS46Mz62Tzju7+zap6RJK3JXlwkk8keU5337nffQl0AAAAAMZiyBUAAADAYAQ6AAAAAIMR6AAAAAAMRqADAAAAMBiBDgAAAMBgBDoAwMqrqodU1dXT7QtVtXeavqOqXrfs+gAANpufLQcAhlJVu5Pc0d2/u+xaAACWRQ8dAGBYVXVmVb1nmt5dVZdW1Yer6saq+qmqelVVXVNV76+qY6b1nlhVf1tVV1XV5VX10OU+CgCA+Ql0AICt5MQkZyX5iSRvTrKnux+f5H+TPHUKdf4wyTO6+4lJ3pjkt5dVLADAodq27AIAABbofd391aq6JsnRSd4/zb8myc4kj0nyuCRXVFWmdW5eQp0AAIdFoAMAbCV3Jkl331VVX+1vflngXZld91SSa7v7jGUVCACwCIZcAQBHks8lOa6qzkiSqjqmqh675JoAAOYm0AEAjhjd/ZUkz0hyQVV9MsnVSX5wuVUBAMzPz5YDAAAADEYPHQAAAIDBCHQAAAAABiPQAQAAABiMQAcAAABgMAIdAAAAgMEIdAAAAAAGI9ABAAAAGIxABwAAAGAw/w9yi/xWuRzNKQAAAABJRU5ErkJggg==",
- "text/plain": [
- ""
- ]
- },
- "execution_count": null,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
- "groundtruth_rttm, _ = google.colab.files.upload().popitem()\n",
- "groundtruths = load_rttm(groundtruth_rttm)\n",
- "if OWN_FILE['audio'] in groundtruths:\n",
- " groundtruth = groundtruths[OWN_FILE['audio']]\n",
- "else:\n",
- " _, groundtruth = groundtruths.popitem()\n",
- "groundtruth"
+ "IPyannote(audio=audio, annotation=outputs.exclusive_speaker_diarization)"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "5MclWK2GYnp_"
+ "id": "3hmFmLzFYnp_"
},
"source": [
- "# Speaker diarization with `pyannote.pipeline`\n",
- "\n",
- "We are about to run a full speaker diarization pipeline, that includes speaker segmentation, speaker embedding, and a final clustering step. **Brace yourself!**\n",
- "\n",
- "To load the speaker diarization pipeline,\n",
- "\n",
- "* accept the user conditions on [hf.co/pyannote/speaker-diarization-3.1](https://hf.co/pyannote/speaker-diarization-3.1)\n",
- "* accept the user conditions on [hf.co/pyannote/segmentation-3.0](https://hf.co/pyannote/segmentation-3.0)\n",
- "* login using `notebook_login` below"
+ "## A word about hosted `speaker-diarization-community-1-cloud`"
]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 301,
- "referenced_widgets": [
- "c8731777ce834e58a76a295076200cfc",
- "859b12a6d95b4c6f987791ca848122b9",
- "94756148d2e94a93ae233baba20af683",
- "ba18cded436e486da34882d821d8f1eb",
- "99898e6ee64a46bd832af112e79b58b7",
- "79184c8c2a6f4b7493bb7f6983f18a09",
- "ea95ffd922c0455d957120f034e541f8",
- "13525aa369a9410a83343952ab511f3c",
- "b2be65e192384c948fb8987d4cfca505",
- "333b42ca7aa44788b1c22724eb11bcc3",
- "0e382d66f09f4958a40baa7ab83c4ccb",
- "6a45ce374e2e47ba9457d02e02522748",
- "765485a1d3f941d28b79782dcffbf401",
- "3499ef4dd9f243d9bef00b396e78ed69",
- "6e56329c30c0441c8d45df3975e75a76"
- ]
- },
- "id": "r5u7VMb-YnqB",
- "outputId": "c714a997-d4f8-417a-e5ad-0a4924333859"
+ "id": "t2Cz6_-7dipH"
},
- "outputs": [
- {
- "data": {
- "application/vnd.jupyter.widget-view+json": {
- "model_id": "6e56329c30c0441c8d45df3975e75a76",
- "version_major": 2,
- "version_minor": 0
- },
- "text/plain": [
- "VBox(children=(HTML(value=' "
- ]
- },
- "execution_count": 14,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
- "diarization"
+ "import os\n",
+ "\n",
+ "token = os.environ[\"PYANNOTEAI_API_KEY\"] # or simply paste your API key here"
]
},
{
"cell_type": "markdown",
"metadata": {
- "id": "DLhErS6wYnqB"
+ "id": "oGNYZJ1AgZOe"
},
"source": [
- "# Evaluation with `pyannote.metrics`\n",
- "\n",
- "Because groundtruth is available, we can evaluate the quality of the diarization pipeline by computing the [diarization error rate](http://pyannote.github.io/pyannote-metrics/reference.html#diarization)."
+ "Then you can load the pipeline and process your audio:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "id": "vNHQRTUIYnqB"
+ "id": "xC05jFO_Ynp_"
},
"outputs": [],
"source": [
- "from pyannote.metrics.diarization import DiarizationErrorRate\n",
- "metric = DiarizationErrorRate()\n",
- "der = metric(groundtruth, diarization)"
+ "import os\n",
+ "cloud_pipeline = Pipeline.from_pretrained(\"pyannote/speaker-diarization-community-1\", token=token, skip_dependencies=True)\n",
+ "cloud_outputs = cloud_pipeline(audio)\n"
]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/"
- },
- "id": "9d0vKQ0fYnqB",
- "outputId": "9a664753-cd84-4211-9153-d33e929bb252"
+ "id": "fTIH0rqWmowD"
},
- "outputs": [
- {
- "name": "stdout",
- "output_type": "stream",
- "text": [
- "diarization error rate = 19.8%\n"
- ]
- }
- ],
"source": [
- "print(f'diarization error rate = {100 * der:.1f}%')"
+ "Visualize the speaker diarization outputs in the same way as before:"
]
},
{
- "cell_type": "markdown",
+ "cell_type": "code",
+ "execution_count": null,
"metadata": {
- "id": "Xz5QJV9nYnqB"
+ "id": "6h9JVnrOmubd"
},
+ "outputs": [],
"source": [
- "This implementation of diarization error rate is brought to you by [`pyannote.metrics`](http://pyannote.github.io/pyannote-metrics/).\n",
- "\n",
- "It can also be used to improve visualization by find the optimal one-to-one mapping between groundtruth and hypothesized speakers."
+ "IPyannote(audio=audio, annotation=cloud_outputs.speaker_diarization)"
]
},
{
- "cell_type": "code",
- "execution_count": null,
+ "cell_type": "markdown",
"metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 230
- },
- "id": "xMLf4mrYYnqB",
- "outputId": "ed08bcc8-24c6-439c-a244-3a673ff480b0"
+ "id": "WIBcxzi1m7kV"
},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 19,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
"source": [
- "mapping = metric.optimal_mapping(groundtruth, diarization)\n",
- "diarization.rename_labels(mapping=mapping)"
+ "Want something even more precise? Try the `pyannote/speaker-diarization-precision-2` pipeline:\n"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
- "colab": {
- "base_uri": "https://localhost:8080/",
- "height": 230
- },
- "id": "Z0ewsLlQYnqB",
- "outputId": "8a8cd040-ee1d-48f7-d4be-eef9e08e9e55"
+ "id": "rtH_51m4nXAJ"
},
- "outputs": [
- {
- "data": {
- "image/png": "",
- "text/plain": [
- ""
- ]
- },
- "execution_count": 20,
- "metadata": {},
- "output_type": "execute_result"
- }
- ],
+ "outputs": [],
"source": [
- "groundtruth"
+ "precision_pipeline = Pipeline.from_pretrained(\"pyannote/speaker-diarization-precision-2\", token=token, skip_dependencies=True)\n",
+ "precision_outputs = precision_pipeline(audio)\n",
+ "\n",
+ "IPyannote(audio=audio, annotation=precision_outputs.speaker_diarization)"
]
},
{
@@ -584,22 +405,23 @@
"id": "MxlrTbyPYnqB"
},
"source": [
- "# Going further\n",
+ "## Going further\n",
"\n",
"We have only scratched the surface in this introduction.\n",
"\n",
- "More details can be found in the [`pyannote.audio` Github repository](https://github.com/pyannote/pyannote-audio).\n"
+ "More details can be found in the [`pyannote.audio` Github repository](https://github.com/pyannote/pyannote-audio).\n",
+ "\n",
+ "You can also visit the [`pyannoteAI`](https://www.pyannote.ai/) website to explore our fastest and most advanced solutions.\n"
]
}
],
"metadata": {
"accelerator": "GPU",
"colab": {
- "include_colab_link": true,
"provenance": []
},
"kernelspec": {
- "display_name": "Python 3 (ipykernel)",
+ "display_name": "pyannote-audio",
"language": "python",
"name": "python3"
},
@@ -613,442 +435,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
- "version": "3.10.13"
- },
- "widgets": {
- "application/vnd.jupyter.widget-state+json": {
- "0e382d66f09f4958a40baa7ab83c4ccb": {
- "model_module": "@jupyter-widgets/base",
- "model_module_version": "1.2.0",
- "model_name": "LayoutModel",
- "state": {
- "_model_module": "@jupyter-widgets/base",
- "_model_module_version": "1.2.0",
- "_model_name": "LayoutModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "LayoutView",
- "align_content": null,
- "align_items": null,
- "align_self": null,
- "border": null,
- "bottom": null,
- "display": null,
- "flex": null,
- "flex_flow": null,
- "grid_area": null,
- "grid_auto_columns": null,
- "grid_auto_flow": null,
- "grid_auto_rows": null,
- "grid_column": null,
- "grid_gap": null,
- "grid_row": null,
- "grid_template_areas": null,
- "grid_template_columns": null,
- "grid_template_rows": null,
- "height": null,
- "justify_content": null,
- "justify_items": null,
- "left": null,
- "margin": null,
- "max_height": null,
- "max_width": null,
- "min_height": null,
- "min_width": null,
- "object_fit": null,
- "object_position": null,
- "order": null,
- "overflow": null,
- "overflow_x": null,
- "overflow_y": null,
- "padding": null,
- "right": null,
- "top": null,
- "visibility": null,
- "width": null
- }
- },
- "13525aa369a9410a83343952ab511f3c": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "DescriptionStyleModel",
- "state": {
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "DescriptionStyleModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "StyleView",
- "description_width": ""
- }
- },
- "333b42ca7aa44788b1c22724eb11bcc3": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "DescriptionStyleModel",
- "state": {
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "DescriptionStyleModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "StyleView",
- "description_width": ""
- }
- },
- "3499ef4dd9f243d9bef00b396e78ed69": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "DescriptionStyleModel",
- "state": {
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "DescriptionStyleModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "StyleView",
- "description_width": ""
- }
- },
- "6a45ce374e2e47ba9457d02e02522748": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "ButtonStyleModel",
- "state": {
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "ButtonStyleModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "StyleView",
- "button_color": null,
- "font_weight": ""
- }
- },
- "765485a1d3f941d28b79782dcffbf401": {
- "model_module": "@jupyter-widgets/base",
- "model_module_version": "1.2.0",
- "model_name": "LayoutModel",
- "state": {
- "_model_module": "@jupyter-widgets/base",
- "_model_module_version": "1.2.0",
- "_model_name": "LayoutModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "LayoutView",
- "align_content": null,
- "align_items": null,
- "align_self": null,
- "border": null,
- "bottom": null,
- "display": null,
- "flex": null,
- "flex_flow": null,
- "grid_area": null,
- "grid_auto_columns": null,
- "grid_auto_flow": null,
- "grid_auto_rows": null,
- "grid_column": null,
- "grid_gap": null,
- "grid_row": null,
- "grid_template_areas": null,
- "grid_template_columns": null,
- "grid_template_rows": null,
- "height": null,
- "justify_content": null,
- "justify_items": null,
- "left": null,
- "margin": null,
- "max_height": null,
- "max_width": null,
- "min_height": null,
- "min_width": null,
- "object_fit": null,
- "object_position": null,
- "order": null,
- "overflow": null,
- "overflow_x": null,
- "overflow_y": null,
- "padding": null,
- "right": null,
- "top": null,
- "visibility": null,
- "width": null
- }
- },
- "79184c8c2a6f4b7493bb7f6983f18a09": {
- "model_module": "@jupyter-widgets/base",
- "model_module_version": "1.2.0",
- "model_name": "LayoutModel",
- "state": {
- "_model_module": "@jupyter-widgets/base",
- "_model_module_version": "1.2.0",
- "_model_name": "LayoutModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "LayoutView",
- "align_content": null,
- "align_items": "center",
- "align_self": null,
- "border": null,
- "bottom": null,
- "display": "flex",
- "flex": null,
- "flex_flow": "column",
- "grid_area": null,
- "grid_auto_columns": null,
- "grid_auto_flow": null,
- "grid_auto_rows": null,
- "grid_column": null,
- "grid_gap": null,
- "grid_row": null,
- "grid_template_areas": null,
- "grid_template_columns": null,
- "grid_template_rows": null,
- "height": null,
- "justify_content": null,
- "justify_items": null,
- "left": null,
- "margin": null,
- "max_height": null,
- "max_width": null,
- "min_height": null,
- "min_width": null,
- "object_fit": null,
- "object_position": null,
- "order": null,
- "overflow": null,
- "overflow_x": null,
- "overflow_y": null,
- "padding": null,
- "right": null,
- "top": null,
- "visibility": null,
- "width": "50%"
- }
- },
- "859b12a6d95b4c6f987791ca848122b9": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "HTMLModel",
- "state": {
- "_dom_classes": [],
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "HTMLModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/controls",
- "_view_module_version": "1.5.0",
- "_view_name": "HTMLView",
- "description": "",
- "description_tooltip": null,
- "layout": "IPY_MODEL_ea95ffd922c0455d957120f034e541f8",
- "placeholder": "",
- "style": "IPY_MODEL_13525aa369a9410a83343952ab511f3c",
- "value": " Copy a token from your Hugging Face\ntokens page and paste it below. Immediately click login after copying\nyour token or it might be stored in plain text in this notebook file. "
- }
- },
- "94756148d2e94a93ae233baba20af683": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "PasswordModel",
- "state": {
- "_dom_classes": [],
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "PasswordModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/controls",
- "_view_module_version": "1.5.0",
- "_view_name": "PasswordView",
- "continuous_update": true,
- "description": "Token:",
- "description_tooltip": null,
- "disabled": false,
- "layout": "IPY_MODEL_b2be65e192384c948fb8987d4cfca505",
- "placeholder": "",
- "style": "IPY_MODEL_333b42ca7aa44788b1c22724eb11bcc3",
- "value": ""
- }
- },
- "99898e6ee64a46bd832af112e79b58b7": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "HTMLModel",
- "state": {
- "_dom_classes": [],
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "HTMLModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/controls",
- "_view_module_version": "1.5.0",
- "_view_name": "HTMLView",
- "description": "",
- "description_tooltip": null,
- "layout": "IPY_MODEL_765485a1d3f941d28b79782dcffbf401",
- "placeholder": "",
- "style": "IPY_MODEL_3499ef4dd9f243d9bef00b396e78ed69",
- "value": "\nPro Tip: If you don't already have one, you can create a dedicated\n'notebooks' token with 'write' access, that you can then easily reuse for all\nnotebooks. "
- }
- },
- "b2be65e192384c948fb8987d4cfca505": {
- "model_module": "@jupyter-widgets/base",
- "model_module_version": "1.2.0",
- "model_name": "LayoutModel",
- "state": {
- "_model_module": "@jupyter-widgets/base",
- "_model_module_version": "1.2.0",
- "_model_name": "LayoutModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "LayoutView",
- "align_content": null,
- "align_items": null,
- "align_self": null,
- "border": null,
- "bottom": null,
- "display": null,
- "flex": null,
- "flex_flow": null,
- "grid_area": null,
- "grid_auto_columns": null,
- "grid_auto_flow": null,
- "grid_auto_rows": null,
- "grid_column": null,
- "grid_gap": null,
- "grid_row": null,
- "grid_template_areas": null,
- "grid_template_columns": null,
- "grid_template_rows": null,
- "height": null,
- "justify_content": null,
- "justify_items": null,
- "left": null,
- "margin": null,
- "max_height": null,
- "max_width": null,
- "min_height": null,
- "min_width": null,
- "object_fit": null,
- "object_position": null,
- "order": null,
- "overflow": null,
- "overflow_x": null,
- "overflow_y": null,
- "padding": null,
- "right": null,
- "top": null,
- "visibility": null,
- "width": null
- }
- },
- "ba18cded436e486da34882d821d8f1eb": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "ButtonModel",
- "state": {
- "_dom_classes": [],
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "ButtonModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/controls",
- "_view_module_version": "1.5.0",
- "_view_name": "ButtonView",
- "button_style": "",
- "description": "Login",
- "disabled": false,
- "icon": "",
- "layout": "IPY_MODEL_0e382d66f09f4958a40baa7ab83c4ccb",
- "style": "IPY_MODEL_6a45ce374e2e47ba9457d02e02522748",
- "tooltip": ""
- }
- },
- "c8731777ce834e58a76a295076200cfc": {
- "model_module": "@jupyter-widgets/controls",
- "model_module_version": "1.5.0",
- "model_name": "VBoxModel",
- "state": {
- "_dom_classes": [],
- "_model_module": "@jupyter-widgets/controls",
- "_model_module_version": "1.5.0",
- "_model_name": "VBoxModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/controls",
- "_view_module_version": "1.5.0",
- "_view_name": "VBoxView",
- "box_style": "",
- "children": [
- "IPY_MODEL_859b12a6d95b4c6f987791ca848122b9",
- "IPY_MODEL_94756148d2e94a93ae233baba20af683",
- "IPY_MODEL_ba18cded436e486da34882d821d8f1eb",
- "IPY_MODEL_99898e6ee64a46bd832af112e79b58b7"
- ],
- "layout": "IPY_MODEL_79184c8c2a6f4b7493bb7f6983f18a09"
- }
- },
- "ea95ffd922c0455d957120f034e541f8": {
- "model_module": "@jupyter-widgets/base",
- "model_module_version": "1.2.0",
- "model_name": "LayoutModel",
- "state": {
- "_model_module": "@jupyter-widgets/base",
- "_model_module_version": "1.2.0",
- "_model_name": "LayoutModel",
- "_view_count": null,
- "_view_module": "@jupyter-widgets/base",
- "_view_module_version": "1.2.0",
- "_view_name": "LayoutView",
- "align_content": null,
- "align_items": null,
- "align_self": null,
- "border": null,
- "bottom": null,
- "display": null,
- "flex": null,
- "flex_flow": null,
- "grid_area": null,
- "grid_auto_columns": null,
- "grid_auto_flow": null,
- "grid_auto_rows": null,
- "grid_column": null,
- "grid_gap": null,
- "grid_row": null,
- "grid_template_areas": null,
- "grid_template_columns": null,
- "grid_template_rows": null,
- "height": null,
- "justify_content": null,
- "justify_items": null,
- "left": null,
- "margin": null,
- "max_height": null,
- "max_width": null,
- "min_height": null,
- "min_width": null,
- "object_fit": null,
- "object_position": null,
- "order": null,
- "overflow": null,
- "overflow_x": null,
- "overflow_y": null,
- "padding": null,
- "right": null,
- "top": null,
- "visibility": null,
- "width": null
- }
- }
- }
+ "version": "3.10.18"
}
},
"nbformat": 4,