Skip to content

Conversation

@AnnaWegmann
Copy link

Description

This adds the mediasum.rst file for documentation and the convert_mediasum-corpus.ipnyb for the script that was used to convert the mediasum dataset to a convokit corpus object. Find the zipped dataset here: https://drive.google.com/file/d/1cCaSuVUKN0B3s-GxnWg1gtWwOLNF66n0/view?usp=sharing to be added to your servers

Motivation and Context

add a dataset, see details in the .rst file, but this is based on
https://aclanthology.org/2021.naacl-main.474.pdf and https://aclanthology.org/2024.emnlp-main.52/

How has this been tested?

see convert_mediasum-corpus.ipnyb for the creation / testing outputs

Other information

corpus still needs to be added to your servers https://drive.google.com/file/d/1cCaSuVUKN0B3s-GxnWg1gtWwOLNF66n0/view?usp=sharing

@cristiandnm cristiandnm added the dataset Use this tag when providing a new dataset for inclusion in ConvoKit. label Sep 11, 2025
@seanzhangkx8
Copy link
Collaborator

Hi Anna, thank you so much for your contribution to ConvoKit. It looks great. I will just add some configuration to support downloading the corpus from ConvoKit directly. After that I will merge the PR into our main branch. Thanks again for your work!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dataset Use this tag when providing a new dataset for inclusion in ConvoKit.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants