Skip to content

Conversation

@seanzhangkx8
Copy link
Collaborator

Conversational Dynamics Similarity

This PR adds ConvoSimilarity, a new module implementing the framework from the paper A Similarity Measure for Comparing Conversational Dynamics. It provides tools for comparing conversations based on their interaction patterns rather than individual utterances.

Key Features

  • SCDWriter for generating Summaries of Conversation Dynamics (SCD) and extracting Sequences of Patterns (SoP) via LLMs
  • ConDynS and NaiveConDynS measures for similarity computation
  • Baseline methods and utilities for preprocessing, evaluation, and visualization

Example Notebooks


GenAI Module

This PR also introduces GenAI, a unified interface for integrating LLMs into ConvoKit workflows for conversational analysis.

Key Features

  • Abstract LLMClient base class with implementations for OpenAI, Gemini, and a template for local models
  • Centralized configuration via GenAIConfigManager for API keys and settings
  • Factory method for flexible and consistent client instantiation

Example Notebook

@cristiandnm
Copy link
Contributor

@seanzhangkx8 can you add the option to use SCDs that are already in the metadata? So when passing the input as the function that gets the SCD, you should also accept a string, and if it's a string it is the name of the metadata containing the SCD that is to be used.

Copy link
Collaborator

@vianxnguyen vianxnguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @seanzhangkx8, really great contributions to Convokit, thanks for putting this out! Just checking if there any changes still pending on the PR? (It's currently marked as a draft). I added some comments based on the current version below:

  • Merge conflicts in convokit/__init__.py and docs/source/analysis.rst that need to be resolved before merging. It also looks like the branch is a bit out of date so may be worth rebasing onto the latest master
  • For documentation structure, currently, all the SCD and ConDynS transformers are documented under ConDynS.rst. Since SCD could be useful independently of ConDynS, it might make sense to split this into two separate pages: one for SCD and another for ConDynS that references the SCD page
  • For naive baselines, I think they are really helpful for initial experimentation, but has there been any discussion on whether we want to include them in Convokit from a practical perspective?
  • For GenAI output handling, it could be useful to allow users to specify a custom function for converting raw LLM text output into a structured format (e.g., json, list, etc.), depending on their downstream use case
  • For local GenAI integration, the local client seems to be more of a placeholder/mock template right now, maybe including a simple working example of integrating a local model could be really helpful
  • For GenAI documentation could be helpful to link to specific setup instructions from providers (OpenAI/Gemini)
  • Currently it seems like LLMPromptTransformer operates on a single "unit" (e.g. different levels of the corpus: conversation, speaker, utterance) wondering if would be helpful if theres an option to have different or multiple units in the same prompt (e.g., if you want to prompt an utterance with respect to a conversation) or multiple subunits in the same "unit" (e.g., you want to prompt different parts of the same conversation)
  • For GenAI error retry logic, seems like it raises an exception when retries are exhausted, if users are running this in a loop over some unit, maybe would be good to have a marker or way to indicate "where they left off" so when they rerun transform they can start where the retry period ended off

Let me know if anything is still pending, happy to take another look if that would be helpful. Thanks!

@seanzhangkx8 seanzhangkx8 marked this pull request as ready for review October 22, 2025 21:57
@seanzhangkx8
Copy link
Collaborator Author

Hi Vivian! Thanks for the detailed review.

  1. I resolved the conflicts.
  2. I agree, SCD is now having its own documentation page.
  3. I will keep naive baselines in for now for reproducibility because i have a notebook comparing between ConDynS and baselines. We can remove it if you think it is unnecessary.
  4. I totally agree with that added feature! But I think this PR is big enough and I hink I will leave the improvements to a later development. Here we focus on introducing the modules and later we can develop cool stuff in addition to it.
  5. Same as above.
  6. Nice! added links
  7. Same as above mentioned, I think it would be a nice feature, but maybe little tricky to develop for multiple levels. We can discuss this later.

Thanks for your review and let me know if you have any other concerns we should address!

Copy link
Collaborator

@vianxnguyen vianxnguyen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Sean, thanks for addressing the comments, looks great! Feel free to note if there are any suggested improvements that you would like to defer. Also just approved the PR!

@seanzhangkx8
Copy link
Collaborator Author

seanzhangkx8 commented Oct 25, 2025

thanks Vivian! incrementing version number and will merge now.

@seanzhangkx8 seanzhangkx8 merged commit 59a68a4 into CornellNLP:master Oct 25, 2025
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants