-
Notifications
You must be signed in to change notification settings - Fork 139
Conversational Dynamics Similarity (ConDynS) and ConvoKit GenAI Tool #288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversational Dynamics Similarity (ConDynS) and ConvoKit GenAI Tool #288
Conversation
…ine methods, as well as validation example notebooks
|
@seanzhangkx8 can you add the option to use SCDs that are already in the metadata? So when passing the input as the function that gets the SCD, you should also accept a string, and if it's a string it is the name of the metadata containing the SCD that is to be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @seanzhangkx8, really great contributions to Convokit, thanks for putting this out! Just checking if there any changes still pending on the PR? (It's currently marked as a draft). I added some comments based on the current version below:
- Merge conflicts in
convokit/__init__.pyanddocs/source/analysis.rstthat need to be resolved before merging. It also looks like the branch is a bit out of date so may be worth rebasing onto the latest master - For documentation structure, currently, all the SCD and ConDynS transformers are documented under
ConDynS.rst. Since SCD could be useful independently of ConDynS, it might make sense to split this into two separate pages: one for SCD and another for ConDynS that references the SCD page - For naive baselines, I think they are really helpful for initial experimentation, but has there been any discussion on whether we want to include them in Convokit from a practical perspective?
- For GenAI output handling, it could be useful to allow users to specify a custom function for converting raw LLM text output into a structured format (e.g., json, list, etc.), depending on their downstream use case
- For local GenAI integration, the local client seems to be more of a placeholder/mock template right now, maybe including a simple working example of integrating a local model could be really helpful
- For GenAI documentation could be helpful to link to specific setup instructions from providers (OpenAI/Gemini)
- Currently it seems like
LLMPromptTransformeroperates on a single "unit" (e.g. different levels of the corpus: conversation, speaker, utterance) wondering if would be helpful if theres an option to have different or multiple units in the same prompt (e.g., if you want to prompt an utterance with respect to a conversation) or multiple subunits in the same "unit" (e.g., you want to prompt different parts of the same conversation) - For GenAI error retry logic, seems like it raises an exception when retries are exhausted, if users are running this in a loop over some unit, maybe would be good to have a marker or way to indicate "where they left off" so when they rerun transform they can start where the retry period ended off
Let me know if anything is still pending, happy to take another look if that would be helpful. Thanks!
|
Hi Vivian! Thanks for the detailed review.
Thanks for your review and let me know if you have any other concerns we should address! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Sean, thanks for addressing the comments, looks great! Feel free to note if there are any suggested improvements that you would like to defer. Also just approved the PR!
|
thanks Vivian! incrementing version number and will merge now. |
Conversational Dynamics Similarity
This PR adds ConvoSimilarity, a new module implementing the framework from the paper A Similarity Measure for Comparing Conversational Dynamics. It provides tools for comparing conversations based on their interaction patterns rather than individual utterances.
Key Features
SCDWriterfor generating Summaries of Conversation Dynamics (SCD) and extracting Sequences of Patterns (SoP) via LLMsConDynSandNaiveConDynSmeasures for similarity computationExample Notebooks
GenAI Module
This PR also introduces GenAI, a unified interface for integrating LLMs into ConvoKit workflows for conversational analysis.
Key Features
LLMClientbase class with implementations for OpenAI, Gemini, and a template for local modelsGenAIConfigManagerfor API keys and settingsExample Notebook