Watts-Lab · amytangzheng · Sep 23, 2024 · Sep 23, 2024 · Sep 23, 2024 · Sep 23, 2024
diff --git a/.gitignore b/.gitignore
@@ -41,6 +41,7 @@ src/team_comm_tools/ipython_notebooks/.ipynb_checkpoints/
 tests/ipython_notebooks/.ipynb_checkpoints/
 tests/data/vector_data/
 tests/test.log
+tests/helper.ipynb
 tests/output/*
 tests/vector_data/*
 src/utils/__pycache__/

diff --git a/docs/build/doctrees/environment.pickle b/docs/build/doctrees/environment.pickle
diff --git a/docs/build/doctrees/examples.doctree b/docs/build/doctrees/examples.doctree
diff --git a/docs/build/doctrees/feature_builder.doctree b/docs/build/doctrees/feature_builder.doctree
diff --git a/docs/build/doctrees/features/index.doctree b/docs/build/doctrees/features/index.doctree
diff --git a/docs/build/doctrees/features/lexical_features_v2.doctree b/docs/build/doctrees/features/lexical_features_v2.doctree
diff --git a/docs/build/doctrees/features/readability.doctree b/docs/build/doctrees/features/readability.doctree
diff --git a/docs/build/doctrees/features/temporal_features.doctree b/docs/build/doctrees/features/temporal_features.doctree
diff --git a/docs/build/doctrees/features/word_mimicry.doctree b/docs/build/doctrees/features/word_mimicry.doctree
diff --git a/docs/build/doctrees/features_conceptual/content_word_accommodation.doctree b/docs/build/doctrees/features_conceptual/content_word_accommodation.doctree
diff --git a/docs/build/doctrees/index.doctree b/docs/build/doctrees/index.doctree
diff --git a/docs/build/doctrees/utils/calculate_chat_level_features.doctree b/docs/build/doctrees/utils/calculate_chat_level_features.doctree
diff --git a/docs/build/doctrees/utils/calculate_conversation_level_features.doctree b/docs/build/doctrees/utils/calculate_conversation_level_features.doctree
diff --git a/docs/build/doctrees/utils/calculate_user_level_features.doctree b/docs/build/doctrees/utils/calculate_user_level_features.doctree
diff --git a/docs/build/doctrees/utils/check_embeddings.doctree b/docs/build/doctrees/utils/check_embeddings.doctree
diff --git a/docs/build/doctrees/utils/preprocess.doctree b/docs/build/doctrees/utils/preprocess.doctree
diff --git a/docs/build/doctrees/utils/summarize_features.doctree b/docs/build/doctrees/utils/summarize_features.doctree
diff --git a/docs/build/html/_sources/examples.rst.txt b/docs/build/html/_sources/examples.rst.txt
diff --git a/docs/build/html/_sources/features/index.rst.txt b/docs/build/html/_sources/features/index.rst.txt
@@ -32,7 +32,11 @@ Utterance-Level features are calculated *first* in the Toolkit, as many conversa
 
 Conversation-Level Features
 ****************************
-Once utterance-level features are computed, we compute conversation-level features; some of these features represent an aggregation of utterance-level information (for example, the "average level of positivity" in a conversation is simply the mean positivity score for each utterance). Other conversation-level features are constructs that are defined only at the conversation-level, such as the level of "burstiness" in a team's communication patterns.
+
+Base Conversation-Level Features
++++++++++++++++++++++++++++++++++++
+
+The following features are constructs that are defined only at the conversation-level, such as the level of "burstiness" in a team's communication patterns. We call these the "base" conversation-level features, and they can be accessed using a property of the ``FeatureBuilder`` object: ``FeatureBuilder.conv_features_base``.
 
 .. toctree::
    :maxdepth: 1
@@ -46,12 +50,17 @@ Once utterance-level features are computed, we compute conversation-level featur
    within_person_discursive_range
    turn_taking_features
 
+Conversation-Level Aggregates
++++++++++++++++++++++++++++++++++++
+Once utterance-level features are computed, we compute conversation-level features; some of these features represent an aggregation of utterance-level information (for example, the "average level of positivity" in a conversation is simply the mean positivity score for each utterance).
+
+By default, all numeric attributes generated at the utterance (chat) level are aggregated using the functions ``mean``, ``max``, ``min``, and ``stdev``. However, this behavior can be customized, with details in the Worked Example (see :ref:`custom_aggregation`).
+
 Speaker- (User) Level Features
 *********************************
 User-level features generally represent an aggregation of features at the utterance- level (for example, the average number of words spoken *by a particular user*). There is therefore limited speaker-level feature documentation, other than a function used to compute the "network" of other speakers that an individual interacts with in a conversation.
 
-You may reference the :ref:`Speaker (User)-Level Features Page <user_level_features>` for more information.
-
+You may reference the :ref:`Speaker (User)-Level Features Page <user_level_features>` for more information, as well as the details in the Worked Example (see :ref:`custom_aggregation`).
 
 .. toctree::
    :maxdepth: 1

diff --git a/docs/build/html/_sources/features_conceptual/content_word_accommodation.rst.txt b/docs/build/html/_sources/features_conceptual/content_word_accommodation.rst.txt
@@ -13,10 +13,16 @@ Citation
 
 Implementation Basics 
 **********************
-To compute the feature, we count the number of shared content words (defined as anything that is not on the function word list) between the current and previous utterance in a conversation, then normalize it by the frequency of the word across all inputs in the dataset. This follows the original authors' method:
+To compute the feature, we count the number of shared content words (defined as anything that is not on the function word list) between the current and previous utterance in a conversation, normalized by the frequency at which the word appears. This follows the original authors' method:
 
 	Content words are defined as any word that is not a function word. For each content word w in a given speaker’s turn, if w also occurs in the immediately preceding turn of the other, we count w as an accommodated content word. The raw count of accommodated content words is be the total number of these accommodated content words over every turn in the conversation side. Because content words vary widely in frequency, we normalized our counts by the frequency of each word.
 
+For completeness, we interprete "the frequency of each word" in two distinct ways:
+
+1. **The frequency of each word across the entire dataset (`content_word_accommodation`)**: here, we normalize non-function words with respect to the language used across all conversations in the dataset. This version of accommodation is useful if the entire dataset consists of similar conversations, or conversations about the same topic. Normalizing with respect to a larger dataset will be useful in establishing better estimates in identifying (and appropriately weigting) whichs words carry meaningful content in a particular domain.
+
+2. **The frequency of each word within a given conversation (`content_word_accommodation_per_conv`)**: here, we normalize non-function words with respect only to the language in a given conversation. This version of accommodation is useful if the dataset consists of very distinct conversations, for which it may not make sense to assume that the distribution of which words are "important" will hold across different domains.
+
 The feature requires a reference list of function words, which are defined by the original authors as follows.
 
 **Auxiliary and copular verbs**

diff --git a/docs/build/html/_sources/index.rst.txt b/docs/build/html/_sources/index.rst.txt
@@ -150,10 +150,10 @@ Use the Table of Contents below to learn more about our tool. We recommend that
 
    intro
    basics
-   feature_builder
+   examples
    features/index
    features_conceptual/index
-   examples
+   feature_builder
    utils/index
 
 Indices and Tables