vibcoded partial fit #679

GrachevArtem · 2025-11-15T17:17:11Z

No description provided.

Copilot

Pull Request Overview

This PR implements incremental training support for TMC Shapley valuation using models with partial_fit() capability. The implementation allows for significant performance improvements (2-10x faster) when using compatible models like SGDClassifier, by updating models incrementally as data points are added sequentially, rather than retraining from scratch for each subset.

Introduces PartialFitModelUtility class that extends ModelUtility with incremental training capabilities
Adds PermutationEvaluationStrategyWithPartialFit to manage partial_fit state across permutations
Implements automatic detection and strategy selection for utilities supporting partial_fit

Reviewed Changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 6 comments.

Show a summary per file

File	Description
src/pydvl/valuation/utility/modelutility.py	Adds PartialFitModelUtility class with incremental training logic and state management
src/pydvl/valuation/samplers/permutation.py	Adds PermutationEvaluationStrategyWithPartialFit strategy and automatic detection in make_strategy()
test_partial_fit_simple.py	Provides unit tests for partial_fit functionality including detection, incremental training, and strategy selection
examples/partial_fit_tmc_example.py	Demonstrates usage comparison between standard and partial_fit utilities with TMC Shapley
PARTIAL_FIT_FEATURE.md	Comprehensive documentation covering implementation details, usage, and performance considerations
QUICK_START_PARTIAL_FIT.md	Quick start guide showing simple migration from ModelUtility to PartialFitModelUtility
IMPLEMENTATION_SUMMARY.md	Summary of implementation changes and testing instructions

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot · 2025-11-19T10:58:13Z

src/pydvl/valuation/utility/modelutility.py

+                    if hasattr(self._current_model, "classes_") or not hasattr(
+                        self._current_model, "partial_fit"
+                    ):


The condition not hasattr(self._current_model, 'partial_fit') is redundant since _can_use_partial_fit() already checks self._supports_partial_fit, and we only reach this code when can_use_partial is True. This check will never be False in this context. Consider simplifying to just check for classes_ attribute.

Suggested change

if hasattr(self._current_model, "classes_") or not hasattr(

self._current_model, "partial_fit"

):

if hasattr(self._current_model, "classes_"):

Copilot · 2025-11-19T10:58:14Z

src/pydvl/valuation/utility/modelutility.py

+        if self.training_data is None:
+            raise ValueError("No training data provided")
+
+        new_indices = set(sample.subset) - self._current_indices


Converting to a sorted list and then to an array may produce non-deterministic ordering if the training is run in parallel with different workers processing the same permutations. While the set difference is correct, the sorted order depends on the hash values of integers which are deterministic. However, consider if this ordering matters for reproducibility across different Python versions or implementations.

Suggested change

new_indices = set(sample.subset) - self._current_indices

new_indices = set(sample.subset) - self._current_indices

# Sort indices to ensure deterministic ordering across runs and Python versions

Copilot · 2025-11-19T10:58:14Z

src/pydvl/valuation/samplers/permutation.py

+
+            self.truncation.reset(self.utility)
+            truncated = False
+            curr = prev = float(self.utility(None))


This line is duplicated from the parent class PermutationEvaluationStrategy.process() at line 295. The entire process method is nearly identical except for the added reset_partial_fit_state() call. Consider refactoring to call the parent's process method with a hook for resetting state, or extract the common logic to reduce code duplication.

Copilot · 2025-11-19T10:58:14Z

test_partial_fit_simple.py

+    assert utility._current_model is None
+    assert len(utility._current_indices) == 0


These assertions test private attributes (_current_model and _current_indices), which couples the tests to implementation details. Consider testing the observable behavior instead, such as verifying that calling the utility with a new sample after reset starts fresh training rather than using partial_fit.

Copilot · 2025-11-19T10:58:14Z

examples/partial_fit_tmc_example.py

+        show_warnings=False,
+        progress=False,
+    )
+    valuation_standard.fit(train_data)


[nitpick] The example doesn't demonstrate timing comparisons which would better showcase the performance benefits mentioned in the PR description. Consider adding timing measurements to show the actual speedup achieved with partial_fit.

Copilot · 2025-11-19T10:58:15Z

test_partial_fit_simple.py

+    X, y = make_classification(n_samples=30, n_features=5, random_state=42)
+    X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3)
+
+    train_data = Dataset.from_arrays(X_train, y_train)


Variable train_data is not used.

Suggested change

train_data = Dataset.from_arrays(X_train, y_train)

GrachevArtem · 2025-11-20T11:52:14Z

Hi
Sorry we supposed to do this PR originaly in our private fork and then put it here. It is not ready yet and I forgot to delete.
Are you interested in something like this?

vibcoded partial fit

7f28530

mdbenito requested a review from Copilot November 19, 2025 10:55

Copilot started reviewing on behalf of mdbenito November 19, 2025 10:55 View session

Copilot finished reviewing on behalf of mdbenito November 19, 2025 10:56

Copilot AI reviewed Nov 19, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vibcoded partial fit #679

vibcoded partial fit #679

Uh oh!

GrachevArtem commented Nov 15, 2025

Uh oh!

Copilot AI left a comment

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

Copilot AI Nov 19, 2025

Uh oh!

GrachevArtem commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

	new_indices = set(sample.subset) - self._current_indices
	new_indices = set(sample.subset) - self._current_indices
	# Sort indices to ensure deterministic ordering across runs and Python versions

		assert utility._current_model is None
		assert len(utility._current_indices) == 0

vibcoded partial fit #679

Are you sure you want to change the base?

vibcoded partial fit #679

Uh oh!

Conversation

GrachevArtem commented Nov 15, 2025

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull Request Overview

Reviewed Changes

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

Copilot AI Nov 19, 2025

Choose a reason for hiding this comment

Uh oh!

GrachevArtem commented Nov 20, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant