-
Notifications
You must be signed in to change notification settings - Fork 1
Description
Dear Jonas,
thanks again for your great package, and for making it open source! I have encountered an issue with adaptivetesting
on our real world data, in which numerical overflows seem to lead to incorrectly computed ability estimates. I have created a fully reproducible example here, implemented as a unit test for your package: test_realworld.py
To run it yourself:
cd
git clone https://github.com/dfsp-spirit/adaptivetesting.git adaptivetesting-ts
cd adaptivetesting-ts/
git checkout illustrate_issue
git checkout df77121 # Illustrates the broken state.
uv sync --editable .
uv run python -m unittest
If you run it, you will get output like this:
uv run python -m unittest
........Expected percentage correct: 49.2%
Actual percentage correct: 52.0%
.0.0667121752511823
.First 5 items - Expected probabilities for ability=0:
Item 0: a=1.051, b=-0.560, c=0.060, d=0.814 -> P=0.545
Item 1: a=0.994, b=-0.230, c=0.241, d=0.805 -> P=0.555
Item 2: a=0.991, b=1.559, c=0.150, d=0.898 -> P=0.282
Item 3: a=1.274, b=0.070, c=0.129, d=0.817 -> P=0.457
Item 4: a=0.955, b=0.129, c=0.101, d=0.883 -> P=0.468
.......................Item ID: S0811, Correct Answer: diff, User Answer: same. Score: 0
After item #1 with ID S001: estimated ability and standard error: -0.13013013013013008, 0.9785610828814044
Item ID: S049, Correct Answer: diff, User Answer: same. Score: 0
After item #2 with ID S003: estimated ability and standard error: -0.3303303303303302, 0.3249376308979352
Item ID: S007, Correct Answer: diff, User Answer: same. Score: 0
After item #3 with ID S005: estimated ability and standard error: -0.39039039039039025, 0.26918390077873305
Item ID: S075, Correct Answer: diff, User Answer: same. Score: 0
After item #4 with ID S007: estimated ability and standard error: -0.4904904904904903, 0.21795263455719996
Item ID: S003, Correct Answer: diff, User Answer: same. Score: 0
/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/math/estimators/__functions/__estimators.py:28: RuntimeWarning: overflow encountered in exp
value = c + (d - c) * (np.exp(a * (mu - b))) / \
/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/math/estimators/__functions/__estimators.py:29: RuntimeWarning: overflow encountered in exp
(1 + np.exp(a * (mu - b)))
/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/math/estimators/__functions/__estimators.py:28: RuntimeWarning: invalid value encountered in divide
value = c + (d - c) * (np.exp(a * (mu - b))) / \
/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/math/estimators/__functions/__estimators.py:28: RuntimeWarning: invalid value encountered in scalar divide
value = c + (d - c) * (np.exp(a * (mu - b))) / \
After item #5 with ID S009: estimated ability and standard error: 9.77977977977978, nan
Item ID: S192, Correct Answer: same, User Answer: same. Score: 1
After item #6 with ID S013: estimated ability and standard error: 9.77977977977978, nan
Item ID: S0908, Correct Answer: same, User Answer: same. Score: 1
After item #7 with ID S015: estimated ability and standard error: 9.77977977977978, nan
Item ID: S0712, Correct Answer: same, User Answer: same. Score: 1
After item #8 with ID S017: estimated ability and standard error: 9.77977977977978, nan
// many more lines omitted here
After item #137 with ID S1410: estimated ability and standard error: 9.77977977977978, nan
E......................
======================================================================
ERROR: test_our_issue (adaptivetesting.tests.test_realworld.TestRealWorld.test_our_issue)
----------------------------------------------------------------------
Traceback (most recent call last):
File "/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/tests/test_realworld.py", line 71, in test_our_issue
adaptive_test.run_test_once()
File "/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/implementations/__test_assembler.py", line 231, in run_test_once
return super().run_test_once()
^^^^^^^^^^^^^^^^^^^^^^^
File "/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/models/__adaptive_test.py", line 150, in run_test_once
item = self.get_next_item()
^^^^^^^^^^^^^^^^^^^^
File "/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/implementations/__test_assembler.py", line 157, in get_next_item
item = self.__item_selector(
^^^^^^^^^^^^^^^^^^^^^
File "/home/ts/develop_mpiae/adaptivetesting_myfork/adaptivetesting/math/item_selection/__maximum_information_criterion.py", line 53, in maximum_information_criterion
raise ItemSelectionException("No appropriate item could be selected.")
adaptivetesting.models.__item_selection_exception.ItemSelectionException: No appropriate item could be selected.
----------------------------------------------------------------------
Ran 56 tests in 6.019s
FAILED (errors=1)
Note the RunTimeWarnings, and the result that ability is always 9.77977977977978
afterwards (and SE=NaN
), and that both stay like this till the end once they are in this state.
The second commit in the same branch rescues this behavior by using more stable math and avoiding the runtime warnings, and thus the issue with the wrong ability estimate (commands continued from above):
git checkout dfb1fb43f595c3fc9b258f7b5b7d5abb955ab251
uv run python -m unittest
This shows expected behavior:
uv run python -m unittest
........Expected percentage correct: 49.2%
Actual percentage correct: 52.0%
.0.06671217525215464
.First 5 items - Expected probabilities for ability=0:
Item 0: a=1.051, b=-0.560, c=0.060, d=0.814 -> P=0.545
Item 1: a=0.994, b=-0.230, c=0.241, d=0.805 -> P=0.555
Item 2: a=0.991, b=1.559, c=0.150, d=0.898 -> P=0.282
Item 3: a=1.274, b=0.070, c=0.129, d=0.817 -> P=0.457
Item 4: a=0.955, b=0.129, c=0.101, d=0.883 -> P=0.468
.......................Item ID: S0811, Correct Answer: diff, User Answer: same. Score: 0
After item #1 with ID S001: estimated ability and standard error: -0.13013013013013008, 0.9785610828814044
Item ID: S049, Correct Answer: diff, User Answer: same. Score: 0
After item #2 with ID S003: estimated ability and standard error: -0.3303303303303302, 0.3249376308979352
Item ID: S007, Correct Answer: diff, User Answer: same. Score: 0
After item #3 with ID S005: estimated ability and standard error: -0.39039039039039025, 0.26918390077873305
Item ID: S075, Correct Answer: diff, User Answer: same. Score: 0
After item #4 with ID S007: estimated ability and standard error: -0.4904904904904903, 0.21795263455719996
Item ID: S003, Correct Answer: diff, User Answer: same. Score: 0
After item #5 with ID S009: estimated ability and standard error: -0.5705705705705704, 0.1670767407810938
Item ID: S081, Correct Answer: diff, User Answer: same. Score: 0
After item #6 with ID S013: estimated ability and standard error: -0.6106106106106104, 0.22674237978014508
Item ID: S151, Correct Answer: same, User Answer: same. Score: 1
After item #7 with ID S015: estimated ability and standard error: -0.5705705705705704, 0.0647880110496236
Item ID: S065, Correct Answer: diff, User Answer: same. Score: 0
After item #8 with ID S017: estimated ability and standard error: -0.5705705705705704, 0.06046497163058291
Item ID: S083, Correct Answer: diff, User Answer: same. Score: 0
After item #9 with ID S019: estimated ability and standard error: -0.5705705705705704, 0.05724104458744014
Item ID: S023, Correct Answer: diff, User Answer: same. Score: 0
After item #10 with ID S021: estimated ability and standard error: -0.6706706706706704, 0.03326211160134737
Item ID: S0406, Correct Answer: diff, User Answer: same. Score: 0
// Many lines omitted here
Item ID: S1401, Correct Answer: same, User Answer: same. Score: 1
After item #138 with ID S1411: estimated ability and standard error: -1.3113113113113108, 0.017204674401946778
.......................
----------------------------------------------------------------------
Ran 56 tests in 11.572s
OK
I think in item_information_function()
, the problem is that when p_y1
approaches 0 or 1, the denominator p_y1 * (1 - p_y1)
approaches 0, causing division by very small numbers and resulting in overflow. In probability_y1()
, there is a potential numerical overflow in np.exp(a * (mu - b))
when the exponent becomes very large.
And a note: there may be more such numerical stability issues hiding in other math functions. To rescue our use case, changing item_information_function()
and probability_y1()
as done in the second commit was sufficient, but with other data more may show up.