Fix incorrect MultiLevel construction in outlier_frames.compute_deviations#3247
Open
Fix incorrect MultiLevel construction in outlier_frames.compute_deviations#3247
Conversation
Add a new test module exercising outlier_frames.compute_deviations. Introduces fixtures to build dense and sparse multi-animal DataFrames, plus mocks for SARIMAX fitting and HDF writing. Adds a regression test ensuring sparse layouts preserve only actual streams (not the Cartesian product of non-'coords' levels) and a behavior-preservation test verifying dense layouts match the old product-based column ordering. Tests also assert output shapes, selectable derived-stat levels, zero-distance behavior with a deterministic fake fitter, expected SARIMAX call counts, and that the 'full' storeoutput path attempts persistence.
Pass the --pytest-test-first argument to the name-tests-test pre-commit hook so the hook runs with pytest's test-first behavior when checking test names. This change only updates .pre-commit-config.yaml.
Replace MultiIndex.from_product with a MultiIndex built from the existing keypoint coordinate combinations (preserving their original order). The change detects the 'coords' level, selects base columns for the 'x' coordinate, appends statistical fields (distance, sig, meanx, etc.), and constructs a MultiIndex.from_tuples with an added 'stats' level. This avoids generating invalid/extra keypoint combinations and keeps column ordering consistent when assembling the deviations DataFrame.
Simplify extraction of base columns by replacing manual level-index lookup and droplevel logic with Dataframe.xs("x", axis=1, level="coords", drop_level=True).columns. Updated comment to note that 'y' could be used interchangeably. This makes the code clearer and reduces explicit MultiIndex handling.
2 tasks
2 tasks
deruyter92
approved these changes
Mar 20, 2026
Collaborator
deruyter92
left a comment
There was a problem hiding this comment.
Good fix!
Will be revisited during the keypoint refactoring, ofc. But good to include this fix before we work on that. (Great that you added tests BTW)
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Scope
Fixes
extract_outlier_frames(..., outlieralgorithm="fitting")for multi-animal projects with sparse column layouts (e.g. projects usinguniquebodyparts).The current code was rebuilding a MultiIndex over all unique, non-
coordslevel values, which would create invalidindividual × bodypartcombinations that do not actually exist, leading to a shape mismatch when constructing the output dataframe.This occurs in the
fittingbranch ofextract_outlier_frames.Fix
Instead of rebuilding the output columns with
MultiIndex.from_product(...), derive the base tuples from the existing dataframe columns by selecting the"x"columns and dropping thecoordslevel, preserving only the valid keypoint combinations in their original order.We then append the derived stats to those actual stream tuples, keeping the output columns aligned with
np.concatenate(preds, axis=1).Tests
Two tests were added :
coordscombination is valid.MultiIndexfrom the actual stream tuples and no longer trigger a shape mismatch.