Skip to content

Fix incorrect MultiLevel construction in outlier_frames.compute_deviations#3247

Open
C-Achard wants to merge 6 commits intomainfrom
cy/fix-outlier-fitting-mode
Open

Fix incorrect MultiLevel construction in outlier_frames.compute_deviations#3247
C-Achard wants to merge 6 commits intomainfrom
cy/fix-outlier-fitting-mode

Conversation

@C-Achard
Copy link
Collaborator

Scope

Fixes extract_outlier_frames(..., outlieralgorithm="fitting") for multi-animal projects with sparse column layouts (e.g. projects using uniquebodyparts).
The current code was rebuilding a MultiIndex over all unique, non-coords level values, which would create invalid individual × bodypart combinations that do not actually exist, leading to a shape mismatch when constructing the output dataframe.
This occurs in the fitting branch of extract_outlier_frames.

Fix

Instead of rebuilding the output columns with MultiIndex.from_product(...), derive the base tuples from the existing dataframe columns by selecting the "x" columns and dropping the coords level, preserving only the valid keypoint combinations in their original order.
We then append the derived stats to those actual stream tuples, keeping the output columns aligned with np.concatenate(preds, axis=1).

Tests

Two tests were added :

  • Behavior preserved for dense column layouts: verifies that the fixed implementation preserves the previous output layout when every non-coords combination is valid.
  • Regression test for sparse column layouts: verifies that sparse maDLC-style layouts reconstruct the MultiIndex from the actual stream tuples and no longer trigger a shape mismatch.

Add a new test module exercising outlier_frames.compute_deviations. Introduces fixtures to build dense and sparse multi-animal DataFrames, plus mocks for SARIMAX fitting and HDF writing. Adds a regression test ensuring sparse layouts preserve only actual streams (not the Cartesian product of non-'coords' levels) and a behavior-preservation test verifying dense layouts match the old product-based column ordering. Tests also assert output shapes, selectable derived-stat levels, zero-distance behavior with a deterministic fake fitter, expected SARIMAX call counts, and that the 'full' storeoutput path attempts persistence.
Pass the --pytest-test-first argument to the name-tests-test pre-commit hook so the hook runs with pytest's test-first behavior when checking test names. This change only updates .pre-commit-config.yaml.
Replace MultiIndex.from_product with a MultiIndex built from the existing keypoint coordinate combinations (preserving their original order). The change detects the 'coords' level, selects base columns for the 'x' coordinate, appends statistical fields (distance, sig, meanx, etc.), and constructs a MultiIndex.from_tuples with an added 'stats' level. This avoids generating invalid/extra keypoint combinations and keeps column ordering consistent when assembling the deviations DataFrame.
Simplify extraction of base columns by replacing manual level-index lookup and droplevel logic with Dataframe.xs("x", axis=1, level="coords", drop_level=True).columns. Updated comment to note that 'y' could be used interchangeably. This makes the code clearer and reduces explicit MultiIndex handling.
@C-Achard C-Achard requested a review from deruyter92 March 18, 2026 08:52
@C-Achard C-Achard self-assigned this Mar 18, 2026
@C-Achard C-Achard added the bug fix! fix for a real buggy one... label Mar 18, 2026
@C-Achard C-Achard linked an issue Mar 18, 2026 that may be closed by this pull request
2 tasks
Copy link
Collaborator

@deruyter92 deruyter92 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good fix!

Will be revisited during the keypoint refactoring, ofc. But good to include this fix before we work on that. (Great that you added tests BTW)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix! fix for a real buggy one...

Projects

None yet

Development

Successfully merging this pull request may close these issues.

ValueError: Shape of passed values is... when extracting outlier frames

2 participants