Bug: multiple workers reserving the same key in Job.reserve() in 2.1

Bug Report

Description

When using populate(reserve_jobs=True) with multiple workers (e.g., SLURM array jobs), multiple workers can successfully "reserve" the same key and call make() simultaneously.
I noticed this when sumbitting multiple SLURM jobs that populate the same table: their output logs confirmed that 6/7 jobs were executing make() on the same key. This job reservation system has always been a nice feature of datajoint for distributed computing - preventing redundant computations and collisions.

Current workaround

Currently I think this is a rare occurance, probably made slightly more common by the fact that I am accessing my server across a network (not hosted locally) and submitting sbatch arrays. My workaround right now has been adding a random sleep (0-30s) before populate() in the SLURM script to stagger worker start times. With that in place I have not noticed the duplicate make / race.

I'm not very familiar with how datajoint manages job reservations, but below is what claude suggested might be the issue. Adding it to this post in case it is useful.

Job.reserve() in DataJoint 2.1 uses a non-atomic SELECT-then-UPDATE pattern that allows multiple workers to reserve the same job simultaneously. This is a regression from the 0.13.x approach which used an atomic INSERT ... ON DUPLICATE KEY pattern that was inherently atomic.

Root Cause

Job.reserve() (jobs.py:430-473) performs a check-then-act without atomicity:
def reserve(self, key: dict) -> bool:
    # Step 1: SELECT — check if job is pending
    job = (self & key & "status='pending'" & "scheduled_time <= CURRENT_TIMESTAMP(3)").to_dicts()
    if not job:
        return False

    # Step 2: UPDATE — mark as reserved
    pk = self._get_pk(key)
    update_row = {**pk, "status": "reserved", ...}
    try:
        self.update1(update_row)  # UPDATE ... SET status='reserved' WHERE <pk>
        return True
    except Exception:
        return False
The UPDATE's WHERE clause matches only on primary key, not on status='pending'. So if two workers both read the row as 'pending' before either updates, both UPDATEs succeed — the second simply overwrites the first worker's reservation.

Comparison with 0.13.x

The old JobTable.reserve() (0.13.x) used an atomic INSERT pattern:
def reserve(self, table_name, key):
    job = dict(key, table_name=table_name, status='reserved',
               host=platform.node(), pid=os.getpid(), ...)
    try:
        self.insert1(job)  # INSERT — fails with DuplicateError if row exists
    except DuplicateError:
        return False        # Another worker already has this key
    return True
This is inherently atomic: the first INSERT wins, all others get DuplicateError. No window exists between check and action.

Suggested Fix

Option A — Add a WHERE clause to the UPDATE:
UPDATE jobs SET status='reserved', ...
WHERE table_name=... AND key_hash=... AND status='pending'
Then check affected_rows == 1 to determine success. This is a single atomic operation.

Option B — Use SELECT ... FOR UPDATE before the check:
SELECT * FROM jobs WHERE ... AND status='pending' FOR UPDATE
This acquires a row-level lock, preventing concurrent readers from seeing the row as pending.

Option C — Restore the INSERT-based approach from 0.13.x, which was atomic by design.

Environment

datajoint 2.1.0

Python 3.12

MySQL 8.0

SLURM cluster, 7 concurrent array tasks calling populate(reserve_jobs=True)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bug: multiple workers reserving the same key in Job.reserve() in 2.1 #1398

Bug Report

Description

Current workaround

Root Cause

Comparison with 0.13.x

Suggested Fix

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Bug: multiple workers reserving the same key in Job.reserve() in 2.1 #1398

Description

Bug Report

Description

Current workaround

Root Cause

Comparison with 0.13.x

Suggested Fix

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions