marcjmcdougall's avatar

Atomic updates possible for JSON fields on Eloquent Models?

The Problem

Hi folks -

Running into some strange issues with multi-threaded behavior where I have multiple nested, queued Jobs that each modify a JSON object ($info) on my model.

Each job simply adds a new key-value pair to the array, however when they do so they seem to be overwriting the entire JSON object at once, despite me using the atomic JSON update syntax found in the Laravel docs.

This problem only emerges when I use Horizon to spin up more than one queue worker, as I have some jobs that take much longer than other jobs, presumably overwriting older data added to the model with the newer data.

Code Examples

My Model configuration:

protected $fillable = [
        'url',
        'email',
        'status',
        'progress',
        'html_content',
        'title',
        'screenshot_path',
        'meta_description',
        'issues',
        'info',
        'info->messaging_evaluation',
        'info->performance_metrics',
        'info->html_size_kb',
        'info->image_count',
        'score',
        'completed_at'
    ];

My parent Job:

Bus::batch([
            new EvaluateLoadTime($this->quickScan),
            new EvaluateCopy($this->quickScan),
            new EvaluateImages($this->quickScan),
        ])->then(function (Batch $batch) use ($quickScan) {
            // Now that everything is truly done, inform the user
            dispatch(new Inform($quickScan)); // Email the visitor

            Log::info('✅ All eval jobs completed.');
        })->finally(function (Batch $batch) use ($quickScan) {
            // Log::info('⚠️ Batch processing completed (success or failure)');
        })->dispatch();

And this is how I am mutating the Job JSON object in the child jobs:

/**
     * Set a key in the info array
     *
     * @param string $key The key to set
     * @param mixed $value The value to set
     * @param array $additional Additional fields to update
     * @return bool Whether the update was successful
     */
    public function setInfo($key, $value, array $additional = [])
    {
        // Problem: This should update ONLY $info->key
        $updateData = ["info->{$key}" => $value];
        
        // Add any additional fields
        if ( ! empty($additional)) {
            $updateData = array_merge($updateData, $additional);
        }
        
        // Perform the atomic update
        return $this->update($updateData);
    }

More Info

What I have tried so far:

  • Passing IDs instead of serialized QuickScan Eloquent models.
  • Moving all the fields out of the $info array, and onto the model, but this is not scalable.
  • Refreshing the object before updating in setInfo(): $this->refresh();
    • (This actually works, but it forces a new database query every time I want to update a key-value pair on my $info JSON object, not ideal)

Note: Nowhere in any of the child jobs is the $info object updated any other way. The setInfo() function is always used.

My Question

Am I doing something wrong here? Or is this simply not possible in Laravel?

0 likes
5 replies
krisi_gjika's avatar

where did you read that "info->{$key}" is some sort of atomic JSON update syntax? This still syncs the entire JSON object in memory with your DB.

Although I don't see why you need to split what seems like a processing job into smaller chunks all writing to the same model. I think it would be best to just dispatch 1 job, and than you would not have concurrency issues.

If not possible you can either dispatch the Bus as a "chain", instead of a "batch"; or look into https://laravel.com/docs/12.x/queries#pessimistic-locking

1 like
marcjmcdougall's avatar

@krisi_gjika -

where did you read that "info->{$key}" is some sort of atomic JSON update syntax? This still syncs the entire JSON object in memory with your DB.

Laracasts won't let me post links, but if you go to the Laravel docs and append this relative URL, you should find it: /docs/5.3/queries#updating-json-columns

Not sure if I'm following the right syntax, and of course this is very old documentation.

Although I don't see why you need to split what seems like a processing job into smaller chunks all writing to the same model. I think it would be best to just dispatch 1 job, and than you would not have concurrency issues.

The reason is that each job takes ~3-5 minutes to resolve, and they were timing out before. I could just extend the timeout duration, but also seems a bit inefficient to tie up workers for so long.

I'm also broadcasting progress updates in real time to the front-end, and I can move faster by doing the 3 evaluation jobs concurrently.

If not possible you can either dispatch the Bus as a "chain", instead of a "batch"; or look into [pessimistic locking]

Yeah, can't chain them unfortunately, or it takes the job from ~3 minute total to ~9 minutes, which is not great for UX.

Will look into pessimistic locking, this might be the key here!

1 like
krisi_gjika's avatar
Level 14

@marcjmcdougall I still don't see anything in the docs about this but anyhow, I think blocking a worker for 10 min is not much different from blocking 3 workers for 3 min. If you have more jobs that your workers can handle without backlogging to me it seems the same. Remember you can use custom priorities and/or queues to still keep some workers on your other jobs, like sending emails and what not.

Also your problem would likely be solved if you didn't use json. You can push your json into a scan_evaluations table, example something like ['scan_id', 'evaluation_type', 'started_at', 'completed_at']. So that each job can write to a new row in this table

1 like
marcjmcdougall's avatar

@krisi_gjika -

Also your problem would likely be solved if you didn't use json. You can push your json into a scan_evaluations table, example something like ['scan_id', 'evaluation_type', 'started_at', 'completed_at']. So that each job can write to a new row in this table.

Yeah, this makes sense - I was hoping to use jsON so that I wouldn't have to run a new migration script whenever I wanted to add a new field to the info array.

Seems like the solution to my problem is the following:

  1. Provision more server workers or optimize queuing to minimize worker downtime for critical tasks.
  2. Re-evaluate pessimistic queries or locking the model for each transaction.
  3. Use a dedicated model for ScanResult, which contains all my fields as dedicated values.

Thank you for taking the time to help me. :)

krisi_gjika's avatar

@marcjmcdougall think I found another solution

$updateData = ["info->{$key}" => $value];
        
// Add any additional fields
if ( !empty($additional)) {
    $jsonFields = array_filter($this->casts, fn ($cast) => in_array($cast, ['json', 'array']); // and other jsonable casts

    // allow only non json fields for column update
    $updateData = array_merge($updateData,
        array_filter(
            $additional, 
            fn ($key) => !in_array($key, $jsonFields), 
            ARRAY_FILTER_USE_KEY)
    );
}

$updateQuery = $this->newModelQuery();

$updateQuery->whereKey($this->getKey()) // constrain query to this model only
          ->update($updateData);

// results in: update `table` set `info` = json_set(`info`, '$."key"', ?), `table`.`updated_at` = ? where `table`.`id` = ?
// where using $this->update() would result in: update `table` set `info` = ?, `table`.`updated_at` = ? where `id` = ?

the downside is that the in-memory model being updated would not be up to date anymore, since the update was performed in another query.

Please or to participate in this conversation.