Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

Jun 19, 2022

15

OP

Level 1

Proper way to loop jobs

Hello. I am fairly new to Laravel, and bee working with Laravel 8 for about half a year now. I was wondering, which is the proper way of handling a job that does hundreds of thousands of database insertions/updates? I've been struggling to decide between

foreach (CustomFacade::getElements() as $element) {
    MyCustomJob::dispatch($element);
}

and

// MyCustomJob class
public function handle() {
    foreach (CustomFacade::getElements() as $element) {
        // perform actions
    }
}

I'm wondering if one is "better" than the other in terms of server loads, querying, or any other parameter or concerns that I am not quite familiar with yet.

Thank you for your time, have a great day!

EDIT: Corrected missing parenthesis on second example.

Level 27

Apart from the fact that you’re missing the parentheses after getElements in the second example, the only immediate difference I can think of is that there will be a tiny bit of overhead in repeatedly calling the dispatch method.

If CustomFacade::getElements() really can/does return hundreds of thousands or millions of elements, that tiny overhead may add up to something that can actually be felt, but it won’t be much.

Of course, if the ‘actions’ you’re performing are repeated SQL queries, it would make a lot of sense to use a single prepared statement and execute that X times (with different bound parameters for each iteration, of course), rather than using the Laravel query builder to build a new SQL statement for each iteration. That would probably have a very tangible impact on processing time, and it would be easier and cleaner to implement in the second format where you can just create the statement inside the handle() method and bind the parameters in the foreach loop.

OP

Level 1

@kokoshneta @snapey Upserts and prepared statements are not possible with the scenario i'm working on.

Let me be more clear about this, perhaps you can point me to another direction that I'm not currently lloking at because of not knowing all the features from the framework.

I have a model of elements. When a customer opens a specific element from his/her account, Guzzle performs a GET request to an external API endpoint that gives me specifics about that item, such as max price, min price, amount sold, etc, and the model is updated with new data. The point is, not every customer has every item, and items are only getting updated when a customer specifically opens it, so, if no one opens item 434343, I cannot show updated information about it. So, CustomFacade::getElements() retrieve elements as a JSON from that external API with a list of all current elements in the API. Those elements are then compared to the database, and removed all database elements that are no longer in the JSON since last request (for example, elements that a user hasn't opened in a while and are no longer available in the API but still stored in the database). Those elements who are present in JSON remain in database, and for each of them I need to redo a request to another endpoint of the same API to get specifics of each element that are not present in the previous request (first request gives me only the list of elements, second request gives me max price, min price, amount sold in last 24 hours, max items in same order, for the specific element I'm asking for in the request). After retrieving that data, I compare it to the database for updated_at > 72hs or current ammount sold in the last 24hs > 0 and > last amount sold 24hs, current max price > last max price, current min price < last min price, etc. All those verifications are done before updating/inserting the model, because if no condition is true, then changes are not needed and model is not updated. The CustomFacade allows me to update/delete elements even when users don't open them, and keep my database relatively up-to-date with the external API information. For example, if user 38 opened item 1499 yesterday, and I run my job on schedule every monday, then item 1499 wont be updated because it was updated less than 72 hours ago. So, basically, I need to retrieve both database records, API data, compare them, remove unused, request more api data, compare to the remaining local data, update only if necessary, and the save.

I am thinking about splitting the elements in batches of around 50k of items daily, but I don't know where it will be better to iterate over those 50k elements, send the full array to the job and iterate inside, or iterate outside the job and create a job for each one of the 50k elements.

Thanks for your comments, this question perhaps sounds basic, but I'm quite new to the framework, and it has so many tools that it is hard for a newbie to find the specific tool to get things done in the best way possible.

Level 27

I don’t see why prepared statements wouldn’t work in your scenario. You can prepare multiple statements for different types of updates, then loop through elements and run the appropriate prepared statement according to what needs updating. I would guess you’d still see a performance boost over individual queries, even if the number of prepared statements is fairly large.

@Fabro wrote:

Those elements who are present in JSON remain in database, and for each of them I need to redo a request to another endpoint of the same API to get specifics of each element that are not present in the previous request

This doesn’t make sense to me. For each element that is in the JSON, you need to send a request to get specifics of each element that isn’t in the JSON?

OP

Level 1

@kokoshneta I thought that doing, for example, 12 prepared statements, filter each element, analyze it, and run the required prepared statement would probably be heavier than a single general query. Will dig deeper on that if you say that it won't. And yes. First JSON only has the IDs and basic description of the elements. After that, I need to do another request to another endpoint of the same API to get full data of each element (long description, tags, date added, last updated, current price, max price of last month, min price of last month, amount sold, average price of last month) using the ID of the element as data inside the request. One endpoint to get the full list, another endpoint to get full data of single element.

Best Answer

Level 27

@Fabro Oh, you meant that the specifics aren’t in the first request. Gotcha.

Prepared statements may not end up giving you a performance boost, but I think they will. The overhead of preparing half a dozen statements is very small compared to that of executing hundreds of thousands of statements in sequence.

If you boil it down to just two statements (either delete a record or update all columns in the record), that’s a further optimisation which will also make writing your code easier, since you won’t have to keep track of lots of different statements.

What would definitely not be faster is to have a prepared statement for each column to update, and then run a separate statement for each column (e.g., first update updated_at, then update sold_24h, then update sold_72h, etc.). That would result in way too many queries and would be much slower.

But something roughly along these lines should be fairly optimised – obviously very simplified:

class MyCustomJob {
	public function handle() {
		// In actual code, retrieve your PDO instance from Laravel
		$pdo = new PDO('...');
		$deleteStmt = $pdo->prepare("DELETE FROM elements WHERE id = ?");
		$updateStmt = $pdo->prepare("UPDATE elements SET updated_at = :updated_at, sold_72h = :sold_72h, sold_24h = :sold_24h, ... WHERE id = :id");

		// I’m assuming you get your DB and API elements as collections here
		$elementsInDB = Element::all();
		$elementsInAPI = CustomFacade::getElements();

		// Get all elements present in DB but absent from API as array
		$toDelete = array_diff($elementsInDB->pluck('id')->all(), $elementsInAPI->all());

		foreach ($toDelete as $id) {
			$elementsInAPI->forget($id);
			$deleteStmt->execute([$id]);
		}

		// I’m assuming here that your API endpoint can give you a list of details in a single call, filtered by IDs provided from an array
		$elementsToUpdate = CustomFacade::getElementSpecifics($elementsInAPI->all());

		foreach ($elementsToUpdate as $e) {
			// Do the fancy work here to decide whether the record should be updated or not
			$shouldUpdate = TRUE/FALSE;

			if ($shouldUpdate) {
				// Make an array containing only the columns specified in the prepared statement
				$arr = (array) $e;

				$updateStmt->execute((array) $e);
			}
		}
	}
}

1 like

OP

Level 1

@kokoshneta Great help, thank you very much. I will try to rework my code turning towards your solution. I cannot use your whole solution, because the details endpoint does not recieve an array. It specifically need one item at a time. Throttle is being set up at 100k requests every 24hs, so that's why I want to update around 50k each day. In that case, I think it may be the best solution to do each request and the put them alltogether in an array to send the whole array to the prepared statement, am I correct?

Level 27

@Fabro You can delete all the records that need deleting in one go – and actually, if you do that, there’s no point in preparing the statement first, since you’d only be running the delete query once. You’d then do something like this:

$toDelete = $elementsInDB->diff($elementsInAPI);
$toUpdate = $elementsInAPI->diff($toDelete);

// Remove unwanted elements
$pdo->query(
	"DELETE FROM elements WHERE id IN ({$toDelete->implode(', ')})"
);

(This may not be much faster than the original way, because the query will end up having a very long IN clause, which can slow down queries a lot – but it definitely shouldn’t be slower, and it’s simpler code.)

But you can’t update everything at once, because an update query can only affect one row. So there you’d still have to go one by one, roughly thus:

foreach ($toUpdate as $e) {
	$specs = CustomFacade::getElementSpecifics($e);

	// Do stuff to decide whether to update or not
	$shouldUpdate = TRUE/FALSE;

	if ($shouldUpdate) {
		// Make sure specs is an array with keys corresponding to specified columns
		$arr = (array) $e;
		$updateStmt->execute($e);
	}
}

1 like

Level 39

If your inserts/updates are totally independent, first solution gives you the automatism of job failure.

If not, you have to manage your queries inside transaction(s) and need to choose partially or totally the second way.

In term of work load, second way will be better, noticeable or not depending of your queries.

OP

Level 1

@sr57 Hello, thanks for replying. Yes, they are independent of each other. Besides that, I'm currently doing my best to manage exceptions on my Facade when I'm verifying each element, to prevent the job from failing, and instead logging the error and continue execution with next item in the loop. But I am struggling about wich better is in terms of server load, queue performance, and specifics about laravel jobs and queues that I may not be familiar with yet.

Level 39

In term of work load, second way will be better, noticeable or not depending of your queries.

1 like

Level 122

depends which queue driver.

I would definitely not put single database updates inside a job that itself requires multiple updates to the database

Best solution is to minimise transactions and batch work up into bulk upserts

OP

Level 1

@Snapey I tagged you in a previous reply, hope it gives you more specifics of what I'm trying to acomplish.

Level 122

@Fabro my observation is the same

OP

Level 1

@Snapey Sorry to ask, but how would you manage relationships while making upserts?

EDIT: Nevermind. Can't use upserts, because I have duplicated elements. Since users can personalize elements when they have them, I have duplicated IDs in the database for all of those elements that are from the same base element.

Level 39

I am thinking about ....

Clearly, once your main codes is written these 2 options are very quick to code, you can easily test both ways.

If not, choose the one is simpler for you and think / ask about performance improvement if you think it takes too long.

Please or to participate in this conversation.