Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

fernandogross's avatar

Problems finding last element of Query Builder cursor()

Hello!

I'm working on an export code and we decided to go with cursor() because we are exporting a massive ammount of data.

The problem is that I want to implement some kind of progress report, and working with cursor I only have access to one item at a time in memory, not knowing the total ammount of rows I am exporting makes it dificult.

I have a Query Builder like this:

$query = DB::connection('mysql_unbuffered')
	->table($table)
	->select($columns)
	->where('account_id, $accountId)
	->groupBy($groupBy)
        ->orderBy($orderBy);

Which I am using to work with a cursor() like this:

foreach ($query->cursor() as $key => $item) {
	// Gotta fire an event and write every row on a csv file
}

I'm working with an API and my plan is fire an event every 10% (not defined yet) of the export so my frontend can catch it with socket, something like this, but before that I need to figure it out how to know the progress of the export.

Do you guys have any ideia how I could proceed here? I thought about use the methods next() and valid() on $item in my foreach but it's an stdClass instance and these methods only work on Generator instances, which is the result of $query->cursor().

0 likes
13 replies
rodrigo.pedra's avatar

Can't you use count before iterating?

From the query builder which will run on the database as a separate query. If the where condition is on indexed fields it should run fast enough even for massive amounts of rows.

$query = DB::connection('mysql_unbuffered')
	->table($table)
	->select($columns)
	->where('account_id', $accountId)
	->groupBy($groupBy)
        ->orderBy($orderBy);

$count = $query->count(); // run count in DB
$threshold = intval(ceil($count / 10)); // each 10%

foreach($query->cursor() as $key => $item) {
    $percentage = $key * 100.0 / $count;

    if ($key % $threshold === 0) {
        // send notification
    }

    // process row
}
fernandogross's avatar

Can't, for some reason it returns a wrong count, maybe because it's just an instance of Builder, no get() was made to know how many models it got.

rodrigo.pedra's avatar

You can compare both counts and they should match.

The count on the builder will make a SQL count operation:

$count =  DB::connection('mysql_unbuffered')->table('users')->count();

// will make a SQL query to the database for counting records
// SELECT COUNT(*) FROM users;

The count after ->get() will count the collection results.

But they should match nevertheless.

One scenario I think they "wouldn't" match is if you are grouping by after fetching (after calling ->get()), which is unlikely as you are using the ->cursor().

EDIT: "wouldn't" is between quotes as the number of records would match, but calling ->count() after grouping by the ->get() results would count the number of groups, not records.

Also see scenario below on using ->distinct().

rodrigo.pedra's avatar

Just thought of it, if you are using ->distinct(['column']) the counts can mismatch too if you are selecting different columns.

For example:

// distinct and select use the same columns
// -> same count
User::distinct(['name'])->select(['name'])->cursor()->count(); // 6
User::distinct(['name'])->select(['name'])->count(); // 6
// distinct and select do not use the same columns
// -> different counts
User::distinct(['name'])->cursor()->count(); // 7
User::distinct(['name'])->count(); // 6

But that wouldn't make sense, as the distinct would have no effect on the ->cursor() result.

Note that calling ->count() after ->cursor() will count the LazyCollection results, similar to issuing a ->get()->count().

fernandogross's avatar

Note that calling ->count() after ->cursor() will count the LazyCollection results, similar to issuing a ->get()->count().

True, and that's not really good, kinda kills the purpose of using cursor().

The workaround I've implemented is fire the event every 1000 columns and at the end fire another event with the rest (with help of a variable which iterates through the foreach), not cool, but somewhat does the trick, will keep looking after it.

rodrigo.pedra's avatar

Note that in my first suggestion I told you to call ->count() before calling the ->cursor() not after it.

The ->count() would issue a single database query for counting (SELECT COUNT(*) FROM table)

And then the ->cursor() would perform a second separated query.

Calling ->count() on the builder object BEFORE calling ->cursor() does not mutate the builder object, so both queries are independent and run on the DB. The ->count() query does not fetch all records for counting.

Maybe I am not expressing myself clearly, but you can try the code from my first sample while listening for DB events.

https://laravel.com/docs/8.x/database#listening-for-query-events

Only two queries will be issued, one for ->count() and one for ->cursor().

Again, on my first sample I did not suggest to call ->count() AFTER calling ->cursor(), I used it as an example on a later response to highlight the difference on calling ->count() BEFORE calling ->cursor().

Maybe what is confusing is that my code sample reuse the same query object. But I based my code sample on yours as you were building the query builder object on a temporary variable before calling ->cursor().

I have a package that logs all the queries, in case you find it easier:

https://github.com/rodrigopedra/laravel-query-logger

fernandogross's avatar

Sure, sure, I've understood that.

I tried using your first suggestion before opening this thread, as I said before, the return doesn't match the reality, I just can't see why.

rodrigo.pedra's avatar
  1. Create a new Laravel app (from default)
  2. Set up database and .env file
  3. Run migrations
  4. Add the query logger package (composer require rodrigopedra/laravel-query-logger)
  5. set APP_DEBUG=false on .env

6 - Add these routes to the project's ./route/web.php

<?php

use App\Models\User;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Route;

function formatBytes($bytes, $precision = 2)
{
    $units = ['B', 'KB', 'MB', 'GB', 'TB'];

    $bytes = max($bytes, 0);
    $pow = floor(($bytes ? log($bytes) : 0) / log(1024));
    $pow = min($pow, count($units) - 1);
    $bytes /= pow(1024, $pow);

    return round($bytes, $precision) . ' ' . $units[$pow];
}

Route::get('/seed', function () {
    User::factory(50000)->create();

    return 'seeded';
});

Route::get('/', function () {
    $query = DB::table('users');

    $count = $query->count(); // run count in DB
    $threshold = intval(ceil($count / 10)); // each 10%

    return response()->stream(function () use ($query, $count, $threshold) {
        echo '<pre>', PHP_EOL;

        foreach ($query->cursor() as $key => $item) {
            $percentage = $key * 100.0 / $count;

            if ($key % $threshold === 0) {
                //
                echo '########## notification:', intval($key / $threshold), '/10', PHP_EOL;
            }

            // process row
            echo $key, ' - ', $item->email, PHP_EOL;
        }

        echo '########## notification: 10/10', PHP_EOL;
        echo 'memory:', formatBytes(memory_get_peak_usage(true)), PHP_EOL;
        echo 'time: ', (microtime(true) - LARAVEL_START), PHP_EOL;
    });
});

7 - Run php artisan serve

7 - Visit

Queries logged:

select count(*) as aggregate from `users`
select * from `users`

Results:

########## notification:0/10
0 - [email protected]
1 - [email protected]
2 - [email protected]
3 - [email protected]
4 - [email protected]
########## notification:1/10
5 - [email protected]
6 - [email protected]
7 - [email protected]
8 - [email protected]
9 - [email protected]
########## notification:2/10
10 - [email protected]
11 - [email protected]
12 - [email protected]
13 - [email protected]
14 - [email protected]
########## notification:3/10
15 - [email protected]
16 - [email protected]
17 - [email protected]
18 - [email protected]
19 - [email protected]
########## notification:4/10
20 - [email protected]
21 - [email protected]
22 - [email protected]
23 - [email protected]
24 - [email protected]
########## notification:5/10
25 - [email protected]
26 - [email protected]
27 - [email protected]
28 - [email protected]
29 - [email protected]
########## notification:6/10
30 - [email protected]
31 - [email protected]
32 - [email protected]
33 - [email protected]
34 - [email protected]
########## notification:7/10
35 - [email protected]
36 - [email protected]
37 - [email protected]
38 - [email protected]
39 - [email protected]
########## notification:8/10
40 - [email protected]
41 - [email protected]
42 - [email protected]
43 - [email protected]
44 - [email protected]
########## notification:9/10
45 - [email protected]
46 - [email protected]
47 - [email protected]
48 - [email protected]
49 - [email protected]
########## notification: 10/10
memory:2 MB
time: 0.0072391033172607

Note memory was 2MB and time 7ms. I tested with APP_DEBUG=false.

I ran with 50,000 (50 thousands) records (seeded from tinker), will post only the memory and time results for brevity:

memory:22.11 MB
time: 0.15305399894714
  • memory: 22.11MB
  • time: 153ms
JeromeFitzpatrick's avatar

Calling count on the query that has a groupBy clause will result in only counting the aggregate result of the first group by clause since the result will return multiple rows E.g.

GroupColumn Count
New York          16
Chicago              4
Orlando             5

So instead of getting 3, you will get 16...

rodrigo.pedra's avatar

Used ->eachById() to run with 500,000:

memory:2 MB
time: 1.5480210781097
<?php

use App\Models\User;
use Illuminate\Support\Facades\DB;
use Illuminate\Support\Facades\Route;

function formatBytes($bytes, $precision = 2)
{
    $units = ['B', 'KB', 'MB', 'GB', 'TB'];

    $bytes = max($bytes, 0);
    $pow = floor(($bytes ? log($bytes) : 0) / log(1024));
    $pow = min($pow, count($units) - 1);
    $bytes /= pow(1024, $pow);

    return round($bytes, $precision) . ' ' . $units[$pow];
}

Route::get('/seed', function () {
    User::factory(50000)->create();

    return 'seeded';
});

Route::get('/', function () {
    $query = DB::table('users');

    $count = $query->count(); // run count in DB
    $threshold = intval(ceil($count / 10)); // each 10%

    return response()->stream(function () use ($query, $count, $threshold) {
        echo '<pre>', PHP_EOL;

        $query->eachById(function ($item, $key) use ($count, $threshold) {
            $percentage = $key * 100.0 / $count;

            if ($key % $threshold === 0) {
                echo '########## notification:', intval($key / $threshold), '/10', PHP_EOL;
            }

            // process row
            echo $key, ' - ', $item->email, PHP_EOL;
        }, 1000, 'id');

        echo '########## notification: 10/10', PHP_EOL;
        echo 'memory:', formatBytes(memory_get_peak_usage(true)), PHP_EOL;
        echo 'time: ', (microtime(true) - LARAVEL_START), PHP_EOL;
    });
});
JeromeFitzpatrick's avatar

There may be a better way of doing it but this works:

        $count = DB::select(
            DB::raw(
                "select count(*) as count from (" . 
                
                  DB::connection('mysql_unbuffered')
	                ->table($table)
	                ->select($columns)
	                ->where('account_id, $accountId)
	                ->groupBy($groupBy)
                        ->orderBy($orderBy) .
                ") as myCountTable"
            )
        )[0]->count;

This issue is that just adding count to the query returns multiple rows and it will only give you the count of the first aggregated row, so the above wraps the results in a sub query and counts the results of the subquery instead.

EDIT: I've named the count column and returned it to a variable for completeness

2 likes

Please or to participate in this conversation.