Alternative to whereIn or optimizing whereIn for large queries

Published 3 months ago by droplister

I have a database of assets, addresses, and balances. I have a query that will fetch, for a given asset, what other assets addresses also have. So, like affinity assets or related assets.

"Database"

addresses
address (pk)

assets
asset_name (pk)

balances
address (index)
asset_name (index)

AssetsController

    // Addresses That Have This Asset
    $addresses = $asset->balances()->distinct('address')->pluck('address');

    // Assets The Addresses Also Have
    $relatedAssets = \App\Balance::where('asset_name', '!=', $asset->asset_name)
        ->whereIn('address', $addresses)
        ->selectRaw('COUNT(*) as count, asset_name')
        ->groupBy('asset_name')
        ->orderBy('count', 'desc')
        ->take(10)
        ->get();

And this works fine, when I have as many as 10,000 addresses, but there is one case where I have 150,000 addresses and it takes a very long time to execute, I would have to change my server settings to allow it to run longer. I'm okay with queries running for a while because I cache the result for a day, but this one is ultra long.

Is there something I can do beyond indexes (which I already have) to make this work better? I suspect this is a bad use of whereIn. Maybe I should I build a pivot table or... ?

Snapey
Snapey
3 months ago (995,255 XP)

indexing the address column is as good as it gets unless you can partition the database in some way.

Searching 150,000 records should be very quick. I just truncated a table with 650,000 records because it was starting to slow down on string searches (5 seconds instead of less than 1 second).

More likely count and group are having more of an impact.

measure it with and without count and with/without GroupBy

Cronix
Cronix
3 months ago (730,880 XP)

It's also faster to specify a column to count instead of just COUNT(*)

droplister

I tried defining a specific count and I dropped the groupby and the orderby to see how that goes, but I'm getting the same error...

Kind of a weird one because I'm not sure why Str is involved.

    return \App\Balance::current()
        ->where('asset_name', '!=', $asset->asset_name)
        ->whereIn('address', $addresses)
        ->selectRaw('COUNT(asset_name) as count, asset_name')
        ->take(10)
        ->get();
/vendor/laravel/framework/src/Illuminate/Support/Str.php
        }
 
        return $subject;
    }
 
    /**
     * Replace the first occurrence of a given value in the string.
     *
     * @param  string  $search
     * @param  string  $replace
     * @param  string  $subject
     * @return string
     */
    public static function replaceFirst($search, $replace, $subject)
    {
        if ($search == '') {
            return $subject;
        }
 
        $position = strpos($subject, $search);
 
        if ($position !== false) {
            return substr_replace($subject, $replace, $position, strlen($search));
        }
 
        return $subject;
    }
 
    /**
     * Replace the last occurrence of a given value in the string.
     *
     * @param  string  $search
     * @param  string  $replace
     * @param  string  $subject
     * @return string
     */
    public static function replaceLast($search, $replace, $subject)
    {
        $position = strrpos($subject, $search);
 
Arguments
"Maximum execution time of 30 seconds exceeded"
droplister

Even this causes the error above:

    \App\Balance::whereIn('address', $addresses)->first();
Cronix
Cronix
3 months ago (730,880 XP)

Kind of a weird one because I'm not sure why Str is involved.

What error is indicating a problem with Str? I'm sure they use the string library to create the actual queries from the query builder, such as

return \App\Balance::

generating SELECT xyz FROM balances to get the plural of Balance from the eloquent model name, etc.

Cronix
Cronix
3 months ago (730,880 XP)

Use explain on your query to see where the bottleneck is. If you don't have debugbar installed, I highly recommend it!

https://github.com/barryvdh/laravel-debugbar

Snapey
Snapey
3 months ago (995,255 XP)

What error?

droplister

https://imgur.com/TvFqytk

And when I increase execution time I get a 504.

Snapey
Snapey
3 months ago (995,255 XP)

have you indexed the address field?

how big is the array $addresses? that could be the culprit

droplister

102,217 addresses. I'm index'd to the hilt!

        Schema::create('assets', function (Blueprint $table) {
            // Columns
            $table->unsignedBigInteger('block_index')->index();
            $table->unsignedBigInteger('message_index')->default(0)->index();
            $table->string('asset_name');
            $table->string('asset_longname')->nullable();
            $table->string('type')->index();
            $table->string('owner')->nullable()->index();
            $table->string('issuer')->nullable()->index();
            $table->text('description')->nullable();
            $table->unsignedBigInteger('issuance')->default(0);
            $table->decimal('issuance_normalized', 27, 8)->default(0);
            $table->boolean('divisible')->default(0);
            $table->boolean('locked')->default(0);
            $table->datetime('confirmed_at')->index();
            $table->timestamps();
            // Indexes
            $table->primary('asset_name');
        });
        Schema::create('addresses', function (Blueprint $table) {
            // Columns
            $table->string('address');
            $table->string('type')->index();
            $table->unsignedBigInteger('options');
            $table->unsignedBigInteger('block_index')->index();
            $table->boolean('burn')->default(0);
            $table->datetime('confirmed_at')->index();
            $table->timestamps();
            // Indexes
            $table->primary('address');
        });
        Schema::create('balances', function (Blueprint $table) {
            // Columns
            $table->increments('id');
            $table->string('address')->index();
            $table->string('asset')->index();
            $table->unsignedBigInteger('quantity');
            $table->unsignedBigInteger('quantity_usd')->default(0);
            $table->unsignedBigInteger('message_index')->index();
            $table->unsignedBigInteger('block_index')->index();
            $table->boolean('current')->index();
            $table->unsignedInteger('quality_score')->default(0)->index();
            $table->datetime('confirmed_at')->index();
            $table->timestamps();
            // Indexes
            $table->index(['address', 'asset']);
        });
Cronix
Cronix
3 months ago (730,880 XP)

He said it could be up to 150k addresses in the first post. That's one hell of a query.

Snapey
Snapey
3 months ago (995,255 XP)

how big is the array $addresses? that could be the culprit

Please sign in or create an account to participate in this conversation.