droplister's avatar

Alternative to whereIn or optimizing whereIn for large queries

I have a database of assets, addresses, and balances. I have a query that will fetch, for a given asset, what other assets addresses also have. So, like affinity assets or related assets.

"Database"

addresses
address (pk)

assets
asset_name (pk)

balances
address (index)
asset_name (index)

AssetsController

    // Addresses That Have This Asset
    $addresses = $asset->balances()->distinct('address')->pluck('address');

    // Assets The Addresses Also Have
    $relatedAssets = \App\Balance::where('asset_name', '!=', $asset->asset_name)
        ->whereIn('address', $addresses)
        ->selectRaw('COUNT(*) as count, asset_name')
        ->groupBy('asset_name')
        ->orderBy('count', 'desc')
        ->take(10)
        ->get();

And this works fine, when I have as many as 10,000 addresses, but there is one case where I have 150,000 addresses and it takes a very long time to execute, I would have to change my server settings to allow it to run longer. I'm okay with queries running for a while because I cache the result for a day, but this one is ultra long.

Is there something I can do beyond indexes (which I already have) to make this work better? I suspect this is a bad use of whereIn. Maybe I should I build a pivot table or... ?

0 likes
13 replies
Snapey's avatar

indexing the address column is as good as it gets unless you can partition the database in some way.

Searching 150,000 records should be very quick. I just truncated a table with 650,000 records because it was starting to slow down on string searches (5 seconds instead of less than 1 second).

More likely count and group are having more of an impact.

measure it with and without count and with/without GroupBy

Cronix's avatar

It's also faster to specify a column to count instead of just COUNT(*)

droplister's avatar

I tried defining a specific count and I dropped the groupby and the orderby to see how that goes, but I'm getting the same error...

Kind of a weird one because I'm not sure why Str is involved.

    return \App\Balance::current()
        ->where('asset_name', '!=', $asset->asset_name)
        ->whereIn('address', $addresses)
        ->selectRaw('COUNT(asset_name) as count, asset_name')
        ->take(10)
        ->get();
/vendor/laravel/framework/src/Illuminate/Support/Str.php
        }
 
        return $subject;
    }
 
    /**
     * Replace the first occurrence of a given value in the string.
     *
     * @param  string  $search
     * @param  string  $replace
     * @param  string  $subject
     * @return string
     */
    public static function replaceFirst($search, $replace, $subject)
    {
        if ($search == '') {
            return $subject;
        }
 
        $position = strpos($subject, $search);
 
        if ($position !== false) {
            return substr_replace($subject, $replace, $position, strlen($search));
        }
 
        return $subject;
    }
 
    /**
     * Replace the last occurrence of a given value in the string.
     *
     * @param  string  $search
     * @param  string  $replace
     * @param  string  $subject
     * @return string
     */
    public static function replaceLast($search, $replace, $subject)
    {
        $position = strrpos($subject, $search);
 
Arguments
"Maximum execution time of 30 seconds exceeded"
droplister's avatar

Even this causes the error above:

    \App\Balance::whereIn('address', $addresses)->first();
Cronix's avatar

Kind of a weird one because I'm not sure why Str is involved.

What error is indicating a problem with Str? I'm sure they use the string library to create the actual queries from the query builder, such as

return \App\Balance::

generating SELECT xyz FROM balances to get the plural of Balance from the eloquent model name, etc.

Snapey's avatar

have you indexed the address field?

how big is the array $addresses? that could be the culprit

droplister's avatar

102,217 addresses. I'm index'd to the hilt!

        Schema::create('assets', function (Blueprint $table) {
            // Columns
            $table->unsignedBigInteger('block_index')->index();
            $table->unsignedBigInteger('message_index')->default(0)->index();
            $table->string('asset_name');
            $table->string('asset_longname')->nullable();
            $table->string('type')->index();
            $table->string('owner')->nullable()->index();
            $table->string('issuer')->nullable()->index();
            $table->text('description')->nullable();
            $table->unsignedBigInteger('issuance')->default(0);
            $table->decimal('issuance_normalized', 27, 8)->default(0);
            $table->boolean('divisible')->default(0);
            $table->boolean('locked')->default(0);
            $table->datetime('confirmed_at')->index();
            $table->timestamps();
            // Indexes
            $table->primary('asset_name');
        });
        Schema::create('addresses', function (Blueprint $table) {
            // Columns
            $table->string('address');
            $table->string('type')->index();
            $table->unsignedBigInteger('options');
            $table->unsignedBigInteger('block_index')->index();
            $table->boolean('burn')->default(0);
            $table->datetime('confirmed_at')->index();
            $table->timestamps();
            // Indexes
            $table->primary('address');
        });
        Schema::create('balances', function (Blueprint $table) {
            // Columns
            $table->increments('id');
            $table->string('address')->index();
            $table->string('asset')->index();
            $table->unsignedBigInteger('quantity');
            $table->unsignedBigInteger('quantity_usd')->default(0);
            $table->unsignedBigInteger('message_index')->index();
            $table->unsignedBigInteger('block_index')->index();
            $table->boolean('current')->index();
            $table->unsignedInteger('quality_score')->default(0)->index();
            $table->datetime('confirmed_at')->index();
            $table->timestamps();
            // Indexes
            $table->index(['address', 'asset']);
        });
Cronix's avatar

He said it could be up to 150k addresses in the first post. That's one hell of a query.

Snapey's avatar

how big is the array $addresses? that could be the culprit

Please or to participate in this conversation.