jjudge's avatar

Handling blobs from the database as streams

I'm wondering if Laravel query builder or eloquent supports this or not.

I am copying some very large blobs from an SQLite database to a MySQL database. The BLOBs need some conversion in PHP, and the thrown into MySQL as a geometry object.

At the moment I select the source BLOB - that's a chunk of memory. Then I convert teh BLOB (by taking off a wrapper layer it has) to another format BLOB. Then I need to push the result into a MySQL table. Each of these stages holds massive binary objects in memory and I quickly run out.

My PHP BLOB conversion works on streams, so that itself does not eat into memory. Now, is there also a way to use streams at each database end? Can I pull the BLOB from the SQLite database into a stream rather than into a variable in memory? Can I feed a stream into the MySQL column rather than passing it a massive binary object in a string?

I'm kind of looking for a big data pipeline, SQLite->PHP converter->MySQL using streams instead of memory.

Any ideas? I would like to avoid using CLI tools to export to temporay files if I can help it.

0 likes
8 replies
jjudge's avatar

Chunking is great when handling multiple rows in chunks of rows (fetching a handful at a time). However, this a different probem.

What I have is a memory problem when fetching and writing a single row. These BLOBs can be 64MByte and some are bigger than that. They start in one database, and need to end up in another database, as well a going through a parsing and uwrapping process in the middle.

This is geographic data, and there is not much I can do about the size. It needs to go into MySQL so they can be handled with the OpenGEO functionality built into MySQL 8. The source database is nearly 4GByte in size, and it's amazing eloquent can read it while hardly breaking a sweat. I don't want to import the whole databaase, but just the geographical data for a throusand rows or thereabouts.

jjudge's avatar

It seems that PDO can bind BLOBs to streams using the PDO::PARAM_LOB option when binding. That's both for select and insert/update. So the underying database layer does support it. Now, in theory eloquent can be made to support it. If I can't see a way to do it, I'll just resort to plain PDO. Does eloquent expose the PDO connection to use directly?

Eloquent supports casting of columns to and from specific datatypes. Perhaps that is a way in to solving this. It's probably not though. I suspect get casting happens too far from the PDO data binding.

jjudge's avatar

Okay, the PDO object is easily fetched from the DB facade. I have tried this:

            $pdo = DB::getPdo();

            $statement = $pdo->prepare('update geographic_regions set geometry = ST_GeometryFromWKB(:wkb) where id = :id');

            $statement->bindParam(':wkb', $wkbStream, \PDO::PARAM_LOB);
            $statement->bindParam(':id', $id, \PDO::PARAM_INT);

            $statement->execute();

where $id is the ID of the record I'm updating, and $wkbStream is a stream to the 25MByte WKB binary.

Still running out of memory at this point, which is bizarre, because any additional memory needed to update the geometry should be in the database engine, not the PHP environment, surely? Some of the smaller blobs go in okay, but not the larger ones.

Allowed memory size of 134217728 bytes exhausted (tried to allocate 62566040 bytes)

It's allocating a block of memory big enough to hold the whole of the WKB or geometry on execute, when it should not need to if it's streaming it.

jjudge's avatar

If this were just a file, then without a doubt it belongs on storage outside of the relational database, and I would always advocate that.

What makes this different, is that this blob is a geoigraphic geometry object. MySQL will parse it, index its contents, and then provide many useful geographic functions for finding, retrieving and manipulating that data. That is why it needs to be inside the database.

An increase in memory is probably going to be what I have to do.

jjudge's avatar

Ah, someone else with my exact same problem:

https://stackoverflow.com/questions/60454008/why-does-pdoparam-lob-cause-an-allowed-memory-size-exhausted-error

Turns out it is possibly a 13 year old PDO bug, thoiugh that bug only refers to selecting blobs into a writable stream rather than updating a blob through a readable stream. The bug is that the stream is basically turned into a string in memory before being passed to the database engine. That would certainly do it! I also need to make sure I don't have emulation mode on (that mode builds queries in memory and expands the bindings, rather than passing the bindings to the database engine to put the query togather).

jjudge's avatar

Raising memory to 512MByte did the trick. The 64MByte geometrty took some time to go into the database - about 30 seconds to update - but I guess the database engine had a lot of indexing to do. It's an infrequent event, so does not need to be fast. Everything is running in containers, so we have total control of the environment.

tl;dr: streaming in large blobs was all nice in theory, but a decade-old bug in PHP PDO means it simply does not work, even if PDO is designed on the surface to support it. Throw some more memory at tge problem instead, and move on. Will revisit in 13 years time to see if the bug has been tackled.

Please or to participate in this conversation.