Be part of JetBrains PHPverse 2026 on June 9 – a free online event bringing PHP devs worldwide together.

philkoury's avatar

Stripping Invisible Characters before passing to DB

I recently upgraded a project from Laravel 3 to Laravel 11 (I know). I also upgraded the database and my tables use InnoDB with utf8mb4_unicode_ci.

Everything was running pretty great for a while but I just had this issue pop up and I'm not entirely sure how to handle it the best way.

I have invisible characters that are sometimes posted by users (they don't know - they just copy and paste from Word docs) such as a Zero Width Non-Joiner (<0x200c> or U+200C) and they throw this exception:

SQLSTATE[HY000]: General error: 3988 Conversion from collation utf8mb4_unicode_ci into latin1_swedish_ci impossible for parameter

I wasn't sure how to handle this yet, so I temporarily wrapped the input in str()->ascii() and that stripped the offending characters. I want a more appropriate solution though since this seems like a very hack-ish way to fix this. Is there some way to globally sanitize these things, or is there something I missed in Laravel docs?

0 likes
7 replies
LaryAI's avatar
Level 58

To handle the issue of invisible characters being posted by users and causing database errors, you can create a middleware in Laravel to sanitize the input before it reaches your database. This way, you can globally sanitize the input and remove any unwanted characters.

Here's a step-by-step solution:

  1. Create a Middleware: First, create a new middleware using the Artisan command:

    php artisan make:middleware SanitizeInput
    
  2. Implement the Middleware: Open the newly created middleware file located at app/Http/Middleware/SanitizeInput.php and implement the logic to remove invisible characters.

    <?php
    
    namespace App\Http\Middleware;
    
    use Closure;
    use Illuminate\Http\Request;
    
    class SanitizeInput
    {
        /**
         * Handle an incoming request.
         *
         * @param  \Illuminate\Http\Request  $request
         * @param  \Closure  $next
         * @return mixed
         */
        public function handle(Request $request, Closure $next)
        {
            $input = $request->all();
    
            array_walk_recursive($input, function (&$input) {
                // Remove invisible characters
                $input = preg_replace('/[\x{200B}-\x{200D}\x{FEFF}]/u', '', $input);
            });
    
            $request->merge($input);
    
            return $next($request);
        }
    }
    
  3. Register the Middleware: Register the middleware in your app/Http/Kernel.php file. You can add it to the global middleware stack or to a specific group (e.g., web or api).

    protected $middleware = [
        // Other middleware
        \App\Http\Middleware\SanitizeInput::class,
    ];
    
  4. Test the Middleware: Ensure that the middleware is working correctly by testing your application. Any input containing invisible characters should now be sanitized before being processed by your application and stored in the database.

This approach ensures that all incoming requests are sanitized globally, preventing the issue of invisible characters causing database errors.

RicardoAugustoTulio's avatar

As the forum's AI suggested, I think you should use middleware, with it you will be able to handle any and all inputs globally.

Since you are creating middleware to handle inputs, you can also implement the ezyang/htmlpurifier package to prevent script injections and related attacks

Something like:

public function handle($request, Closure $next) {

    $sanitized = array_map(function ($input) {
        return Str::ascii($input);
    }, $request->except('_token'));

    $request->merge($sanitized);

 //purifier(optional)
    $purifierConfig = HTMLPurifier_Config::createDefault();
    $purifier = new HTMLPurifier($purifierConfig);

    $input = $request->all();
    array_walk_recursive($input, function (&$value) use ($purifier) {
        $value = $purifier->purify($value);
    });


    $request->merge($input);

    return $next($request);
}
philkoury's avatar

Thanks for the reply. I feel like I’m missing something here entirely though. While it might be a solution to the problem, it doesn’t feel like I am addressing the correct problem.

This issue can’t be incredibly uncommon so creating a middleware makes me feel like there is a different approach entirely that isn’t being considered. Or is everyone creating a middleware as part of their initial install of Laravel?

MohamedTammam's avatar

In the config/database.php set charset and collation values to match your database, like utf8mb4 and utf8mb4_unicode_ci in the corresponding database driver you're using.

philkoury's avatar

@MohamedTammam from my config/database.php. I'm using mysql.

'charset' => 'utf8mb4',
'collation' => 'utf8mb4_unicode_ci',
philkoury's avatar

@MohamedTammam I know it's been a while but I just wanted to pop in and say that this is correct, but also, the columns needed to be updated with the right charset and collation as well.

1 like

Please or to participate in this conversation.