vitnasinec's avatar

Locale aware collection sorting

Hi, how do I sort strings alphabetically with various accents and other special characters? For me it's primarily czech. This is the correct order:

$characters = ['a', 'á', 'c', 'č', 'd', 'ď', 'e', 'é', 'ě', 'i', 'í', 'n', 'ň', 'o', 'ó', 'r', 'ř', 's', 'š', 't', 'ť', 'u', 'ú', 'ů', 'z', 'ž'];

but

collect($characters)->sort(SORT_LOCALE_STRING);

sorts all the accented characters last

['a', 'c', 'd', 'e', 'i', 'n', 'o', 'r', 's', 't', 'u', 'z', 'á', 'é', 'í', 'ó', 'ú', 'č', 'ď', 'ě', 'ň', 'ř', 'š', 'ť', 'ů', 'ž']

I have set config app.locale to cs, I have tried to set locale using

setlocale(LC_ALL, 'cs_CZ');

Nothing works. The only working way I could find is to use Collator

$collator = new Collator(config('app.locale'));
$collator->sort($characters);

I ended up creating collection macro for that but still, the ->sort(SORT_LOCALE_STRING) should work, doesn't it?

Thanks

0 likes
1 reply
LaryAI's avatar
Level 58

It seems like you've encountered a common issue when dealing with locale-specific sorting in PHP. The sort() function with the SORT_LOCALE_STRING flag does not always behave as expected, especially when dealing with non-English characters and specific locale nuances.

The Collator class from the intl extension is indeed the recommended way to handle locale-aware sorting in PHP. This class provides a much more reliable sorting mechanism for strings based on locale settings. Here's how you can use it effectively:

  1. Ensure the intl extension is enabled in your PHP installation.
  2. Use the Collator class to sort your array as you've already discovered.

Since you mentioned that you've created a collection macro for this, I'll provide a more detailed example of how you might implement such a macro for a Laravel collection. This will encapsulate the sorting logic, making it reusable across your application:

use Illuminate\Support\Collection;

// Register a macro for locale-aware sorting
Collection::macro('localeSort', function () {
    $collator = new Collator(config('app.locale'));
    $items = $this->all();
    $collator->sort($items);
    return new static($items);
});

// Usage
$characters = ['a', 'á', 'c', 'č', 'd', 'ď', 'e', 'é', 'ě', 'i', 'í', 'n', 'ň', 'o', 'ó', 'r', 'ř', 's', 'š', 't', 'ť', 'u', 'ú', 'ů', 'z', 'ž'];
$sortedCharacters = collect($characters)->localeSort();

print_r($sortedCharacters->all());

This macro localeSort can be placed in a service provider or a bootstrap file, where it's loaded early in your application's lifecycle. Once defined, you can use localeSort on any collection instance to sort it according to the Czech locale or any other locale specified in your application's configuration.

Regarding your question about why ->sort(SORT_LOCALE_STRING) does not work as expected, it's primarily because PHP's internal locale handling can be inconsistent and is highly dependent on the server's locale settings and capabilities. The Collator class provides a more robust and reliable solution by leveraging the Unicode CLDR data, which is why it's generally preferred for applications needing precise locale-aware sorting.

Please or to participate in this conversation.