martinszeltins's avatar

I'm seeing Chinese letters after converting UTF-16 to UTF-8

I have a UTF-16 encoded .CSV file. The field delimiter is a comma (,) and string delimiter is double quotes ("). Changing the encoding outside of PHP is not an option. I need to turn it into an associative array with PHP. But I can't seem to get it to recognize that UTF-16 encoding.

Here is my file contents for numbers.csv:

"Phone number"
"+1 55500718"
"+1 55551919"

I have successfully uploaded the file like this:

$file = $request->file('numbers-file');
$filename = $file->getPathName();

I also have a custom function csvToAssocArray() that turns it into an associative array. I am using mb_convert_encoding to convert UTF-16 to UTF-8.

function csvToAssocArray($filename)
{
    $csvAsArray = array_map(function($data) {
            return str_getcsv(mb_convert_encoding($data, 'UTF-8', 'UTF-16'), ",");
        }, file($filename));

        $header = array_shift($csvAsArray);

        $csv = array();

        foreach ($csvAsArray as $row) {
          $csv[] = array_combine($header, $row);
        }

    return $csv;
}

But the end result of dd(csvToAssocArray($filename)) looks like this:

array:3 [▼
  0 => array:1 [▼
    "∀倀栀漀渀攀 渀甀洀戀攀爀∀" => "+1 55500718"
  ]

  1 => array:1 [▼
    "∀倀栀漀渀攀 渀甀洀戀攀爀∀" => "+1 55551919"
  ]
]

What is happening here?

0 likes
3 replies
martinszeltins's avatar

After running this here is the result that I got...

https://pastebin.com/dpwsUbBH

$text = file_get_contents($file->getPathName());
                    
foreach(mb_list_encodings() as $chr){ 
    echo mb_convert_encoding($text, 'UTF-8', $chr)." : ".$chr."<br>";    
}
martinszeltins's avatar

Also if I switch to UTF-16LE then the left side is correct but the number are chinese

array:3 [▼
  0 => array:1 [▼
    "Phone number" => "∀⬀㌀㜀㄀ ㈀㘀㄀  㜀㄀㠀∀਀"
  ]
  1 => array:1 [▼
    "Phone number" => "∀⬀㌀㜀㄀ ㈀㔀㄀㔀㄀㤀㄀㤀∀਀"
  ]
  2 => array:1 [▼
    "Phone number" => null
  ]
]
ethor's avatar

I have the same problem. Did you manage to solve this?

Please or to participate in this conversation.