/home/josephspurrier

Parse CSV with PHP

A comma-separate values (CSV) file is a common file that contains data from a database or spreadsheet and separates out the data using commas. The first row of the file usually contains the header which is made up of the column names separated by commas. All subsequent rows will contain the actual records from the database or spreadsheet which are also separated by commas.

id,first_name,last_name
1,John,Doe
2,"Mary Jane",Doe

If you look at the second record (third row), you’ll notice there are quotes around the first name. Usually, any fields that contain spaces or commas will be in double quotes to ensure the structure is easy to parse. You’ve probably also noticed I’ve said “usually” a lot when talking about the CSV format. There is no standard CSV format so it will vary between applications and operating systems. Most of the time, it is not an issue if you are staying consistent with the types of applications. For instance, if you export a table from MySQL into a CSV format using the MySQL Workbench on a Windows operating system, you should have no problems importing the file into MySQL on a Linux operating system. The problems start to occur when you export data from one application and then want to import it into a different application.

If you can determine the format required by the application importing the CSV, you can write a quick script in PHP to modify the format to meet the requirements. One of the requirements I’ve had in the past was I needed the data in a format that would easily import into MySQL which meant I needed to escape characters like single and double quotes. I build a lot of debugging functionality into my applications so I can troubleshoot issues easier.

Below is the source code for a working PHP function that will read a CSV file line by line, strip any hidden characters out of the fields, escape any invalid characters for MySQL, and then return an associative array.

I generated a CSV filled with dummy data from Mockaroo and ran the script at the end of this article to generate this output:

Array
(
    [0] => Array
        (
            [id] => 1
            [first_name] => Louise
            [last_name] => Robinson
            [email] => lrobinson@skyble.name
            [country] => Tajikistan
            [ip_address] => 20.92.31.194
        )

    [1] => Array
        (
            [id] => 2
            [first_name] => Stephanie
            [last_name] => Smith
            [email] => ssmith@skynoodle.info
            [country] => Antigua and Barbuda
            [ip_address] => 183.128.227.44
        )
....
}

Below is the PHP code used to parse CSV files.

// Auto detect line endings
ini_set('auto_detect_line_endings', true);

function loadCSV($file)
{
    // Create an array to hold the data
    $arrData = array();

    // Create a variable to hold the header information
    $header = NULL;

    // Connect to the database
    $db = <a href="http://www.php.net/mysqli_connect">mysqli_connect</a>('localhost','username','password','database')
      or die("Failed to connect to MySQL: " . mysqli_connect_error());

    // If the file can be opened as readable, bind a named resource
    if (($handle = <a href="http://www.php.net/fopen">fopen</a>($file, 'r')) !== FALSE)
    {
        // Loop through each row
        while (($row = <a href="http://www.php.net/fgetcsv">fgetcsv</a>($handle)) !== FALSE)
        {
            // Loop through each field
            foreach($row as &$field)
            {
                // Remove any invalid or hidden characters
                $field = <a href="http://www.php.net/preg_replace">preg_replace</a>('/[\x00-\x1F\x80-\xFF]/', '', $field);

                // Escape characters for MySQL (single quotes, double quotes, etc.)
                $field = <a href="http://www.php.net/mysqli_escape_string">mysqli_escape_string</a>($db, $field);
            }

            // If the header has been stored
            if ($header)
            {
                // Create an associative array with the data
                $arrData[] = <a href="http://www.php.net/array_combine">array_combine</a>($header, $row);
            }
            // Else the header has not been stored
            else
            {
                // Store the current row as the header
                $header = $row;
            }
        }

        // Close the file pointer
        <a href="http://www.php.net/fclose">fclose</a>($handle);
    }

    // Close the MySQL connection
    <a href="http://jp2.php.net/mysqli_close">mysqli_close</a>($db);

    return $arrData;
}

// Load the CSV and return an associative array
$data = loadCSV('test.csv');

// Output the contents of the array
<a href="http://www.php.net/print_r">print_r</a>($data);

View loadCSV.php on GitHub

#php