Comments on: CSV parsing with awk https://backreference.org/2010/04/17/csv-parsing-with-awk/ Proudly uncool and out of fashion Fri, 11 Mar 2016 16:43:49 +0000 hourly 1 https://wordpress.org/?v=5.8.2 By: Marco Coletti https://backreference.org/2010/04/17/csv-parsing-with-awk/#comment-25256 Fri, 11 Mar 2016 16:43:49 +0000 http://backreference.org/?p=855#comment-25256 We should be using a recent version of gawk anyway, hence there is a rather simple solution using FPAT (field pattern) instead of FS (field separator) which is documented in GAWK manual:
https://www.gnu.org/software/gawk/manual/html_node/Splitting-By-Content.html

However the regexp exposed there is not completely RFC 4180 compliant because it does not account for [""] inside ["] like ["He shouted ""Hello"""].
Here is my solution:

-----------------------------------------------------------------------------
BEGIN {
  FPAT = "(\"([^\"]|\"\")*\")|([^,\"]*)"
}
{
  for (i = 1; i <= NF; i++) {
    $i = gensub(/\"\"/,"\"","g",gensub(/^\"|\"$/,"","g",$i))
  }
  # at this point the variables $1, $2, $3... contain the original undecorated unescaped data
}
-----------------------------------------------------------------------------
]]>
By: jee https://backreference.org/2010/04/17/csv-parsing-with-awk/#comment-25244 Sat, 23 Jan 2016 10:15:19 +0000 http://backreference.org/?p=855#comment-25244 In reply to Jarno Suni.

This solution only works with gawk 4.0 and above.

]]>
By: Jarno Suni https://backreference.org/2010/04/17/csv-parsing-with-awk/#comment-25212 Wed, 12 Aug 2015 20:05:02 +0000 http://backreference.org/?p=855#comment-25212 Here is a solution to parse CSV data (in format defined in RFC 4180 linked above) using gawk's patsplit function:

http://lists.gnu.org/archive/html/bug-gawk/2015-07/msg00002.html

]]>
By: Ralph Little https://backreference.org/2010/04/17/csv-parsing-with-awk/#comment-25188 Tue, 24 Mar 2015 22:05:45 +0000 http://backreference.org/?p=855#comment-25188 Here's a regex that can be used with the match() example above:

([^\\,"]|(\\.))*($|,)|(^"([^"\\]|(\\.))*"($|,))

It allows you to parse a mixture of "text" and text in the CSV and you can \ any character to treat it as data.
The data extracted needs to be postprocessed to remove the quoting if appropriate.
Just another idea.

Example:
"Test",Example containing \",Example containing \,,Another example containing \\

]]>
By: Dan https://backreference.org/2010/04/17/csv-parsing-with-awk/#comment-24892 Sat, 04 May 2013 23:51:55 +0000 http://backreference.org/?p=855#comment-24892 For a different approach, see https://github.com/dbro/csvquote - it's a script I wrote that sanitizes the quoted data so that awk can work with it easily (no FPAT required, handles double quote marks), and then restores the special characters after awk is done.

]]>