06 May 2013

Convert Comma Delimited CSV to Pipe Delimited


Problem Statement

I have a comma delimited csv file that contained quoted strings that could contain commas and newlines. Unfortunately the program needed to process this file could not handle quoted strings, so it was treating commas as new fields and newlines as creating a new record.

Solution

I came up with the below code that uses awk to convert the csv to pipe delimited, replace the newlines with a space, and strip out the quotes. You can change the delimiter to anything, but since my data did not contain pipes, I used that.

Disclaimer

This does not handle escaped quotes, but since my data did not have any it works for me.

Code

awk -F"\"" '
     BEGIN{
          j=0; OFS="\"";
     }
     {
     if (NF%2==1 && j == 0){
          ORS="\n";
          for(i=1;i<=NF;i++){
               if(i%2){gsub(",","|",$i);}
          }
     }
     else if(NF%2==0 && j == 0){
          ORS=" ";
          for(i=1;i<=NF;i++){
               if(i%2){gsub(",","|",$i);}
          }
          j=1;
     }
     else if(NF%2==0 && j == 1){
          ORS="\n";
          for(i=1;i<=NF;i++){
               if(i+1%2){gsub(",","|",$i);}
          }
          j=0;
     }
     else if(NF%2==1 && j == 1){
          ORS=" ";
          for(i=1;i<=NF;i++){
               if(i+1%2){gsub(",","|",$i);}
          }
          j=1;
     }
     print $0;}
     ' file.csv | tr -d '"' > file.pipe