- How to parse a CSV file in Bash?
- 6 Answers 6
- How to parse a CSV file in Bash?
- Parsing CSV files under bash , using loadable module
- bash loadable .C compiled modules.
- Using loadable bash-builtins:
- Full sample script for parsing CSV containing multilines fields
- Note:
- Warning:
- Note about quoted multi-line fields
- Editing CSV files in Ubuntu [closed]
- 8 Answers 8
- Unix command-Line CSV viewer [closed]
- 6 Answers 6
How to parse a CSV file in Bash?
I’m working on a long Bash script. I want to read cells from a CSV file into Bash variables. I can parse lines and the first column, but not any other column. Here’s my code so far:
cat myfile.csv|while read line do read -d, col1 col2 < <(echo $line) echo "I got:$col1|$col2" done
It's only printing the first column. As an additional test, I tried the following: read -d, x y < <(echo a,b,) And $y is empty. So I tried: read x y < <(echo a b) And $y is b . Why?
6 Answers 6
You need to use IFS instead of -d :
while IFS=, read -r col1 col2 do echo "I got:$col1|$col2" done < myfile.csv
To skip a given number of header lines:
skip_headers=3 while IFS=, read -r col1 col2 do if ((skip_headers)) then ((skip_headers--)) else echo "I got:$col1|$col2" fi done < myfile.csv
Note that for general purpose CSV parsing you should use a specialized tool which can handle quoted fields with internal commas, among other issues that Bash can't handle by itself. Examples of such tools are cvstool and csvkit .
The proposed solution is fine for very simple CSV files, that is, if the headers and values are free of commas and embedded quotation marks. It is actually quite tricky to write a generic CSV parser (especially since there are several CSV "standards"). One approach to making CSV files more amenable to *nix tools is to convert them to TSV (tab-separated values), e.g. using Excel.
@Zsolt: There's no reason that should be the case. You must have a typo or a stray non-printing character.
@DennisWilliamson You should enclose the seperator e.g. when using ; : while IFS=";" read col1 col2; do .
@thomas.mc.work: That's true in the case of semicolons and other characters that are special to the shell. In the case of a comma, it's not necessary and I tend to prefer to omit characters that are unnecessary. For example, you could always specify variables for expansion using curly braces (e.g. $ ), but I omit them when they're not necessary. To me, it looks cleaner.
@DennisWilliamson, From some time, bash source tree offer a loadable builtin csv parser! Have a look at my answer! Of course there are some limitations.
How to parse a CSV file in Bash?
Coming late to this question and as bash do offer new features, because this question stand about bash and because none of already posted answer show this powerful and compliant way of doing precisely this.
Parsing CSV files under bash , using loadable module
Conforming to RFC 4180, a string like this sample CSV row:
1 12 2 22.45 3 Hello, "man". 4 A, b. 5 42
bash loadable .C compiled modules.
Under bash, you could create, edit, and use loadable c compiled modules. Once loaded, they work like any other builtin!! ( You may find more information at source tree. 😉
Current source tree (Oct 15 2021, bash V5.1-rc3) do contain a bunch of samples:
accept listen for and accept a remote network connection on a given port asort Sort arrays in-place basename Return non-directory portion of pathname. cat cat(1) replacement with no options - the way cat was intended. csv process one line of csv data and populate an indexed array. dirname Return directory portion of pathname. fdflags Change the flag associated with one of bash's open file descriptors. finfo Print file info. head Copy first part of files. hello Obligatory "Hello World" / sample loadable. . tee Duplicate standard input. template Example template for loadable builtin. truefalse True and false builtins. tty Return terminal name. uname Print system information. unlink Remove a directory entry. whoami Print out username of current user.
There is an full working cvs parser ready to use in examples/loadables directory: csv.c!!
Under Debian GNU/Linux based system, you may have to install bash-builtins package by
Using loadable bash-builtins:
enable -f /usr/lib/bash/csv csv
From there, you could use csv as a bash builtin.
With my sample: 12,22.45,"Hello, ""man"".","A, b.",42
csv -a myArray '12,22.45,"Hello, ""man"".","A, b.",42' printf "%s\n" "$" | cat -n 1 12 2 22.45 3 Hello, "man". 4 A, b. 5 42
Then in a loop, processing a file.
while IFS= read -r line;do csv -a aVar "$line" printf "First two columns are: [ '%s' - '%s' ]\n" "$" "$" done
This way is clearly the quickest and strongest than using any other combination of bash builtins or fork to any binary.
Unfortunely, depending on your system implementation, if your version of bash was compiled without loadable , this may not work.
Complete sample with multiline CSV fields.
Conforming to RFC 4180, a string like this single CSV row:
12,22.45,"Hello ""man"", This is a good day, today!","A, b.",42
1 12 2 22.45 3 Hello "man", This is a good day, today! 4 A, b. 5 42
Full sample script for parsing CSV containing multilines fields
Here is a small sample file with 1 headline, 4 columns and 3 rows. Because two fields do contain newline, the file are 6 lines length.
Id,Name,Desc,Value 1234,Cpt1023,"Energy counter",34213 2343,Sns2123,"Temperatur sensor to trigg for alarm",48.4 42,Eye1412,"Solar sensor ""Day / Night""",12199.21
And a small script able to parse this file correctly:
#!/bin/bash enable -f /usr/lib/bash/csv csv file="sample.csv" exec " numcols=$ while read -ru $FD line;do while csv -a row "$line" ; (( $ < numcols )) ;do read -ru $FD sline || break line+=$'\n'"$sline" done printf "$fieldfmt\\n" "$" done
This may render: (I've used printf "%q" to represent non-printables characters like newlines as $'\n' )
Id : "1234" Name : "Cpt1023" Desc : "Energy\ counter" Value : "34213" Id : "2343" Name : "Sns2123" Desc : "$'Temperatur sensor\nto trigg for alarm'" Value : "48.4" Id : "42" Name : "Eye1412" Desc : "$'Solar sensor "Day /\nNight"'" Value : "12199.21"
You could find a full working sample there: csvsample.sh.txt or csvsample.sh.
Note:
In this sample, I use head line to determine row width (number of columns). If you're head line could hold newlines, (or if your CSV use more than 1 head line). You will have to pass number or columns as argument to your script (and the number of head lines).
Warning:
Of course, parsing CSV using this is not perfect! This work for many simple CSV files, but care about encoding and security!! For sample, this module won't be able to handle binary fields!
Note about quoted multi-line fields
In particular if multi-line field is located on last column, this method won't loop correctly upto second quote.
For this, you have to check quotes parity in $line before parsing using csv module.
Editing CSV files in Ubuntu [closed]
Questions seeking product, service, or learning material recommendations are off-topic because they become outdated quickly and attract opinion-based answers. Instead, describe your situation and the specific problem you're trying to solve. Share your research. Here are a few suggestions on how to properly ask this type of question.
This,is,data,with,a,header 2,2,3,4,, 1,,3,,6,6 ,5,3,5,5,6 1,2. 1,2,3,4,8,6 1,,9,,5,9 -1,,3,4,5,6 1,2,0,4,5,6
I've been using OpenOffice, but it takes about 5 clicks to turn off the default behaviour of quoting all of the fields. I'd like to find something lightweight and easy to use that will allow inserting/deleting data, and column-based sorting.
8 Answers 8
For vim, there's a nice plugin csv.vim.
I just come from that plugin looking for an alternative. It has huge performance issue when the csv are "larger"; currently it loops for a csv having 500 lines.
The java CsvEditors (e.g. csveditor, reCsvEditor) may be worth a look
You might use gnumeric to this end. On my system (Crunchbang) and with a file as small as in your example, leafpad consumes about 2M of RAM; gnumeric, 4M; and scalc (from LibreOffice), 34M. Gnumeric clearly is on the lightweight end, and it should detect your separator correctly on opening the file.
But (there is a but. ) gnumeric won't let you save the modified file without going through a hurdle of menus. What follows is a BASH script to fix this. The script relies on xsel (a lightweight commmand-line clipboard manager) to paste the modified spreadsheet content back into your file. If sourced (not run), this script gives you access to two functions, gn to open the file in gnumeric:
and gp to paste the content back into the file and close gnumeric:
(Personally, I source this script in my .bashrc to have the gn and gp functions available whenever I open a terminal.)
#! /bin/bash # once sourced by the shell, this script provides two functions: # gn to open a file with gnumeric # gp to update the file with gnumeric's selection # requires grep, sed, awk, and the xsel utility # name of the target file: used in gn () and gp () # ================================================== gn_file= # take note of target file and open it with gnumeric if not already opened # ================================================== gn () < # sanity checks if [[ -z $1 ]]; then echo 'Usage: gn file' return fi if ! [[ -f $1 && -r $1 ]]; then echo "Cannot find/use $1" return fi # yes, this is right; job report, if any, has "$gn_file" not expanded if jobs -l | grep 'Running.* gnumeric "$gn_file"' >/dev/null; then echo 'Already editing with gnumeric.' return fi echo 'Once done, select the part of the spreadsheet you want to save,' echo 'press Ctrl-C, go back to the command line, and type gp [ENTER].' # do the job gn_file=$1 gnumeric "$gn_file" & > # paste selection into target file and close gnumeric # ================================================== gp () < # sanity checks if [[ -z $gn_file || ! -f $gn_file ]]; then echo 'Cannot find/use target file.' return fi local gnumeric_job=$( jobs -l | grep 'Running.* gnumeric "$gn_file"' ) if [[ -z $gnumeric_job ]]; then echo 'No gnumeric instance to paste from.' return fi if [[ -z $( xsel -ob ) ]]; then echo 'Nothing to paste.' return fi local temp_file=$( mktemp "$PWD/temp.XXXXXX" ) # paste X selection (o = output, b = clipboard mode) xsel -ob >"$temp_file" # replace tabs to get a CSV file local tab=$'\t' sed --in-place "s/$tab/,/g" "$temp_file" # must close gnumeric before updating file local job_id=$( echo "$gnumeric_job" | awk '' ) kill "$job_id" mv --backup "$temp_file" "$gn_file" echo "$gn_file updated." >
As the script itself will tell you when opening your file with gnumeric, when you are done with your editing, you must select the portion of the spreadsheet you want to save before pressing Ctr-C (to copy this portion to the clipboard). Going back to the command line (Alt-Tab), entering gp will update your file with the content of the clipboard and close gnumeric. Your modified values won't have quotes around them, but they will be separated by tabs; hence, the script uses sed to replace tabs by commas.
I have found this to be an efficient way to work on CSV data files from the command line. The script should save the file correctly as long as it does not contain tabs within the comma-separated fields (which seems to be the case in your data-analysis example).
Unix command-Line CSV viewer [closed]
Questions seeking product, service, or learning material recommendations are off-topic because they become outdated quickly and attract opinion-based answers. Instead, describe your situation and the specific problem you're trying to solve. Share your research. Here are a few suggestions on how to properly ask this type of question.
Is there a convenient command-line csv viewer, possibly a unix tool or a mod of some tools (e.g. vim or python)? I find it easy to simply edit CSV files by writing it manually (since all you need to do it comma delimit the columns), but is there a way to view it in a slightly nicer UI on the command-line?
Can you be more specific on how you want the output formatted? CSV's are easily to manipulate with shell tools so there's likely a string of four or five shell commands that can format them any way you like for convenient viewing.
Well, more or less like it's shown in Excel. Having the columns aligned and properly spaced with proper underlining, if possible.
6 Answers 6
sc is a command-line spreadsheet program that's been around a long time, likely available in your package manager. Here's a Linux Journal intro article to it:
It seems like this question overlaps (at least partially) with my similar question on StackOverflow:
The top answer there is currently:
(Please see the link for more details.)
There's a tool, CSVfix, which helps with viewing CSV files.
- Convert fixed format, multi-line and DSV files to CSV
- Reorder, remove, split and merge fields
- Convert case, trim leading & trailing spaces
- Search for specific content using regular expressions
- Filter out duplicate data or data on exclusion lists
- Perform sed/perl style editing
- Enrich with data from other sources
- Add sequence numbers and file source information
- Split large CSV files into smaller files based on field contents
- Perform arithmetic calculations on individual fields
- Validate CSV data against a collection of validation rules
- Convert between CSV and fixed format, XML, SQL and DSV
- Summarise CSV data, calculating averages, modes, frequencies etc.
A simple way to view CSV files on the command-line is to pipe the .csv file into the column utility with the column delimiter set as a comma: