Home > awk > awk – Split columns into rows

awk – Split columns into rows

Problem:
I needed to get

from this

Sean Connery|Dr. No,From Russia with Love,Goldfinger,Thunderball,You Only Live Twice

to this

Sean Connery|Dr. No
Sean Connery|From Russia with Love
Sean Connery|Goldfinger
Sean Connery|Thunderball
Sean Connery|You Only Live Twice

First Solution:
One way to do this was using awk’s gsub.

$ less file.txt
Sean Connery|Dr. No,From Russia with Love,Goldfinger,Thunderball,You Only Live Twice
Rodger Moore|Diamonds Are Forever,Live and Let Die,The Man with the Golden Gun,The Spy Who Loved Me,Moonraker
Pierce Brosnan|GoldenEye,Tomorrow Never Dies,The World Is Not Enough
Daniel Craig|Casino Royale,Quantum of Solace,Skyfall,Spectre 2
George Lazenby|On Her Majesty's Secret Service

$ awk -F\| '{gsub(",","\n"$1"|")}1' file.txt
Sean Connery|Dr. No
Sean Connery|From Russia with Love
Sean Connery|Goldfinger
Sean Connery|Thunderball
Sean Connery|You Only Live Twice
Rodger Moore|Diamonds Are Forever
Rodger Moore|Live and Let Die
Rodger Moore|The Man with the Golden Gun
Rodger Moore|The Spy Who Loved Me
Rodger Moore|Moonraker
Pierce Brosnan|GoldenEye
Pierce Brosnan|Tomorrow Never Dies
Pierce Brosnan|The World Is Not Enough
Daniel Craig|Casino Royale
Daniel Craig|Quantum of Solace
Daniel Craig|Skyfall
Daniel Craig|Spectre 2
George Lazenby|On Her Majesty's Secret Service

From man awk

gsub(r, s [, t])

For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0. An & in the replacement text is replaced with the text that was actually matched. Use \& to get a literal &. (This must be typed as “\\&”; see GAWK: Effective AWK Programming for a fuller discussion of the rules for &’s and backslashes in the replacement text of sub(), gsub(), and gensub().)

awk -F\| ‘{gsub(“,”,”\n”$1″|”)}1’ file.txt

What this does is it substitutes all commas found with a “new line and the first field”. The ‘1’ at the end prints out the result. I think this is a default behaviour of awk that is being exploited here. It can be any number except 0 with results in a false and nothing gets printed.

Second Solution:
Another way with awk is to use the split function.

split(s, a [, r [, seps] ])

Split the string s into the array a and the separators array seps on the regular expression r, and return the number of fields. If r is omitted, FS is used instead. The arrays a and seps are cleared first. seps[i] is the field separator matched by r between a[i] and a[i+1]. If r is a single space, then leading whitespace in s goes into the extra array element seps[0] and trailing whitespace goes into the extra array element seps[n], where n is the return value of split(s, a, r, seps). Splitting behaves identically to field splitting, described above.

$ awk -v OFS="|" -v FS="|" '{num=split($NF, array,",");for(i=1;i<=num;i++){print $1 OFS array[i]}}' file.txt

The variable NF is set to the total number of fields in the input record.

It becomes ‘num=split($2, array, “,”)’ for records that have 2 fields as per FS.
Which means that for each record you split, num keeps track of num of comma separated items. You then use the for loop to print out the contents of the array.

$1 – First field
OFS – Pipe separator
array[i] – item that was put in the array by the split function

Source:
http://www.unix.com/shell-programming-and-scripting/265041-split-columns-into-rows.html

Advertisements
Categories: awk Tags: , ,
  1. No comments yet.
  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: