Archive

Archive for the ‘awk’ Category

awk – Split columns into rows

April 7, 2017 Leave a comment

Problem:
I needed to get

from this

Sean Connery|Dr. No,From Russia with Love,Goldfinger,Thunderball,You Only Live Twice

to this

Sean Connery|Dr. No
Sean Connery|From Russia with Love
Sean Connery|Goldfinger
Sean Connery|Thunderball
Sean Connery|You Only Live Twice

First Solution:
One way to do this was using awk’s gsub.

$ less file.txt
Sean Connery|Dr. No,From Russia with Love,Goldfinger,Thunderball,You Only Live Twice
Rodger Moore|Diamonds Are Forever,Live and Let Die,The Man with the Golden Gun,The Spy Who Loved Me,Moonraker
Pierce Brosnan|GoldenEye,Tomorrow Never Dies,The World Is Not Enough
Daniel Craig|Casino Royale,Quantum of Solace,Skyfall,Spectre 2
George Lazenby|On Her Majesty's Secret Service

$ awk -F\| '{gsub(",","\n"$1"|")}1' file.txt
Sean Connery|Dr. No
Sean Connery|From Russia with Love
Sean Connery|Goldfinger
Sean Connery|Thunderball
Sean Connery|You Only Live Twice
Rodger Moore|Diamonds Are Forever
Rodger Moore|Live and Let Die
Rodger Moore|The Man with the Golden Gun
Rodger Moore|The Spy Who Loved Me
Rodger Moore|Moonraker
Pierce Brosnan|GoldenEye
Pierce Brosnan|Tomorrow Never Dies
Pierce Brosnan|The World Is Not Enough
Daniel Craig|Casino Royale
Daniel Craig|Quantum of Solace
Daniel Craig|Skyfall
Daniel Craig|Spectre 2
George Lazenby|On Her Majesty's Secret Service

From man awk

gsub(r, s [, t])

For each substring matching the regular expression r in the string t, substitute the string s, and return the number of substitutions. If t is not supplied, use $0. An & in the replacement text is replaced with the text that was actually matched. Use \& to get a literal &. (This must be typed as “\\&”; see GAWK: Effective AWK Programming for a fuller discussion of the rules for &’s and backslashes in the replacement text of sub(), gsub(), and gensub().)

awk -F\| ‘{gsub(“,”,”\n”$1″|”)}1’ file.txt

What this does is it substitutes all commas found with a “new line and the first field”. The ‘1’ at the end prints out the result. I think this is a default behaviour of awk that is being exploited here. It can be any number except 0 with results in a false and nothing gets printed.

Second Solution:
Another way with awk is to use the split function.

split(s, a [, r [, seps] ])

Split the string s into the array a and the separators array seps on the regular expression r, and return the number of fields. If r is omitted, FS is used instead. The arrays a and seps are cleared first. seps[i] is the field separator matched by r between a[i] and a[i+1]. If r is a single space, then leading whitespace in s goes into the extra array element seps[0] and trailing whitespace goes into the extra array element seps[n], where n is the return value of split(s, a, r, seps). Splitting behaves identically to field splitting, described above.

$ awk -v OFS="|" -v FS="|" '{num=split($NF, array,",");for(i=1;i<=num;i++){print $1 OFS array[i]}}' file.txt

The variable NF is set to the total number of fields in the input record.

It becomes ‘num=split($2, array, “,”)’ for records that have 2 fields as per FS.
Which means that for each record you split, num keeps track of num of comma separated items. You then use the for loop to print out the contents of the array.

$1 – First field
OFS – Pipe separator
array[i] – item that was put in the array by the split function

Source:
http://www.unix.com/shell-programming-and-scripting/265041-split-columns-into-rows.html

Categories: awk Tags: , ,

“Generating summary data with awk

March 31, 2014 Leave a comment

Problem: A collegue asked me to do this. Generating a summary based on the week numbers. File input

$ less file1.txt
2013-03-01, Week 1, 20
2013-03-05, Week 1, 30
2013-03-04, Week 1, 20
2013-03-17, Week 1, 30
2013-03-19, Week 1, 40
2013-03-21, Week 1, 50
2013-03-01, Week 2, 20
2013-03-05, Week 2, 30
2013-03-04, Week 2, 20
2013-03-17, Week 2, 30
2013-03-19, Week 2, 40

File output

2013-03-01, 20, Week 1
2013-03-04, 20, Week 1
2013-03-05, 30, Week 1
2013-03-17, 30, Week 1
2013-03-19, 40, Week 1
2013-03-21, 50, Week 1
Total,190, Week 1
2013-03-01, 20, Week 2
2013-03-04, 20, Week 2
2013-03-05, 30, Week 2
2013-03-17, 30, Week 2
2013-03-19, 40, Week 2
Total,140, Week 2

Solution: Using awk.

$ awk -v FS="," -v OFS="," '{sum[$2]+= $3; print $1, $3, $2}
END {for (var in sum){print "Total", sum[var], var}}' file1.txt | sort -t, -k3
2013-03-01, 20, Week 1
2013-03-04, 20, Week 1
2013-03-05, 30, Week 1
2013-03-17, 30, Week 1
2013-03-19, 40, Week 1
2013-03-21, 50, Week 1
Total,190, Week 1
2013-03-01, 20, Week 2
2013-03-04, 20, Week 2
2013-03-05, 30, Week 2
2013-03-17, 30, Week 2
2013-03-19, 40, Week 2
Total,140, Week 2

Source: http://www.unix.com/shell-programming-scripting/245209-generating-summary-data-use-awk.html

Categories: awk Tags:

Delete specific line numbers in a file – awk, sed

March 14, 2014 Leave a comment

Problem:
You want to delete certain lines in a file. Say 3,6,9,12 etc.

$ less file
first
second
third
fourth
fifth
sixth
seventh
eigth
nineth
tenth

Solution:
1. Using awk.

$ awk 'NR%3' file
first
second
fourth
fifth
seventh
eigth
tenth

My understanding of how it works.

$ man awk

NR The total number of input records seen so far.

$ awk 'NR' file 

Will print all the rows as NR returns a value which equates to true and thus gets printed.
If you want to only print the x row

$ awk 'NR == 2' file
second
$ awk 'NR == 5' file
fifth

The below example prints nothing. It is like saying print but the result is false so nothing gets printed.

$ awk '0' file

So what happens with NR%3 is that anything that is not 0 gets printed.
1%3 result is 1 it gets printed.
2%3 result is 2 it gets printed.
3%3 result is 0 it does not get printed.
4%3 result is 1 it gets printed.
and so on.

2. Using sed.

$ sed 'n;n;d' file
first
second
fourth
fifth
seventh
eigth
tenth

How I think it works.

$ sed 'n;n;d' file

n – move to the next line
n – move to the next line
d – delete that line.

I think that due to d repeating the next cycle, since we still have lines in the file, it starts over and repeats the same till EOF.

$ info sed

`n’
If auto-print is not disabled, print the pattern space, then,
regardless, replace the pattern space with the next line of input.
If there is no more input then `sed’ exits without processing any
more commands.

`d’
Delete the pattern space; immediately start next cycle.

NB: File must not contain blank lines.

Source:
http://www.unix.com/shell-programming-scripting/245088-how-delete-line-number-3-6-9-12-15-so.html

Categories: awk, Interesting, sed Tags:

Interesting – Print certain lines in reverse (awk)

October 31, 2013 Leave a comment

Problem:
You would like to print certain lines in a file in reverse.

Solution:
awk ‘{if (NR % 2) {print} else {r=”rev”; print | r; close(r);}; }’ myfile

$less myfile
kitten
pig
horse
cow
donkey
$awk '{if (NR % 2) {print} else {r="rev"; print | r; close(r);}; }' myfile
kitten
gip
horse
woc
donkey

Above awk prints all even lines in from file in reverse.

Note:
NR % 2 – Returns 0 for even lines which is false. So odd lines which are 1 == true, are simply printed.
rev – bash command that reverses a line. (man rev)
close(r) – Need to explicitly close the pipped command so that the output can be flushed.
r=”rev” ; print | r; close(r);}; – Print the line, pass it to rev to reverse it, flush output so that results can be printed immediately.

More on close()

close(filename)
or
close(command)

The argument filename or command can be any expression. Its value must exactly match the string that was used to open the file or start the command (spaces and other “irrelevant” characters included). For example, if you open a pipe with this:

“sort -r names” | getline foo

then you must close it with this:

close(“sort -r names”)

Here are some reasons why you might need to close an output file:

To write a file and read it back later on in the same awk program. Close the file when you are finished writing it; then you can start reading it with getline.

To write numerous files, successively, in the same awk program. If you don’t close the files, eventually you may exceed a system limit on the number of open files in one process. So close each one when you are finished writing it.

To make a command finish. When you redirect output through a pipe, the command reading the pipe normally continues to try to read input as long as the pipe is open. Often this means the command cannot really do its work until the pipe is closed. For example, if you redirect output to the mail program, the message is not actually sent until the pipe is closed.

To run the same program a second time, with the same arguments. This is not the same thing as giving more input to the first run! For example, suppose you pipe output to the mail program. If you output several lines redirected to this pipe without closing it, they make a single message of several lines. By contrast, if you close the pipe after each line of output, then each line makes a separate message.

Source:
http://www.unix.com/showthread.php?p=302869631
http://www.chemie.fu-berlin.de/chemnet/use/info/gawk/gawk_7.html#SEC67

Categories: awk, Interesting

Quick Note – convert 1 to 4 as 1,2,3,4

February 5, 2013 3 comments

With GNU bash version 4.1.5

$seq 1 4 | tr '\n' ',' | sed 's/,$/\n/'
1,2,3,4

With awk (mawk version 1.3.3)

$seq 1 4 | awk -f transpose.awk | sed 's/^,//'
1,2,3,4

What transpose.awk looks like.

#!/usr/bin/awk
{
    for(i=1;i<=NF;i++)
      _[i]=sprintf("%s,%s",_[i],$i)
}
END{
    for(i in _)
        print _[i] 
} 

Source:
http://www.unix.com/shell-programming-scripting/193989-transpose-whole-file-specific-columns.html

Categories: awk, bash

bash – Mimic MySQL GROUP BY function with awk

November 20, 2012 1 comment

Problem:
I needed to get GROUP BY results and I did not have the time to wait to restore the mysqldump and run the commands. I had like over 360 zipped files to work with. Restoring each database and running the query even in a script was just too long a process. I opted to just extract the data and use awk to mimic GROUP BY.

Solution:
I used sed to extract the desired results from the mysql dump files. Then took the relevant date fields and used awk to do the group count.

Here is the SELECT…GROUP BY using mysql

$ mysql -u <user> -p<password> 
-e"SELECT date_format(timestamp,'%Y-%m-%d') as 'Date',count(distinct(<field>)) FROM <table>
WHERE <search criteria>
GROUP BY date_format(timestamp,'%Y-%m-%d')
ORDER BY date_format(timestamp,'%Y-%m-%d')" | sed 's/\t/,/g'
2012-10-10,4
2012-10-11,21
2012-10-12,19
2012-10-13,8
2012-10-14,22
2012-10-15,7
2012-10-16,13

Here is the same but now using awk. After extracting the data from the dumps I end up with the following

$ head -n 3 <file-with-extracted-content>
2012-10-01 06:08:54|377772327281
2012-10-01 06:08:55|371202225977
2012-10-01 06:08:55|061712877272
$ 
$ awk -v FS=" |\|" '{print $1, $3}' <file-with-extracted-content> | 
sort -u |  
awk -v OFS=, '{a[$1]+=1}END{for (i in a) print i,a[i]}' | 
sort -t, -k1
2012-10-10,4
2012-10-11,21
2012-10-12,19
2012-10-13,8
2012-10-14,22
2012-10-15,7

Categories: awk, bash, MySQL

awk – Convert columns to rows

July 3, 2012 Leave a comment

Problem:
You have a file with columns that you want to convert to rows.

From:

user@computer:~$less infile
field1 field2 field3
field1 field2
field1
field1 field2 field3 field4
...

To:

user@computer:~$less outfile
field1 field1 field1 field1
field2 field2 field2
field3 field3
field4

Solution:
awk

user@computer:~$ awk '{
      for (f = 1; f <= NF; f++)
         a[NR, f] = $f
   }
   NF > nf { nf = NF }
   END {
      for (f = 1; f <= nf; f++)
         for (r = 1; r <= NR; r++)
            printf a[r, f] (r==NR ? RS : FS)
   }' infile | tr -s " " " " | sed 's/^ //' > outfile

user@computer:~$less outfile
field1 field1 field1 field1
field2 field2 field2
field3 field3
field4

Source:
http://www.unix.com/unix-advanced-expert-users/188705-convert-column-data-row-data-using-shell-script.html
and
http://www.unix.com/78463-post3.html

Categories: awk, bash