User Tools

Site Tools


wiki:awkadvance

Advance AWK scripts

Exercise:
Script: open by kate ~/ost4sem/exercise/basic_adv_awk/adv_awk.sh
Data: ~/ost4sem/exercise/basic_adv_awk/input.txt
Directory: ~/ost4sem/exercise/basicadvawk

kate ~/ost4sem/exercise/basic_adv_awk/adv_awk.sh &

Mathematical operation

Extract the maximum value:

cd ~/ost4sem/exercise/basic_adv_awk
awk '{ if (NR>1) {if ($4>max) max= $4 }} END {print max }'  input.txt

Sum the values in a column:

awk ' BEGIN { sum=0 } { sum=sum+$4} END {print sum }'  input.txt

Sum the values in a column under a specific condition:

awk '{ if($3==7) sum=sum+$5} END {print sum }'  input.txt

Sum the values in a column under two specific conditions:

awk '{ if($3==7 && $2==2005  ) sum=sum+$5} END {print sum }'  input.txt

A Bash/Awk tool for computing the average of columns based on CLASS/ID column

In this example we build a bash/awk tool able to calculate the average of columns based on CLASS/ID column. we use the sort bash function

more  ~/ost4sem/studycase/fire_risk/sh_script/average.sh

The file has to be sorted base on the CLASS/ID column!!! Sort the file, run the script, and replay to the questions:

sort -k 2,2  input.txt > input_s.txt 
sh ~/ost4sem/studycase/fire_risk/sh_script/average.sh input_s.txt output.txt

The same file can be run in a macro script using the following commands:

sh ~/ost4sem/studycase/fire_risk/sh_script/average.sh input_s.txt output.txt <<EOF
n
2
2
EOF

Even more complicated is building up a script to calculate the weighted average.

more  ~/ost4sem/studycase/fire_risk/sh_script/average_w.sh

Query a dataset on the basis of the header name

We create an example table saved as test.csv:

echo ID,column_1,colum_b,colum_c,TOTAL > test.csv
echo 1,3,5,6,14 >> test.csv
echo 2,5,4,6,15  >> test.csv
echo 3,8,4,6,18  >> test.csv
In this example we also deal with .csv file (comma separated values text file). For this reason we need to use the option -F, to define our input file as csv.

We create two subset files corresponding to the “ID” and to the “column_b” (first file) and “ID” and “TOTAL” (second file).

for COLNAME in  colum_b TOTAL ; do
awk -F, -v COLNAME=$COLNAME ' { if (NR==1)  { for (col=1;col<=NF;col++) { if ($col==COLNAME) {colprint=col}}}  else  {print $1 "," $colprint }}  ' test.csv > $COLNAME.csv 
done

Subset a limited number of column headers and query the dataset on the basis of the header subset selection

We have a dataset with 100 columns and 3 rows

echo `seq -s \, 1 1 100`  > test.csv
echo `seq -s \, 2 20 2000`  >> test.csv
echo `seq -s \, 2 200 20000`  >> test.csv
awk -F,  ' { print NF }' test.csv
awk -F,  ' { print NR }' test.csv

We would like to select the first 5, the 25th and the 90th to the 94th colums from the total 100 colums (Fields in AWK) in our dataset. We redirect the list of the selection in a “listofcolumns.csv” file

awk -F,  ' {  for(i=1;i<6; i++) if(NR==1) printf("%s",$i" " ) ; if(NR==1)  printf("%s",$25" " ) ; for(i=90;i<95; i++) if(NR==1) printf("%s"," "$i  )   }' test.csv > list_of_columns.csv

Finally we carry out a query of the original dataset on the basis of the columns previously selected.

for COLNAME in  `cat list_of_columns.csv` ; do
echo $COLNAME
awk -F, -v COLNAME=$COLNAME ' { if (NR==1)  { for (col=1;col<=NF;col++) { if ($col==COLNAME) {colprint=col}}}  else  {print $1 "," $colprint }} ' test.csv > OUTPUT_subset_$COLNAME.csv 
done

Merge multiple files in multiple columns of one file

pr -m -t -s, File1.txt File2.txt File3.txt  |  awk 'BEGIN{print "Column1,Column2,Column3"} {print$0}' > multiple_coulumn_output_file.csv
File 1 to 3 have only one coulmn
wiki/awkadvance.txt · Last modified: 2017/12/05 22:53 (external edit)