AWK Scripting Language
Awk is an interpreted scripting language used for text manipulation. It is available by default in most linux and unix distributions.

Awk divides the input file into records and each record is divide into fields.
The awk command is as follows
> awk '<the script to run in quotes>' input_file
The quotes around the script enables the shell to treat the entire script as a single argument for the awk command. Awk runs the script on each record in the input_file unless specified.
Download the Sample Input File
The input file of the name weather-data.csv used in this blog is available on github. Do download it to follow along. It wont take much time.
> cd /tmp
> curl -O https://gist.githubusercontent.com/PraveenMathew92/fccec9b1f16fe4a776f4148e7bf82b03/raw/059c95dbee2366ff995ddd5a6b63c6d6c6cb037f/weather-data.csv
This should create a new file weather-data.csv in your machine along with some contents
Run awk Scripts on the Input File
For the csv file weather-data.csv, the contents can be displayed with the cat command.
> cat weather-data.csvtimestamp,Temperature,Precipitation
20200622T00000,23.907972,0.5
20200622T0100,24.277971,1.2
20200622T0200,24.517971,0.0
20200622T0300,23.917973,0.0
Lets try to do the same with awk
Print the files (similar to cat) using awk
> awk {print} weather-data.csvtimestamp,Temperature,Precipitation
20200622T0000,23.907972,0.5
20200622T0100,24.277971,1.2
20200622T0200,24.517971,0.0
20200622T0300,23.917973,0.0
Now, lets try something more.
Removing headers in a csv
The headers may be removed to obtain only the contents of the file.
> awk 'FNR>1 {print}' weather-data.csv20200622T0000,23.907972,0.5
20200622T0100,24.277971,1.2
20200622T0200,24.517971,0.0
20200622T0300,23.917973,0.0
FNR refers to the Record Number in the current file. Another variable NR refers to Record Number of the total records being processed.
Passing a value to print function
The default value that is passed to the print function is $0. $0 refers to the entire record. The following command gives the same output as the one above
> awk 'FNR>1 {print $0}' weather-data.csv20200622T0000,23.907972,0.5
20200622T0100,24.277971,1.2
20200622T0200,24.517971,0.0
20200622T0300,23.917973,0.0
To refer to a field in the record individually, use dollar sign followed by the position of the field in the record. For instance, the first field in the record can be referenced with ‘$1’. More on this later.
Setting the Field Separators and the Record Separators
In awk FS and RS variables store the Field Separator and the Record Separators respectively. The default values of RS is the newline character while that of FS is ‘whitespace’. However, for csv files we need field separators to be ”,”. The value of FS can be set with the -F option in awk.
> awk -F, 'FNR>1 {print $1}' weather-data.csv20200622T0000
20200622T0100
20200622T0200
20200622T0300
Now the fields are split at ”,” instead of whitespace. The below command also gives the same result:
> awk 'FNR>1 {print $1}' FS="," weather-data.csv
Example: show only the Temperature for a each timestamp
> awk -F, 'FNR>1 {print $1,$2}' weather-data.csv20200622T0000 23.907972
20200622T0100 24.277971
20200622T0200 24.517971
20200622T0300 23.917973
Regex in Awk
The command to filter the records in the input_file that matche the regex:
> awk '/<regex enclosed in forward slashes>/' input_file
Further comparisons on the regular expressions can be done with ”~” and ”!~”
For instance to get all the temperatures that are in the 23° range.
> awk -F, '$2 ~ /23/' weather-data.csv20200622T0000,23.907972,0.5
20200622T0300,23.917973,0.0
On the other hand to get the temperatures not in the 23° range.
> awk -F, 'FNR> 1 && $2 !~ /23.*/' weather-data.csv20200622T0100,24.277971,1.2
20200622T0200,24.517971,0.0
Begin and End Block
Unlike the rest of the script which gets executed for each record in the file, the BEGIN and END Blocks will be executed only once. These are the start-up and the clean-up blocks.
> awk 'BEGIN {print "Begin Block"} END {print "End Block"} {print}' weather-data.csvBegin Block
timestamp,Temperature,Precipitation
20200622T0000,23.907972,0.5
20200622T0100,24.277971,1.2
20200622T0200,24.517971,0.0
20200622T0300,23.917973,0.0
End Block
The BEGIN and the END blocks may also contain configurations for the awk script.
References: