1. Foreword
awk has 3 different versions: awk, nawk and gawk. Unless otherwise specified, it generally refers to gawk. The most basic function of the awk language is to decompose and extract information from files or strings based on specified rules, and it can also output data based on specified rules. A complete awk script is typically used to format information in text files.
2. Basic syntax
awk [opion] 'awk_script' input_file1 [input_file2 ...]
awk's common options are:
① -F fs: Use fs as the field separator for input records, if omitted For this option, awk uses the value of the environment variable IFS
② -f filename: Read awk_script from the file filename
③ -v var=value: Set the variable for awk_script
awk has three running modes:
The first , put the awk script command directly in the command.
Second, put all the script commands of awk in a script file, and then use the -f option to specify the script command file to be run.
The third method is to put awk_script into the script file and use #!/bin/awk -f as the first line, give the script executable permission, and then call it by typing the script name in the shell.
3. Awk script
An awk script can be composed of one or more awk_cmd. For multiple awk_cmd, after one awk_cmd is completed, a new line should be started for separation.
awk_cmd consists of two parts: awk_pattern { actions }.
In addition, when using awk_script directly in the awk command, awk_script can also be written in multiple lines, but you must ensure that the entire awk_script is enclosed in single quotes. The general form of
awk command:
awk ' BEGIN { actions }
awk_pattern1 { actions }
............
awk_patternN { actions }
END { actions }
' inputfile
where BEGIN { actions } and END { actions } are optional.
You can use AWK's own built-in variables in the awk script, as follows:
ARGC The number of command line arguments
ARGV The command line argument array
FILENAME The current input file name
FNR The record number in the current file
FS Input field delimiter, default is a space
RS Input record delimiter
NF Number of fields in the current record
NR Number of records so far
OFS Output field delimiter
ORS Output record delimiter
The running process of the awk script:
① If the BEGIN block exists, awk executes the actions specified by it.
② awk reads a line from the input file, which is called an input record. (If the input file is omitted, it will be read from the standard input)
③ awk splits the read record into fields, putting the first field into the variable $1, the second field into $2, and so on. $0 represents the entire record. Field separators are specified using the shell environment variable IFS or by parameters.
④ Compare the current input record with the awk_pattern in each awk_cmd to see if it matches. If it matches, execute the corresponding actions. If there is no match, the corresponding actions are skipped until all awk_cmds are compared.
⑤ When an input record compares all awk_cmd, awk reads the next line of the input and continues to repeat steps ③ and ④. This process continues until awk reads the end of the file.
⑥ When awk has read all the input lines, if END exists, the corresponding actions will be executed.
1) input_file can be a file list with more than one file, and awk will process each file in the list in order.
2) An awk_pattern of awk_cmd can be omitted. When omitted, the corresponding actions will be executed without matching and comparing the input records. An awk_cmd action can also be omitted. When omitted, the default action is to print the current input record, that is, {print $0}. Awk_pattern and actions in an awk_cmd cannot be omitted at the same time.
3) The BEGIN block and the END block are located at the beginning and end of awk_script respectively. Only END blocks or only BEGIN blocks are allowed in awk_script. If there is only BEGIN { actions } in awk_script, awk will not read input_file.
4) awk reads the data of the input file into the memory, and then operates the copy of the input data in the memory. awk will not modify the content of the input file.
5) Awk always outputs to standard output. If you want awk to output to a file, you can use redirection.
3.1.awk_pattern
The awk_pattern pattern part determines when the actions action part is triggered and when the actions are triggered.
awk_pattern can be of the following types:
1) Regular expression is used as awk_pattern: /regexp/
Note that the regular expression regexp must be wrapped by /
It is often used in regular expression matching operations in awk Characters:
^ $ . [] | () * //: Universal regexp metacharacter
+: Matches the single character before it more than once. It is awk's own metacharacter and does not apply to grep or sed, etc.
? : matches the single character before it 1 or 0 times. It is awk’s own metacharacter and is not suitable for grep or sed.
For more information about regular expressions, please refer to "Regular Expressions"
Example:
awk '/ *$0.[0-9][0-9].*/' input_file
For example, the line content is $0.99. The helllo line can match the above regular expression
2) Boolean expressions are used as awk_pattern. When the expression is true, the execution of corresponding actions is triggered.
① You can use variables (such as field variables $1, $2, etc.) and /regexp/
② Operators in Boolean expressions:
Relational operator: < > <= >= == !=
Matching operator: value ~ /regexp/ If value matches /regexp/, return true
value !~ /regexp/ if value If /regexp/ does not match, then return true
Example: awk '$2 > 10 {print "ok"}' input_file
awk '$3 ~ /^d/ {print "ok"}' input_file
③ &&( And) and ||(or) can connect two /regexp/ or Boolean expressions to form a mixed expression. !(not) can be used in Boolean expressions or before /regexp/.
Example: awk '($1 < 10 ) && ($2 > 10) {print $0 "ok"}' input_file
awk '/^d/ || /x$/ {print $0 "ok"}' input_file
④ Other expressions are used as awk_script, such as assignment expressions, etc.
Example:
awk '(tot+=$6); END{print "total points :" tot }' input_file // The semicolon cannot be omitted
awk 'tot+=$6 {print $0} END{print "total points :" tot }' input_file // Equivalent to the above
When using an assignment expression, it means that if the assigned variable is a number, if it is non-0 , it matches, otherwise it does not match; if it is a string, it matches if it is not empty, otherwise it does not match.
awk built-in string functions:
gsub(r, s) Replace r with s throughout $0
awk 'gsub(/name/,"xingming") {print $0}' temp
gsub (r, s, t) Replace r with s in the entire t
index(s,t) Return the first position of the string t in s
awk 'BEGIN {print index("Sunny", "ny") }' temp Returns 4
length(s) Returns the length of s
match(s, r) Tests whether s contains a string matching r
awk '$1=="J.Lulu" {print match($1, "u")}' temp Return 4
split(s, a, fs) Split s into sequence a on fs
awk 'BEGIN {print split("12#345#6789", myarray, "#") "'
returns 3, while myarray[1]="12", myarray[2]="345", myarray[3]="6789"
sprint(fmt, exp) Returns exp formatted by fmt
sub(r, s) Replace r with s from the leftmost longest substring in $0 (only replace the first matching string encountered)
substr(s, p) Return the string s from p The starting suffix part
substr(s, p, n) Returns the suffix part starting from p and having a length of n in string s
awk string concatenation operation
[chengmo@centos5 ~]$ awk 'BEGIN{a= "a";b="b";c=(a""b);print c}'
ab
2.7. Use of printf function:
Character conversion: echo "65" |awk '{printf " %cn", $0}' Output A
awk 'BEGIN {printf "%fn", 999}' Output 999.000000
Formatted output: awk '{printf "%-15s %sn", $1, $3}' temp Align all the first fields to the left
2.8. Other awk usage:
Pass value to a line of awk command:
awk '{if ($5
who | awk '{if ($1==user) print $1 " are in " $2 ' user=$LOGNAME Use environment variables
awk script command: Use !/bin/awk -f at the beginning of
. Without this sentence, the self-contained script will not be executed. Example:
!/bin /awk -f
# all comment lines must start with a hash '#'
# name: student_tot.awk
# to call: student_tot.awk grade.txt
# prints total and average of club student points
# print a header first
BEGIN
{
print "Student Date Member No. Grade Age Points Max"
print "Name Joined Gained Point Available"
print"========== ==============================================="
}
# let's add the scores of points gained
(tot+=$6);
# finished processing now let's print the total and average point
END
{
print "Club student total points :" tot
Print "Average Club Student points:" tot/N
}
2.9. awk array:
awk’s basic loop structure
For (element in array) print array[element]
awk 'BEGIN {record="123#456#789";split(record, myarray, "#")}
END { for (i in myarray) {print myarray[i]} }
3.0 Custom statements in awk
1. Conditional judgment statement (if)
if (expression) #if (Variable in Array)
Statement 1
else
Statement 2
In the format "Statement 1" can be multiple statements, If you want to facilitate Unix awk's judgment and your own reading, you'd better enclose multiple statements with {}. Unix awk branch structure allows nesting, its format is:
if(expression)
{statement 1}
else if(expression)
{statement 2}
else
{statement 3}
[chengmo@localhost nginx]# awk 'BEGIN{
test=100;
if(test>90)
{
print "very good";
}
else if(test>60)
{
print "good";
}
else
{
print "no pass";
}
}'
very good
After each command statement Can be ended with ";".
2. Loop statement (while, for, do)
1. while statement
Format:
while(expression (formula)
{statement}
Example:
[chengmo@localhost nginx]# awk 'BEGIN{
test=100;
total=0;
while(i<=test)
{
total+=i;
i++;
}
print total;
}'
5050
2.for loop
for loop has two formats:
Format 1:
for(variable in array)
{statement}
Example:
[chengmo@localhost nginx]# awk 'BEGIN{
for(k in ENVIRON)
{
Print k"="ENVIRON[k];
}
}'
AWKPATH=.:/usr/share/awk
OLDPWD=/home/web97
SSH_ASKPASS=/ usr/libexec/openssh/gnome-ssh-askpass
SELINUX_LEVEL_REQUESTED=
SELINUX_ROLE_REQUESTED=
LANG=zh_CN.GB2312
. . . . . .
Explanation: ENVIRON is an awk constant and a sub-typical array.
Format 2:
for (variable; condition; expression)
{statement}
Example:
[chengmo @localhost nginx]#awk 'BEGIN{
total=0;
for(i=0;i<=100;i++)
{
total+=i;
}
print total;
}'
5050
3.do loop
Format:
do
{statement}while(condition)
Example:
[chengmo@localhost nginx] # awk 'BEGIN{
total=0;
i=0;
do
{
total+=i;
i++;
}while(i<=100)
print total;
}'
5050
The above is the awk flow control statement. From the syntax, you can see that it is the same as the c language. With these statements, many shell programs can actually be handed over to awk, and the performance is very fast.
break When the break statement is used in a while or for statement, it causes the program loop to exit.
continue When the continue statement is used in a while or for statement, causes the program loop to move to the next iteration.
next causes the next line of input to be read and returns to the top of the script. This avoids performing additional operations on the current input line. The
exit statement causes the main input loop to exit and transfers control to END, if END exists. If no END rule is defined, or an exit statement is applied in END, the execution of the script is terminated.
NR and FNR:
QUOTE:
A. The execution sequence of awk for multiple input files is that the code is first applied to the first file (read line by line), and then the repeated code is applied to the second file, and then to the third file.
B. Awk's execution order of multiple input files causes a line number problem. When the first file is executed and the second file is read next time, how to calculate the first line of the second file? If it counts as 1 again, wouldn’t it be two 1s? (Because the first file also has the first line). This is the problem with NR and FNR.
NR: Global line number (counted sequentially from the first line of the second file to the last line of the first file)
FNR: The number of lines in the current file itself (regardless of the number and total number of lines in the previous input files) )
例如:data1.txt中有40行,data2.txt中有50行,那么awk '{}' data1.txt data2.txt
NR 的值依次为:1,2……40,41, 42 ... The value of 90
FNR is: 1, 2 ... 40, 1, 2 ... 50
Getline function description:
AWK's getline statement is used to simply read a record. Getline is especially useful if the user has a data record that resembles two physical records. It completes the separation of general fields (set field variables $0 FNR NF NR). Returns 1 on success, 0 on failure (end of file reached).
QUOTE:
A. Getline as a whole, we should understand its usage:
When there is no regulating direction in the left and right, or & lt;(VAR or $ 0 (no variable); should be noted that because AWK had read a line before processing Getline, the return result of Getline was separated.定 When there is a redirective rune | or & lt; Getline acts on the directional input file. Since the file is just opened, it is not read into a line by
AWK, but getline is read, so the getline returns is returning. The first line of the file, not every other line. B. The usage of getline can be roughly divided into three major categories (each major category is divided into two sub-categories), that is, there are a total of 6 usages. The code is as follows: QUOTE: nawk 'BEGIN{"cat data.txt"|getline d; print d}' data2.txt nawk 'BEGIN{"cat data.txt"|getline; print $0}' data2 .txt nawk 'BEGIN{getline d < "data.txt"; print d}' data2.txt nawk 'BEGIN{getline < "data.txt"; print $0}' data2.txt above All four lines of code realize "only print the first line of the data.txt file" (if you want to print all lines, use a loop) eg. nawk 'BEGIN{FS=":";while(getline<"/etc/passwd" >0){print $1}}' data.txtQUOTE:
nawk '{getline d; print d”#”$3}' data.txt
For more detailed explanations of awk commands, please pay attention to the PHP Chinese website for related articles!