In-depth understanding of grep command: application of regular expressions in grep-LINUX-php.cn

Introduction

How do I use the regular expressions of the Grep command in Linux and Unix-like systems? Linux comes with the GNU grep command tool, which supports extended regular expressions, and GNU grep is included by default in all Linux systems. The Grep command is used to search and locate any information stored on your server or workstation.

Regular expression

Regular expression is a pattern used to match each line of input. The pattern refers to a sequence of characters. The following is an example:

^w1 w1|w2 [^ ]

Copy after login

grep regular expression example

Search for 'vivek'

in the /etc/passswd directory

grep vivek /etc/passwd

Copy after login

Output example:

vivek:x:1000:1000:Vivek Gite,,,:/home/vivek:/bin/bash vivekgite:x:1001:1001::/home/vivekgite:/bin/sh gitevivek:x:1002:1002::/home/gitevivek:/bin/sh

Copy after login

Search for vivek in any case (i.e. case-insensitive search)

grep -i -w vivek /etc/passwd

Copy after login

Search for vivek or raj in any case

grep -E -i -w 'vivek|raj' /etc/passwd

Copy after login

The last example above shows an extended regular expression pattern.

Anchor

You can use the ^ and $ symbols respectively to regularly match the beginning or end of the input line. The following example search displays only input lines starting with vivek:

grep ^vivek /etc/passwd

Copy after login

Output example:

vivek:x:1000:1000:Vivek Gite,,,:/home/vivek:/bin/bash vivekgite:x:1001:1001::/home/vivekgite:/bin/sh

Copy after login

You can only search for lines starting with the word vivek, that is, do not display vivekgit, vivekg, etc. (LCTT translation annotation: the word is followed by English word separators such as spaces and symbols.)

grep -w ^vivek /etc/passwd

Copy after login

Find lines ending with the word word:

grep 'foo$' 文件名

Copy after login

Matches only lines containing foo:

grep '^foo$' 文件名

Copy after login

The example shown below can search for empty lines:

grep '^$' 文件名

Copy after login

Character class

Match Vivek or vivek:

grep '[vV]ivek' 文件名

Copy after login

grep '[vV][iI][Vv][Ee][kK]' 文件名

Copy after login

Can also match numbers (i.e. match vivek1 or Vivek2, etc.):

grep -w '[vV]ivek[0-9]' 文件名

Copy after login

Can match two numeric characters (i.e. foo11, foo12, etc.):

grep 'foo[0-9][0-9]' 文件名

Copy after login

is not limited to numbers, but can also match at least one letter:

grep '[A-Za-z]' 文件名

Copy after login

Display all lines containing "w" or "n" characters:

grep [wn] 文件名

Copy after login

The expression placed in brackets, that is, the name of the character class enclosed between "[:" and ":]", represents a list of all characters belonging to this class. The standard character class names are as follows:

[:alnum:]

Copy after login

- Alphanumeric characters

[:alpha:]

Copy after login

- Alphabetic characters

[:blank:]

Copy after login

- Null characters: space and tab characters

[:digit:]

Copy after login

-Number: '0 1 2 3 4 5 6 7 8 9'

[:lower:]

Copy after login

- Lowercase letters: 'a b c d e f g h i j k l m n o p q r s t u v w x y z'

[:space:]

Copy after login

- Space characters: tab, line feed, vertical tab, form feed, carriage return and space character

[:upper:]

Copy after login

- Capital letters: 'A B C D E F G H I J K L M N O P Q R S T U V W X Y Z'

In this example shown is matching all uppercase letters:

grep '[:upper:]' 文件名

Copy after login

Wildcard

You can use "." to match a single character. The example matches 3-character words starting with "b" and ending with "t":

grep '/' 文件名

Copy after login

here,

Match the empty string before the word
Matches the empty string after the word

Print out all lines with only two characters:

grep '^..$' 文件名

Copy after login

Display lines starting with a dot and a number:

grep '^/.[0-9]' 文件名

Copy after login

Dot character escape

The following regular expression to match the IP address 192.168.1.254 is incorrect: (LCTT Annotation: It can match the IP address, but it is also possible to match a similar format in which the separator symbol is not a dot)

grep '192.168.1.254' /etc/hosts

Copy after login

All three dot characters need to be escaped:

grep '192/.168/.1/.254' /etc/hosts

Copy after login

The following example can only match the IP address: (LCTT Translation: In fact, due to the range of numbers in the IP address, this regular expression is not accurate)

egrep '[[:digit:]]{1,3}/.[[:digit:]]{1,3}/.[[:digit:]]{1,3}/.[[:digit:]]{1,3}' 文件名

Copy after login

How to search for matching patterns starting with the "-" symbol?

Use the -e option to search for a string matching '--test--'. If you do not use the -e option, the grep command will try to parse '--test--' as its own option parameter:

grep -e '--test--' 文件名

Copy after login

How to use grep's "or" matching?

Use the following syntax:

grep -E 'word1|word2' 文件名 或 egrep 'word1|word2' 文件名

Copy after login

or it could be

grep 'word1/|word2' 文件名

Copy after login

How to use grep's "and" matching?

Use the following syntax to display all lines that contain both 'word1' and 'word2'

grep 'word1' 文件名 | grep 'word2'

Copy after login

How to use sequence detection?

Using the following syntax, you can detect the number of times a character appears repeatedly in a sequence:

{N} {N,} {min,max}

Copy after login

To match the character "v" appearing twice:

egrep "v{2}" 文件名

Copy after login

The following command can match "col" and "cool":

egrep 'co{1,2}l' 文件名

Copy after login

The following command will match all lines with at least three 'c' characters.

egrep 'c{3,}' 文件名

Copy after login

The following example will match mobile phone numbers in the format 91-1234567890 (that is, two digits-ten digits).

grep "[[:digit:]]/{2/}[ -]/?[[:digit:]]/{10/}" 文件名

Copy after login

How to highlight the grep command?

Use the following syntax:

grep --color 正则表达式 文件名

Copy after login

怎么样仅仅只显示匹配出的字符，而不是匹配出的行？

使用如下语法：

grep -o 正则表达式 文件名

Copy after login

正则表达式限定符

限定符	描述
.	匹配任意的一个字符。
?	匹配前面的子表达式，最多一次。
*	匹配前面的子表达式零次或多次。
+	匹配前面的子表达式一次或多次。
{N}	匹配前面的子表达式 N 次。
{N,}	匹配前面的子表达式 N 次到多次。
{N,M}	匹配前面的子表达式 N 到 M 次，至少 N 次至多 M 次。
-	只要不是在序列开始、结尾或者序列的结束点上，表示序列范围。
^	匹配一行开始的空字符串；也表示字符不在要匹配的列表中。
$	匹配一行末尾的空字符串。
\b	匹配一个单词前后的空字符串。
\B	匹配一个单词中间的空字符串。
\<	匹配单词前面的空字符串。
\>	匹配单词后面的空字符串。

grep 和 egrep

egrep 等同于

grep -E

Copy after login

它会以扩展的正则表达式的模式来解释模式。下面来自 grep 的帮助页：

基本的正则表达式元字符 ?、+、 {、 |、 ( 和 ) 已经失去了它们原来的意义，要使用的话用反斜线的版本 /?、/+、/{、/|、/( 和 /) 来代替。传统的 egrep 并不支持 { 元字符，一些 egrep 的实现是以 /{ 替代的，所以一个可移植的脚本应该避免在 grep -E 使用 { 符号，要匹配字面的 { 应该使用 [}]。

GNU grep -E 试图支持传统的用法，如果 { 出在在无效的间隔规范字符串这前，它就会假定 { 不是特殊字符。

例如，grep -E '{1' 命令搜索包含 {1 两个字符的串，而不会报出正则表达式语法错误。

POSIX.2 标准允许这种操作的扩展，但在可移植脚本文件里应该避免这样使用。

The above is the detailed content of In-depth understanding of grep command: application of regular expressions in grep. For more information, please follow other related articles on the PHP Chinese website!