This time I will bring you a detailed explanation of the use of regular position matching. What are the precautions for using regular position matching? . The following is a practical case, let's take a look.
The example in this article describes the position matching of the regular expression tutorial. Share it with everyone for your reference, as follows:
Note: In all examples, the regular expression matching results contain [ and ]## in the source text. #, some examples will be implemented using Java. If it is the usage of regular expressions in Java itself, it will be explained in the corresponding place. All java examples are tested under JDK1.6.0_13.
1. Introduction to the problem
If we want to match a certain word in a text (not considering the multi-line mode for now, which will be introduced later), we It might look like this: Text:Yesterday is history, tomorrow is a mystery, but today is a gift.
Regular expression:is
Result:Yesterday 【is】 h【is】tory, tomorrow 【is】 a mystery, but today 【is】 a gift.
Analysis: Originally it only wanted to match the word is, but it also matched the is contained in other words. To solve this problem, use boundary delimiters, that is, use somemetacharacters in the regular expression to indicate where (or boundary) we want the matching operation to occur.
2. Word Boundary
A commonly used boundary is the word boundary specified by the qualifier \b, which is used to match the beginning and end of a word. More precisely, it matches a position between a character that can be used to form a word (letter, number, underscore, which is the character matched by \w) and a character that cannot be used to form a word ( characters that match \W). Let’s look at the previous example: Text:Yesterday is history, tomorrow is a mystery, but today is a gift.
Regular expression:\bis \b
Result:Yesterday 【is】 history, tomorrow 【is】 a mystery, but today 【is】 a gift.
Analysis: In the original text, there is a space before and after the word is, which matches the pattern \bis\b (space is one of the characters used to separate words) . The word history also contains is, because there are two characters h and t before and after it. Neither of these two characters can match \b. If a word boundary is not matched, \B is used. For example: Text:Please enter the nine-digit id as it appears on your color - coded pass-key.
Regular expression:\B -\B
Result:Please enter the 【nine-digit】 id as it appears on your color - coded 【pass-key】 .
Analysis: \B-\B will match a hyphen that is not a word boundary before and after. There are no spaces before and after the hyphen in nine-digit and pass-key, so it can match, and color - There are spaces before and after the hyphen in coded, so it cannot be matched.3. StringBoundary
Word boundary can be used to match positions related to words (beginning of word, end of word, entire word, etc. wait). String boundaries have a similar purpose, but are used to match positions related to strings (beginning of string, end of string, entire string, etc.). There are two metacharacters used to define string boundaries: one is ^ used to define the beginning of the string, and the other is $ used to define the end of the string. For example, if you want to check the legality of an XML document, legal XML documents all start with : Text:<?xml version="1.0" encoding="UTF-8"?> <project basedir="." default="ear"> </project>
^\s*<\?xml.*?\?>
结果:
分析:^匹配一个字符串的开头位置,所以^\s*将匹配一个字符串的开头位置和随后的零个或多个空白字符,因为标签前面允许有空格、制表符、换行符等空白字符。
$元字符符的用法除了位置上的差异外,与^用法完全一样。比如,检查一个html页面是否以结尾,可以用模式:[Hh][Tt][Mm][Ll]>\s*$
四、多行匹配模式
正则表达式可以通过一些特殊的元字符来改变另外一些元字符的行为。可以通过(?m) 来启用多行匹配模式。多行匹配模式将使得正则表达式引擎把行分隔符当做一个字符串分隔符来对待。在多行匹配模式下,^不仅匹配正常的字符串开头,还将匹配行分隔符(换行符)后面的开始位置,$不仅匹配正常的字符串结尾,还将匹配行分隔符(换行符)后面的结束位置。
在使用时,(?m)必须出现在整个模式的最前面。比如,通过正则表达式把一段java代码中的单行注释(以//开始)内容全部找出来。
文本:
publicDownloadingDialog(Frame parent){ //Callsuper constructor, specifying that dialog box is modal. super(parent,true); //Setdialog box title. setTitle("E-mailClient"); //Instructwindow not to close when the "X" is clicked. setDefaultCloseOperation(DO_NOTHING_ON_CLOSE); //Puta message with a nice border in this dialog box. JPanelcontentPanel = new JPanel(); contentPanel.setBorder(BorderFactory.createEmptyBorder(5,5, 5, 5)); contentPanel.add(newJLabel("Downloading messages...")); setContentPane(contentPanel); //Sizedialog box to components. pack(); //Centerdialog box over application. setLocationRelativeTo(parent); }
正则表达式:(?m)^\s*//.*$
结果:
publicDownloadingDialog(Frame parent){
【 //Call superconstructor, specifying that dialog box is modal.】 super(parent,true);
【 //Set dialog boxtitle.】 setTitle("E-mailClient");
【 //Instruct windownot to close when the "X" is clicked.】 setDefaultCloseOperation(DO_NOTHING_ON_CLOSE);
【 //Put a messagewith a nice border in this dialog box.】 JPanelcontentPanel = new JPanel();
contentPanel.setBorder(BorderFactory.createEmptyBorder(5,5, 5, 5));
contentPanel.add(newJLabel("Downloading messages..."));
setContentPane(contentPanel);
【 //Size dialog boxto components.】 pack();
【 //Center dialogbox over application.】 setLocationRelativeTo(parent);
}
分析:^\s*//.*$将匹配一个字符串的开始,然后是任意多个空白字符,再后面是//,再往后是任意文本,最后是一个字符串的结束。不过这个模式只能找出第一条注释,加上(?m)前缀后,将把换行符视为一个字符串分隔符,这样就可以把每一行注释匹配出来了。
java代码实现如下(文本保存在text.txt文件中):
public static String getTextFromFile(String path) throws Exception{ BufferedReader br = new BufferedReader(new FileReader(new File(path))); StringBuilder sb = new StringBuilder(); char[] cbuf = new char[1024]; int len = 0; while(br.ready() && (len = br.read(cbuf)) > 0){ br.read(cbuf); sb.append(cbuf, 0, len); } br.close(); return sb.toString(); } public static void multilineMatch() throws Exception{ String text = getTextFromFile("E:/text.txt"); String regex = "(?m)^\\s*//.*$"; Matcher m = Pattern.compile(regex).matcher(text); while(m.find()){ System.out.println(m.group()); } }
输出结果如下:
//Call super constructor, specifying that dialog box is modal.
//Set dialog box title.
//Instruct window not to close when the "X" is clicked.
//Put a message with a nice border in this dialog box.
//Size dialog box to components.
//Center dialog box over application.
5. Summary
Regular expressions can not only be used to match text blocks of any length, but can also be used to match text that appears at specific positions in a string. \b is used to specify a word boundary (\B is just the opposite). ^ and $ are used to specify word boundaries. If used with (?m), ^ and $ will also match strings that begin or end with a newline character. The use of subexpressions will be introduced in the next article.
I believe you have mastered the method after reading the case in this article. For more exciting information, please pay attention to other related articles on the php Chinese website!
Recommended reading:
Position matching tutorial of regular expression tutorial (with code)
JS password strength correction Verify regular expression (with code)
The above is the detailed content of Detailed explanation of regular position matching. For more information, please follow other related articles on the PHP Chinese website!