模式匹配文本处理 - 军军小站|张军博客

使用正则表达式在 System.TextRegularExpression

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

Regex reg = new Regex("the");

string str1 = "the quick brown fox jumped over the lazy dog";

Match matchSet;

int matchPos;

matchSet = reg.Match(str1);

if (matchSet.Success)

{

matchPos = matchSet.Index;

Console.WriteLine("found match at position:" + matchPos);

}

if (Regex.IsMatch(str1, "the"))

{

Match aMatch;

aMatch = reg.Match(str1);

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

Regex reg = new Regex("the");

string str1 = "the quick brown fox jumped over the lazy dog";

MatchCollection matchSet;

matchSet = reg.Matches(str1);

if (matchSet.Count > 0)

foreach (Match aMatch in matchSet)

Console.WriteLine("found a match at: " + aMatch.Index);

Console.Read();

}

数量词

(+) 这个数量词说明正则表达式应该匹配一个或者多次紧接其前的字符。

(*) 这个数量词说明正则表达式应该匹配零个或者多次紧接其前的字符。 // 实践中非常难用，会导致匹配太多

(?) 这个数量词说明正则表达式应该匹配零次或者多次紧接其前的字符。

{N} 这个数量词指定要匹配的数量。

{m,n} 这个数量词指定最小，做大匹配数量。也可以 {m,},{,n} 只指定最大和最小。

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string[] words = new string[]{"Part", "of", "this","string", "is", "bold"};

string regExp = "<.*>"; // 应该修改成 <.+?> + 仅使用这个是不行的。 . (.) 句点表示与任意字符匹配

MatchCollection aMatch;

foreach (string word in words)

{

if (Regex.IsMatch(word, regExp))

{

aMatch = Regex.Matches(word, regExp);

for (int i = 0; i < aMatch.Count; i++)

Console.WriteLine(aMatch[i].Value);

}

原本期望这个程序就返回两个标签： 和 但由于贪心，正则返回了 string 。利用惰性量词 (?) 可以解决 <.+?> 仅适用 + 是不行的，必须加惰性量词 ?

使用字符类

句点 (.) 的通常是用它在字符内部定义字符范围，也就是用来限定字符串的开始 / 结束字符。

句点匹配任意字符。

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string str1 = "the quick brown fox jumped over the lazy dog one time";

MatchCollection matchSet;

matchSet = Regex.Matches(str1, "t.e");

foreach (Match aMatch in matchSet)

Console.WriteLine("Matches at: " + aMatch.Index);

}

检查字符组的模式， ([]) 。在方括号内的字符称为“字符类”

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string str1 = "THE quick BROWN fox JUMPED over THE lazy DOG";

MatchCollection matchSet;

matchSet = Regex.Matches(str1, "[a-z]");

foreach (Match aMatch in matchSet)

Console.WriteLine("Matches at: " + aMatch.Index);

}

[]A-Za-z] 所有英文字母大小写

字符类前面放 (^) 表示字符类的反或者否定如 [aeiou] 表示元音，那么 [^aeiou] 表示非元音。

[]A-Za-z0-9 ] 表示单词，也可以用 \w 表示 , 用 \W 表示 \w 的反向即非单词。

[0-9] 可以用 \d 表示

[^0-9] 表示 \D

\s 表示空格 \S 表示非空格。

断言

(^) 在开始处匹配

($) 在结束处匹配

\b 在开始结束匹配

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string[] words = new string[] { "heal", "heel", "noah", "techno" };

string regExp = "^h";

Match aMatch;

foreach (string word in words)

if (Regex.IsMatch(word, regExp))

{

aMatch = Regex.Match(word, regExp);

Console.WriteLine("Matched: " + word + " at position: " + aMatch.Index);

}

string regExp = "h$";

string words = "hark, what doth thou say, Harold? ";

string regExp = "\\bh";

使用分组构造

1 匿名分组

通过括号内围绕的正则表达式就可以组成组

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string words = "08/14/57 46 02/25/59 45 06/05/85 18" + "03/12/88 16 09/09/90 13";

string regExp1 = "(\\s\\d{2}\\s)";

MatchCollection matchSet = Regex.Matches(words,regExp1);

foreach (Match aMatch in matchSet)

Console.WriteLine(aMatch.Groups[0].Captures[0]);

}

2 命名组

命名组通过在正则表达式前缀的问号和一对尖括号扩着的名字组成。

例如 "ages " 中的组名

正则如下 (?<ages>\\s\\d{2}\\s)

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string words = "08/14/57 46 02/25/59 45 06/05/85 18 " + "03/12/88 16 09/09/90 13";

string regExp1 = "(?<dates>(\\d{2}/\\d{2}/\\d{2}))\\s";

MatchCollection matchSet = Regex.Matches(words,regExp1);

foreach (Match aMatch in matchSet)

Console.WriteLine("Date: {0}", aMatch.Groups["dates"]);

}

零宽度正向预搜索断言和零宽度反向预搜索断言

断言还可以用来确定正则表达式向前或者向后匹配程度，这些断言可能是正（匹配模式），也能是负的 ( 非匹配模式 ) 。

(?=reg-exp-char)

string words = "lions lion tigers tiger bears,bear";

string regExp1 = "\\w+(?=\\s)"; \\ 只匹配当前子表达式在指定位置右侧，那么匹配就继续。

负的正向预搜索断言，只要搜索到不匹配的当前表达式的指定位置右侧，那么断言就继续。

string words = "subroutine routine subprocedure procedure";

string regExp1 = "\\b(?!sub)\\w+\\b";

反向预搜索断言

只要字表达式不匹配在位置左侧，那么负的反向与搜索断言就继续。

string words = "subroutines routine subprocedures

procedure";

string regExp1 = "\\b\\w+(?<=s)\\b";

string regExp1 = "\\b\\w+(?<!s)\\b";

CaptureCollection 类

using System;

using System.Text.RegularExpressions;

class chapter8

{

static void Main()

{

string dates = "08/14/57 46 02/25/59 45 06/05/85 18 " + "03/12/88 16 09/09/90 13";

string regExp = "(?<dates>(\\d{2}/\\d{2}/\\d{2}))\\s(?<ages>(\\d{2}))\\s";

MatchCollection matchSet;

matchSet = Regex.Matches(dates, regExp);

Console.WriteLine();

foreach (Match aMatch in matchSet)

{

foreach (Capture aCapture in aMatch.Groups["dates"].Captures)

Console.WriteLine("date capture: " + aCapture.ToString());

foreach (Capture aCapture in aMatch.Groups["ages"].Captures)

Console.WriteLine("age capture: " + aCapture.ToString());

}

正则表达式选项

matchSet = Regex.Matches(dates, regexp, RegexOptions.Multiline);

模式匹配文本处理

更多文章、技术交流、商务合作、联系博主

微信扫码或搜索：z360901061

微信扫一扫加我为好友

QQ号联系： 360901061

您的支持是博主写作最大的动力，如果您喜欢我的文章，感觉我的文章对您有帮助，请用微信扫描下面二维码支持博主2元、5元、10元、20元等您想捐的金额吧，狠狠点击下面给点支持吧，站长非常感激您！手机微信长按不能支付解决办法：请将微信支付二维码保存到相册，切换到微信，然后点击微信右上角扫一扫功能，选择支付二维码完成支付。

【本文对您有帮助就好】元

2元

5元

10元

20元

自定义