C# Regular Expressions

Sometimes when getting input from the user, you need to check if it follows a certain pattern. For example. consider a textbox that expects to get an email from the user, it must check if the given text is actually an email and not anything else such as a name or a phone number. The .NET Framework offers this amazing pattern checking using regular expression which is a special language for manipulating texts. The namespace used in handling regular expressions is the System.Text.RegularExpressions which contains the RegEx class that will do all the magic. The following shows the simplest use of the Regex class.

Regex myRegEx = new Regex("sample");
string s1 = "This is a sample.";

if (myRegEx.IsMatch(s1))
   Console.WriteLine("Match found!");

The Regex‘s constructor takes a string parameter which is the pattern you are searching for. We use the IsMatch() method to find a match. Since we passed “sample” as the pattern and the word sample can be found in s1, the IsMatch() method returns true.

You might want to find the exact position of the text “string” in the variable. You can use the method Match() which returns a Matchobject which has the Index property containing the index of the matched word.

Regex re = new Regex("string");
string s1 = "This is a string";
Match match = re.Match(s1);

if (match.Success)
{
   Console.WriteLine("Match found at " + match.Index);
}
Match found at 10

When you have multiple matches in a string, you can use the Matches() method instead. It returns a MatchCollection object which you can iteratively loop through to obtain the index positions of all matches.

Regex re = new Regex("happy");
string s1 = "This is a happy happy happy day.";
MatchCollection matches = re.Matches(s1);

foreach(Match match in matches)
{
   Console.WriteLine("Match found at index {0}.", match.Index);
}
Match found at index 10. 
Match found at index 16. 
Match found at index 22.

The Regex class also has a static Match() method which returns a Match object and accepts two parameters: the first one is the search pattern, and the second one is the string to search.

Match match = Regex.Match("sample", "This is a sample.");

if (match.Success)
   Console.WriteLine("Match found!");

You can perform complex searches by using regular expression operators. Suppose you want to know if a specified string has either “Mr” or “Mrs”, you can use the | operator.

string name = "Mr. John Smith";
Regex r = new Regex("Mr|Mrs");
if (r.IsMatch(name))
{
   Console.WriteLine("Match found!");
}

Here, Regex will find “Mr” or “Mrs” in a string. If at least one of them is found in the string, then IsMatch() will yield true.

The table that follows shows the regular expression operators commonly used in searching patterns:

Operator Description
. Match any one character
[] Match any one character listed between the brackets
[^ ] Match any one character not listed between the brackets
? Matches 0 or 1 occurrence of the preceding pattern.
* Matches 0 or more occurrences of the preceding pattern.
+ Matches 1 or more occurrences of the preceding pattern.
{n} Match declared element exactly n times
{n, } Match declared element at least n times
{n,N} Match declared element at least n times, but not more than N times
^ Match at the beginning of a line
$ Match at the end of a line
\< Match at the beginning of a word
\> Match at the end of a word
\b Match at the beginning or end of a word
\B Match in the middle of the word
\d Shorthand for digits (0-9)
\w Shorthand for word characters (letters and digits)
\s Shorthand for whitespace

Let’s take a look at more examples:

Pattern Sample Matches
^[A-Z][a-zA-Z]*$ John, Raymond, Allen
^[0-9]+\s+([a-zA-Z]+|[a-zA-Z]+\s[a-zA-Z]+)$ 123 Some Street
567 Unknown
\d{5} 12345

The ^ indicates the start of the string and $ represents the end of the string. [A-Z] means a character should be from A to Z. Using the ^ character negates the effect. For example, if you use [^A-Z] then only characters which do not fall from A-Z is matched. [a-zA-Z]simple means match every alphabet whether it is lowercase or uppercase. The * operator is used if you want to match 0 or more of the preceding pattern. Therefore, [0-9]* will match nothing, 1, 12, 123 and so on. The ? character matches the preceding pattern 0 or 1 time there for \d? will much nothing or the numbers 1-9 since \d represents a numerical digit. The + matches 1 or more times therefore [0-9A-Z]+ matches 9, A8, 87G, 9AT2 and so on. The {n} is used to repeat a pattern a number of times. So A{5} matches AAAAA. {n, } matches at least n times of the preceding pattern. Therefore, B{2, } matches BB, BBB, BBBB but not B. {n, N} matches at least n times but not more than N times of the preceding pattern. There for C{3, 5} matches CCC, CCCC, CCCCC, but not C or CCCCCC.

There are many premade regular expressions in the internet. You can visit http://regexlib.com/ for some premade regular expression patterns. All you need to do is find the regular expression of your choice, feed it in the Regex constructor (or use as an argument in the Match() method). Then you can use the Match() and IsMatch() methods to find a match.

Replacing Strings with Regular Expressions


You can use the Regex class to replace all the matching strings within another string. Regex offers two versions of Replace method. An instance method and a static method. Let’s take a look at the static version first.

string contact = "My contact is [email protected]";
Console.WriteLine(contact);
contact = Regex.Replace(contact, @"[\w][email protected][\w]+\.[\w]+", "123-4567");
Console.WriteLine(contact);
My contact is [email protected]
My contact is 123-4567

The code above provides you with a simple email regular expression pattern. We will use this to seek any matching string and replace it with the replacement string. The static Replace method accepts three arguments, the input string, the pattern and the replacement string. When the email was found, it was replaced with the replacement string. Notice we don’t put ^ and & in the beginning and end of the pattern because the matching string could be found anywhere in the input string. The instance version of the Replace method is quite similar except that it only has two parameters, the pattern and the replacement string.