The Complete Guide to Regex in VBA
Regular Expressions (Regex) are a pattern that is used to find matches within strings. This is very useful when you want to validate, replace, or extract data from strings. In Excel, regular expressions can be used when working with VBA.
Contents
Matching Characters
The Pattern is a schema that is used to match against a target string. It is composed of a sequence of expressions.
To start with, the simplest expression is a literal. To group expressions together, you can use ( ). The column ‘Example’ below shows examples of patterns.
Pattern Syntax | Description | Example | Matches Found |
. | Matches any single character except vbNewLine | r.y | ray, roy, r@y, rty |
[characters] | Matches any single character between brackets[] | [nb] | Would only match “n” or “b” in nba |
[^characters] | Matches any single character that is not between brackets[] | [^vb] | So would match “a” in “vba” |
[start-end] | Matches any character that is part of the range in brackets[] | [A-K] | Would match “O” in “zoo” |
\w | Matches alphanumeric characters and the underscore, but not the space character | \w | Would match “hello” in “hello!” |
\W | Matches any non-alphanumeric characters and the underscore | \W | Would match “@” in “username@company” |
\s | Matches any white space character such as spaces and tabs | \s | Would match ” ” in “Hello World” |
\S | Matches any non-white space character | \S | Would match “Hello” and “World” in “Hello World” |
\d | Matches any decimal digit | \d | Would match “2” in “2AM” |
\D | Matches any non-decimal digit | \D | Would match “AM” in “2AM” |
\ | Escapes special characters when it follows any special character which then allows you to search for them | \. | Would match “.” in “firstname.lastname” |
\t | Tab | \t | Would match a tab character (vbTab) |
\r | Carriage Return | \r | Would match a carriage return (vbCr) |
\n | New Line | \n | Would match a new line (vbNewLine) |
Quantifiers
Being able to match patterns using the previous table is great, but what about controlling how many times a match occurs? Quantifiers enable you to do just that.
Quantifier | Description | Example | Matches Found |
* | Matches zero or more occurrences | vb*a | vba, va, ba, vbba, vbbba, vbvbvbbba |
+ | Matches one or more occurrences | vb*a | vba, vbba, vbbba, vbvbvbbba (note that va and ba would not be matched as at least one occurrence of the pattern ‘vb’ must exist). |
? | Matches zero or one | vb?a | va, vba,ba (note that vbbba and vvvvba would not be matched as at most one occurrence of the pattern could be matched). |
{n} | Matches “n” many times | h\W{4} | Would match “hiiii” in “hiiii&2bye” |
{n,} | Matches at least “n” number of times | h\W{4,} | Would match “hiiii&” in “hiiii&2bye” |
{n,m} | Matches between n and m number of times | h\W{1,8} | Would match “hi..!!!!” in “hi..!!!!2bye” |
Greedy Vs Non-Greedy
By Greedy, we mean that the pattern would be matched as many times as possible, unless, of course, the RegEx expression we’re using indicates a maximum number of matches. For example, when we use {1,3}
we are saying that we want to have at least 1 match and at most 3 matches.
Greedy matching is the default behavior (we don’t need to add anything to the expression to make it Greedy). On the other hand, Non-Greedy means that we want to have as few matches as possible (in terms of the numbers of characters). This is done by adding ?
.
For example, let’s look at this RegEx: W.*W
(which should match any non-word character, followed by any number of characters except for new line, followed by any non-word character. If we apply it to the string “_firsName_lastName_”, then it will match “_firstName_lastName_” but if our RegEx is Non-Greedy (W.*?W
) then it will match “_firstName_” and “_lastName_” because it attempts to get the shortest matches.
Anchor
Anchor | Description | Example | Matches Found |
^ | Matches any string that starts with that value | ^Hello | Would match “Hello World!” in “Hello World!” |
$ | Matches any string that ends with that value | $World! | Would match “Hello World!” in “Hello World!” |
How to start using RegEx
To start using RegEx in VBA, you need to select and activate the RegEx object reference library from the references list. Start by going to the VBA IDE, select Tools –> References, then select Microsoft VBScript Regular Expressions 5.5, then click OK.
RegEx Object
To work with RegEx, you need to initiate a RegEx object. The RegEx object is the mechanism through which you can utilize the RegEx properties and operations. To use the RegEx object, you must declare it through the following syntax:
Dim regexObject As RegExp
Set regexObject = New RegExp
Another option, in case you don’t want to add the reference through the above “References – VBAProject” window, is to use the following syntax:
Dim regexObject As Object
Set regexObject = CreateObject(“VBScript.RegExp”)
RegEx Properties
The most important property is, of course, the Pattern property. The pattern property is the expression that we build using the rules we have learned in the previous sections (see the reference tables above).
An example of a pattern can be as simple as “(.*)”
. The syntax for setting the pattern property is RegEx_Object.Pattern = “(.*)”
. Once the pattern is set, you can use it on any string to verify the existence of a pattern (using Test), replace a pattern (using Replace), or extract all instances of matches of the pattern (using Match). We will see how to use these operations (Test, Replace, Match) in the coming sections.
Other properties of the RegEx object act as options that you can turn on or off. These are;
IgnoreCase
: Set to True if the matching is case insensitiveGlobal
: Set to True if you want to match all of the cases that fit our pattern in the string (in which you’re attempting to find the pattern). Set to False to if you want only the first match to be found.MultiLine
: Set to True if the string consists of multiple lines and you want to find cases that match the pattern in all the lines.
RegEx Operations
regexObject.Test (string)
: returns true if the pattern is matched (found) against the provided string.regexObject.Replace (search_string, replace_string)
: replaces the instances that match the search_string with the replace_string.regexObject.Execute (search_string)
: returns all the matches that were found of the pattern against the provided string.
Now that we have covered the RegEx operations, we are ready to write some VBA code that includes the utilization of RegEx objects and their operations.
Example 1: Test Operation
The Test operation, as mentioned above, is a RegEx operation that returns True if the pattern is matched in the provided string. Below we see an example testing whether a pattern ([W-Z]) is matched in the string “Hello World!”
Sub Test_Pattern() Dim Str As String Dim regexObject As RegExp Set regexObject = New RegExp With regexObject 'checks if any word in our string start with an alphabet between W and Z .Pattern = "[W-Z]" End With Str = "Hello World!" 'A message box displays true because we have a word starting with W ‘World’ MsgBox regexObject.Test(Str) End Sub
Example 2: Replace Operation
Use the Replace operation when you want to replace the instances of matches with a replacement string.
Sub Replace_Pattern() Dim Str As String Dim Replace_Str As String Dim regexObject As RegExp Set regexObject = New RegExp With regexObject .Pattern = "Hello" End With Str = "Hello World!" 'replace the matched pattern with 'Goodbye' Replace_Str = "Goodbye" 'A message box displays our string have the replacement has occured. MsgBox regexObject.Replace(Str, Replace_Str) End Sub
Example 3: Execute Operation
By using the Execute operation, you are able to get a list of all the matches for your pattern that was found in the search string. Note that it’s important to set the Global property of the Regex object to True if you want to get a list of matching instances, instead of just the first match.
To see all the matches presented to us as a list, we can use the immediate window by pressing CTRL + G while we have the VBA IDE active.
Sub Execute_Pattern() Dim Str As String Dim Replace_Str As String Dim regexObject As RegExp Set regexObject = New RegExp With regexObject .Pattern = "He(l+)o" ‘use + to find Hello with at least one ‘l’ .Global = True ‘use this to find all matches, not just the first match End With 'Search string contains multiple versions of 'Hello' Str = "Helo Hello Hellllo Heo World!" 'utilize the execute method and save the results in a new object that we call ‘matches’ Set matches = regexObject.Execute(Str) For Each Match In matches Debug.Print Match.Value 'Result: Helo Hello Hellllo Next Match End Sub
Conclusion
In this article, we covered the basic concepts related to RegEx and demonstrated how to get started with RegEx. We also discussed some of the most important operations that are used in RegEx. RegEx is an extremely powerful tool for working with strings of text. It’s also utilized in various other programming languages.
5 thoughts on “The Complete Guide to Regex in VBA”