Regular Expressions (RegEx) Basics

Regular Expressions or (RegEx) within your scripts is a great way to find strings within a string. It is a sequence of characters that defines how to get a pattern of strings – typically called a pattern. Regular expressions are used in most programming languages and many software configurations. They can work a little differently between each language, and each language may have some patterns that won’t work in others, but in general they work very similarly. These patterns by default are case sensitive.

Typically, regular expressions will match the first pattern it found in a string unless the ‘globalize’ value is set to true. Depending on the language, that option is typically set outside the pattern.

Most languages will have a through z, A through Z, and 0 through 9 represent itself. These sequence should expect the string in the order of what is defined in the pattern. For example, pattern ‘ab’ will find ‘ab’ in a string but not ‘ba’. Backslashes (\) are used to escape special characters such as \d that represents digits. Periods (.) usually represents any character except line breaks. To represent any reserved characters, a backslash will be used to escape it like ‘\\’ will only represent ‘\’.

Ranges can be defined using a dash (-). So 0-4 represents 0 through 4 and A-P represents all letters between A through P including A and P. A-z includes all letters, case insensitive.

When wanting to represent a character more than once without retyping it, * represents 0 or more of that character and + represents 1 or more characters. For example a+ will mean that 1 or more a’s in a row would satisfy the pattern. Curly brackets will represent exactly how many of that character is needed. So a{2,3} means that exactly 2 or 3 a’s in a row will satisfy the pattern and a{4} means exactly 4 a’s in a row will satisfy the pattern. Anything between square brackets represents 1 character. For example [ab] can match one a or b in a string.

The up carrot (^) represents the start of the string in question and dollar sign ($) represents the end of the string in question. For example, if you have ^aba on babab, the pattern will fail because the string doesn’t start with aba; however, abab will not fail.

Katherine Wu, Developer