# Regular Expression Syntax

Regular expressions are a more powerful (and therefore complicated) form of wildcard pattern matching. Like [standard pattern matching](https://chaoses-ib.gitbook.io/directory-opus/manual/reference/wildcard_reference/pattern_matching_syntax), they can be used throughout Opus. Generally, you have to specifically enable the use of regular expressions in a given situation - by default, Opus will assume standard pattern matching. For example, the [**Advanced Rename**](https://chaoses-ib.gitbook.io/directory-opus/manual/file_operations/renaming_files/advanced_rename) dialog has a regular expression mode that you must select before regular expressions can be used.

One advantage regular expressions have over standard pattern matching is they can enable a form of search and replace in certain functions. As an example, this is used in the **Rename** command. The "search" string is specified as a pattern to match against the original names of files. That pattern can indicate *capture groups* - expressions in the source string that are captured, and can be carried over to the new string (which acts as the "replace" string). As an example, imagine the **Rename** dialog is set to regular expression mode, with the following patterns supplied:

**Old Name**: The (.\*) Backup\\(.\*)\
**New Name**: \1.\2

The two **(.\*)** tokens in the *old name* string are capture groups - they "capture" whatever is matched by the expression within the parentheses. In this case, the expression inside the brackets is **.\*** which simply means "match anything". So what this pattern will do is match any filename beginning with *The* and ending in *Backup*, and it will capture the middle of the filename for later use. The second **(.\*)** will capture the file extension. The *new name* string can then re-use the captured text, and this is indicated with the **\1** and **\2** markers. So as an example, the original filename *The Lord Of The Rings Backup.avi* would be renamed to *Lord Of The Rings.avi*. **\1** refers to the first capture group, **\2** to the second, and so on.

If you need the *new name* string to contain a literal \ use two together. For example, *abc\xyz* will turn into *abc\xyz*.

When used with the **Rename** command only, the *old name* pattern can be followed with a **#** character to indicate that the search and replace operation should be repeated multiple times. For example, the following regular expression rename will remove all spaces from the filename:

**Old Name**: (.\*)\s(.\*)#\
**New Name**: \1\2

The **#** causes the search and replace to be repeated until the new name no longer changes. You can also specify a maximum repetition count by appending a number, for example **#5** at the end would repeat the operation no more than five times.

There are many different variants of regular expression; by default Opus uses what's called *TR1 ECMAScript*. Microsoft has a [page on TR1](http://www.gpsoft.com.au/DScripts/redirect.asp?page=regex) that goes into far more detail than this help file can.

| Token    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                                                                                            |                                                                                                                                 |                                                         |
| -------- | -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------ | ------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------- |
| **^**    | <p><strong>Start of a string</strong>.<br>The caret is used to "anchor" the search to the start of the string. If the search is not anchored to either end, the pattern can match a sub-string of the target.</p><p>For example:<br><strong>^abc</strong> matches <em>abc</em>, <em>abcdefg</em>, <em>abc123</em>, but not <em>123abc</em><br><strong>abc</strong> also matches// 123abc//</p>                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                            |                                                                                                                                 |                                                         |
| **$**    | <p><strong>End of a string</strong>.<br>The dollar sign is used to "anchor" the search to the end of the string. If the search is not anchored to either end, the pattern can match a sub-string of the target.</p><p>For example:<br><strong>abc$</strong> matches <em>abc</em>, <em>endsinabc</em>, <em>123abc</em>, but not <em>abc123</em><br><strong>^abc(.\*)123$</strong> matches <em>abc123</em>, <em>abcxyz123</em>, but not <em>abcxyz123def</em></p>                                                                                                                                                                                                                                                                                                                              |                                                                                            |                                                                                                                                 |                                                         |
| **.**    | <p><strong>Any single character</strong>.<br>The period (full stop) is used to match any single character.</p><p>For example:<br><strong>a.c</strong> matches <em>abc</em>, <em>aac</em>, <em>acc</em>, <em>adc</em> but not <em>acd</em></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                            |                                                                                                                                 |                                                         |
| \*       | <p><strong>0 or more of previous expression</strong>.<br>Matches zero or more occurrences of the previous expression. Combine with <strong>.</strong> to form the "match anything" token (<strong>.*</strong>).</p><p>For example:<br><strong>ab*c</strong> matches <em>ac</em>, <em>abc</em>, <em>abbc</em>, <em>abbbc</em>, ...<br><strong>a.*c</strong> matches <em>ac</em>, <em>abc</em>, <em>a123456c</em>, <em>aanythingc</em>, ...<br><strong>.*</strong> matches anything</p>                                                                                                                                                                                                                                                                                                        |                                                                                            |                                                                                                                                 |                                                         |
| **+**    | <p><strong>1 or more of previous expression</strong>.<br>Matches one or more occurrences of the previous expression.</p><p>For example:<br><strong>ab+c</strong> matches <em>abc</em>, <em>abbc</em>, <em>abbbc</em> but not <em>ac</em></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                            |                                                                                                                                 |                                                         |
| **?**    | <p><strong>0 or 1 of previous expression</strong>.<br>Matches either zero or one occurrence of the previous expression.</p><p>For example:<br><strong>ab?c</strong> matches <em>ac</em>, <em>abc</em> but not <em>abbc</em> or <em>abbbc</em></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            |                                                                                            |                                                                                                                                 |                                                         |
| **\|**   | <p><strong>Alternation (logical </strong><em><strong>or</strong></em><strong>).</strong><br>The vertical bar is used to separate two or more characters or expressions, any of which may match.</p><p>For example:<br><strong>a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              | b</strong> matches <em>a</em> or <em>b</em><br><strong>a(b                                 | c)d</strong> matches <em>abd</em> or <em>acd</em><br><strong>(bill                                                              | ted)</strong> matches <em>bill</em> or <em>ted</em></p> |
| **{}**   | <p><strong>Quantifier</strong>.<br>Braces are used to indicate that the preceding expression must match an exact number of times.</p><p>For example:<br><strong>ab{2}c</strong> matches <em>abbc</em>, but not <em>abc</em> or <em>abbbc</em><br><strong>a.{4}z</strong> matches <em>abcdez</em>, <em>a1234z</em>, <em>afourz</em>, <em>aaaaaz</em>, etc.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                            |                                                                                                                                 |                                                         |
| **\[]**  | <p><strong>Character set</strong>.<br>Matches any single character in the set of specified characters.<br>You can specify the character set as individual characters (e.g. <strong>\[abdfg]</strong>) or as a range of characters (e.g. <strong>\[a-j]</strong>) or as multiple ranges.</p><p>For example:<br><strong>\[abc]</strong> matches either <em>a</em>, <em>b</em> or <em>c</em><br><strong>\[af-j]</strong> matches either <em>a</em>, <em>f</em>, <em>g</em>, <em>h</em> or <em>j</em><br><strong>\[a-dh-kq-]</strong> matches <em>a</em>, <em>b</em>, <em>c</em>, <em>d</em>, <em>h</em>, <em>i</em>, <em>j</em>, <em>k</em>, or any character from <em>q</em> onwards<br><strong>IMGP\[0-9]{4}.jpg</strong> matches <em>IMGP0158.jpg</em> (or any other four-digit number).</p> |                                                                                            |                                                                                                                                 |                                                         |
| **\[^]** | <p><strong>Negative character set</strong>.<br>Matches any character <strong>not</strong> in the set of specified characters. See <strong>\[]</strong> for information on how the set is defined.</p><p>For example:<br><strong>\[^pqr]</strong> matches any character except <em>p</em>, <em>q</em> or <em>r</em></p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |                                                                                            |                                                                                                                                 |                                                         |
| **()**   | <p><strong>Expression / capture group</strong>.<br>Parentheses are used to combine multiple characters into an expression. When used in a "search and replace" like Advanced Rename, they also mark capture groups - see above for a discussion of these.</p><p>For example:<br><strong>a                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    | bc</strong> matches <em>ac</em> or <em>bc</em>, whereas<br><strong>a                       | (bc)</strong> matches <em>a</em> or <em>bc</em></p>                                                                             |                                                         |
| **\\**   | <p><strong>Escape character</strong>.<br>The backslash is used to escape token characters in order to match those characters literally.<br>When used before a non-token character, it is used to indicate the following special escape characters:<br>\<WRAP></p><p>\</WRAP>\<wrap clear/></p><p>It is also used to mark several character classes, which are shorthand ways to specify various common <strong>\[]</strong> character sets (see below).</p><p>For example:<br><strong>a                                                                                                                                                                                                                                                                                                      | b</strong> matches <em>a</em> or <em>b</em>, whereas<br><strong>a\b</strong> matches <em>a | b\</em> <strong>a</strong> matches <em>a</em> followed by a tab character, whereas<br><strong>a</strong> matches <em>a</em></p> |                                                         |
|          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                            |                                                                                                                                 |                                                         |
|          | tab character ($09)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |                                                                                            |                                                                                                                                 |                                                         |
|          | carriage return ($0d)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                            |                                                                                                                                 |                                                         |
| **\v**   | vertical tab ($0b)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |                                                                                            |                                                                                                                                 |                                                         |
| **\f**   | form feed ($0c)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              |                                                                                            |                                                                                                                                 |                                                         |
|          | new line ($0a)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                            |                                                                                                                                 |                                                         |
| **\e**   | escape ($1b)                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                            |                                                                                                                                 |                                                         |
| **\x**   | matches an ASCII character specified as a two-digit hexadecimal number, e.g. **\x20** matches a space                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                            |                                                                                                                                 |                                                         |
| **\u**   | matches a Unicode character specified as a four-digit hexadecimal number, e.g. **\u0020** matches a space.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |                                                                                            |                                                                                                                                 |                                                         |
| **\w**   | <p><strong>Word character</strong>.<br>Matches any word character. Equivalent to <strong>\[a-zA-Z\_0-9]</strong>.</p><p>For example:<br><strong>^\w+\[0-9]{4}.jpg</strong> matches <em>IMGP0158.jpg</em> (or any other four-digit number preceded by at least one other word character).</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                            |                                                                                                                                 |                                                         |
| **\W**   | <p><strong>Non-word character</strong>.<br>Matches any non-word character, equivalent to <strong>\[^a-zA-Z\_0-9]</strong>.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               |                                                                                            |                                                                                                                                 |                                                         |
| **\s**   | <p><strong>Space character</strong>.<br>Matches any whitespace character. Equivalent to <strong>\[ \f\n\r\t\v]</strong>.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |                                                                                            |                                                                                                                                 |                                                         |
| **\S**   | <p><strong>Non-space character</strong>.<br>Matches any non-whitespace character. Equivalent to <strong>\[^ \f\n\r\t\v]</strong>.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        |                                                                                            |                                                                                                                                 |                                                         |
| **\d**   | <p><strong>Digit character</strong>.<br>Matches any decimal digit. Equivalent to <strong>\[0-9]</strong>.</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                                                                                            |                                                                                                                                 |                                                         |
| **\D**   | <p><strong>Non-digit character</strong>.<br>Matches any non-decimal digit. Equivalent to <strong>\[^0-9]</strong>.</p><p>For example:<br><strong>^\D+\d{4}.jpg</strong> matches <em>IMGP0158.jpg</em> (or any other four-digit number preceded by at least one non-digit character).</p>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |                                                                                            |                                                                                                                                 |                                                         |
