Connect to the Purdue Home Page

Purdue University

Identity and Access Management

Basic UNIX Regular Expressions

The reflector can search for names using a key that is a UNIX basic regular expression. The PUIDNETD_DATA_RE ('r') field value gives the regular expression and some search limits. See Name Regular Expressions for more information on the composition of the regular expression field.

Case

The I2A2 reflector DBM ignores case in alphabetic characters when it matches a regular expression to the names it knows.

Formal Regular Expression Rules

These are the formal rules for forming basic UNIX regular expressions. For some simple examples, skip to Simple Examples. The rules come from the Solaris 7 regexp(5) manual page. The I2A2 reflector DBM follows these rules. In the rules RE stands for regular expression.

1.1 An ordinary character (not one of those discussed in 1.2 below) is a one-character RE that matches itself.

1.2 A backslash (\) followed by any special character is a one-character RE that matches the special character itself. The special characters are:

a. ., *, [, and \ (period, asterisk, left square bracket, and backslash, respectively), which are always special, except when they appear within square brackets ([]; see 1.4 below).

b. ^ (caret or circumflex), which is special at the beginning of an entire RE (see 4.1 and 4.3 below), or when it immediately follows the left of a pair of square brackets ([]) (see 1.4 below)

c. $ (dollar sign), which is special at the end of an entire RE (see 4.2 below).

1.3 A period (.) is a one-character RE that matches any character.

1.4 A non-empty string of characters enclosed in square brackets ([]) is a one-character RE that matches any one character in that string. If, however, the first character of the string is a circumflex (^), the one-character RE matches any character and the remaining characters in the string. The ^ has this special meaning only if it occurs first in the string. The minus (-) may be used to indicate a range of consecutive characters; for example, [0-9] is equivalent to [0123456789]. The - loses this special meaning if it occurs first (after an initial ^, if any) or last in the string. The right square bracket (]) does not terminate such a string when it is the first character within it (after an initial ^, if any); for example, []a-f] matches either a right square bracket (]) or one of the ASCII letters a through f inclusive. The four characters listed in 1.2.a above stand for themselves within such a string of characters.

The following rules may be used to construct REs from one-character REs:

2.1 A one-character RE is a RE that matches whatever the one-character RE matches.

2.2 A one-character RE followed by an asterisk (*) is a RE that matches 0 (zero) or more occurrences of the one-character RE. If there is any choice, the longest leftmost string that permits a match is chosen.

2.3 A one-character RE followed by \{m\}, \{m,\}, or \{m,n\} is a RE that matches a range of occurrences of the one-character RE. The values of m and n must be non-negative integers less than 256; \{m\} matches exactly m occurrences; \{m,\} matches at least m occurrences; \{m,n\} matches any number of occurrences between m and n inclusive. Whenever a choice exists, the RE matches as many occurrences as possible.

2.4 The concatenation of REs is a RE that matches the concatenation of the strings matched by each component of the RE.

2.5 A RE enclosed between the character sequences \( and \) is a RE that matches whatever the unadorned RE matches.

2.6 The expression \n matches the same string of characters as was matched by an expression enclosed between \( and \) earlier in the same RE. Here n is a digit; the sub-expression specified is that beginning with the n-th occurrence of \( counting from the left. For example, the expression ^\(.*\)\1$ matches a name consisting of two repeated appearances of the same string.

An RE may be constrained to match words.

3.1 \< constrains a RE to match the beginning of a string or to follow a character that is not a digit, underscore, or letter. The first character matching the RE must be a digit, underscore, or letter.

3.2 \> constrains a RE to match the end of a string or to precede a character that is not a digit, underscore, or letter.

An entire RE may be constrained to match only an initial segment or final segment of a name (or both).

4.1 A circumflex (^) at the beginning of an entire RE constrains that RE to match an initial segment of a name.

4.2 A dollar sign ($) at the end of an entire RE constrains that RE to match a final segment of a name.

4.3 The construction ^entire RE$ constrains the entire RE to match the entire name.

Characters with Special Meaning:

Characters that have special meaning except when they appear within square brackets ([]) or are preceded by \ are: ., *, [, \. Other special characters, such as $ have special meaning in more restricted contexts.

The character ^ at the beginning of an expression permits a successful match only at the beginning of the name, and the character $ at the end of an expression requires a match to the end of the name

.Two characters have special meaning only when used within square brackets. The character - denotes a range, [c-c], unless it is just after the open bracket or before the closing bracket, [-c] or [c-] in which case it has no special meaning. When used within brackets, the character ^ has the meaning complement of if it immediately follows the open bracket (example: [^c]); elsewhere between brackets (example: [c^]) it stands for the ordinary character ^.

The special meaning of the \ operator can be escaped only by preceding it with another \, for example \\.

Simple Examples - Here are some simple examples that attempt to explain some of the formal rules.

Match two characters at the beginning of the name, followed by any characters up to an ending "foobar".

^...*foobar$

Match a name that ends in "foobar" and has any beginning, including none.

foobar$

Match a name that begins with "John" and ends with "Smith".

^john.*smith$

Match a name that begins with 'j', has a 't' inside it and ends with a space, followed by "jones".

^j.*t.* jones$

Match all names beginning with "Stanislaus".

^stanislaus

Timing Considerations

Complicated regular expressions take longer to process and are more likely to encounter a search time limit.

For faster searches, anchor the start of the expression with a circumflex ('^') or its end with the dollar sign ('$').

Avoid the use of the wild card matching asterisk('*') where possible.

Feedback | Contact Purdue
Maintained by: IAMO Team

Purdue University, West Lafayette, IN 47907, (765) 494-4600
© 2010 - 2013 Purdue University | An equal access/equal opportunity university | Copyright Complaints
If you have trouble accessing this page because of a disability, please contact the CSC at itap@purdue.edu or (765) 494-4000.