Don't Fear the RegEx

Or how I learned to stop worrying and love the
[Rr]eg(ular)? ?[Ee]xp?(ression)?

Octocat
Chris Kankiewicz
Twitter

PHLAK

Chris Kankiewicz

@PHLAK

Definition

A regular expression, or "regex" for short,
is a set of literal and/or meta characters
that compose a search pattern
for matching strings.

Syntax

# ... # ... @ ... @
... ... ...

Patterns

Literal Text

Match one or more literal charaters

Pattern Will Match Won't Match
a a face A b 3 ! foo
bart bart Slartibartfast Bart lisa 123
42 42 134237 1337 24 asdf
K-2SO K-2SO C-3PO R2-D2

Dot

Match any single character (except newline)

Pattern Will Match Won't Match
. a b C D 1 2 @ & \r \n

Set

Match any character in a group or range

Pattern Will Match Won't Match
[Ab3] A b 3 a B 4
[0-9] 0 1 2 ... 9 a b C D
[a-z] a b c ... z A B C 1 2 3
[A-Z] A B C ... Z a b c 1 2 3
gr[ae]y gray grey groy

Negated Set

Match none of the characters in a group or range

Pattern Will Match Won't Match
[^Abc] a B C d E A b c
[^0-9] a b c 0 1 2 ... 9
[^Ff]ool cool pool Tool Fool fool

Alternation

Matches the first complete pattern from left to right

Pattern Will Match Won't Match
arthur|ford|zaphod arthurfordzaphod marvin
cat|dog food cat dog food cat food
(cat|dog) food cat food dog food hampster food

Anchors

Match the start or end of a string

Pattern Will Match Won't Match
^art art arthur dart particle
art$ art dart arthur particle
^art$ art arthur dart particle

Word Boundry

Match a position between a word character and non-word character

Pattern Will Match Won't Match
\bford ford prefect fordoing affordable oxford
ford\b ford prefect oxford affordable fordoing
\bford\b ford ford prefect affordable fordoing oxford

Quantifiers

Optional

Match zero or one of the preceding pattern

Pattern Will Match
vogons? vogon vogons
colou?r color colour

Repitition

Match zero, one or more of the preceding pattern

Pattern Will Match Wont Match
[A-Z][a-z]+ Arthur Ford A F
[A-Z][a-z]* Arthur Ford A F
0x[0-9A-F]+ 0x2A 0x1092 0x 0x1Q7X

Specific Quantity

Match specific quantities of the preceding pattern

Pattern Will Match Wont Match
H{2}GT{2}G HHGTTG HGTG HHHGTTTG
No{1,3}! No! Noo! Nooo! N! Noooo!

Greediness

By default quantifiers are greedy and will match as much text as they can.

Use ? to make a quantifier ungreedy (a.k.a. l.azy).

Subject Pattern Result
<strong>azPHP</strong> <.+> <strong>azPHP</strong>
<.+?> <strong>

Grouping

Groups

Groups allow you to apply quantifiers to a sub-pattern or to constrain alternation

Pattern Will Match
([Hh]o ?){3}! Ho ho ho!
(base|foot)ball baseball football

Capturing Groups

Captures the results of a sub-pattern as a numbered group

Pattern Subject Group Content
([A-Za-z]+) ([A-Za-z]+) Arthur Dent 0 Arthur Dent
1 Arthur
2 Dent

Non-capturing Groups

Group a sub-pattern without capturing the result

Pattern Subject Group Content
(?:[A-Za-z]+) ([A-Za-z]+) Ford Prefect 0 Ford Prefect
1 Prefect

Named Groups

Captures the results of a sub-pattern as a named group

Pattern Subject Group Content
(?<first>[A-Za-z]+) (?<last>[A-Za-z]+) Zaphod Beeblebrox 0 Zaphod Beeblebrox
first Zaphod
last Beeblebrox

Back References

Use the result of a previous group as a pattern

Pattern Will Match Wont Match
<([a-z]+)>.+?<\/\1> <strong>azPHP</strong>
<small>NodeAZ</small>
<strong>azPHP</stronk>

Lookarounds

Match a sub-pattern before/after the main expression without including it in the result.

Pattern Will Match Wont Match Result
[Aa]rthur(?= [Dd]ent) Arthur Dent Arthur Prefect Arthur
[Aa]rthur(?! [Dd]ent) Arthur Prefect Arthur Dent Arthur
(?<=[Ff]ord )[Pp]refect Ford Prefect Arthur Prefect Prefect
(?<![Ff]ord )[Pp]refect Arthur Prefect Ford Prefect Prefect

Regular ExPHPressions


                    preg_match( $pattern , $subject [, &$matches ] ): int
                

Searches $subject for a match to the regular expression given in $pattern.

If $matches is provided, then it is filled with the results of search.

Returns 1 if $pattern matchest given $subject, 0 if it does not.


                    $subject = "Sirius Cybernetics Corporation";

                    preg_match('/[Cc]([a-z]+)/', $subject, $matches); // 1

                    // $matches = [ "Cybernetics", "ybernetics" ]
                

                    preg_match_all( $pattern , $subject [, &$matches ] ): int
                

Searches $subject for all matches to the regular expression given in $pattern.

If $matches is provided, then it is filled with the results of search.

Returns the number of full pattern matches (could be 0).


                    $subject = 'How much wood would a woodchuck chuck if a woodchuck could chuck wood?';

                    preg_match_all('/wood([a-z]+)/', $subject, $matches); // 2

                    // $matches = [
                    //    [ "woodchuck", "woodchuck" ],
                    //    [ "chuck", "chuck" ],
                    // ]
                

                    preg_replace( $pattern , $replacement, $subject ): mixed
                

Searches $subject for matches to $pattern and replaces them with $replacement.

If matches are found, the new $subject will be returned,
otherwise $subject will be returned unchanged


                    preg_replace('/([Pp])ink/', '\1urple', 'John Pinkerton'); // John Purpleerton
                

                    preg_replace('/[^\d]/', null, '123-456-7890'); // 1234567890
                

                    preg_replace('#(\d{2})/(\d{2})/(\d{4})#', '\3-\1-\2', '05/20/1986'); // 1986-05-20
                

                    $pattern = ['/maroon/', '/gr[ae]y/', '/yellow|brown/'];
                    $replacement = ['red', 'silver', 'ugly'];

                    preg_replace($pattern, $replacement, 'The car was gray'); // The car was silver
                

                    preg_split( $pattern, $subject ): array
                

Split the given $subject by a string matched by $pattern.


                    preg_split('/[ .-]/', '1 123.456-7890'); // [ "1", "123", "456", "7890" ]
                

                    preg_quote( $str [, $delimiter ]): string
                

Escapes regular expression chracters from a string.

If $delimter is specified it will also be escaped.

Returns the escaped string.


                    preg_quote('[Foo]+Bar'); // \[Foo\]\+Bar
                

                    preg_quote('/path/to/file.txt', '/'); // \/path\/to\/file\.txt
                

Exercises

Phone Numbers

123-456-7890 1 (123) 456-7890 1234567890 +1 123 456 7890

456 7890 +42 12 34567890 123-456-7890 Ext. 1337

Time

5:30 AM 12:00PM 10:10 am 2:45pm 4 PM

13:00 AM 10:99 pm 13:37 1234 532 AM

URLs

www.google.com http://example.com/
https://regexr.com/5dubs https://www.azphp.org/#foobar https://api.example.net:1337/some/page?foo=bar&baz=qux ftp://user:password@fileserver.com

https://www.urlencoder.io/learn/

([a-z]+:\/\/)?([\w]+(:.+)?@)?[\w.~-]+(:\d+)?(\/[\w\/.~-]*)?(\?[\w=&.~-]*)?(#[\w.~-]*)?

https://regexcrossword.com

References