Regular Expressions, referred to as regex, is a pattern matching search string that proves to be a powerful tool to have under your toolbelt.
anytime you want to deal with regular expressions, test each expression you're attempting with this tool
# re module in python deals with regular expressions
import re
pattern = r'i l\wve to l\wve'
word_one = 'i love to live'
word_two = 'i live to love'
mismatch = 'i l-ve to love'
re_pattern = re.compile(pattern)
print(re_pattern.match(word_one))
print(re_pattern.match(word_two))
print(re_pattern.match(mismatch))
escaping '.' with '\' ensures it captures the literal '.'
pattern = '\d\d\d\.\d\d\d\.\d\d\d\d'
phone_one = '718.777.3143'
mismatch = '83s.382.sa32'
re_pattern = re.compile(pattern)
print(re_pattern.match(phone_one))
print(re_pattern.match(mismatch))
"""\w\w\w\w\s\d\d\d\d"""
pattern = "\w\w\w\w\s\d\d\d\d"
word_one = 'abcd 0321'
# \w matches numbers as well
word_two = 'abc3 0321'
mismatch = 'a-dvd afd-3'
re_pattern = re.compile(pattern)
print(re_pattern.match(word_one))
print(re_pattern.match(word_two))
print(re_pattern.match(mismatch))
pattern = "my favorite character is ."
word_one = 'my favorite character is x'
word_two = 'my favorite character is ?'
word_three = 'my favorite character is 3'
word_four = 'my favorite character is '
re_pattern = re.compile(pattern)
print(re_pattern.match(word_one))
print(re_pattern.match(word_two))
print(re_pattern.match(word_three))
print(re_pattern.match(word_four))
character sets are sets of characters kept in [] braces, that match any character in the set.
note that any special interpretations that character might have is null, and the raw character is instead parsed
"""[abcdefghijklmnopqrstuvwxyz123456789'.,/!]"""
shortcut:
"""[a-z0-9'.,/!]"""
is equivalent to matching any letter from a to z, or any number from 0 to 9, or any of the characters '.,/!
pattern = "i love lock[es]"
word_one = 'i love locke'
word_two = 'i love locks'
word_three = 'i love locka'
re_pattern = re.compile(pattern)
print(re_pattern.match(word_one))
print(re_pattern.match(word_two))
print(re_pattern.match(word_three))
{n, m} matches from n to m of the previous pattern/group
"""\w{4}\s\d{4}"""
is equivalent to
"""\w\w\w\w\s\d\d\d\d"""
+ is equivalent to {1, }
"(ab|ef)an"
matches "aban" or "efan"
pattern = '\d{3}\.\d{3}\.\d{4}'
phone = '718.777.7777'
phone_two = '718.777.7d77'
p = re.compile(pattern)
print(p.match(phone))
print(p.match(phone_two))
pattern = "(iv|eth)an"
word = "ivan"
word_two = "ethan"
p = re.compile(pattern)
print(p.match(word))
print(p.match(word_two))
groups are subsets of patterns that you might want to reference again if you are interested in a certain subset of the string rather than the entire match
groups can be "captured" with parenthesis
"""(\w{4}) pin:(\d{4})"""
this captures the first four characters of a matched pattern of type
"""\w\w\w\w pin:\d\d\d\d"""
as well as the final four digits
# capture id of students
pattern = '\w+?\s\w+?: (\d{4})'
s_one = 'salah ahmed: 1823'
s_two = 'jon stewart: 8421'
s_three = 'john mulaney: 3824'
p = re.compile(pattern)
print(p.match(s_one).groups())
print(p.match(s_two).groups())
print(p.match(s_three).groups())
pattern = '(salah)(?=!)'
w_one = 'salah?'
w_two = 'salah!'
p = re.compile(pattern)
print(p.match(w_one))
print(p.match(w_two))
"""
"port"
but not
"opportunity"
and
"\Bport\B"
matches
"opportunity"
but not
"port"
```
"""
pattern = "\Bport\B"
p = re.compile(pattern)
word = """port"""
match = """opportunity"""
print(p.search(word))
print(p.search(match))