Introduction
Python RegEx (Regular Expressions) is a powerful tool used for pattern matching and text manipulation in Python programming language. It allows you to search, extract, and replace specific patterns in a string by using a set of rules and symbols. Python RegEx is widely used in web development, data analysis, and text processing, and learning it can greatly enhance your programming skills. In this article, we will explore the basics of Python RegEx and some of its most common use cases.
Python RegEx is important for several reasons:
Overall, Python RegEx is a powerful tool that can help you efficiently and accurately process text data in your Python programs.
To use Python RegEx, you first need to import the re
module, which provides functions and methods for working with regular expressions. Here’s an example of how to import the re
module:
import re
Once you’ve imported the re module, you can use its functions and methods to work with regular expressions.
Here are some of the most commonly used functions and methods in the re module:
– searches for the first occurrence of pattern in string and returns a match object if it finds one.
– matches pattern at the beginning of string and returns a match object if it finds one.
– finds all occurrences of pattern in string and returns them as a list of strings.
– searches for all occurrences of pattern in string and replaces them with repl.
Here’s an example of how to use the re.search() function to search for a pattern in a string:
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
match = re.search(pattern, text)
if match:
print("Found the pattern '{}' in the string: '{}'".format(pattern, text))
else:
print("Did not find the pattern '{}' in the string: '{}'".format(pattern, text))
This code will output:
Found the pattern 'fox' in the string: 'The quick brown fox jumps over the lazy dog'
Here are some ways to use Python RegEx with examples:
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
match = re.search(pattern, text)
if match:
print("Found the pattern '{}' in the string: '{}'".format(pattern, text))
else:
print("Did not find the pattern '{}' in the string: '{}'".format(pattern, text))
Output:
Found the pattern 'fox' in the string: 'The quick brown fox jumps over the lazy dog'
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "The"
match = re.match(pattern, text)
if match:
print("Found the pattern '{}' at the beginning of the string: '{}'".format(pattern, text))
else:
print("Did not find the pattern '{}' at the beginning of the string: '{}'".format(pattern, text))
Output:
Found the pattern 'The' at the beginning of the string: 'The quick brown fox jumps over the lazy dog'
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "o"
matches = re.findall(pattern, text)
if matches:
print("Found the pattern '{}' {} times in the string: '{}'".format(pattern, len(matches), text))
else:
print("Did not find the pattern '{}' in the string: '{}'".format(pattern, text))
Output:
Found the pattern 'o' 4 times in the string: 'The quick brown fox jumps over the lazy dog'
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
new_text = re.sub(pattern, "cat", text)
print("Old string: '{}'".format(text))
print("New string: '{}'".format(new_text))
Output:
Old string: 'The quick brown fox jumps over the lazy dog'
New string: 'The quick brown cat jumps over the lazy dog'
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = " "
words = re.split(pattern, text)
print("Words in the string: ", words)
Output:
Words in the string: ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']
import re
email = "example@example.com"
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"
if re.match(pattern, email):
print("Valid email address:", email)
else:
print("Invalid email address:", email)
Output:
Valid email address: example@example.com
import re
text = "John Smith (john@example.com)"
pattern = r"(\w+)\s(\w+)\s\((\w+@\w+\.\w+)\)"
match = re.search(pattern, text)
if match:
print("Name:", match.group(1), match.group(2))
print("Email:", match.group(3))
else:
print("No match found")
Output:
Name: John Smith
Email: john@example.com
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"
new_text = re.sub(pattern, "cat", text)
print("New string: ", new_text)
Output:
New string: The quick brown cat jumps over the lazy dog
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = "the"
matches = re.findall(pattern, text, re.IGNORECASE)
print("Number of matches: ", len(matches))
So the Output:
Number of matches: 2
import re
password = "MyPa55word"
pattern = r"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)[A-Za-z\d@$!%*#?&]{8,}$"
if re.match(pattern, password):
print("Valid password")
else:
print("Invalid password")
Output:
Valid password
import re
text = "The quick brown fox jumps over the lazy dog"
pattern = r"(?<=quick\s)(\w+)"
match = re.search(pattern, text)
if match:
print("Word following 'quick':", match.group(1))
else:
print("No match found")
Output:
Word following 'quick': brown
Word following 'quick': brown
In this example, the (?<=quick\s) portion of the pattern is a positive lookbehind assertion that matches any string that follows the string “quick “. The (\w+) portion of the pattern matches one or more word characters after the specified string.
import re
text = "John Smith (john@example.com)"
pattern = r"(?P\w+)\s(?P\w+)\s\((?P\w+@\w+\.\w+)\)"
match = re.search(pattern, text)
if match:
print("Name:", match.group("first"), match.group("last"))
print("Email:", match.group("email"))
else:
print("No match found")
Output:
Name: John Smith
Email: john@example.com
In this example, the (?P\w+), (?P\w+), and (?P\w+@\w+\.\w+) portions of the pattern are named capturing groups that capture the first name, last name, and email address, respectively. These named groups can be accessed using the group() method with the name of the group as the argument. These are just a couple more examples of advanced techniques you can use with Python RegEx to make your regular expressions more powerful and expressive. With some creativity and practice, you can accomplish many tasks with regular expressions that might be difficult or impossible with other string manipulation methods.
A Match object is returned by the match() and search() methods of the re module. It contains information about the search and the matched string.
Here are some attributes and methods of the Match object:
Returns the part of the string that was matched by the regular expression.
import re
string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)
print(f"Original string: {string}")
print(f"Match: {match.group()}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Match: brown
Returns the start index of the matched string in the original string.
import re
string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)
print(f"Original string: {string}")
print(f"Match start index: {match.start()}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Match start index: 10
Returns the end index of the matched string in the original string.
import re
string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)
print(f"Original string: {string}")
print(f"Match end index: {match.end()}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Match end index: 15
Returns a tuple containing the start and end index of the matched string in the original string.
import re
string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)
print(f"Original string: {string}")
print(f"Match start and end index: {match.span()}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Match start and end index: (10, 15)
Returns a tuple containing all the captured groups in the match.
import re
string = "John Smith: 555-555-5555"
match = re.search(r"(\w+) (\w+): (\d{3}-\d{3}-\d{4})", string)
print(f"Original string: {string}")
print(f"Match groups: {match.groups()}")
Output:
Original string: John Smith: 555-555-5555
Match groups: ('John', 'Smith', '555-555-5555')
Returns a dictionary containing all the named captured groups in the match.
import re
string = "John Smith: 555-555-5555"
match = re.search(r"(?P\w+) (?P\w+): (?P\d{3}-\d{3}-\d{4})", string)
print(f"Original string: {string}")
print(f"Match group dictionary: {match.groupdict()}")
Output :
Original string: John Smith: 555-555-5555
Match group dictionary: {'first_name': 'John', 'last_name': 'Smith', 'phone': '555-555-5555'}
Returns the string that was searched.
import re
string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)
print(f"Original string: {string}")
print(f"Searched string: {match.string}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Searched string: The quick brown fox jumps over the lazy dog
Returns the re object that was used to create the Match object.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string)
print(f"Original string: {string}")
print(f"Regex object: {match.re}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Regex object: re.compile('brown')
Returns the start position of the search.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string, 10)
print(f"Original string: {string}")
print(f"Start position of the search: {match.pos}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Start position of the search: 10
Returns the end position of the search.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string, 0, 20)
print(f"Original string: {string}")
print(f"End position of the search: {match.endpos}")
Output:
Original string: The quick brown fox jumps over the lazy dog
End position of the search: 20
Returns the index of the last matched capturing group.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(\w+) (\w+) (\w+)")
match = regex.search(string)
print(f"Original string: {string}")
print(f"Index of the last matched capturing group: {match.lastindex}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Index of the last matched capturing group: 3
Returns the name of the last matched capturing group.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(?P\w+) (?P\w+) (?P\w+)")
match = regex.search(string)
print(f"Original string: {string}")
print(f"Name of the last matched capturing group: {match.lastgroup}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Name of the last matched capturing group: third
Returns the entire match.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string)
print(f"Original string: {string}")
print(f"Entire match: {match.group()}")
Output:
Original string: The quick brown fox jumps over the lazy dog
Entire match: brown
Returns the nth capturing group.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(\w+) (\w+) (\w+)")
match = regex.search(string)
print(f"Original string: {string}")
print(f"First capturing group: {match.group(1)}")
print(f"Second capturing group: {match.group(2)}")
print(f"Third capturing group: {match.group(3)}")
Output:
Original string: The quick brown fox jumps over the lazy dog
First capturing group: The
Second capturing group: quick
Third capturing group: brown
Returns the capturing group with the specified name.
import re
string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(?P\w+) (?P\w+) (?P\w+)")
match = regex.search(string)
print(f"Original string: {string}")
print(f"First capturing group: {match.group('first')}")
print(f"Second capturing group: {match.group('second')}")
print(f"Third capturing group: {match.group('third')}")
Output:
Original string: The quick brown fox jumps over the lazy dog
First capturing group: The
Second capturing group: quick
Third capturing group: brown
search()
method in the RegEx module do? a) It returns all matches in a string
b) It returns the first match in a string
c) It replaces matches in a string with a specified string
d) It splits a string into a list based on a specified separator
Answer: b) It returns the first match in a string
findall()
method in the RegEx module do? a) It returns all matches in a string
b) It returns the first match in a string
c) It replaces matches in a string with a specified string
d) It splits a string into a list based on a specified separator
Answer: a) It returns all matches in a string
sub()
method in the RegEx module do? a) It returns all matches in a string
b) It returns the first match in a string
c) It replaces matches in a string with a specified string
d) It splits a string into a list based on a specified separator
Answer: c) It replaces matches in a string with a specified string
a) * b) + c) . d) ?
Answer: c) .
a) $ b) ^ c) * d) +
Answer: a) $
a) \d b) \s c) \w d) \
Answer: a) \d
a) \d b) \s c) \w d) \
Answer: b) \s
a) \d b) \s c) \w d) \
Answer: c) \w
a) findall() b) search() c) split() d) sub()
Answer: b) search()
a) findall() b) search() c) split() d) sub()
Answer: c) split()
compile()
function in the RegEx module do? a) It searches for a pattern in a string
b) It returns all matches in a string
c) It compiles a regular expression pattern into a pattern object
d) It splits a string into a list based on a specified separator
Answer: c) It compiles a regular expression pattern into a pattern object
group()
method of a match object in the RegEx module do?a) It returns the entire matched string
b) It returns the position of the match in the original string
c) It returns a tuple containing all matched subgroups
d) It returns the number of matches in the string
Answer: a) It returns the entire matched string
finditer()
function in the RegEx module do? a) It searches for a pattern in a string
b) It returns all matches in a string
c) It replaces matches in a string with a specified string
d) It returns an iterator yielding match objects for all non-overlapping matches
Answer: d) It returns an iterator yielding match objects for all non-overlapping matches
Answer: b) .
a) * b) + c) . d) ?
Answer: a) *
a) * b) + c) . d) ?
Answer: b) +
a) \D b) \S c) \W d) \
Answer: a) \D
a) \D b) \S c) \W d) \
Answer: b) \S
a) \D b) \S c) \W d) \
Answer: c) \W
a) findall() b) search() c) split() d) subn()
Answer: d) subn()
search()
function in the RegEx module do? a) It searches for a pattern in a string
b) It returns all matches in a string c) It compiles a regular expression pattern into a pattern object
d) It replaces matches in a string with a specified string
Answer: a) It searches for a pattern in a string
split()
function in the RegEx module do? a) It searches for a pattern in a string
b) It returns all matches in a string
c) It splits a string into a list based on a specified separator
d) It compiles a regular expression pattern into a pattern object
Answer: c) It splits a string into a list based on a specified separator
findall()
function in the RegEx module do? a) It searches for a pattern in a string
b) It returns all matches in a string
c) It compiles a regular expression pattern into a pattern object
d) It splits a string into a list based on a specified separator
Answer: b) It returns all matches in a string
Answer: a) ^
Answer: b) $
Answer: d) \s
Answer: a) \d
Answer: b) \w
Answer: a) \b
Answer: d) sub()
1-What function can be used to compile a regular expression pattern into a pattern object? a) findall() b) search() c) split() d) compile()
Answer: d) compile()
2-What does the match() function in the RegEx module do? a) It searches for a pattern in a string b) It returns all matches in a string c) It compiles a regular expression pattern into a pattern object d) It matches a pattern at the beginning of a string
Answer: d) It matches a pattern at the beginning of a string
3-What metacharacter is used to match zero or one occurrence of the preceding character? a) * b) + c) ? d) .
Answer: c) ?
4-What metacharacter is used to match one or more occurrences of the preceding character? a) * b) + c) ? d) .
Answer: b) +
5-What metacharacter is used to match zero or more occurrences of the preceding character? a) * b) + c) ? d) .
Answer: a) *
6-What set of characters can be used to match any character that is not a digit? a) \d b) \D c) \s d) \S
Answer: b) \D
7-What set of characters can be used to match any character that is not a whitespace character? a) \s b) \S c) \d d) \
Answer: b) \S
8-What metacharacter is used to match any character except a newline? a)
b) . c) * d) \
Answer: b) .
9-What function can be used to split a string into a list of substrings based on a regular expression pattern? a) search() b) match() c) split() d) findall()
Answer: c) split()
10-What method can be used to retrieve the start and end position of the match in a string? a) span() b) start() c) end() d) All of the above
Answer: d) All of the above