Python RegEx

Contents

Python RegEx

Introduction

Python RegEx (Regular Expressions) is a powerful tool used for pattern matching and text manipulation in Python programming language. It allows you to search, extract, and replace specific patterns in a string by using a set of rules and symbols. Python RegEx is widely used in web development, data analysis, and text processing, and learning it can greatly enhance your programming skills. In this article, we will explore the basics of Python RegEx and some of its most common use cases.

Important of Python RegEx

Python RegEx is important for several reasons:

Powerful pattern matching: Regular expressions provide a powerful and flexible way to match patterns in text, which makes them an essential tool for text processing and data analysis.
Efficient text processing: RegEx module in Python is optimized for efficient text processing, which makes it an ideal choice for handling large amounts of data.
Flexible search and replace: Regular expressions allow you to search for and replace specific patterns in text, which can save time and effort in text editing and data cleaning tasks.
Standardization: Regular expressions are a standard feature in most programming languages, so knowing how to use them in Python can be valuable when working with other languages or collaborating on projects.
Regular expression libraries: Python has a number of libraries that utilize regular expressions, including the re module, which makes it easy to integrate RegEx into your Python projects.

Overall, Python RegEx is a powerful tool that can help you efficiently and accurately process text data in your Python programs.

How to use Python RegEx ?

To use Python RegEx, you first need to import the re module, which provides functions and methods for working with regular expressions. Here’s an example of how to import the re module:


import re

Once you’ve imported the re module, you can use its functions and methods to work with regular expressions.

Here are some of the most commonly used functions and methods in the re module:

re.search(pattern, string):

– searches for the first occurrence of pattern in string and returns a match object if it finds one.

re.match(pattern, string):

– matches pattern at the beginning of string and returns a match object if it finds one.

re.findall(pattern, string):

– finds all occurrences of pattern in string and returns them as a list of strings.

re.sub(pattern, repl, string):

– searches for all occurrences of pattern in string and replaces them with repl.

Here’s an example of how to use the re.search() function to search for a pattern in a string:


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"

match = re.search(pattern, text)
if match:
    print("Found the pattern '{}' in the string: '{}'".format(pattern, text))
else:
    print("Did not find the pattern '{}' in the string: '{}'".format(pattern, text))

This code will output:


Found the pattern 'fox' in the string: 'The quick brown fox jumps over the lazy dog'

Ways to use Python RegEx with examples

Here are some ways to use Python RegEx with examples:

Search for a pattern in a string using re.search():


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"

match = re.search(pattern, text)
if match:
    print("Found the pattern '{}' in the string: '{}'".format(pattern, text))
else:
    print("Did not find the pattern '{}' in the string: '{}'".format(pattern, text))

Output:


Found the pattern 'fox' in the string: 'The quick brown fox jumps over the lazy dog'

Match a pattern at the beginning of a string using re.match():


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = "The"

match = re.match(pattern, text)
if match:
    print("Found the pattern '{}' at the beginning of the string: '{}'".format(pattern, text))
else:
    print("Did not find the pattern '{}' at the beginning of the string: '{}'".format(pattern, text))

Output:



Found the pattern 'The' at the beginning of the string: 'The quick brown fox jumps over the lazy dog'

Find all occurrences of a pattern in a string using re.findall():


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = "o"

matches = re.findall(pattern, text)
if matches:
    print("Found the pattern '{}' {} times in the string: '{}'".format(pattern, len(matches), text))
else:
    print("Did not find the pattern '{}' in the string: '{}'".format(pattern, text))

Output:


Found the pattern 'o' 4 times in the string: 'The quick brown fox jumps over the lazy dog'

Replace all occurrences of a pattern in a string using re.sub():


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"

new_text = re.sub(pattern, "cat", text)
print("Old string: '{}'".format(text))
print("New string: '{}'".format(new_text))

Output:


Old string: 'The quick brown fox jumps over the lazy dog'
New string: 'The quick brown cat jumps over the lazy dog'

Split a string into a list of substrings using re.split():


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = " "

words = re.split(pattern, text)
print("Words in the string: ", words)

Output:


Words in the string:  ['The', 'quick', 'brown', 'fox', 'jumps', 'over', 'the', 'lazy', 'dog']

Validate an email address using a regular expression:


import re

email = "example@example.com"
pattern = r"^[a-zA-Z0-9_.+-]+@[a-zA-Z0-9-]+\.[a-zA-Z0-9-.]+$"

if re.match(pattern, email):
    print("Valid email address:", email)
else:
    print("Invalid email address:", email)

Output:


Valid email address: example@example.com

Extract specific parts of a string using capturing groups:


import re

text = "John Smith (john@example.com)"
pattern = r"(\w+)\s(\w+)\s\((\w+@\w+\.\w+)\)"

match = re.search(pattern, text)
if match:
    print("Name:", match.group(1), match.group(2))
    print("Email:", match.group(3))
else:
    print("No match found")

Output:


Name: John Smith
Email: john@example.com

Replace matched patterns in a string using re.sub():


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = "fox"

new_text = re.sub(pattern, "cat", text)
print("New string: ", new_text)

Output:


New string: The quick brown cat jumps over the lazy dog

Find all occurrences of a pattern in a string using re.findall()


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = "the"

matches = re.findall(pattern, text, re.IGNORECASE)
print("Number of matches: ", len(matches))

So the Output:


Number of matches:  2

Validate a password with certain requirements using a regular expression:


import re

password = "MyPa55word"
pattern = r"^(?=.*[A-Z])(?=.*[a-z])(?=.*\d)[A-Za-z\d@$!%*#?&]{8,}$"

if re.match(pattern, password):
    print("Valid password")
else:
    print("Invalid password")

Output:


Valid password

Use lookaround assertions to find patterns that precede or follow a specified pattern without including them in the match:


import re

text = "The quick brown fox jumps over the lazy dog"
pattern = r"(?<=quick\s)(\w+)"

match = re.search(pattern, text)
if match:
    print("Word following 'quick':", match.group(1))
else:
    print("No match found")

Output:

Word following 'quick': brown
Word following 'quick': brown

In this example, the (?<=quick\s) portion of the pattern is a positive lookbehind assertion that matches any string that follows the string “quick “. The (\w+) portion of the pattern matches one or more word characters after the specified string.

Use named capturing groups to give specific names to captured groups:


import re

text = "John Smith (john@example.com)"
pattern = r"(?P\w+)\s(?P\w+)\s\((?P\w+@\w+\.\w+)\)"

match = re.search(pattern, text)
if match:
    print("Name:", match.group("first"), match.group("last"))
    print("Email:", match.group("email"))
else:
    print("No match found")

Output:


Name: John Smith
Email: john@example.com

In this example, the (?P\w+), (?P\w+), and (?P\w+@\w+\.\w+) portions of the pattern are named capturing groups that capture the first name, last name, and email address, respectively. These named groups can be accessed using the group() method with the name of the group as the argument. These are just a couple more examples of advanced techniques you can use with Python RegEx to make your regular expressions more powerful and expressive. With some creativity and practice, you can accomplish many tasks with regular expressions that might be difficult or impossible with other string manipulation methods.

Methods to Match Object

A Match object is returned by the match() and search() methods of the re module. It contains information about the search and the matched string.

Here are some attributes and methods of the Match object:

.group()

Returns the part of the string that was matched by the regular expression.

import re

string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)

print(f"Original string: {string}")
print(f"Match: {match.group()}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Match: brown

.start()

Returns the start index of the matched string in the original string.


import re

string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)

print(f"Original string: {string}")
print(f"Match start index: {match.start()}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Match start index: 10

. .end()

Returns the end index of the matched string in the original string.


import re

string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)

print(f"Original string: {string}")
print(f"Match end index: {match.end()}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Match end index: 15

. .span()

Returns a tuple containing the start and end index of the matched string in the original string.


import re

string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)

print(f"Original string: {string}")
print(f"Match start and end index: {match.span()}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Match start and end index: (10, 15)

. .groups() :

Returns a tuple containing all the captured groups in the match.


import re

string = "John Smith: 555-555-5555"
match = re.search(r"(\w+) (\w+): (\d{3}-\d{3}-\d{4})", string)

print(f"Original string: {string}")
print(f"Match groups: {match.groups()}")

Output:


Original string: John Smith: 555-555-5555
Match groups: ('John', 'Smith', '555-555-5555')

.groupdict()

Returns a dictionary containing all the named captured groups in the match.


import re

string = "John Smith: 555-555-5555"
match = re.search(r"(?P\w+) (?P\w+): (?P\d{3}-\d{3}-\d{4})", string)

print(f"Original string: {string}")
print(f"Match group dictionary: {match.groupdict()}")

Output :


Original string: John Smith: 555-555-5555
Match group dictionary: {'first_name': 'John', 'last_name': 'Smith', 'phone': '555-555-5555'}

. .string

Returns the string that was searched.


import re

string = "The quick brown fox jumps over the lazy dog"
match = re.search(r"brown", string)

print(f"Original string: {string}")
print(f"Searched string: {match.string}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Searched string: The quick brown fox jumps over the lazy dog

.re

Returns the re object that was used to create the Match object.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string)

print(f"Original string: {string}")
print(f"Regex object: {match.re}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Regex object: re.compile('brown')

.pos

Returns the start position of the search.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string, 10)

print(f"Original string: {string}")
print(f"Start position of the search: {match.pos}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Start position of the search: 10

.endpos

Returns the end position of the search.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string, 0, 20)

print(f"Original string: {string}")
print(f"End position of the search: {match.endpos}")

Output:


Original string: The quick brown fox jumps over the lazy dog
End position of the search: 20

.lastindex

Returns the index of the last matched capturing group.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(\w+) (\w+) (\w+)")
match = regex.search(string)

print(f"Original string: {string}")
print(f"Index of the last matched capturing group: {match.lastindex}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Index of the last matched capturing group: 3

.lastgroup

Returns the name of the last matched capturing group.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(?P\w+) (?P\w+) (?P\w+)")
match = regex.search(string)

print(f"Original string: {string}")
print(f"Name of the last matched capturing group: {match.lastgroup}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Name of the last matched capturing group: third

.group()

Returns the entire match.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"brown")
match = regex.search(string)

print(f"Original string: {string}")
print(f"Entire match: {match.group()}")

Output:


Original string: The quick brown fox jumps over the lazy dog
Entire match: brown

.group(n)

Returns the nth capturing group.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(\w+) (\w+) (\w+)")
match = regex.search(string)

print(f"Original string: {string}")
print(f"First capturing group: {match.group(1)}")
print(f"Second capturing group: {match.group(2)}")
print(f"Third capturing group: {match.group(3)}")

Output:



Original string: The quick brown fox jumps over the lazy dog
First capturing group: The
Second capturing group: quick
Third capturing group: brown

. .group(name)

Returns the capturing group with the specified name.


import re

string = "The quick brown fox jumps over the lazy dog"
regex = re.compile(r"(?P\w+) (?P\w+) (?P\w+)")
match = regex.search(string)

print(f"Original string: {string}")
print(f"First capturing group: {match.group('first')}")
print(f"Second capturing group: {match.group('second')}")
print(f"Third capturing group: {match.group('third')}")

Output:


Original string: The quick brown fox jumps over the lazy dog
First capturing group: The
Second capturing group: quick
Third capturing group: brown

Quiz about python RegEx Module

part1

What does the search() method in the RegEx module do?

a) It returns all matches in a string

b) It returns the first match in a string

c) It replaces matches in a string with a specified string

d) It splits a string into a list based on a specified separator

Answer: b) It returns the first match in a string

What does the findall() method in the RegEx module do?

a) It returns all matches in a string

b) It returns the first match in a string

c) It replaces matches in a string with a specified string

d) It splits a string into a list based on a specified separator

Answer: a) It returns all matches in a string

What does the sub() method in the RegEx module do?

a) It returns all matches in a string

b) It returns the first match in a string

c) It replaces matches in a string with a specified string

d) It splits a string into a list based on a specified separator

Answer: c) It replaces matches in a string with a specified string

What metacharacter is used to match any single character?

a) * b) + c) . d) ?

Answer: c) .

What metacharacter is used to match the end of a string?

a) $ b) ^ c) * d) +

Answer: a) $

What set of characters can be used to match any digit?

a) \d b) \s c) \w d) \

Answer: a) \d

What set of characters can be used to match any whitespace character?

a) \d b) \s c) \w d) \

Answer: b) \s

What set of characters can be used to match any word character?

a) \d b) \s c) \w d) \

Answer: c) \w

What method returns a match object if there is a match anywhere in the string?

a) findall() b) search() c) split() d) sub()

Answer: b) search()

What method can be used to split a string into a list based on a specified separator?

a) findall() b) search() c) split() d) sub()

Answer: c) split()

part2

What does the compile() function in the RegEx module do?

a) It searches for a pattern in a string

b) It returns all matches in a string

c) It compiles a regular expression pattern into a pattern object

d) It splits a string into a list based on a specified separator

Answer: c) It compiles a regular expression pattern into a pattern object

What does the group() method of a match object in the RegEx module do?

a) It returns the entire matched string

b) It returns the position of the match in the original string

c) It returns a tuple containing all matched subgroups

d) It returns the number of matches in the string

Answer: a) It returns the entire matched string

What does the finditer() function in the RegEx module do?

a) It searches for a pattern in a string

b) It returns all matches in a string

c) It replaces matches in a string with a specified string

d) It returns an iterator yielding match objects for all non-overlapping matches

Answer: d) It returns an iterator yielding match objects for all non-overlapping matches

What metacharacter is used to match any character except a newline?

a) b) . c) * d) +

Answer: b) .

What metacharacter is used to match zero or more occurrences of the preceding character or group?

a) * b) + c) . d) ?

Answer: a) *

What metacharacter is used to match one or more occurrences of the preceding character or group?

a) * b) + c) . d) ?

Answer: b) +

What set of characters can be used to match any non-digit character?

a) \D b) \S c) \W d) \

Answer: a) \D

What set of characters can be used to match any non-whitespace character?

a) \D b) \S c) \W d) \

Answer: b) \S

What set of characters can be used to match any non-word character?

a) \D b) \S c) \W d) \

Answer: c) \W

What method can be used to replace a specified number of occurrences of a pattern in a string with a specified string?

a) findall() b) search() c) split() d) subn()

Answer: d) subn()

part3

What does the search() function in the RegEx module do?

a) It searches for a pattern in a string

b) It returns all matches in a string c) It compiles a regular expression pattern into a pattern object

d) It replaces matches in a string with a specified string

Answer: a) It searches for a pattern in a string

What does the split() function in the RegEx module do?

a) It searches for a pattern in a string

b) It returns all matches in a string

c) It splits a string into a list based on a specified separator

d) It compiles a regular expression pattern into a pattern object

Answer: c) It splits a string into a list based on a specified separator

What does the findall() function in the RegEx module do?

a) It searches for a pattern in a string

b) It returns all matches in a string

c) It compiles a regular expression pattern into a pattern object

d) It splits a string into a list based on a specified separator

Answer: b) It returns all matches in a string

What metacharacter is used to match the start of a string? a) ^ b) $ c)
d) .

Answer: a) ^

What metacharacter is used to match the end of a string? a) ^ b) $ c) d) .

Answer: b) $

What metacharacter is used to match any whitespace character? a) b) . c) * d) \s

Answer: d) \s

What metacharacter is used to match any digit character? a) \d b) \D c) \s d) \

Answer: a) \d

What set of characters can be used to match any word character? a) \d b) \w c) \W d) \s

Answer: b) \w

What set of characters can be used to match any character at the beginning or end of a word? a) \b b) \B c)
d) .

Answer: a) \b

What method can be used to replace all occurrences of a pattern in a string with a specified string? a) findall() b) search() c) split() d) sub()

Answer: d) sub()

1-What function can be used to compile a regular expression pattern into a pattern object? a) findall() b) search() c) split() d) compile()

Answer: d) compile()

2-What does the match() function in the RegEx module do? a) It searches for a pattern in a string b) It returns all matches in a string c) It compiles a regular expression pattern into a pattern object d) It matches a pattern at the beginning of a string

Answer: d) It matches a pattern at the beginning of a string

3-What metacharacter is used to match zero or one occurrence of the preceding character? a) * b) + c) ? d) .

Answer: c) ?

4-What metacharacter is used to match one or more occurrences of the preceding character? a) * b) + c) ? d) .

Answer: b) +

5-What metacharacter is used to match zero or more occurrences of the preceding character? a) * b) + c) ? d) .

Answer: a) *

6-What set of characters can be used to match any character that is not a digit? a) \d b) \D c) \s d) \S

Answer: b) \D

7-What set of characters can be used to match any character that is not a whitespace character? a) \s b) \S c) \d d) \

Answer: b) \S

8-What metacharacter is used to match any character except a newline? a)
b) . c) * d) \

Answer: b) .

9-What function can be used to split a string into a list of substrings based on a regular expression pattern? a) search() b) match() c) split() d) findall()

Answer: c) split()

10-What method can be used to retrieve the start and end position of the match in a string? a) span() b) start() c) end() d) All of the above

Answer: d) All of the above

Python course

Curriculum

Python RegEx

Python RegEx

Important of Python RegEx

How to use Python RegEx ?

re.search(pattern, string):

re.match(pattern, string):

re.findall(pattern, string):

re.sub(pattern, repl, string):

Ways to use Python RegEx with examples

Search for a pattern in a string using re.search():

Match a pattern at the beginning of a string using re.match():

Find all occurrences of a pattern in a string using re.findall():

Replace all occurrences of a pattern in a string using re.sub():

Split a string into a list of substrings using re.split():

Validate an email address using a regular expression:

Extract specific parts of a string using capturing groups:

Replace matched patterns in a string using re.sub():

Find all occurrences of a pattern in a string using re.findall()

Validate a password with certain requirements using a regular expression:

Use lookaround assertions to find patterns that precede or follow a specified pattern without including them in the match:

Use named capturing groups to give specific names to captured groups:

Methods to Match Object

.group()

.start()

. .end()

. .span()

. .groups() :

.groupdict()

. .string

.re

.pos

.endpos

.lastindex

.lastgroup

.group()

.group(n)

. .group(name)

Quiz about python RegEx Module

part1

part2

part3

Leave a Reply

Modal title