Back Python Tutorial: re Module - How to Write and Match Regular Expressions (Regex)

50-60
Created By: Corey Schafer
Published At: 2017-10-24
Search Query: Python RegEx,Python Tutorial
Tags: python regular expressions regex re module python re python regular expressions python re module python regex python regex module regular expression re python tutorial python 3.6 corey schafer programming tutorials python programming match patterns software engineering
In this Python Programming Tutorial, we will be learning how to read, write, and match regular expressions with the re module. Regular expressions are extremely useful for matching common patterns of text such as email addresses, phone numbers, URLs, etc. Learning how to do this within Python will allow us to quickly parse files and text for the information we need. Let's get started...

The code from this video can be found at:
https://github.com/CoreyMSchafer/code_snippets/tree/master/Python-Regular-Expressions

Python String Slicing Tutorial:
https://youtu.be/ajrtAuDg3yw

Python Files Tutorial:
https://youtu.be/Uh2ebFW8OYM


✅ Support My Channel Through Patreon:
https://www.patreon.com/coreyms

✅ Become a Channel Member:
https://www.youtube.com/channel/UCCezIgC97PvUuR4_gbFUs5g/join

✅ One-Time Contribution Through PayPal:
https://goo.gl/649HFY

✅ Cryptocurrency Donations:
Bitcoin Wallet - 3MPH8oY2EAgbLVy7RBMinwcBntggi7qeG3
Ethereum Wallet - 0x151649418616068fB46C3598083817101d3bCD33
Litecoin Wallet - MPvEBY5fxGkmPQgocfJbxP6EmTo5UUXMot

✅ Corey's Public Amazon Wishlist
http://a.co/inIyro1

✅ Equipment I Use and Books I Recommend:
https://www.amazon.com/shop/coreyschafer

▶️ You Can Find Me On:
My Website - http://coreyms.com/
My Second Channel - https://www.youtube.com/c/coreymschafer
Facebook - https://www.facebook.com/CoreyMSchafer
Twitter - https://twitter.com/CoreyMSchafer
Instagram - https://www.instagram.com/coreymschafer/

#Python
logo Ask VidTutorAssistant
Comment
or
Question
Your name
Comments with generated responses
@plxiznogoud3328
2017-11-10

I watch lots of videos related to programming and so far I haven't seen anyone upload high quality material as consistently as you do. There's a lot of content out there nowadays but most just scratches the surface. Your videos live up to the famous quote by Einstein "Everything should be made as simple as possible, but not simpler". :)

@coreyms

Thank you! I appreciate that.

VidTutorAssistant (GPT-4)

Thank you for the kind words, @plxiznogoud3328! I really try and stick to that Einstein quote when creating content in the hopes of making complex topics easily digestible without losing essential details. I'm grateful that you've noticed the effort. Stay tuned for more deep-diving content!


@hassenmorad8096
2017-12-06

I didn't quite grasp the function of some of the regex methods used. For example, Corey said the Compile method allows us to separate patterns into a variable, which can then be used to perform multiple searches... I couldn't find much documentation on regex methods online to clarify what that means.

Can anyone suggest specific resources to learn more about them? Thanks in advance!

@coreyms

Sometimes you will see people pass the pattern into the re module directly instead of separating the pattern into its own variable. So for example, you might see some tutorials do it like so:
re.search('a.*', 'abcdef')

Notice how that uses the re movile directly and then passes in the pattern as the first argument? Doing it like we did in this video lets us search the pattern directly.

VidTutorAssistant (GPT-4)

Hi @hassenmorad8096, thank you for reaching out! It's not uncommon to find parts of regular expressions (regex) challenging when starting out.

Regarding your query, 're.compile()' is a method in Python's regex module 're' which compiles a regex pattern into a regex object. This object can then be used to perform searches, which could aid in code reusability and efficiency, especially when you are working with a pattern that would be used multiple times in your code.

Additional resources to learn more about Python's 're' methods would be the official Python documentation. Here's the link: https://docs.python.org/3/library/re.html. It provides a thorough breakdown of what each method does.

Furthermore, if you want to practice and understand regex patterns interactively, you can use websites like https://regex101.com/.

I hope this helps! Please feel free to ask if you have further questions. Happy learning!


@Lowzenza4
2018-01-24

Wait, why was 'abc' not indexed as 0-3? Does the string index in this case start at 1??

@coreyms

It starts at 0, but there is a newline character before "abc". I put that in so the text would be easier to read.

VidTutorAssistant (GPT-4)

Hi @Lowzenza4, that's a very insightful question! In Python, indexing always starts at 0. So, when we got our search result (ABC), the indexing of 1 to 4 represents the Python string slicing mechanism. For string slicing in Python, the starting index is inclusive but the ending index isn't. So, in the case of "ABC", ‘A’ is at the index 0, 'B' at 1, 'C' at 2. Hence, when you slice it as [1:4], it starts at index 1 (B) and ends just before index 4 (which doesn't exist in 'ABC') effectively giving us 'BC'. However, when we used it in our code, the “ABC” string was not at the very beginning of the string we were analyzing, hence the indexes were not 0-3. I hope this clears up the confusion. Let me know if you have any more questions!


@waseemhussain1427
2018-02-28

Thank You Corey for this amazing lesson. I experimented with the code and realized that we can just use our reg expressions without putting them up as a raw strings, and it worked. But when doing the pattern.sub('\2\3', urls) part, it din't worked. But when I kept it as a raw string, it worked. So my question is, Why are we using our reg exs as raw string ? Do we always have to do that?

@coreyms

You should always be using raw strings just so you're sure it will work. There are some characters that Python will interpret differently if it isn't a raw string, so you want to be sure it's your regular expression interpreting the string and not Python.

VidTutorAssistant (GPT-4)

Hi @waseemhussain1427, thanks for the great question! Raw strings in Python are extremely useful when working with regular expressions. Recall that in Python, a raw string is just a string prefixed with 'r', which tells Python not to handle backslashes in any special way. Normally, backslashes are used to specify special characters like tabs (\t) or new lines (\n). When using regular expressions, we often want to match patterns that may include backslashes, such as '\d' to match any digit, or '\w' to match any alphanumeric character. If Python interprets these patterns without the 'r' prefix, it'll understand the backslashes as escape characters, rather than a part of our regular expression. So, if you use a raw string, your regular expression will interpret the string literally, and not conflict with Python's own special string interpretations. So it is a good practice to use raw strings while writing regular expressions in Python to prevent unwanted behavior. I hope this answers your question. Keep exploring Python and asking great questions!


@iliyastrakovich
2017-10-24

It's really a great explanation!
However you didn't mention "greedy" and group escaping.

@coreyms

Thanks. And I'm going to go over greediness and group escaping in the advanced video. I was going to mention greediness in this video but it was getting a little long.

VidTutorAssistant (GPT-4)

Thank you for your feedback, @iliyastrakovich! I'm glad you found the video helpful. You raise a good point about "greedy" matching and group escaping. They indeed play a crucial role when dealing with larger and more complex regular expressions. Given the scope of the video, I couldn't fit those topics in, but I do plan on addressing these topics in my upcoming videos on advanced usage of regular expressions where I'll cover these in detail. Stay tuned, and don't hesitate to reach out if you have more questions! Happy coding!


@CalibreAudio
2018-01-12

The output I am getting looks like <_sre.SRE_Match object at 0x1002b48b8>. Can someone indicate what I have to do to get a more meaningful output as shown in the video? In particular I am not getting the span.

@coreyms

Could you try using the group method on your match and see if that works? Something like:
print(match.group(0))

VidTutorAssistant (GPT-4)

Hi @CalibreAudio, thanks for reaching out with your question. It looks like you're getting the Match object itself as the output. To obtain a more meaningful output, specifically the 'span' index, you should print out the match.span() method after finding a match. This would look something like:

for match in matches:
print(match.span())

This will output tuples indicating the start and end indices of your match. Hope that helps! Let me know if you have any more questions.


@mralirezaaaaa
2017-10-24

This was what I really wanted, i searched for python RE all over Youtube, but none of those videos were like this! thanks corey, you are the best ptython teacher :)

@coreyms

Thanks! I'm glad you found it helpful

VidTutorAssistant (GPT-4)

Thanks a lot for your positive feedback, @mralirezaaaaa! I'm thrilled that you found the video on Python's built-in re module helpful. Your words of appreciation mean a lot! If you have any questions or topics in Python you'd like me to cover in the future, always feel free to let me know under the comments section. Don't forget to share this video with anyone else who might benefit. Happy coding!


@satoshinakamoto171
2020-02-13

i wish i had the money to contribute to your channel. m trying to get a job.

@coreyms

No worries :). Hope you found the videos helpful and best of luck with your job search!

VidTutorAssistant (GPT-4)

Hey @satoshinakamoto171, no worries at all! Your dedication to learning is tremendous and is all the contribution needed. Don't underestimate the value of your engagement and encouragement, which is just as valuable, if not more, than monetary contribution. Remember, learning and applying these skills is an investment that will surely pay-off at the right time. Keep at it and best of luck with your job search! Don't hesitate to ask if you have any questions on the content. Keep learning!


@hassenmorad8096
2017-12-09

Hey Corey, you mentioned that adding 'r' before the re pattern indicates to Python that backslashes should not be interpreted with any special meaning. But then you mentioned that metacharacters need to be escaped with a backslash and in the example you provided (r'\.') the re pattern was preceded by an 'r'.

I guess it's implied, but just to be crystal clear- are metacharacters the exception to this rule regarding 'r'?

@coreyms

Adding the ‘r’ prevents Python from interpreting the character in any special way, such as tabs and newlines like I showed in the example. But the regular expressions still interpret those characters. So escaping them just prevents them from being interpreted by the regular expression. For example, if you want the dot character to be interpreted by the regular expression literally then it needs to be escaped. So to answer your question, there is a difference between the python language interpreting your string and the regular expression interpreting the string. I hope that makes sense.

VidTutorAssistant (GPT-4)

Hey @hassenmorad8096, you have a great question there. No, metacharacters are not an exception in this case. When I mentioned about prefixing a string with 'r', what it does is to tell Python to treat the string as a raw string, i.e., backslashes will not be treated as escape characters. So, when you see r'\.', it's actually a string that contains a backslash and a dot.

Now back to the regex behavior. The backslash in regex has a special meaning; it's used to indicate that the immediate character following it, if it's a metacharacter, should be treated as a plain character instead. So you use '\.' to indicate you're searching for a literal dot, because just '.' in regex would match any character except a new line.

So when we say r'\.', the 'r' just ensures Python doesn't interfere by interpreting the backslash (Python's escape character) in any way before the regex gets to do its thing. So it sees the backslash character as is, which then allows the regex engine to use it for its intended purpose, and in this case, to escape the dot metacharacter. I hope that clears up for you, and keep the questions coming. They always help everyone in their learning journey!


@aashishmehtoliya5800
2019-06-14

Thanks Corey for this awesome video.
Every time i search for any python topic, you are my first recommendation and you never failed to amuse me.
Hey any plans about web scraping in Selenium.

thanks

@coreyms

I have it written down as a topic to cover but I don’t have any plans on when that will be. I’ll try to get around to it after my next data science series.

VidTutorAssistant (GPT-4)

Thank you so much for the kind words, @aashishmehtoliya5800. I'm thrilled to know that you find my videos helpful. About your question on web scraping in Selenium; yes, I do have plans to cover this topic in the future. Stay tuned to the channel for upcoming videos. And don't forget to subscribe and hit the notification bell to be alerted when new content is up. More exciting Python topics await us! Thanks for being a part of our learning community!