Basically, a regular expression is a pattern describing a certain amount of text. For example, you know that emails are always like:
python>username@domain.extension
If we want to describe the pattern of an email, we will say something like this: Starting with a username (a combination of letters and numbers), followed by an at
@ symbol, followed by the domain (another combinations of letters and numbers) followed by the extension (that starts with a dot .
followed by a combination only letters).
The process of describing the pattern of an email is the same process you will follow when you want to create a regular expression. The only difference will be the syntax.
All major programming languages use regular expressions (C++, PHP, .NET, Java, JavaScript, Python, Ruby, and many others). As a web developer, you have to always be working with strings to validate the user’s inputted data, to validate URL formats, to replace words in paragraphs, etc. These are the main uses for regular expressions:
Never start creating a Regex without having a live testing tool – it can get very complicated very easily. The best way is to use the "divide and conquer" strategy (again) – split your Regex into several smaller Regex’s, and then combine them all.
This is a regular expression that checks for an email pattern:
/[\w._%+-]+@[\w.-]+\.[a-zA-Z]{2,4}/
But, don’t worry…you don’t have to understand it right now. The good news is that a complex regular expression is just the combinations of several very simple regular expressions. "Divide and conquer!"
So…let’s start with the basic regular expressions using the most basic operators:
A simple character is…
Any succession of characters is a simple regular expression. If we use the word "email" as a regular expression, the system will look for any repetitions of the word "email" inside of the given text.
Use the container on the right to play with other simple successions of characters.
.
CharacterThe python>.
character represents…
Any character or symbol available. If you say python>ab.ve
you are saying anything that starts with python>ab
and ends with python>ve
You can use the .
as many times as you want; the regular expression will replace the python>.
with any character as many times as the .
appears.
Use the container on the right to play with other simple successions of characters.
[ ]
CharacterThe python>[ ]
characters represent…
A group of possible characters. Sometimes, we would like to be a bit more specific…this is where ranges come in useful. We specify a range of characters by enclosing them within square brackets ( ).
You can also use the python>[ ]
to range numbers or letters with a dash in between. The dash represents a range of numbers or characters. For example:
python>[0-9]
represents any number between 0 and 9.python>[a-z]
represents any letter in lowercasepython>[A-Z]
represent any letter in uppercase
You can also combine ranges of characters like this:
python>[a-zA-Z]
python>[1-59]
python>[1-5a-fX]
^
(caret) Character: Negation or Beginning of a TermIf we place ^
at the beginning of a [range]:
We are negating the range. For example:
All terms that start with python>li
and end with python>e
but have no python>i
or python>v
on the inside: python>li[^v]e
If we place python>^
at the beginning of a regular expression:
We are saying that we want to only test the Regex from the beginning of the string (no substrings – smaller parts of the string – will be tested):
A string starting with http: python>^http
python>\d
and Words python>\w
If you prefer, you can use these shortcuts in your regular expressions:
Operator | Descriptions |
---|---|
\w | Matches any word character (equal to python>[a-zA-Z0-9_]) |
\W | Matches anything other than a letter, digit or underscore |
\d | Matches any decimal digit. Equivalent to python>[0-9] |
\D | Matches anything other than a decimal digit |
python>()
We always talk about "divide and conquer," right? Well, your best friend for that will be the parenthesis operator python>( )
. We are now able to group any pattern just like we do in math.
Now that we can group, we can multiply (repeat) our patterns, negate our patterns, etc.
For example, this Regex accepts one or many repetitions of the ab
string followed by a python>c
letter at the end: python>(ab)*c
Sometimes, you don’t want to specify the number of characters that a Regex can have. For example, a domain name can have between 1 to maybe 100 characters…who knows?
Quantifier allow us to increase the number of times a character may occur in our regular expression. Here is the basic set of multipliers:
python>*
python>+
python>?
QuantifierWe can place the quantifier after the character patterns that we want to repeat. Here are some cases and examples:
Operator | Description |
---|---|
+ | One or many E.g.: Terms with the letter o at least one time; o+ |
| Zero or many E.g.: Terms starting with the letter "a" (lowercase) followed by *zero or many characters of any type but the white space: a[^ ]* | |
? | Zero or one E.g.: Finding the November string with or without the shortcut: python>[nN]ov(ember)? |
☝Here are two amazing tools to build, & test Regular Expressions.https://regex101.com/ and http://regexr.com/
☝Here is an interactive tutorial to learn regular expressionsundefined//regexone.com/
Lets face it: regular expressions are something you will use every once in a while (unless you specialize in a very particular area of the web development world). The syntax is easy to forget, and you probably are going to find your Regex’s from the internet a lot of the time. The important thing here is that you understand them and that you are able to play with them whenever you need to.
Here are some pre-made Regex’s:
We begin by telling the parser to find the beginning of the string (^).
Inside the first group, we match one or more lowercase letters, numbers, underscores, dots, or hyphens.
We have escaped the dot because a non-escaped dot means any character.
Directly after that, there must be an at @ sign.
Next is the domain name, which must be: one or more lowercase letters, numbers, underscores, dots, or hyphens. Then another (escaped) dot, with the extension being two to six letters or dots. I have 2 to 6 because of the country specific TLD’s (.ny.us or .co.uk).
Finally, we want the end of the string ($).
This Regex is almost like taking the ending part of the above Regex, slapping it between "http://" and some file structure at the end. It sounds a lot simpler than it really is. To start off, we must search for the beginning of the line with the caret.
The first capturing group is all optional. It allows the URL to begin with "http://", "https://", or neither of them. We have a question mark after the s to allow URL’s that have http or https. In order to make this entire group optional, we just added a question mark to the end of it.
Next is the domain name: one or more numbers, letters, dots, or hyphens followed by another dot then two to six letters or dots. The following section is the optional files and directories. Inside the group, we want to match any number of forward slashes, letters, numbers, underscores, spaces, dots, or hyphens. Then we shall say that this group can be matched as many times as we want. This allows multiple directories to be matched along with a file at the end. We have used the star instead of the question mark because the star says zero or more, not zero or one. If a question mark was to be used there, only one file/directory would be able to be matched.
Next, a trailing slash is matched, but it is optional.
Finally, we end with the end of the line.
The most efficient way to learn: Join a cohort with classmates like yourself, live streamings, coding jam sessions, live mentorships with real experts and keep the motivation.
From zero to getting paid as a developer, learn the skills of the present and future. Boost your professional career and get hired by a tech company.
Start a career in data science and analytics. A hands-on approach with interactive exercises, chat support, and access to mentorships.
Keep your motivation with this 30 day challenge. Join hundreds of other developers coding a little every day.
Start with Python and Data Science, Machine Learning, Deep Learning and maintaining a production environment in A.I.
©4Geeks Academy LLC 2019