Hello Folks.....!!!
Today Let us walk through the available methods in re package.
How to import the package ?
You have to write import re before beginning the other parts of the
code. This line is like #include < > in C/C++. The methods must
be called by writing the package_name.method_name(parameters).
Methods in re
1. re.findall ( ):
Returns a list that contains the matching strings. The parameters are the
pattern to be searched and the whole passage/string in which the pattern
should be searched. Let us look at an example.
Consider a sentence from which I want to extract the words that start with the
alphabet 'a', look at the code below:
Now let us see the explanation for the pattern.
- we all know that writing a '\b' matches the strings that has the specified characterset. so we write '\b[a]'.
- If you write like this alone, it will return only the matched part of the string in a list and for the example above, you will get ['a','a'] as there are only two words starting with 'a'.
- So placing a '\w' will match all the alphanumeric characters.
- But if you place just once, it will fetch the next character along with it and you will get ['ap','ar'].
- This is not our desired result. So a '*' or '+' should be placed, which denotes all the trailing characters except a newline or space.
- Now also you won't get the words starting with 'a', unless and until u place a 'r' before the pattern which tells the regular expression engine, that the following one should be treated specially.
- After doing all these, you will get the result ['apple','artist']- all the words in the sentence that starts with 'a'.
2. re.split( )
This splits the input string whenever, the pattern specified is encountered.
The parameters to be passed are the pattern and the string. Let us consider
the same example we tried for last one. Look at the code and output below.
The input sentence is split whenever a word that starts with 'a' is
encountered. That word is ignored and the rest we get them in a list. One of
the optional parameter is
maxsplit which means maximum number of split. Now if I specify
split only once, then we will get something that looks like the one given
below.
Now the string is split only at the first encounter and not at the upcoming
ones.
3. re.sub( ):
Used to replace the specified pattern in the string. It returns the whole
string with the replacements at specified places.The syntax is given below:
re.sub ( pattern, replace, string, count)
- pattern - The pattern to be matched.
- replace - The string which should be placed instead of the pattern specified.
- string - The input sentence/passage.
- count - (optional) Number of replacements. Default value '0' meaning replace all possibilities.
Let us look at an example. Consider the same one we took before.
4. re.subn( ):
It is similar to the re.sub ( ), differs only in the output returned. This
method returns a tuple containing the new string and the number of
replacements made. Look at the code and output below.
5. re.search( ):
Used to search for the specified pattern in the input string. It
returns an object named match object if the pattern search is successfully
found. The information can be extracted from the match object in three
ways.
- match_obj.span( ) - returns a tuple containing the starting and ending index of the pattern.
- match_obj.string - returns the input string passed.
- match_obj.group( ) - returns the match that occurred.
Let us look at an example. We can consider a new example, where a three
digit number should be followed by a space and then two alphabets.
Comments
Post a Comment