Methods in re package







Hello Folks.....!!!

Today Let us walk through the available methods in re package.

How to import the package ?

You have to write import re before beginning the other parts of the code. This line is like #include < > in C/C++. The methods must be called by writing the package_name.method_name(parameters).

Methods in re 

1. re.findall ( ):

Returns a list that contains the matching strings. The parameters are the pattern to be searched and the whole passage/string in which the pattern should be searched. Let us look at an example.

Consider a sentence from which I want to extract the words that start with the alphabet 'a', look at the code below:


Now let us see the explanation for the pattern. 
  • we all know that writing a '\b' matches the strings that has the specified characterset. so we write '\b[a]'.
  •  If you write like this alone, it will return only the matched part of the string in a list and for the example above, you will get ['a','a'] as there are only two words starting with 'a'. 
  • So placing a '\w' will match all the alphanumeric characters.
  • But if you place just once, it will fetch the next character along with it and you will get ['ap','ar']
  • This is not our desired result. So a '*' or '+' should be placed, which denotes all the trailing characters except a newline or space. 
  • Now also you won't get the words starting with 'a', unless and until u place a 'r' before the pattern which tells the regular expression engine, that the following one should be treated specially.
  • After doing all these, you will get the result ['apple','artist']- all the words in the sentence that starts with 'a'.

2. re.split( )

This splits the input string whenever, the pattern specified is encountered. The parameters to be passed are the pattern and the string. Let us consider the same example we tried for last one. Look at the code and output below.


The input sentence is split whenever a word that starts with 'a' is encountered. That word is ignored and the rest we get them in a list. One of the optional parameter is maxsplit which means maximum number of split. Now if I specify split only once, then we will get something that looks like the one given below.


Now the string is split only at the first encounter and not at the upcoming ones.

3. re.sub( ):

Used to replace the specified pattern in the string. It returns the whole string with the replacements at specified places.The syntax is given below:

re.sub ( pattern, replace, string, count)
  • pattern - The pattern to be matched.
  • replace - The string which should be placed instead of the pattern specified.
  • string - The input sentence/passage.
  • count - (optional) Number of replacements. Default value '0' meaning replace all possibilities.
Let us look at an example. Consider the same one we took before.


4. re.subn( ):

It is similar to the re.sub ( ), differs only in the output returned. This method returns a tuple containing the new string and the number of replacements made. Look at the code and output below.


5. re.search( ):

Used to search for the specified pattern  in the input string. It returns an object named match object if the pattern search is successfully found. The information can be extracted from the match object in three ways.
  1. match_obj.span( ) - returns a tuple containing the starting and ending index of the pattern.
  2. match_obj.string - returns the input string passed.
  3. match_obj.group( ) - returns the match that occurred.
Let us look at an example. We can consider a new example, where a three digit number should be followed by a space and then two alphabets.


Comments