re-package






Hello Folks....!!!!! Today let us look at the package re.

re-package in Python

re stands for regular expressions. This package provides various way for matching regular expressions. We all have come across the contacts in our mobile phone. When we start typing the name of the person, it starts matching the names with similar characters and suggests us. To do this re package in python provides us several modules.This allows to match patterns or search patterns in the given string or paragraph using the built-in methods.

Unicode error in file operations

We came across the Unicode error given below while handling files.


To over come this we use double slashes for the path, instead we can follow the method given below



Just placing a r before the path tell the regular expression engine to treat them as normal characters instead of considering them as escape characters.


Metacharacters for pattern matching

These characters have special meaning when used for pattern matching/searching. Before getting into the methods for pattern matching/searching, let us look at various characters that are necessary for doing this and these characters are called metacharacters.
  1. [ ] (Square brackets) :
  •  For specifying the set of characters to match or search. 
  • For example if you write [ghz] and the string to be processed is techie girl, then the number of matches got is 2. 
  • This is because it will take each character within the square bracket and search for match. Thus while checking for  'g', you will get a match, while checking for 'h', you will get a match. 
  • You can even specify range of elements like [k-q], it means the set has [klmnopq].
  • You can do complementary matching, ie., match the string that do not contain the specified parts by adding a caret symbol (^) in the beginning. For example [^ghz] matches the string that don't contain any of the characters-g/h/z.

    2. . (Period) :
  • To match any character other that new line.
  • If you give one period, strings with at least one character will be matched, if 2 periods are given then words with at least 2 characters will be matched.
  • Based on the number of periods, Strings with specified number of characters will be matched.
   
     3. ^ (Caret) :
  • Checks whether the string starts with specified set of characters.
  • For example : ^de matches with  the string defgh, but not with the string adefghSimilarly ^def  can also be searched or matched.

    4. $ (Dollar) :
  • Attached at the end of the expression/character set to be searched.
  • The string ending with the specified pattern will be matched.
  • For example: de$ matches with the string abde but not with abdef.

    5. * (Star) :
  • The strings are checked for the matching with the character set that is to the left of the * and the right of the *. Here the characters to the left of the * should be followed by the ones to its right.
  • Even strings with zero occurrence are counted.
  • For example: ab*c matches with the strings ab, abcdef, abbbbbbc . Here a,b should not contain any text in between them and c should always follow bThe strings that don't match are atbc, abedc, abefgcy.
  • Since zero or more occurrence is counted, the strings ac, acf, acfo also matches.

    6. + (Plus) :
  • This is same as *, but matches only if the string has one or more occurrence.
  • Thus for example, the pattern ab+c wont match with ac, acf, acfo.
  • The pattern ab+c matches with abcderwt, abcdef, abbbbcf.
    
    7. ? (Question mark) :
  • This is same as + and *, but the expression to the left of the * must be present zero or once before the one to the right of the *.
  • For example: ab?c matches ac, abc, abcdef but not with the strings like abbbbcdef as b occurs more than once in between.

    8. expr{n,m} :
  • This checks for the strings that contains the repetition of expr atleast n times and atmost m times.
  • This means the lower bound for repetition of particular characterset is n times and the upper bound is m times.
  • For example : a{2,3} which means for a string to match a must be present atleast twice and not more than thrice.
  • We can use to check ranges also. For example : [0-5]{2,3} means the digits in the specified range, here 0 to 5 must be present atleast twice and not more than thrice.

    9. | (Alternation operator) :
  • This means that the expression either to the left of the operator or to the right must be present.
  • This is like 'or' logical operation.
  • For example : a|b means the strings that contain either a or b has to be matched.

We can group expressions using parenthesis- ( ). If you want to match a special character then place a back slash before them. For example you want to find the string that starts with *, but it is a special operator, so We can write  '^\*' as the pattern to be searched.

Special sequences

  1. \A  characterset - This matches the strings/sentences that starts with the specified characters. For example, if you write \Athe, sentences/strings starting with the will be matched.
  2. \b characterset - If \b is placed in the beginning, each word in the string will be checked whether they start with the characterset that follows it. If placed at the end, each word will be checked whether they end with specified characterset.
  3. \B characterset - It is the opposite of \b, it matches the string that do not start /end with the specified characterset.
  4. \d - It matches any decimal in the range 0-9. For example, the string ad345se2 will get 4 matches while the string apple wont get any match.
  5. \D - It is the opposite of  \d and matches the non-numeric characters if any.
  6. \s - Searches for the matching string that contains white space in between.
  7. \S - Tries to find the strings without white space in between.
  8. \w- Finds the strings with alpha-numeric characters, The string should contain a combination of alphabets (a-z/A-z) and digits (0-9). Here underscore (_) is considered to be alpha-numeric character. For example : for the string abc123#$% you will get 6 matches and for apple you will get zero match.
  9. \W - It is opposite of the above one. Matches non-alpha-numeric characters.
  10. \Z characterset - Helps finding the string/sentence that ends in the specified characterset. For example, apple\Z, the string that matches is I like apple and the string I like apple and orange doesn't.

    Methods👉



    Next Topic👉

    Comments