7.20. Regex Group Reference

  • \g<number> - backreferencing by position

  • \g<name> - backreferencing by name

  • (?P=name) - backreferencing by name

  • Note, that for backreference, must use raw-sting or double backslash

7.20.1. SetUp

>>> import re

7.20.2. Backreference

  • Recall the value of a group in the regular expression

  • (?P<myname>\w+?) - capture the pattern in a group named myname

  • (?P=myname) - backreference to the group named myname

  • string = '<p>Hello Alice</p>'

  • pattern = r'<(?P<tag>\w+)>(?:.+)</(?P=tag)>'

>>> string = '<p>Hello Alice</p>'
>>> pattern = r'<(?P<tag>\w+)>(?:.+)</(?P=tag)>'
>>>
>>> re.findall(pattern, string)
['p']

7.20.3. Recall Group by Position

  • string = '<p>Hello Alice</p>'

  • pattern = r'<p>(.+)</p>'

  • replace = r'<strong>\g<1></strong>'

  • \g<number> - backreferencing by position

>>> string = '<p>Hello Alice</p>'
>>> pattern = r'<p>(.+)</p>'
>>> replace = r'<strong>\g<1></strong>'
>>>
>>> re.sub(pattern, replace, string)
'<strong>Hello Alice</strong>'

7.20.4. Recall Group by Name

  • string = '<p>Hello Alice</p>'

  • pattern = r'<p>(?P<text>.+)</p>'

  • replace = r'<strong>\g<text></strong>'

  • \g<name> - backreferencing by name

>>> string = '<p>Hello Alice</p>'
>>> pattern = r'<p>(?P<text>.+)</p>'
>>> replace = r'<strong>\g<text></strong>'
>>>
>>> re.sub(pattern, replace, string)
'<strong>Hello Alice</strong>'

7.20.5. Use Case - 1

>>> import re
>>>
>>> string = 'Email from Alice Apricot <alice@example.com> received on: Jan 1st, 2000 at 12:00AM'
>>>
>>> year = r'(?P<year>\d{4})'
>>> month = r'(?P<month>[A-Z][a-z]{2})'
>>> day = r'(?P<day>\d{1,2})'
>>> pattern = f'{month} {day}(?:st|nd|rd|th), {year}'

Recall group by position:

>>> replace = r'\g<3> \g<1> \g<2>'
>>>
>>> re.sub(pattern, replace, string)
'Email from Alice Apricot <alice@example.com> received on: 2000 Jan 1 at 12:00AM'

Recall group by name:

>>> replace = r'\g<year> \g<month> \g<day>'
>>>
>>> re.sub(pattern, replace, string)
'Email from Alice Apricot <alice@example.com> received on: 2000 Jan 1 at 12:00AM'

7.20.6. Use Case - 2

>>> import re
>>>
>>>
>>> string = '<p>Hello Alice</p>'
>>> pattern = r'<(?P<tag>.*?)>(.*?)</(?P=tag)>'
>>>
>>> re.findall(pattern, string)
[('p', 'Hello Alice')]

7.20.7. Use Case - 3

>>> import re
>>>
>>>
>>> string = '<p>We choose to go to the <strong>Moon</strong></p>'
>>>
>>> pattern = r'<(?P<tagname>[a-z]+)>.*</(?P=tagname)>'
>>> re.findall(pattern, string)
['p']
>>>
>>> pattern = r'<(?P<tagname>[a-z]+)>(.*)</(?P=tagname)>'
>>> re.findall(pattern, string)
[('p', 'We choose to go to the <strong>Moon</strong>')]

7.20.8. Assignments