Wednesday, 11 September 2013

Python Regular Expression - right-to-left

Python Regular Expression - right-to-left

I am trying to use regular expressions in python to match the frame number
component of an image file in a sequence of images. I want to come up with
a solution that covers a number of different naming conventions. If I put
it into words I am trying to match the last instance of one or more
numbers between two dots (eg .0100.). Below is an example of how my
current logic falls down:
import os
import re
def sub_frame_number_for_frame_token(path, token='@'):
folder = os.path.dirname(path)
name = os.path.basename(path)
pattern = r'\.(\d+)\.'
matches = list(re.finditer(pattern, name) or [])
if not matches:
return path
# Get last match.
match = matches[-1]
frame_token = token * len(match.group(1))
start, end = match.span()
apetail_name = '%s.%s.%s' % (name[:start], frame_token, name[end:])
return os.path.join(folder, apetail_name)
# Success
eg1 = 'xx01_010_animation.0100.exr'
eg1 = sub_frame_number_for_frame_token(eg1) # result:
xx01_010_animation.@@@@.exr
# Failure
eg2 = 'xx01_010_animation.123.0100.exr'
eg2 = sub_frame_number_for_frame_token(eg2) # result:
xx01_010_animation.@@@.0100.exr
I realise there are other ways in which I can solve this issue (I have
already implemented a solution where I am splitting the path at the dot
and taking the last item which is a number) but I am taking this
opportunity to learn something about regular expressions. It appears the
regular expression creates the groups from left-to-right and cannot use
characters in the pattern more than once. Firstly is there anyway to
search the string from right-to-left? Secondly, why doesn't the pattern
find two matches in eg2 (123 and 0100)?
Cheers

No comments:

Post a Comment