Skip to content

Commit ec13fac

Browse files
authored
Minor performance improvement to timestamp fix (#92)
I had wondered before about the performance implications of making an extra list when parsing timestamps, and @edsu’s recent patch in #89 got me to take a look. This is a minor 5-6% performance improvement, which may be worthwhile if you are iterating through extremely large result sets. Part of this is not allocating a few new lists, and part is checking the least significant digit first, since there are relatively few valid months/days where the least significant digit is 0.
1 parent 1098bbb commit ec13fac

File tree

1 file changed

+5
-8
lines changed

1 file changed

+5
-8
lines changed

wayback/_utils.py

Lines changed: 5 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -71,19 +71,16 @@ def parse_timestamp(time_string):
7171
# see the raw data so this is as close as we can get.
7272
#
7373
# The issue seems to be limited to some crawls in the year 2000.
74-
timestamp_chars = list(time_string)
75-
if timestamp_chars[4:6] == ['0', '0']:
74+
if time_string[5] == '0' and time_string[4] == '0':
7675
logger.warning("found invalid timestamp with month 00: %s", time_string)
77-
del timestamp_chars[4:6]
78-
timestamp_chars.extend(['0', '0'])
79-
elif timestamp_chars[6:8] == ['0', '0']:
76+
time_string = f'{time_string[0:4]}{time_string[6:]}00'
77+
elif time_string[7] == '0' and time_string[6] == '0':
8078
logger.warning("found invalid timestamp with day 00: %s", time_string)
81-
del timestamp_chars[6:8]
82-
timestamp_chars.extend(['0', '0'])
79+
time_string = f'{time_string[0:6]}{time_string[8:]}00'
8380

8481
# Parse the cleaned-up result.
8582
return (datetime
86-
.strptime(''.join(timestamp_chars), URL_DATE_FORMAT)
83+
.strptime(time_string, URL_DATE_FORMAT)
8784
.replace(tzinfo=timezone.utc))
8885

8986

0 commit comments

Comments
 (0)