Skip to content

Commit

Permalink
Merge pull request #5893 from avalonmediasystem/relax_transcript_parser
Browse files Browse the repository at this point in the history
Improve handling of improperly formatted time cues in Transcript Parser
  • Loading branch information
masaball authored Jul 2, 2024
2 parents 809be00 + eee3b1c commit dbb0d40
Showing 1 changed file with 2 additions and 1 deletion.
3 changes: 2 additions & 1 deletion lib/avalon/transcript_parser.rb
Original file line number Diff line number Diff line change
Expand Up @@ -48,7 +48,8 @@ def normalized_text

# Separate time cue and text content from a single line of timed text to facilitate result formatting of transcript searches.
def self.extract_single_time_cue(timed_text)
split_text = timed_text.match(/(\d{2,}*:?\d{2}:\d{2}\.?,?\d{3} --> \d{2,}*:?\d{2}:\d{2}\.?,?\d{3})(.*)/)
split_text = timed_text.match(/(\d{0,}:?\d{2}:\d{2}\.?,?\d{3} --> \d{0,}:?\d{2}:\d{2}\.?,?\d{3})(.*)/)
return [nil, nil] if split_text.blank?
time_cue = split_text[1].sub(',', '.').gsub(/\s-->\s/, ',')
text = split_text[2].strip
[time_cue, text]
Expand Down

0 comments on commit dbb0d40

Please sign in to comment.