Simple and fast functions for building VTT formatted transcripts.
The builder functions makes a few assumptions.
First, it assumes that the data is stored either in JSON files or as a Python dict with the following keys - transcript
and segments
.
It also assumes that the transcript is already broken up into segments with the appropriate timestamps in the following schema - a list of these objects is what should be in segments
:
key | type | description |
---|---|---|
id | integer | Unique, ordered identifier of the segment. |
start | number | Start time of the segment in seconds. |
end | number | End time of the segment in seconds. |
text | string | Text content of the segment. |
transcript
should store the plain text of the segments together. This allows for combining text from multiple files.
Builds one VTT formatted transcript from one or more JSON files. The function will assume the file paths are provided in order. In this way, the output VTT transcript will build from the files before it, adding the start
and end
times as it goes to make the output one, coherent transcript.
file_paths
(list[str]): The JSON file paths that'll make up the transcript.output_file
(str): The file path opf the output VTT formatted transcript.
From plain text transcripts within multiple JSON files, build a single, plain text transcript with each chunk separated by a newline.
file_paths
(list[str]): The JSON file paths that'll make up the transcript.output_file
(str): The file path opf the output VTT formatted transcript.
From a list of segments - a list of Python dictionaries - build the VTT transcript.
segments_list
(list[dict]): A list of transcript segments (see schema above).output_file
(str): The file path opf the output VTT formatted transcript.
Validate a VTT file.
vtt_file
(string): A complete file path to the vtt file.
- (boolean): A true or false value indicating whether it's a valid VTT formatted file or not.
test_fail = validate_vtt_file('/Users/user/not_vtt.json')
print(test_fail) # False
test_pass = validate_vtt_file('/Users/user/legit_vtt.vtt')
# The extension does not matter
print(test_pass) # True