-
Notifications
You must be signed in to change notification settings - Fork 33
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
* update deps and Vite * remove warning message * adds the ability to fetch transcript from youtube * refactor transcript serializer * basic display of the transcript * update vlitejs * add seekTo functionality * extract to partial * add transcript to a tab * add transcript to the search * lint * fix vlite imports for V6 * return empty transcript when missing
- Loading branch information
1 parent
eec1306
commit d2779c1
Showing
25 changed files
with
27,386 additions
and
45 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,47 @@ | ||
require "message_pb" | ||
|
||
module Youtube | ||
class Transcript | ||
attr_reader :response | ||
|
||
def get(video_id) | ||
message = {one: "asr", two: "en"} | ||
typedef = MessageType | ||
two = get_base64_protobuf(message, typedef) | ||
|
||
message = {one: video_id, two: two} | ||
params = get_base64_protobuf(message, typedef) | ||
|
||
url = "https://www.youtube.com/youtubei/v1/get_transcript" | ||
headers = {"Content-Type" => "application/json"} | ||
body = { | ||
context: { | ||
client: { | ||
clientName: "WEB", | ||
clientVersion: "2.20240313" | ||
} | ||
}, | ||
params: params | ||
} | ||
|
||
@response = HTTParty.post(url, headers: headers, body: body.to_json) | ||
JSON.parse(@response.body) | ||
end | ||
|
||
def self.get(video_id) | ||
new.get(video_id) | ||
end | ||
|
||
private | ||
|
||
def encode_message(message, typedef) | ||
encoded_message = typedef.new(message) | ||
encoded_message.to_proto | ||
end | ||
|
||
def get_base64_protobuf(message, typedef) | ||
encoded_data = encode_message(message, typedef) | ||
Base64.encode64(encoded_data).delete("\n") | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,33 @@ | ||
class Cue | ||
attr_reader :start_time, :end_time, :text | ||
|
||
def initialize(start_time, end_time, text) | ||
@start_time = start_time | ||
@end_time = end_time | ||
@text = text | ||
end | ||
|
||
def to_s | ||
"#{start_time} --> #{end_time}\n#{text}" | ||
end | ||
|
||
def to_h | ||
{ | ||
start_time: start_time, | ||
end_time: end_time, | ||
text: text | ||
} | ||
end | ||
|
||
def start_time_in_seconds | ||
time_string_to_seconds(start_time) | ||
end | ||
|
||
def time_string_to_seconds(time_string) | ||
parts = time_string.split(":").map(&:to_f) | ||
hours = parts[0] * 3600 | ||
minutes = parts[1] * 60 | ||
seconds = parts[2] | ||
(hours + minutes + seconds).to_i | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,65 @@ | ||
class Transcript | ||
include Enumerable | ||
|
||
attr_reader :cues | ||
|
||
def initialize | ||
@cues = [] | ||
end | ||
|
||
def add_cue(cue) | ||
@cues << cue | ||
end | ||
|
||
def to_h | ||
@cues.map { |cue| cue.to_h } | ||
end | ||
|
||
def to_json | ||
to_h.to_json | ||
end | ||
|
||
def to_text | ||
@cues.map { |cue| cue.text }.join("\n\n") | ||
end | ||
|
||
def to_vtt | ||
vtt_content = "WEBVTT\n\n" | ||
@cues.each_with_index do |cue, index| | ||
vtt_content += "#{index + 1}\n" | ||
vtt_content += "#{cue}\n\n" | ||
end | ||
vtt_content | ||
end | ||
|
||
def each(&) | ||
@cues.each(&) | ||
end | ||
|
||
class << self | ||
def create_from_youtube_transcript(youtube_transcript) | ||
transcript = Transcript.new | ||
events = youtube_transcript.dig("actions", 0, "updateEngagementPanelAction", "content", "transcriptRenderer", "content", "transcriptSearchPanelRenderer", "body", "transcriptSegmentListRenderer", "initialSegments") | ||
if events | ||
events.each do |event| | ||
segment = event["transcriptSegmentRenderer"] | ||
start_time = format_time(segment["startMs"].to_i) | ||
end_time = format_time(segment["endMs"].to_i) | ||
text = segment.dig("snippet", "runs")&.map { |run| run["text"] }&.join || "" | ||
transcript.add_cue(Cue.new(start_time, end_time, text)) | ||
end | ||
else | ||
transcript.add_cue(Cue.new("00:00:00.000", "00:00:00.000", "NOTE No transcript data available")) | ||
end | ||
transcript | ||
end | ||
|
||
def format_time(ms) | ||
hours = ms / (1000 * 60 * 60) | ||
minutes = (ms % (1000 * 60 * 60)) / (1000 * 60) | ||
seconds = (ms % (1000 * 60)) / 1000 | ||
milliseconds = ms % 1000 | ||
format("%02d:%02d:%02d.%03d", hours, minutes, seconds, milliseconds) | ||
end | ||
end | ||
end |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,16 @@ | ||
class TranscriptSerializer | ||
def self.dump(transcript) | ||
transcript.to_json | ||
end | ||
|
||
def self.load(transcript_json) | ||
transcript = Transcript.new | ||
return transcript if transcript_json.nil? || transcript_json.empty? | ||
|
||
cues_array = JSON.parse(transcript_json, symbolize_names: true) | ||
cues_array.each do |cue_hash| | ||
transcript.add_cue(Cue.new(cue_hash[:start_time], cue_hash[:end_time], cue_hash[:text])) | ||
end | ||
transcript | ||
end | ||
end |
Oops, something went wrong.