Skip to content

digitalpalitools/audio-talks-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

License: CC BY-NC-SA 4.0

Ṭhānissaro Bhikkhu Talks Search Application

Search through Ṭhānissaro Bhikkhu's talks.

Front-end:

Backlog

  • dhammatalks [before 20211003]
    • structured data file: dateId, ytid, author, channel -> re-index
    • complete remaining 1k files
  • Core
    • AzSearch infra
    • Front end
      • Search results
      • Play
      • View subtitle
    • Add CI
  • Suttas
  • Core v1
    • Play from specific time
    • Index transcribed audio
  • Short dhammatalks
  • Lectures & longer talks

Instructions

YouTube Download

  1. Download metadata of all videos in channel: youtube-dl --write-info-json --skip-download "https://www.youtube.com/c/DhammatalksOrgShorts/videos" --playlist-start 1
  2. In case of crash in the above, terminates prematurely, use --playlist-start to resume from cursor
  3. Download all subtitles: dir *.json | % { $_.FullName.Substring($_.FullName.Length - 21, 11) } | % { youtube-dl --skip-download --write-auto-sub --sub-format srv1 "https://www.youtube.com/watch?v=$($_)" }

Convert to text

dir -rec -filt *.srv1 .\yt\ | % { ..\format.ps1 $_.FullName -Author ydb -Channel main }

Upload to blob store

azcopy copy ".\talks\dt\*.json" "https://tsbtalks.blob.core.windows.net/tsbtalks/dt?sv=2019-12-12&st=2021-10-02T22%3A34%3A41Z&se=2021-10-03T22%3A34%3A41Z&sr=c&sp=racwdxlt&sig=E0dCSUbfWWEH%2B8j2YhN5rHZCjXrB7dii6aKMZpqnlqE%3D" --recursive

Test search

$apiKey = 'D0CA5AF719558AA344C3111934DA874D';
$headers = @{ "api-key" = $apiKey; "Accept" = "application/json; odata.metadata=none" }
$res = Invoke-RestMethod -Method Get -Uri 'https://tsbtalks.search.windows.net/indexes/azureblob-index/docs?api-version=2020-06-30&search=craving as companion&searchMode=all&searchFields=content&highlight=content&highlightPreTag=<XXX>&highlightPostTag=</XXX>' -Headers $headers
$res.value[0].'@search.highlights'.content

References

About

Search through Ṭhānissaro Bhikkhu's audio talks

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published