This project utilizes the BERT model to perform extractive text summarization on lecture transcripts. The contents of
this project include a RESTful API to serve these summaries, and a command line interface for easier interaction. You can
find more about the specs of this service and CLI in our Documentation
directory.z
Paper: https://arxiv.org/abs/1906.04165
First, docker is required to run the service locally. To start the service, run the command:
make docker-build-run
On the first run of a service, this may take quite some time to complete.
The CLI tool can be downloaded using pip with the following command:
pip install git+https://github.com/dmmiller612/lecture-summarizer.git
To test the tool, try getting the current lectures in the service with the command:
lecture-summarizer get-lectures
Note, that this tool automatically uses our cloud based service by default. You can use your local service by supplying
the -base_path
option, such as -base_path localhost:5000
. As an example, to get lectures locally, you could run:
lecture-summarizer get-lectures -base-path localhost:5000
After installing the CLI, the service should be ready to use. The lecture-summarizer uses the API service as it's backend. This backend defaults to the currently hosted one on AWS. The user can supply a specific URL if the service is hosted elsewhere. Below, briefly discusses how to use the CLI tool.
Before one can do anything with summarizations, there needs to be at least one lecture in the system. Taking an Udacity
lecture, using the raw_lecture.txt
file at the parent of the lecture-summarizer directory as an example, one can upload
the content issuing the following command:
lecture-summarizer create-lecture -path ./raw_lecture.txt -name example_first_lecture -course IHI
Currently, the lecture-summarizer can parse sdp file formats, which are common for Udacity-based lectures. Notice that one
needs to supply a name
and a course
as metadata.
One can retrieve lectures with a couple of options. Those options can be found in the Documentation/CLI_Documentation.md file in the base of the repo. Some example commands are shown below:
lecture-summarizer get-lectures -lecture-id 1
lecture-summarizer get-lectures
lecture-summarizer get-lectures -name example_first_lecture
lecture-summarizer get-lectures -course ihi
Just like creating a lecture, creating a summary is a painless process. Below is an example of creating a summary from a specified lecture.
lecture-summarizer create-summary -lecture-id 1 -name 'my summary name' -ratio 0.2
The ratio
specifies approximately how much of the lecture that you want to summarize.
Just like with retrieving lectures, one can also list summaries. Below are a couple of examples:
lecture-summarizer get-summaries -lecture-id 1 -summary-id 1
lecture-summarizer get-summaries -lecture-id 1
lecture-summarizer get-summaries -lecture-id 1 -name 'my summary name'
lecture-summarizer delete-summary -lecture-id 1 -summary-id 1
This endpoint creates a lecture.
{
"course": "course identifier",
"content": "Lecture String Content",
"name": "Lecture name"
}
This endpoint is used to retrieve lectures. The user can supply two query params shown below.
/lectures?course=unique_identifier
/lectures?name=course_name
This endpoint is used to retrieve a single lecture
/lectures/{id}
This endpoint is used to create a summarization from a lecture
{
"name": "Summarization name",
"ratio": "Ratio of sentences to select"
}
/lectures/{id}/summaries?name=course_name
/lectures/{id}/summaries
This endpoint allows you to get or delete a summarization.
/lectures/{id}/summaries/{summarization_id}