Skip to content

Commit b3d8d2d

Browse files
Add files via upload
1 parent 1b8006e commit b3d8d2d

File tree

6 files changed

+905
-133
lines changed

6 files changed

+905
-133
lines changed

README.md

Lines changed: 133 additions & 133 deletions
Original file line numberDiff line numberDiff line change
@@ -1,133 +1,133 @@
1-
# PDF Question Answering System Using Retrieval-Augmented Generation (RAG)
2-
3-
This project is a sophisticated question-answering system designed to extract and provide context-aware answers from PDF documents. By integrating advanced **Retrieval-Augmented Generation (RAG)** techniques and state-of-the-art AI models, the system enables users to interact with their documents in a more efficient and intelligent manner.
4-
5-
---
6-
7-
## Use Cases
8-
9-
- **Academic Research**: Quickly extract insights from research papers, reports, or studies.
10-
- **Professional Analysis**: Navigate lengthy contracts, whitepapers, or manuals with ease.
11-
- **Everyday Use**: Simplify interactions with dense or complex PDF documents.
12-
13-
---
14-
15-
## Key Features
16-
17-
- **PDF Processing**: Upload and process PDF documents for analysis.
18-
- **Interactive Q&A**: Enter natural-language questions and receive precise answers based on document content.
19-
- **Advanced Retrieval**: Uses vector-based indexing and similarity scoring for accurate content retrieval.
20-
- **User-Friendly Interface**: A web application built with Streamlit ensures ease of use and accessibility.
21-
22-
---
23-
24-
## Technologies Used
25-
Frontend: Streamlit
26-
Backend: Python
27-
Machine Learning:
28-
HuggingFace Transformers for text generation
29-
VectorStoreIndex for document indexing
30-
Custom retriever and postprocessor for improved accuracy
31-
32-
## Installation and Setup
33-
34-
1. **Clone the Repository**:
35-
```bash
36-
git clone https://github.com/your-repo-name.git
37-
cd your-repo-name
38-
39-
2. Run the Application: Start the Streamlit application:
40-
```bash
41-
streamlit run app.py
42-
43-
## Upload a PDF and Start Querying
44-
![Home Screen](images/home.png)
45-
- Upload your desired PDF file through the application interface.
46-
![Home Screen](images/upload.png)
47-
- Enter questions and retrieve contextually accurate responses.
48-
![Home Screen](images/answer.png)
49-
50-
---
51-
52-
## How It Works
53-
54-
1. **PDF Processing**:
55-
- The system reads and processes the uploaded PDF, splitting it into manageable chunks for indexing.
56-
57-
3. **Information Retrieval**:
58-
- The indexed content is retrieved using advanced embeddings and similarity scoring.
59-
60-
4. **Answer Generation**:
61-
- A pre-trained language model generates context-aware and concise responses based on the retrieved content.
62-
63-
---
64-
65-
## Technology Stack
66-
67-
- **Frontend**: Streamlit for an interactive and intuitive user experience.
68-
- **Backend**:
69-
- HuggingFace Transformers for natural language understanding and generation.
70-
- Vector-based retrieval using custom embeddings.
71-
- **Programming Language**: Python.
72-
73-
---
74-
75-
## Code Overview
76-
77-
### `app.py`
78-
79-
- A Streamlit application that provides the user interface.
80-
- Handles PDF uploads, question inputs, and displays answers.
81-
82-
### `rag.py`
83-
84-
- Implements the core RAG logic:
85-
- **PDF Processing**: Reads and splits the PDF into manageable chunks.
86-
- **Indexing**: Creates a vector index for efficient content retrieval.
87-
- **Query Engine**: Uses a retriever and postprocessor to answer queries.
88-
- **Response Generation**: Generates detailed responses using a transformer model.
89-
90-
---
91-
92-
## Usage Instructions
93-
94-
1. Upload a PDF file.
95-
2. Wait for the system to process the document.
96-
3. Type your question and click "Get Answer".
97-
4. View the answer generated by the system.
98-
99-
---
100-
101-
## Future Enhancements
102-
103-
- **Multi-Document Support**: Enable querying across multiple PDF files.
104-
- **Multi-Language Support**: Add support for processing documents in multiple languages.
105-
- **GPU Support**: Implement GPU acceleration for faster processing and response times.
106-
- **Additional Formats**: Expand support to other document formats such as DOCX and TXT.
107-
- **Enhanced UI**: Improve the user interface with advanced analytics and visualization features.
108-
109-
---
110-
111-
112-
## Contributing
113-
114-
We welcome contributions from the community. To contribute:
115-
116-
1. Fork the repository.
117-
2. Create a feature branch.
118-
3. Submit a pull request detailing your contribution.
119-
120-
For any issues or suggestions, please open a discussion or issue on the repository.
121-
122-
---
123-
124-
## License
125-
126-
This project is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute it in compliance with the terms of the license.
127-
128-
---
129-
130-
## Contact
131-
132-
For inquiries or further information, please contact via the repository issue tracker or email (if applicable).
133-
1+
# PDF Question Answering System Using Retrieval-Augmented Generation (RAG)
2+
3+
This project is a sophisticated question-answering system designed to extract and provide context-aware answers from PDF documents. By integrating advanced **Retrieval-Augmented Generation (RAG)** techniques and state-of-the-art AI models, the system enables users to interact with their documents in a more efficient and intelligent manner.
4+
5+
---
6+
7+
## Use Cases
8+
9+
- **Academic Research**: Quickly extract insights from research papers, reports, or studies.
10+
- **Professional Analysis**: Navigate lengthy contracts, whitepapers, or manuals with ease.
11+
- **Everyday Use**: Simplify interactions with dense or complex PDF documents.
12+
13+
---
14+
15+
## Key Features
16+
17+
- **PDF Processing**: Upload and process PDF documents for analysis.
18+
- **Interactive Q&A**: Enter natural-language questions and receive precise answers based on document content.
19+
- **Advanced Retrieval**: Uses vector-based indexing and similarity scoring for accurate content retrieval.
20+
- **User-Friendly Interface**: A web application built with Streamlit ensures ease of use and accessibility.
21+
22+
---
23+
24+
## Technologies Used
25+
Frontend: Streamlit
26+
Backend: Python
27+
Machine Learning:
28+
HuggingFace Transformers for text generation
29+
VectorStoreIndex for document indexing
30+
Custom retriever and postprocessor for improved accuracy
31+
32+
## Installation and Setup
33+
34+
1. **Clone the Repository**:
35+
```bash
36+
git clone https://github.com/your-repo-name.git
37+
cd your-repo-name
38+
39+
2. Run the Application: Start the Streamlit application:
40+
```bash
41+
streamlit run app.py
42+
43+
## Upload a PDF and Start Querying
44+
![Home Screen](images/home.png)
45+
- Upload your desired PDF file through the application interface.
46+
![Home Screen](images/upload.png)
47+
- Enter questions and retrieve contextually accurate responses.
48+
![Home Screen](images/answer.png)
49+
50+
---
51+
52+
## How It Works
53+
54+
1. **PDF Processing**:
55+
- The system reads and processes the uploaded PDF, splitting it into manageable chunks for indexing.
56+
57+
3. **Information Retrieval**:
58+
- The indexed content is retrieved using advanced embeddings and similarity scoring.
59+
60+
4. **Answer Generation**:
61+
- A pre-trained language model generates context-aware and concise responses based on the retrieved content.
62+
63+
---
64+
65+
## Technology Stack
66+
67+
- **Frontend**: Streamlit for an interactive and intuitive user experience.
68+
- **Backend**:
69+
- HuggingFace Transformers for natural language understanding and generation.
70+
- Vector-based retrieval using custom embeddings.
71+
- **Programming Language**: Python.
72+
73+
---
74+
75+
## Code Overview
76+
77+
### `app.py`
78+
79+
- A Streamlit application that provides the user interface.
80+
- Handles PDF uploads, question inputs, and displays answers.
81+
82+
### `rag.py`
83+
84+
- Implements the core RAG logic:
85+
- **PDF Processing**: Reads and splits the PDF into manageable chunks.
86+
- **Indexing**: Creates a vector index for efficient content retrieval.
87+
- **Query Engine**: Uses a retriever and postprocessor to answer queries.
88+
- **Response Generation**: Generates detailed responses using a transformer model.
89+
90+
---
91+
92+
## Usage Instructions
93+
94+
1. Upload a PDF file.
95+
2. Wait for the system to process the document.
96+
3. Type your question and click "Get Answer".
97+
4. View the answer generated by the system.
98+
99+
---
100+
101+
## Future Enhancements
102+
103+
- **Multi-Document Support**: Enable querying across multiple PDF files.
104+
- **Multi-Language Support**: Add support for processing documents in multiple languages.
105+
- **GPU Support**: Implement GPU acceleration for faster processing and response times.
106+
- **Additional Formats**: Expand support to other document formats such as DOCX and TXT.
107+
- **Enhanced UI**: Improve the user interface with advanced analytics and visualization features.
108+
109+
---
110+
111+
112+
## Contributing
113+
114+
We welcome contributions from the community. To contribute:
115+
116+
1. Fork the repository.
117+
2. Create a feature branch.
118+
3. Submit a pull request detailing your contribution.
119+
120+
For any issues or suggestions, please open a discussion or issue on the repository.
121+
122+
---
123+
124+
## License
125+
126+
This project is licensed under the [MIT License](LICENSE). Feel free to use, modify, and distribute it in compliance with the terms of the license.
127+
128+
---
129+
130+
## Contact
131+
132+
For inquiries or further information, please contact via the repository issue tracker or email (if applicable).
133+

app.py

Lines changed: 76 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,76 @@
1+
# app.py
2+
3+
import streamlit as st
4+
from rag_model import RAGSystem
5+
6+
def main():
7+
# Page configuration
8+
st.set_page_config(
9+
page_title="PDF Question Answering System",
10+
page_icon="📚",
11+
layout="wide"
12+
)
13+
14+
# Initialize session state
15+
if 'rag_system' not in st.session_state:
16+
st.session_state.rag_system = RAGSystem()
17+
if 'query_engine' not in st.session_state:
18+
st.session_state.query_engine = None
19+
if 'pdf_processed' not in st.session_state:
20+
st.session_state.pdf_processed = False
21+
22+
# Main title
23+
st.title("📚 PDF Question Answering System")
24+
25+
# Sidebar
26+
st.sidebar.header("Upload PDF")
27+
uploaded_file = st.sidebar.file_uploader("Choose a PDF file", type="pdf")
28+
29+
# Process PDF when uploaded
30+
if uploaded_file:
31+
with st.spinner("Processing PDF... This might take a minute..."):
32+
try:
33+
success = st.session_state.rag_system.process_pdf(uploaded_file.getvalue())
34+
if success:
35+
st.session_state.query_engine = st.session_state.rag_system.get_query_engine()
36+
st.session_state.pdf_processed = True
37+
st.sidebar.success("PDF processed successfully!")
38+
else:
39+
st.sidebar.error("Error processing PDF!")
40+
except Exception as e:
41+
st.sidebar.error(f"Error: {str(e)}")
42+
43+
# Main content area
44+
st.header("Ask a Question")
45+
question = st.text_input("Enter your question about the PDF content:")
46+
47+
# Generate response
48+
if st.button("Get Answer"):
49+
if not question:
50+
st.warning("Please enter a question!")
51+
elif not st.session_state.pdf_processed:
52+
st.warning("Please upload a PDF first!")
53+
else:
54+
with st.spinner("Generating answer..."):
55+
try:
56+
response = st.session_state.rag_system.generate_response(
57+
st.session_state.query_engine,
58+
question
59+
)
60+
st.subheader("Answer")
61+
st.write(response)
62+
except Exception as e:
63+
st.error(f"Error: {str(e)}")
64+
65+
# Instructions
66+
with st.sidebar.expander("ℹ️ Usage Instructions"):
67+
st.write("""
68+
1. Upload a PDF file using the uploader above
69+
2. Wait for the PDF to be processed
70+
3. Type your question in the main panel
71+
4. Click 'Get Answer' to generate a response
72+
5. The system will analyze the PDF content and provide a relevant answer
73+
""")
74+
75+
if __name__ == "__main__":
76+
main()

0 commit comments

Comments
 (0)