-
Notifications
You must be signed in to change notification settings - Fork 361
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add support for improved handling of jupyter notebooks #105
feat: add support for improved handling of jupyter notebooks #105
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I tested this with:
https://raw.githubusercontent.com/cyclotruc/test/refs/heads/main/Exploration%20of%20Airline%20On-Time%20Performance.ipynb and got:
File "/workspaces/gitingest/src/gitingest/notebook_utils.py", line 30, in process_notebook
for cell in notebook["cells"]:
~~~~~~~~^^^^^^^^^
KeyError: 'cells'
The use of
|
7e61807
to
3a7e4b8
Compare
@cyclotruc Tests added for the |
…function, and add tests for notebook processing
f505f23
to
f6c3a0b
Compare
@filipchristiansen liked your PR , can you create a new PR such that ,cell number, cell type are commented above the source and if cells[output][-1]['text'] are commented below , also we can make such that that results always init_s with "### Jupyter-Notebook" If you are busy , do mention , I would create a PR in that case |
What do you mean by For the second point, may I ask your use case for this? You would still identify that it is a notebook based on the |
What do you (@cyclotruc) say about the suggestion to start each notebook with
|
1st part else-if it ran and has output then "outputs":[{"name":"stderr","output_type":"stream","text":"/usr/local/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html\n\n from .autonotebook import tqdm as notebook_tqdm\n"}] for nth-cell in cell: outputs=nth-cell.get("outputs","") ; if outputs: output = outputs[-1]["text"] ; del outputs # [-1] always gets last ran output 2nd part |
|
This PR introduces the
process_notebook
function to process.ipynb
files and return them as Python scripts, converting markdown and raw cells into multi-line string literals. It also refactors the function nameingest_from_query
torun_ingest_query
iningest_from_query.py
to avoid naming conflicts with the module, ensuring clearer code organization.Changes include:
process_notebook
function to handle Jupyter notebooks.ingest_from_query
function torun_ingest_query
to avoid naming conflicts with the module._read_file_content
to invokeprocess_notebook
for.ipynb
files.test_notebook_utils.py
for the notebook processing logic.test_ingest.py
to verify that.ipynb
files triggerprocess_notebook
.These changes integrate Jupyter notebook processing into the file ingestion workflow, while also improving code clarity and test coverage.