This Streamlit-based app provides a visual interface for processing JSON files generated from Unstructured.io. It enables users to interactively select element types and metadata for analysis and display. The app is designed for ease of use, catering to both data analysis and automation needs.
- File Upload and Processing: Allows for the uploading and processing of JSON files to extract relevant elements and metadata.
- Interactive Element and Metadata Selection: Users can choose from a list of predefined categories and metadata types for detailed analysis.
- Flexible Display Options: Offers the choice to view raw text or text annotated with selected metadata, alongside built-in cleaning functions for data refinement.
- Script Generation: Facilitates the creation of a Python script to automate the processing of either a single file or multiple files in a directory based on user-defined selections.
- Download Functionality: Provides the option to download the current display view as text or the generated Python script for offline use.
- Install Requirements: First, ensure that all required packages listed in
requirements.txt
are installed. This can be done usingpip install -r requirements.txt
. Though not required, we reccomend the use of a virtual environment to avoid package conflicts. - Run the App: Start the app by navigating your terminal's working directory to the source folder (the folder that holds the app script) and then running
streamlit run app.py
. - Interact with the App:
- Upload a File: Use the file upload option to upload the file you want to process. The app supports JSON files.
- Select Categories and Metadata: On the app's sidebar, select the categories and metadata types you wish to include from the predefined lists.
- View Filtered Data: The app will process the uploaded file and display the filtered elements and metadata based on your selections.
- Download Data: If you wish to download the filtered text, use the provided download button.
Example Of Loading And Viewing Text
- Navigate Terminal To Folder:
cd "/path/to_folder_that_holds_app"
- Create Virtual Environment:
python3 -m venv env
- Activate The Virtual Environment: Run the app using Streamlit:
source env/bin/activate
- Install Necessary Packages:
pip install -r requirements.txt
- Run The App:
streamlit run app.py
- Stopping the App: To stop the app, use the following keyboard shortcut in your terminal (on Mac OS):
Control + C
- Deactivate The Environment:
deactivate
- Please reach out with an email to [email protected]
-
For an open-source library that allows users to create structured data from unstructured text, please navigate to Unstructured.io's GitHub at: Unstructured.IO's GitHub page.
-
For their documentation and soon-to-be-released enterprise platform, please navigate to Unstructured.IO's website.