Authors: Neelima Prasad, Advait Deshmukh and Karthik Sairam
Class: CSCI 7000 - Topics in Neuro-symbolic NLP
This is our neuro-symbolic approach for solving SemEval 2025's Task 9 : Food Hazard Detection
Below is an overview of our repository
The datasets folder contains a csv of the manually labeled English food recall titles. There are two different files, however we mainly work with incidents_train.csv. The other file is a sample of the data
The data analysis file sifts through the data and gathers all the product and hazard categories and filters duplicates.
This folder is where intermediate files for our pipeline are stored.
This folder contains tsv files for all of the following (obtained by processing titles/labels through CoCo-Ex): input titles, products, hazards, product categories, hazard categories. These files are fed into LLaMA in the pipeline.
This folder contains all the extracted keywords that Llama generated for each of file output by COCO-Ex (json file): input titles, products, hazards, product categories, hazard categories.
This folder contains three notebooks : LLAMA_Keyword_Extraction.ipynb, conceptnet_lite.ipynb , NLP__Final_Pipeline.ipynb.
LLAMA_Keyword_Extraction is the code we use to extract the relevent keywords from the titles.
conceptnet_lite is where we built our own sub-ConceptNet with limited relations and only the english language.
NLP__Final_Pipeline is the code we use to tie everything together and generate our results