Associative Named Entity Graph Model (2016)

This project explores early machine learning approaches to Named Entity Recognition (NER), token co-occurrence modeling, and semantic association using SQL-based graph construction. Built in 2016, the system processes domain-specific text (e.g., Wikipedia articles on F. Scott Fitzgerald and The Great Gatsby) to identify, normalize, and associate named entities based on statistical tokenization and paragraph-level context.

🔍 Overview

Goal: Extract and associate named entities through co-occurrence and weighting
Data Source: Wikipedia articles and original texts by F. Scott Fitzgerald
Tech Stack: SQL Server (T-SQL), custom stored procedures, early-stage neural embeddings (described, not included in this repo)

🧠 Architecture

Corpus Input
- Load paragraph-tagged Wikipedia articles into an input table.
Term Normalization (ProcessTermsList.sql)
- Standardizes inflected forms ("writer", "writers", "writing")
- Looks up or inserts TermIds
- Assigns statistical weights based on historical word use
Proper Matching & Feature Engineering (ProcessPropers.sql)
- Matches normalized terms to known "Propers" (entity names)
- Computes metrics: word count, total/weighted overlap
- Filters and flags strong associations for graph building
Named Entity Graph Construction (ProcessPropersAssociate.sql, AssociateTermsList.sql)
- Each entity becomes a graph node (AMM_Node)
- Nodes are linked via associations (AMM_Association) based on:
  - Term overlap
  - Co-occurrence in the same paragraph
Scoring (CalculateAssociationWeight.sql)
- Computes association scores using weighted overlap:
```
Score = ((TotalWeight + 1) * CountedWeightSum / CountedWeightDistinct) / 10
```

🧬 Pipeline Design

This project is structured as a classic data pipeline:

Modular Steps: Each SQL script performs a focused transformation: from raw text to tokens, from tokens to entities, from entities to graphs.
Sequential Flow: Output from each step becomes input for the next (e.g., token normalization feeds entity matching).
Intermediate Tables: Temporary and permanent tables store results between stages (AMM_ProcessInputTerm, AMM_ProcessProper, etc).
Automated Processing: Each stored procedure acts as a reusable module that can be applied to new input.
End-to-End Transformation: Raw Wikipedia text is converted into structured, ranked graphs for downstream use.

📦 Contents

SQL/
├── AMM/
│   └── AMM_Database_All_Objects_2016_04.sql   # Full database schema and procedures
│   └── LookupAssociationViewPart1.sql         # View for association lookups (part 1)
│   └── LookupAssociationViewPart2.sql         # View for association lookups (part 2)
│
├── WikipediaSql/
    ├── ProcessTermsList.sql                   # Normalize tokens and assign IDs/weights
    ├── ProcessPropers.sql                     # Match terms to entities and compute features
    ├── ProcessPropersAssociate.sql            # Build graph nodes and weighted associations
    ├── AssociateTermsList.sql                 # Build term-based graph using paragraph grouping
    ├── CalculateAssociationWeight.sql         # Score entity associations
    ├── Delete Titles With Prefixes.sql        # Clean up noise from wiki namespaces

🧪 Example Use Case

Using input from The Great Gatsby Wikipedia article, the model identifies and connects entities such as:

F. Scott Fitzgerald
Zelda Fitzgerald
Jazz Age, Lost Generation, American Dream
Nick Carraway, Jay Gatsby, Daisy Buchanan

These are clustered based on term overlap and paragraph proximity, yielding semantically relevant association graphs.

📌 Notes

This project predates modern transformers and neural NLP frameworks.
Most logic is embedded in T-SQL stored procedures.
Neural embeddings were experimented with using exported node graphs (not included here).

📚 Credit

Developed by Korova Mode at AmoryTech in 2016.

This repository is for archival and educational purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 2 Commits
SQL		SQL
README.md		README.md
associative_graph_model_summary.md		associative_graph_model_summary.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Associative Named Entity Graph Model (2016)

🔍 Overview

🧠 Architecture

🧬 Pipeline Design

📦 Contents

🧪 Example Use Case

📌 Notes

📚 Credit

About

Uh oh!

Releases

Packages

Languages

markiskorova/associative-entity-graph

Folders and files

Latest commit

History

Repository files navigation

Associative Named Entity Graph Model (2016)

🔍 Overview

🧠 Architecture

🧬 Pipeline Design

📦 Contents

🧪 Example Use Case

📌 Notes

📚 Credit

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages