Skip to content

namehash/namegraph

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation



NameGraph

Surf more than 21 million name ideas across more than 400,000 name collections,
or generate infinite related name suggestions.

Python Build Status Python CI Status status: beta

Project Status

NameGraph is currently in beta. We are excited to share our work with you and continue to build the greatest web of names in history!

Overview

NameGraph is a web service that generates name suggestions for a given input label. It is implemented using FastAPI and provides a variety of endpoints to generate suggestions in different modes and with different parameters.

Label Analysis

The input label is analyzed to determine the most relevant name suggestions. The analysis includes:

  • Defining all possible interpretations of the input label along with their probabilities (whether it is a sequence of common words, a person name, what is the language, etc.)
  • For each interpretation, determining most probable tokenizations (e.g. armstrong -> ["armstrong"], armstrong -> ["arm", "strong"])

The suggestions are later generated based on these interpretations, tokenizations being especially important, since many generators greatly rely on them. This is why the endpoints can handle pretokenized input.

Collections

Collections are curated sets of names that serve as a core component of NameGraph's name suggestion system. The system maintains a vast database of over 400,000 name collections containing more than 21 million unique names. Each collection is stored in Elasticsearch and contains:

  • A unique collection ID
  • Collection title and description
  • Collection rank and metadata
  • Member names with their normalized and tokenized forms
  • Collection types and categories
  • Related collections

Collections are used in several key ways:

  1. Direct Name Generation:

  2. Related Collections:

    • Finds collections with similar themes and content
    • Ensures diverse suggestions across different categories
  3. Membership Lookup:

    • Discovers collections containing specific names
    • Enables finding thematically related names

The collections are maintained and updated through our NameGraph Collections project, ensuring the suggestion database stays current and comprehensive.

Generators

Generators are core components that create name suggestions through different methods. Each generator inherits from the base NameGenerator class and implements specific name generation strategies. They can be grouped into the categories as shown in the diagram below:

NameKit

Modes

NameGraph supports three modes for processing requests:

  • Instant Mode (instant):

    • Fastest response time
    • More basic name generations
    • Some advanced generators like W2VGenerator are disabled (weight multiplier = 0)
    • Often used for real-time suggestions
  • Domain Detail Mode (domain_detail):

    • Intermediate between instant and full
    • More comprehensive than instant, but still optimized for performance
    • Some generators have reduced weights compared to full mode
    • Expanded search window for collection ranking and sampling
  • Full Mode (full):

    • Most comprehensive name generation
    • Includes all enabled generators
    • Uses full weights for most generators
    • Accesses advanced generators like Wikipedia2VGenerator and W2VGenerator
    • Takes longer to process, but provides the most diverse results

Different generators are enabled/disabled for each mode. Take a look at the generators diagram to see which generators are available in each mode.

Icon Mode Description
Instant Instant Fastest response, basic generators only
Domain Detail Domain Detail Balanced speed/quality, expanded search
Full Full Comprehensive generation with all generators

Sampler

The sampler is a sophisticated component that manages the selection and generation of name suggestions. It implements a probabilistic sampling algorithm that balances diversity, relevance, and efficiency while respecting various constraints.

Key Components

  • Request Parameters:

    • mode: Determines which generators are active (instant/domain_detail/full)
    • min_suggestions: Minimum number of suggestions to return
    • max_suggestions: Maximum number of suggestions to return
    • min_available_fraction: Minimum fraction of suggestions that must be available
  • Interpretations: Each input name can have multiple interpretations, characterized by:

    • Type (ngram, person, other)
    • Language
    • Probability score
    • Possible tokenizations

Sampling Algorithm

The sampler uses a probabilistic approach to generate diverse and relevant name suggestions:

flowchart TD
    A[Start] --> B{Enough suggestions?}
    B -->|Yes| Z[End]
    B -->|No| C{All probabilities = 0?}
    C -->|Yes| Z
    C -->|No| D[Sample type & language]
    D --> E["Sample tokenization"]
    E --> F[Sample pipeline]
    F --> G{Pipeline exceeds limit?}
    G -->|Yes| F
    G -->|No| H[Get suggestion from pipeline]
    H --> I{Any suggestions left?}
    I -->|Yes| J{Already sampled?}
    I -->|No| F
    J -->|Yes| H
    J -->|No| K{Available if required?}
    K -->|No| H
    K -->|Yes| L{Normalized?}
    L -->|No| H
    L -->|Yes| B
Loading

The algorithm works as follows:

  1. Initialization: For each type-language pair, pipeline probabilities are computed.

  2. Main Loop: The sampler iterates until either:

    • Enough suggestions are generated (max_suggestions met)
    • All pipeline probabilities become zero
  3. Sampling Process:

    • First samples a type and language pair
    • Then samples a specific tokenization within that pair
    • Selects a pipeline using probability-based sampling
    • First pass uses sampling without replacement for diversity
  4. Validation Checks:

    • Verifies pipeline hasn't exceeded its global limit
    • Ensures suggestions aren't duplicates
    • Checks availability status if required
    • Confirms normalization status
  5. Pipeline Management:

    • Exhausted pipelines are removed from the sampling pool
    • When a pipeline can't generate more suggestions, falls back to other pipelines

This approach ensures a balanced mix of suggestions while maintaining efficiency and respecting all configured constraints.

Usage

NameGraph uses Poetry for dependency management and packaging. Before getting started, make sure you have Poetry installed on your system.

Prerequisites

Install Poetry if you haven't already:

curl -sSL https://install.python-poetry.org | python3 -

Visit Poetry installation guide for more details.

Install

Clone the repository and install dependencies:

git clone https://github.com/namehash/namegraph.git
cd namegraph
poetry install

Download resources

Additional resources need to be downloaded. Run these commands within the Poetry environment:

poetry run python download.py  # dictionaries, embeddings
poetry run python download_names.py

Configuration

NameGraph uses Hydra - a framework for elegantly configuring complex applications. The configuration is stored in the conf/ directory and includes:

  • Main configuration files (prod_config_new.yaml, test_config_new.yaml) with core settings like connections, filters, limits, and paths
  • Pipeline configurations in conf/pipelines/ defining generators, modes, categories, and language settings

The configuration is highly modular and can be easily modified to adjust the behavior of name generation, filtering, and ranking systems.

REST API

Start server using Poetry:

poetry run uvicorn web_api:app --reload

Query with POST:

curl -d '{"label":"armstrong"}' -H "Content-Type: application/json" -X POST http://localhost:8000

Query with POST (pretokenized input):

curl -d '{"label":"\"arm strong\""}' -H "Content-Type: application/json" -X POST http://localhost:8000

Note: pretokenized input should be wrapped in double quotes.

Documentation

The API documentation is available at /docs or /redoc when the server is running. These are auto-generated Swagger/OpenAPI docs provided by FastAPI that allow you to:

  • View all available endpoints
  • See request/response schemas
  • See descriptions of each parameter and response field
  • Test API calls directly from the browser

Public API documentation is available at api.namegraph.dev/docs.

Tests

Run tests using Poetry:

poetry run pytest

Tests that interact with external services (Elasticsearch) are marked with integration_test marker and are disabled by default. Define environment variables needed to access Elasticsearch and run them using:

poetry run pytest -m "integration_test"

Learning-To-Rank

To access the LTR features, you need to configure it in the Elasticsearch instance (see here for more details).

About

Help your users discover ENS names they love with NameGraph.

Topics

Resources

License

Stars

Watchers

Forks

Languages