Surf more than 21 million name ideas across more than 400,000 name collections,
or generate infinite related name suggestions.
NameGraph is currently in beta. We are excited to share our work with you and continue to build the greatest web of names in history!
NameGraph is a web service that generates name suggestions for a given input label. It is implemented using FastAPI and provides a variety of endpoints to generate suggestions in different modes and with different parameters.
The input label is analyzed to determine the most relevant name suggestions. The analysis includes:
- Defining all possible interpretations of the input label along with their probabilities (whether it is a sequence of common words, a person name, what is the language, etc.)
- For each interpretation, determining most probable tokenizations (e.g.
armstrong
->["armstrong"]
,armstrong
->["arm", "strong"]
)
The suggestions are later generated based on these interpretations, tokenizations being especially important, since many generators greatly rely on them. This is why the endpoints can handle pretokenized input.
Collections are curated sets of names that serve as a core component of NameGraph's name suggestion system. The system maintains a vast database of over 400,000 name collections containing more than 21 million unique names. Each collection is stored in Elasticsearch and contains:
- A unique collection ID
- Collection title and description
- Collection rank and metadata
- Member names with their normalized and tokenized forms
- Collection types and categories
- Related collections
Collections are used in several key ways:
-
Direct Name Generation:
- Searches collections based on input tokens
- Uses learning-to-rank models to find relevant collections
-
Related Collections:
- Finds collections with similar themes and content
- Ensures diverse suggestions across different categories
-
Membership Lookup:
- Discovers collections containing specific names
- Enables finding thematically related names
The collections are maintained and updated through our NameGraph Collections project, ensuring the suggestion database stays current and comprehensive.
Generators are core components that create name suggestions through different methods. Each generator inherits from the base NameGenerator class and implements specific name generation strategies. They can be grouped into the categories as shown in the diagram below:
NameGraph supports three modes for processing requests:
-
Instant Mode (
instant
):- Fastest response time
- More basic name generations
- Some advanced generators like W2VGenerator are disabled (weight multiplier = 0)
- Often used for real-time suggestions
-
Domain Detail Mode (
domain_detail
):- Intermediate between instant and full
- More comprehensive than instant, but still optimized for performance
- Some generators have reduced weights compared to full mode
- Expanded search window for collection ranking and sampling
-
Full Mode (
full
):- Most comprehensive name generation
- Includes all enabled generators
- Uses full weights for most generators
- Accesses advanced generators like
Wikipedia2VGenerator
andW2VGenerator
- Takes longer to process, but provides the most diverse results
Different generators are enabled/disabled for each mode. Take a look at the generators diagram to see which generators are available in each mode.
Icon | Mode | Description |
---|---|---|
Instant | Fastest response, basic generators only | |
Domain Detail | Balanced speed/quality, expanded search | |
Full | Comprehensive generation with all generators |
The sampler is a sophisticated component that manages the selection and generation of name suggestions. It implements a probabilistic sampling algorithm that balances diversity, relevance, and efficiency while respecting various constraints.
-
Request Parameters:
mode
: Determines which generators are active (instant
/domain_detail
/full
)min_suggestions
: Minimum number of suggestions to returnmax_suggestions
: Maximum number of suggestions to returnmin_available_fraction
: Minimum fraction of suggestions that must be available
-
Interpretations: Each input name can have multiple interpretations, characterized by:
- Type (
ngram
,person
,other
) - Language
- Probability score
- Possible tokenizations
- Type (
The sampler uses a probabilistic approach to generate diverse and relevant name suggestions:
flowchart TD
A[Start] --> B{Enough suggestions?}
B -->|Yes| Z[End]
B -->|No| C{All probabilities = 0?}
C -->|Yes| Z
C -->|No| D[Sample type & language]
D --> E["Sample tokenization"]
E --> F[Sample pipeline]
F --> G{Pipeline exceeds limit?}
G -->|Yes| F
G -->|No| H[Get suggestion from pipeline]
H --> I{Any suggestions left?}
I -->|Yes| J{Already sampled?}
I -->|No| F
J -->|Yes| H
J -->|No| K{Available if required?}
K -->|No| H
K -->|Yes| L{Normalized?}
L -->|No| H
L -->|Yes| B
The algorithm works as follows:
-
Initialization: For each type-language pair, pipeline probabilities are computed.
-
Main Loop: The sampler iterates until either:
- Enough suggestions are generated (
max_suggestions
met) - All pipeline probabilities become zero
- Enough suggestions are generated (
-
Sampling Process:
- First samples a type and language pair
- Then samples a specific tokenization within that pair
- Selects a pipeline using probability-based sampling
- First pass uses sampling without replacement for diversity
-
Validation Checks:
- Verifies pipeline hasn't exceeded its global limit
- Ensures suggestions aren't duplicates
- Checks availability status if required
- Confirms normalization status
-
Pipeline Management:
- Exhausted pipelines are removed from the sampling pool
- When a pipeline can't generate more suggestions, falls back to other pipelines
This approach ensures a balanced mix of suggestions while maintaining efficiency and respecting all configured constraints.
NameGraph uses Poetry for dependency management and packaging. Before getting started, make sure you have Poetry installed on your system.
Install Poetry if you haven't already:
curl -sSL https://install.python-poetry.org | python3 -
Visit Poetry installation guide for more details.
Clone the repository and install dependencies:
git clone https://github.com/namehash/namegraph.git
cd namegraph
poetry install
Additional resources need to be downloaded. Run these commands within the Poetry environment:
poetry run python download.py # dictionaries, embeddings
poetry run python download_names.py
NameGraph uses Hydra - a framework for elegantly configuring complex applications. The configuration is stored in the conf/
directory and includes:
- Main configuration files (
prod_config_new.yaml
,test_config_new.yaml
) with core settings like connections, filters, limits, and paths - Pipeline configurations in
conf/pipelines/
defining generators, modes, categories, and language settings
The configuration is highly modular and can be easily modified to adjust the behavior of name generation, filtering, and ranking systems.
Start server using Poetry:
poetry run uvicorn web_api:app --reload
Query with POST:
curl -d '{"label":"armstrong"}' -H "Content-Type: application/json" -X POST http://localhost:8000
Query with POST (pretokenized input):
curl -d '{"label":"\"arm strong\""}' -H "Content-Type: application/json" -X POST http://localhost:8000
Note: pretokenized input should be wrapped in double quotes.
The API documentation is available at /docs
or /redoc
when the server is running. These are auto-generated Swagger/OpenAPI docs provided by FastAPI that allow you to:
- View all available endpoints
- See request/response schemas
- See descriptions of each parameter and response field
- Test API calls directly from the browser
Public API documentation is available at api.namegraph.dev/docs.
Run tests using Poetry:
poetry run pytest
Tests that interact with external services (Elasticsearch) are marked with integration_test
marker and are disabled by default. Define environment variables needed to access Elasticsearch and run them using:
poetry run pytest -m "integration_test"
To access the LTR features, you need to configure it in the Elasticsearch instance (see here for more details).