Skip to content

Commit caf20ab

Browse files
committed
fixedd test 2 bed nn
1 parent 8b214a4 commit caf20ab

File tree

4 files changed

+22
-10
lines changed

4 files changed

+22
-10
lines changed

autodoc.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -99,4 +99,4 @@
9999
with open(out, "w") as stream:
100100
stream.write(md_result)
101101
else:
102-
print("Skipping jupyter notebooks")
102+
print("Skipping jupyter notebooks")

ci/scripts/count_records.py

Lines changed: 17 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -7,15 +7,26 @@
77

88
parser = ArgumentParser(description="Count records in a PostgreSQL table and verify")
99

10-
parser.add_argument("-t", "--table", help="Table to count records in",
11-
required=True, type=str)
12-
parser.add_argument("-e", "--expected-count", help="Expected number of records",
13-
type=int, required=False, default=None)
10+
parser.add_argument(
11+
"-t", "--table", help="Table to count records in", required=True, type=str
12+
)
13+
parser.add_argument(
14+
"-e",
15+
"--expected-count",
16+
help="Expected number of records",
17+
type=int,
18+
required=False,
19+
default=None,
20+
)
1421

1522
args = parser.parse_args()
1623

17-
bbc = BedBaseConf(get_bedbase_cfg('$GITHUB_WORKSPACE/ci/cfg/config_min.yaml'))
24+
bbc = BedBaseConf(get_bedbase_cfg("$GITHUB_WORKSPACE/ci/cfg/config_min.yaml"))
1825
row_count = bbc._count_rows(table_name=args.table)
1926
if args.expected_count:
20-
assert row_count == args.expected_count, "Number of records in the '{}' table ({}) not equal {}".format(args.table, row_count, args.expected_count)
27+
assert (
28+
row_count == args.expected_count
29+
), "Number of records in the '{}' table ({}) not equal {}".format(
30+
args.table, row_count, args.expected_count
31+
)
2132
sys.exit(0)

docs/geniml/tutorials/text2bednn-search-interface.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -1,15 +1,15 @@
11
# How to create a natural language search backend for BED files
22
The metadata of each BED file is needed to build a natural language search backend. BED files embedding vectors are created by
3-
`Region2Vec`, and metadata embedding vectors are created by [`FastEmbed`](https://github.com/qdrant/fastembed), [`SentenceTransformers`](https://www.sbert.net/), or other text embedding models.
3+
`Region2Vec` model, and metadata embedding vectors are created by [`FastEmbed`](https://github.com/qdrant/fastembed), [`SentenceTransformers`](https://www.sbert.net/), or other text embedding models.
44

55
`Vec2VecFNN`, a feedforward neural network (FNN), is trained to maps vectors from the embedding space of natural language to the embedding
6-
space of BED files. When a natural language query string is given, it will first be encoded to a vector by the text embedding model, and that
6+
space of BED files. When a natural language query string is given, it will first be encoded to a vector by the text embedding model, and then created
77
vector will be encoded to a query vector by the FNN. `search` backend can perform k-nearest neighbors (KNN) search among the stored BED
88
file embedding vectors, and the BED files whose embedding vectors are closest to that query vector are the search results.
99

1010
## Store embedding vectors
1111
It is recommended to use `geniml.search.backend.HNSWBackend` to store embedding vectors. In the `HNSWBackend` that stores each BED file embedding
12-
vector, the `payload` should contain the name of BED file. In the `HNSWBackend` that stores the embedding vectors of each
12+
vector, the `payload` should contain the name or identifier of BED file. In the `HNSWBackend` that stores the embedding vectors of each
1313
metadata string, the `payload` should contain the original string text and the names of BED files that have that string in metadata.
1414

1515
## Train the model

mkdocs.yml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -102,6 +102,7 @@ nav:
102102
- Fine-tune embeddings: geniml/tutorials/fine-tune-region2vec-model.md
103103
- Randomize bed files: geniml/tutorials/bedshift.md
104104
- Create evaluation dataset with bedshift: geniml/tutorials/bedshift-evaluation-guide.md
105+
- Create search backend: geniml/tutorials/text2bednn-search-interface.md
105106
- Reference:
106107
- How to cite: citations.md
107108
- API documentation: geniml/autodoc_build/geniml.md

0 commit comments

Comments
 (0)