Parse multiple bookmark files and merge into a single collection.
Parameters:
paths: List of file paths to parse (HTML or JSON)
Returns: BookmarkCollection with all parsed bookmarks
Example:
from bookmark_checker.core.parsers import parse_many
collection = parse_many(["bookmarks.html", "chrome.json"])Parse Netscape HTML bookmark file.
Parameters:
path: Path to HTML file
Returns: BookmarkCollection with parsed bookmarks
Parse Chrome/Chromium JSON bookmark file.
Parameters:
path: Path to JSON file
Returns: BookmarkCollection with parsed bookmarks
group_duplicates(collection: BookmarkCollection, similarity_threshold: int = 85, enable_fuzzy: bool = True) -> tuple[dict[str, list[Bookmark]], list[dict[str, Any]]]
Group duplicate bookmarks by canonical URL, with optional fuzzy title matching.
Parameters:
collection: Collection to deduplicatesimilarity_threshold: Minimum similarity score (0-100) for fuzzy matchingenable_fuzzy: Whether to enable fuzzy title matching within same domain
Returns: Tuple of (grouped dict, report list)
Example:
from bookmark_checker.core.dedupe import group_duplicates
grouped, report = group_duplicates(collection, similarity_threshold=90)Annotate bookmarks with canonical URLs and normalized titles.
Parameters:
collection: Collection to annotate in-place
merge_collections(collection: BookmarkCollection, similarity_threshold: int = 85, enable_fuzzy: bool = True) -> tuple[BookmarkCollection, list[dict[str, Any]]]
Merge duplicate bookmarks, selecting representatives and organizing by domain.
Parameters:
collection: Collection to mergesimilarity_threshold: Minimum similarity score for fuzzy matchingenable_fuzzy: Whether to enable fuzzy title matching
Returns: Tuple of (merged collection, dedupe report)
Export collection to Netscape HTML format.
Parameters:
collection: Collection to exportpath: Output file path
Export deduplication report to CSV.
Parameters:
report: List of report dictionariespath: Output CSV file path
Canonicalize a URL by removing tracking parameters and normalizing.
Parameters:
url: URL string to canonicalize
Returns: Canonicalized URL string
Extract domain from URL.
Parameters:
url: URL string
Returns: Domain string (empty if invalid)
Normalize whitespace: collapse multiple spaces/tabs/newlines to single space.
Parameters:
s: String to normalize
Returns: Normalized string
Represents a single bookmark.
Attributes:
url: str- Original URLtitle: str- Bookmark titleadded: datetime | None- Date addedfolder_path: str- Folder pathsource_file: str- Source file pathcanonical_url: str- Canonicalized URLmeta: dict[str, Any]- Additional metadata
Collection of bookmarks with metadata.
Methods:
add(bookmark: Bookmark) -> None- Add a bookmarkextend(bookmarks: list[Bookmark]) -> None- Add multiple bookmarks__len__() -> int- Return number of bookmarks__iter__() -> Iterator[Bookmark]- Iterate over bookmarks
from bookmark_checker.core.parsers import parse_many
from bookmark_checker.core.merge import merge_collections
from bookmark_checker.core.exporters import export_netscape_html, export_dedupe_report_csv
# Parse bookmarks
collection = parse_many(["chrome.html", "firefox.html"])
# Merge duplicates
merged, report = merge_collections(collection, similarity_threshold=90)
# Export results
export_netscape_html(merged, "merged.html")
export_dedupe_report_csv(report, "report.csv")