Skip to content

Commit c26951a

Browse files
Increase scroll duration
1 parent 83677bd commit c26951a

File tree

1 file changed

+8
-1
lines changed
  • archive_query_log/downloaders

1 file changed

+8
-1
lines changed

archive_query_log/downloaders/warc.py

Lines changed: 8 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -116,7 +116,14 @@ def download_serps_warc(config: Config) -> None:
116116
echo("No new/changed captures.")
117117
return
118118

119-
changed_serps: Iterable[Serp] = changed_serps_search.scan()
119+
changed_serps: Iterable[Serp] = (
120+
changed_serps_search
121+
# Downloading WARCs is very slow, so we keep track
122+
# of the Elasticsearch query for a full day, assuming that
123+
# 1000 WARCs can be downloaded in 24h.
124+
.params(scroll="24h")
125+
.scan()
126+
)
120127
changed_serps = safe_iter_scan(changed_serps)
121128
# noinspection PyTypeChecker
122129
changed_serps = tqdm(changed_serps, total=num_changed_serps,

0 commit comments

Comments
 (0)