GitHub - kawalcovid19/wbw-gsheets-crawler: Google Sheets crawler for wargabantuwarga.com

WBW GSheets Crawler

Crawler engine to ingest WBW gsheets into typesense-server for real-time search.

INSTALLATION

You need typesense key with write access to ingest data into server.

cp .env.example .env
yarn install

Modify TYPESENSE_HOST, TYPESENSE_PORT, TYPESENSE_PROTOCOL and TYPESENSE_KEY afterwards. Then you're good to go.

WRITING CRAWLER

IMPORTANT:
Google sheets must be published to web in order to be crawled

Crawler will read all scripts in metadata directory to intepret sheet structure. Each script represent an index and must contains:

schema : typesense schema object. See here for reference.
sheetId : a public google-sheets ID i.e.
https://docs.google.com/spreadsheets/u/1/d/<SHEET_ID>/view
indexId : typesense's index name. Must have wbw- prefix.
worksheet : List of worksheets in given gsheets

A field named order must be defined manually in the metadata with data int32 data type as sortable field

Every data row, id and sheet fields will be added to mark which worksheet it's originated.

Index will be made automatically when it's not present. To prevent server rejection, crawling process will be executed sequentially (not in parallel).

DEVELOPMENT NOTES

We have two flags to make development easier:

--test_script=SCRIPTNAME will only execute given script in the metadata/ directory
--dry_run to run as dry run mode / not inserting data into typesense

CONTRIBUTING GUIDELINES

Please refers to wargabantuwarga contributing guidelines

Name		Name	Last commit message	Last commit date
Latest commit History 27 Commits
.github/workflows		.github/workflows
metadata		metadata
.env.example		.env.example
.gitignore		.gitignore
README.md		README.md
package.json		package.json
start.js		start.js
utils.js		utils.js
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

WBW GSheets Crawler

INSTALLATION

WRITING CRAWLER

DEVELOPMENT NOTES

CONTRIBUTING GUIDELINES

About

Releases

Packages

Languages

kawalcovid19/wbw-gsheets-crawler

Folders and files

Latest commit

History

Repository files navigation

WBW GSheets Crawler

INSTALLATION

WRITING CRAWLER

DEVELOPMENT NOTES

CONTRIBUTING GUIDELINES

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages