Connect data filtering, visualization, PII pipeline into datastore

Datastore is an extension of datasets that interoprates with sqlite and memmap. The reason I created it was because I wanted a low resource way to do data processing on datasets. Arrow can be cumbersome for things like update. And we get the power of SQL and full text search. Note that the full text search is not a service that requires a server, but rather based on sqlite itself and not as flexible as the indexsearch @ggdupont is working on.

However, it would be cool to connect the ac_dc filtering and PII pipeline to datastore so we can do things like
- load dataset X
- ac/dc filter
- PII process
- full text index in sqlite
- run through distiluse and memmap vector to memmap column
- and visualize a subset based on perpleixty param, registry param and full text search 

This is not an immediate need, but it would provide a low compute, low resource (no servers needed) tool that will promote equal access to language tech to different researchers around the world.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Connect data filtering, visualization, PII pipeline into datastore #262

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Connect data filtering, visualization, PII pipeline into datastore #262

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions