Skip to content

Datagrowth (module)

Compare
Choose a tag to compare
@fako fako released this 07 Feb 21:07
· 470 commits to master since this release
7ed9d8d

A release that moves code shared between projects into a new module that will split off from the repo in a future release

  • Adds logs under project directory
  • Abandons Conda
  • Growth commands can log to "datagrowth.command" to separate them from task processing output.
  • Introduces a data dir that can be set on a machine basis. To share data between machines.
  • Adds a QuerysetProcessor as a performative alternative for output processors
  • Moves Resource, HttpResource and ShellResource into datagrowth
  • Adds file deletion handler that can be connected to Resources
  • Adds Collective.to_disk to easily use data in notebooks today
  • Refactors ExtractProcessor to share code between HTML and XML
  • Adds ShellResourceProcessor to execute shells on task servers
  • Adds Tika as the first ShellResource
  • Adds ibatch and datetime formatting as tools to datagrowth
  • Moves all configuration to datagrowth and improves the flow of registering configuration defaults
  • Migrates ImageDownload into separate models per app
  • Adds TopicDetector and EntityDetector
  • Fixes some problems with Wikipedia, but disables the feeds on production for now due to performance issues
  • Migrates files to structure that scales well