Skip to content

DeepDive 0.0.3-alpha.1

Pre-release
Pre-release
Compare
Choose a tag to compare
@zifeishan zifeishan released this 26 May 04:50
· 4333 commits to master since this release

Changelog for version 0.0.3-alpha.1 (05/25/2014)

  • Updated example walkthrough and spouse_example code
  • Added python utility ddlib for text manipulation (need exporting PYTHONPATH, usage see its pydoc)
  • Added utility script util/extractor_input_writer.py to sample extractor inputs
  • Updated nlp_extractor format (use sentence_offset, textual sentence_id)
  • Cleaned up unused datastore code
  • Update templates
  • Bug fixes

Changelog for version 0.0.3-alpha (05/07/2014)

  • Non-backward-compatible syntax change: Developers must include id column with type bigint in any table containing variables, but they MUST NOT use this column anywhere. This column is preserved for learning and inference, and all values will be erased and reassigned in grounding phase.
  • Non-backward-compatible functionality change: DeepDive is no longer responsible for any automatic assignment of sequential variable IDs. You may use examples/spouse_example/scripts/fill_sequence.sh for this task.
  • Updated dependency requirement: requires JDK 7 or higher.
  • Supported four new types of extractors. See documentation for details:
  • Even faster factor graph grounding and serialization using better optimized SQL.
  • The previous default Java sampler is no longer supported. Made C++ sampler as the default sampler.
  • New configuration supported: pipeline.relearn_from to skip extraction and grounding, only perform learning and inference with a previous version. Useful for tuning sampler arguments.
  • New configuration supported: inference.skip_learning to use weights learned in the last execution.
  • New configuration supported: inference.weight_table to fix factor weights in a table and skip learning. The table is specified by factor description and weights. This table can be results from one execution of DeepDive, or manually assigned, or a combination of them. It is useful for learning once and using learned model for later inference tasks.
  • Supported manual holdout by a holdout query.
  • Updated spouse_example with implementations in different styles of extractors.
  • The nlp_extractor example has changed table requirements and usage. See HERE.
  • In db.default configuration, users should define dbname, host, port and user. If not defined, by default system will use environmental variables DBNAME,PGHOST, PGPORT and PGUSER accordingly.
  • Fixed all examples.
  • Updated documentation.
  • Print SQL query execution plans for extractor inputs.
  • Skip grounding, learning and inference if no factors are active.
  • If using GreenPlum, users should add DISTRIBUTED BY clause in all CREATE TABLE commands. Do not use variable id as distribution key. Do not use distribution key that is not initially assigned.