25 Feb 11:20

netj

a10d96a

DeepDive UNSTABLE Pre-release

Pre-release

This automatically updated prerelease includes all the latest features and fixes merged to master. This prerelease is mainly intended for development use and not recommended for normal DeepDive users.

Assets 4

25 Feb 11:10

netj

v0.8-STABLE

28a58de

DeepDive 0.8.x Latest

Latest

This automatically updated release includes the latest fixes done to the last v0.8.x release.

Assets 4

19 Feb 03:32

netj

v0.8.0

ec64859

DeepDive 0.8.0

A completely re-architected version of DeepDive is here.
Now the system compiles an execution plan ahead of time, checkpoints at a much finer granularity, and gives users full visibility and control of the execution, so any parts of the computation can be flexibly repeated, resumed, or optimized later.
The new architecture naturally enforces modularity and extensibility, which enables us to innovate most parts independently without having to understand every possible combination of the entire code.
The abstraction layers that encapsulate database operations as well as compute resources are now clearly established, giving a stable ground for extensions in the future that support more types of database engines and compute clusters such as Hadoop/YARN and ones with traditional job schedulers.

As an artifact of this redesign, exciting performance improvements are now observed:

The database drivers show more than 20x higher throughput (2MB/s -> 50MB/s, per connection) with zero storage footprint by streaming data in and out of UDFs.
The grounded factor graphs save up to 100x storage space (12GB -> 180MB) by employing compression during the factor graph's grounding and loading, incurring less than 10% overhead in time (400s -> 460s, measuring only the dumping and loading, hence a much smaller fraction in practice).

See the issues and pull requests for this milestone on GitHub (most notably #445) for further details.

New commands and features

An array of new commands have been added to deepdive, and existing ones have been rewritten, such as deepdive initdb and deepdive run.

Compilation/Execution
- deepdive compile
- deepdive plan
- deepdive do
- deepdive redo
- deepdive mark
- deepdive done
Learning/Inference
- deepdive model
Data management
- deepdive create
- deepdive load
- deepdive unload
- deepdive query
- deepdive db
UDFs
- deepdive check
- deepdive compute
- @tsv_extractor, @returns Python decorators for parsing and formatting in UDFs.
Interactive tools

The bundled Mindbender can now automatically construct a search and browsing interface from DDlog annotations.
Documentation for Dashboard has been added.
- mindbender search
- mindbender dashboard
- mindbender snapshot
- mindbender tagger
Miscellaneous
- deepdive whereis

To learn more about individual deepdive COMMAND, use the following deepdive help command.

deepdive help COMMAND

Dropped and deprecated features

Scala code base has been completely dropped and rewritten in Bash and jq.
Many superfluous features have been dropped and are deprecated to be dropped as summarized below:

All other extractor style than tsv_extractor, sql_extractor, and cmd_extractor have been dropped, namely:
- plpy_extractor
- piggy_extractor
- json_extractor
Manually writing deepdive.conf is strongly discouraged as filling in more fields such as dependencies: and input_relations: became mandatory.
Rewriting them in DDlog is strongly recommended.
Database configuration in deepdive.db.default is completely ignored.
db.url must be used instead.
deepdive.extraction.extractors.*.input in deepdive.conf should always be SQL queries.
TSV(filename.tsv) or CSV(filename.csv) no longer supported.

Assets 4

28 Sep 18:56

netj

v0.7.1

86cf6b7

v0.7.1: DeepDive 0.7.1

Adds better support for applications written in DDlog.
deepdive run now runs DDlog-based applications (app.ddlog).
Makes PL/Python extension no longer necessary for PostgreSQL.
It is still needed for Greenplum and PostgreSQL-XL.
Adds deepdive sql eval command now supports format=json.
Adds deepdive load command for loading TSV and CSV data.
Adds deepdive help command for quick usage instructions for deepdive command.
Includes the latest Mindbender with the Search GUI for browsing data produced by DeepDive.
Adds various bug fixes and improvements.

Assets 4

13 Jul 23:06

netj

v0.7.0

f4e6dfe

v0.7.0

Provides a new command-line interface deepdive with a new standard DeepDive application layout.
- No more installation/configuration complication: Users run everything through the only deepdive command, and everything just works in any environment. The only possible failure mode is not being able to run deepdive command, e.g., by not setting up the PATH environment correctly.
- No more pathname/environment clutter in apps: repeated settings for DEEPDIVE_HOME, APP_HOME, PYTHONPATH, LD_LIBRARY_PATH, PGHOST, PGPORT, ... in run.sh or env.sh or env_local.sh or env_db.sh or etc. are gone. Path names (e.g., extractor udf) in application.conf are all relative to the application root, and brittle relative paths are no longer used in any of the examples.
- Clear separation of app code from infrastructure code, as well as source code from object code: No more confusing of deepdive source tree with binary/executable/shared-library distribution or temporary/log/output directories.
- Binary releases can be built with make package.
Here are a summary of changes visible to users:
- Application settings is now kept in deepdive.conf file instead of application.conf.
- Database settings is now done by putting everything (host, port, user, password, database name) into a single URL in file db.url.
- Path names (e.g., extractor udf) in deepdive.conf are all relative to the application root unless they are absolute paths.
- SQL queries against the database can be run easily with deepdive sql command when run under an application.
- Database schema is now put in file schema.sql and optional initial data loading can be done by a script input/init.sh. Input data is recommended to be kept under input/.
- By passing the pipeline name as an extra argument to the deepdive run command, different pipelines can be run very easily: No more application.conf editing.
- Logs and outputs are placed under application root, under snapshot/.
Adds piggy extractor that replaces the now deprecated plpy extractor.
Includes the latest DDlog compiler with extended syntax support for writing more real world applications.
Includes the latest Mindbender with Dashboard GUI for producing summary reports after each DeepDive run and interactively analyzing data products.

Assets 4

17 Jun 19:10

netj

v0.6.0

b06be85

v0.6.0

Adds DDlog for writing applications in Datalog-like syntax.
Adds support for incremental development cycles.
Adds preliminary support for Postgres-XL backend.
Simplifies installation on Ubuntu and Mac with a quick installer that takes care of all dependencies.
Drops maintenance of AMI favoring the new quick installer.
Fixes sampler correctness issues.
Drops "FeatureStatsView" view due to performance issues.
Corrects various issues.
Starts using Semantic Versioning for consistent and meaningful version numbers for all future releases.

Assets 2

09 Feb 06:11

mikecafarella

0.05-RELEASE

4fdfe91

0.05-RELEASE

Changelog for release 0.0.5-alpha (02/08/2015)

Added support to build Docker images for DeepDive. See the README.md for more.
Added SQL "FeatureStatsView" view. Populated with feature
statistics; useful for debugging.
Added a few fixes to greenplum docs
Added parallel greenplum loading for extractor data
A few misc bugfixes

Assets 2

25 Nov 07:07

rionda

0.04.1-RELEASE

ac18f54

0.04.1-RELEASE

Changelog for release 0.0.4.1-alpha (11/25/2014)

This release focuses mostly on bug fixing and minor new features.

Improve handling of failures in extractors and inference rules.
Add support for running tests on GreenPlum.
Add support for -q, --quiet in the DimmWitted sampler. This allows to
reduce the verbosity of the output.
Remove some dead code.
Fix a small bug in the spouse_example test.

Assets 2

20 Nov 11:37

rionda

0.04-RELEASE

a98b5c7

0.04-RELEASE

Changelog for release 0.0.4-alpha (11/19/2014)

This release focuses mostly on new features and bug fixing.

Added experimental support for MariaDB / MySQL / MySQL Cluster. See
Using DeepDive with MySQL for
details, including limitations of the current support. The code base was
refactored to make it much easier to add support for additional DBMS in the
future.
Ported Tuffy to DeepDive. It is now
possible to run Tuffy programs for Markov Logic Networks on DeepDive. See
Markov Logic Networks for details.
Added a graphical interface called Mindtagger to label data products for
estimating precision/recall. See Labeling Data Products of
DeepDive and files under examples/labeling/ in
the source tree.
Added support for the DEEPDIVE_HOME environmental variable. It's now possible
to run applications from any location, when this variable is set. See
Installation for details.
Added support for -c datacopies to the DimmWitted sampler (Linux only!).
This allows to control the number of replications of the data. It is useful
for performing inference on very large factor graphs while leveraging on
NUMA. See The DimmWitted High-Speed Sampler for
details.
Fixed integer overflow bug (and use of scientific notation) in tobinary.py.
This allows to use DeepDive for inference on very large factor graphs.
Fix a bug when using multinomial and GreenPlum: the mapping from weight ID and
weight description is not consistent
Fixed various bugs (including known JDBC bug) that prevented DeepDive from
performing inference on very large factor graphs.

Assets 2

26 May 04:50

zifeishan

v0.0.3-alpha.1

acb7036

DeepDive 0.0.3-alpha.1 Pre-release

Pre-release

Changelog for version 0.0.3-alpha.1 (05/25/2014)

Updated example walkthrough and spouse_example code
Added python utility ddlib for text manipulation (need exporting PYTHONPATH, usage see its pydoc)
Added utility script util/extractor_input_writer.py to sample extractor inputs
Updated nlp_extractor format (use sentence_offset, textual sentence_id)
Cleaned up unused datastore code
Update templates
Bug fixes

Changelog for version 0.0.3-alpha (05/07/2014)

Non-backward-compatible syntax change: Developers must include id column with type bigint in any table containing variables, but they MUST NOT use this column anywhere. This column is preserved for learning and inference, and all values will be erased and reassigned in grounding phase.
Non-backward-compatible functionality change: DeepDive is no longer responsible for any automatic assignment of sequential variable IDs. You may use examples/spouse_example/scripts/fill_sequence.sh for this task.
Updated dependency requirement: requires JDK 7 or higher.
Supported four new types of extractors. See documentation for details:
Even faster factor graph grounding and serialization using better optimized SQL.
The previous default Java sampler is no longer supported. Made C++ sampler as the default sampler.
New configuration supported: pipeline.relearn_from to skip extraction and grounding, only perform learning and inference with a previous version. Useful for tuning sampler arguments.
New configuration supported: inference.skip_learning to use weights learned in the last execution.
New configuration supported: inference.weight_table to fix factor weights in a table and skip learning. The table is specified by factor description and weights. This table can be results from one execution of DeepDive, or manually assigned, or a combination of them. It is useful for learning once and using learned model for later inference tasks.
Supported manual holdout by a holdout query.
Updated spouse_example with implementations in different styles of extractors.
The nlp_extractor example has changed table requirements and usage. See HERE.
In db.default configuration, users should define dbname, host, port and user. If not defined, by default system will use environmental variables DBNAME,PGHOST, PGPORT and PGUSER accordingly.
Fixed all examples.
Updated documentation.
Print SQL query execution plans for extractor inputs.
Skip grounding, learning and inference if no factors are active.
If using GreenPlum, users should add DISTRIBUTED BY clause in all CREATE TABLE commands. Do not use variable id as distribution key. Do not use distribution key that is not initially assigned.

Assets 2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

New commands and features

Dropped and deprecated features

Changelog for release 0.0.5-alpha (02/08/2015)

Changelog for release 0.0.4.1-alpha (11/25/2014)

Changelog for release 0.0.4-alpha (11/19/2014)

Changelog for version 0.0.3-alpha.1 (05/25/2014)

Changelog for version 0.0.3-alpha (05/07/2014)

Releases: HazyResearch/deepdive

DeepDive UNSTABLE

DeepDive 0.8.x

DeepDive 0.8.0

New commands and features

Dropped and deprecated features

v0.7.1: DeepDive 0.7.1

v0.7.0

v0.6.0

0.05-RELEASE

Changelog for release 0.0.5-alpha (02/08/2015)

0.04.1-RELEASE

Changelog for release 0.0.4.1-alpha (11/25/2014)

0.04-RELEASE

Changelog for release 0.0.4-alpha (11/19/2014)

DeepDive 0.0.3-alpha.1

Changelog for version 0.0.3-alpha.1 (05/25/2014)

Changelog for version 0.0.3-alpha (05/07/2014)