Skip to content

Commit

Permalink
Merge branch 'master' into v0.7.x
Browse files Browse the repository at this point in the history
  • Loading branch information
netj committed Sep 28, 2015
2 parents f530d4b + 4dcce35 commit 86cf6b7
Show file tree
Hide file tree
Showing 213 changed files with 4,655 additions and 2,304 deletions.
2 changes: 1 addition & 1 deletion .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -40,7 +40,7 @@ before_install:

install:
- make depends
- util/install.sh postgres
- sudo apt-get install -qq -y postgresql-plpython-`ls -1 /var/lib/postgresql | head -n 1` # XXX for piggy and plpy extractor tests
- make test-build # XXX doing it here to hide the noise from sbt and unzip

script:
Expand Down
15 changes: 9 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -88,10 +88,13 @@ include scala.mk # for scala-build, scala-test-build, scala-assembly-jar, scala

### test recipes #############################################################

test/%/scalatests.bats: test/postgresql/update-scalatests.bats.sh $(SCALA_TEST_SOURCES)
test/*/scalatests/%.bats: test/postgresql/update-scalatests.bats.sh $(SCALA_TEST_SOURCES)
# Regenerating .bats for Scala tests
$< >$@
chmod +x $@
$<

# make sure test is against the code built and staged by this Makefile
DEEPDIVE_HOME := $(realpath $(STAGE_DIR))
export DEEPDIVE_HOME

# make sure test is against the code built and staged by this Makefile
DEEPDIVE_HOME := $(realpath $(STAGE_DIR))
Expand Down Expand Up @@ -121,12 +124,12 @@ endif
.PHONY: build-mindbender
build-mindbender:
git submodule update --init mindbender
$(MAKE) -C mindbender
cp -f mindbender/mindbender-LATEST-*.sh util/mindbender
$(MAKE) -C mindbender clean-packages
$(MAKE) -C mindbender package

.PHONY: build-ddlog
build-ddlog:
git submodule update --init ddlog
$(MAKE) -C ddlog ddlog.jar
cp -f ddlog/ddlog.jar util/ddlog.jar

test-build build: build-ddlog
4 changes: 3 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,7 @@
# DeepDive [![Build Status](https://travis-ci.org/HazyResearch/deepdive.svg?branch=master)](https://travis-ci.org/HazyResearch/deepdive) [![Coverage Status](https://coveralls.io/repos/HazyResearch/deepdive/badge.svg?branch=master)](https://coveralls.io/r/HazyResearch/deepdive)

<strong><big>See [deepdive.stanford.edu](http://deepdive.stanford.edu).</big></strong>
<strong><big>See [deepdive.stanford.edu](http://deepdive.stanford.edu) to install and start writing DeepDive applications.</big></strong>

Refer to the [DeepDive Developer's Guide](https://github.com/HazyResearch/deepdive/blob/master/doc/doc/advanced/developer.md#readme) for details on working with this source tree.

Licensed under [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0.txt).
2 changes: 1 addition & 1 deletion build.sbt
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
name := "deepdive"

version := "0.7.0"
version := "0.7.1"

scalaVersion := "2.10.5"

Expand Down
2 changes: 1 addition & 1 deletion ddlog
Submodule ddlog updated 106 files
22 changes: 22 additions & 0 deletions doc/_includes/googleanalytics.html
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
<script>
(function(i,s,o,g,r,a,m){i['GoogleAnalyticsObject']=r;i[r]=i[r]||function(){
(i[r].q=i[r].q||[]).push(arguments)},i[r].l=1*new Date();a=s.createElement(o),
m=s.getElementsByTagName(o)[0];a.async=1;a.src=g;m.parentNode.insertBefore(a,m)
})(window,document,'script','//www.google-analytics.com/analytics.js','ga');

ga('create', 'UA-65379202-1', 'auto');
ga('send', 'pageview');

</script>

<script>
var trackOutboundLink = function(url) {
ga('send', 'event', 'outbound', 'click', url, {'hitCallback':
function () {
document.location = url;
}
});
}
</script>


5 changes: 0 additions & 5 deletions doc/_includes/js/segmentio.js

This file was deleted.

7 changes: 0 additions & 7 deletions doc/_includes/js/site.js
Original file line number Diff line number Diff line change
@@ -1,7 +0,0 @@
if (window.location.href.indexOf('http://dennybritz.github.io') === 0) {
window.location.href = 'http://deepdive.stanford.edu/';
}

$(function(){
analytics.trackLink($("a[href='https://github.com/hazyresearch/deepdive/archive/master.zip']"), "click_github_download");
})
4 changes: 2 additions & 2 deletions doc/_layouts/default.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@
<script src="http://code.jquery.com/jquery-1.10.1.min.js"></script>
<script src="{{ site.baseurl }}/javascripts/application.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>DeepDive</title>
<title>{% if page.title %}{{ page.title }} - {% endif %}DeepDive</title>
</head>

<body>
{% include js/segmentio.js %}
{% include googleanalytics.html %}
<a href="https://github.com/hazyresearch/deepdive" target="_blank"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png" alt="Fork me on GitHub"></a>

<div id="header">
Expand Down
4 changes: 2 additions & 2 deletions doc/_layouts/homepage.html
Original file line number Diff line number Diff line change
Expand Up @@ -10,11 +10,11 @@
<script src="http://code.jquery.com/jquery-1.10.1.min.js"></script>
<script src="{{ site.baseurl }}/javascripts/application.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>DeepDive</title>
<title>{% if page.title %}{{ page.title }} - {% endif %}DeepDive</title>
</head>

<body>
{% include js/segmentio.js %}
{% include googleanalytics.html %}
<a href="https://github.com/hazyresearch/deepdive" target="_blank"><img style="position: absolute; top: 0; right: 0; border: 0;" src="https://s3.amazonaws.com/github/ribbons/forkme_right_red_aa0000.png" alt="Fork me on GitHub"></a>

<div id="header">
Expand Down
4 changes: 2 additions & 2 deletions doc/_layouts/landing.html
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,11 @@
<link rel="stylesheet" type="text/css" href="{{ site.baseurl }}/stylesheets/application.css" />
<script src="http://code.jquery.com/jquery-1.10.1.min.js"></script>
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Deepdive</title>
<title>{% if page.title %}{{ page.title }} - {% endif %}DeepDive</title>
</head>

<body>
{% include js/segmentio.js %}
{% include googleanalytics.html %}

<section id="landing">
<div class="container">
Expand Down
57 changes: 47 additions & 10 deletions doc/doc/advanced/deepdiveapp.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
layout: default
title: DeepDive Application's Structure and Operations
---

# DeepDive Application
Expand All @@ -8,25 +9,47 @@ layout: default

A DeepDive application is a directory that contains the following files and directories:

* `deepdive.conf`
* `app.ddlog`

Extractors, and inference rules are written in [HOCON][] syntax in this file.
See the [Configuration Reference](http://deepdive.stanford.edu/doc/basics/configuration.html) for full details.
Schema, extractors, and inference rules written in our higher-level language, [DDlog][], are put in this file.

* `db.url`

A URL representing the database configuration is supposed to be stored in this file.
For example, `postgresql://user:password@localhost:5432/database_name` can be the line stored in it.
For example, the following URL can be the line stored in it:

```
postgresql://user:password@localhost:5432/database_name
```

[SSL connections for PostgreSQL](https://jdbc.postgresql.org/documentation/91/ssl.html) can be enabled by setting parameter `ssl` as true in the URL, e.g.:

```
postgresql://user:password@localhost:5432/database_name?ssl=true
```

If you use a self-signed certificate, you may want to disable validation with an extra `sslfactory` parameter:

```
postgresql://user:password@localhost:5432/database_name?ssl=true&sslfactory=org.postgresql.ssl.NonValidatingFactory
```

* `deepdive.conf`

Extra configuration not expressed in the DDlog program is in this file.
Extractors, and inference rules can be also be written in [HOCON][] syntax in this file, although DDlog is the recommended way.
See the [Configuration Reference](http://deepdive.stanford.edu/doc/basics/configuration.html) for full details.

* `schema.sql`

Data-Definition Language (DDL) statements for setting up the underlying database tables should be kept in this file.
This may be omitted when the application is written in DDlog.

* `input/`

Any data to be processed by this application is suggested to be kept under this directory.

* `load.sh`
* `init.sh`

In addition to the data files, there should be an executable script that knows how to load the data here to the database once its tables are created.

Expand All @@ -37,10 +60,11 @@ A DeepDive application is a directory that contains the following files and dire

* `run/`

Each run of the DeepDive application has a corresponding subdirectory under this directory whose name contains the timestamp when the run was started, e.g., `run/20150618-223344.567890/`.
Each run of the DeepDive application has a corresponding subdirectory under this directory whose name contains the timestamp when the run was started, e.g., `run/20150618/223344.567890/`.
All output and log files that belong to the run are kept under that subdirectory.
There are a few symbolic links with mnemonic names to the most recently started run, last successful run, last failed run for handy access.

[DDlog]: ../basics/ddlog.html
[HOCON]: https://github.com/typesafehub/config/blob/master/HOCON.md#readme "Human Optimized Configuration Object Notation"


Expand All @@ -49,18 +73,30 @@ A DeepDive application is a directory that contains the following files and dire
There are several operations that are frequently performed on a DeepDive application.
Any of the following command can be run under any subdirectory of a DeepDive application to perform a certain operation.

To see all options for each command, such as specifying alternative configuration file for running, see the online help message with the `deepdive help` command. For example:

```bash
deepdive help run
```

### Initializing Database

```bash
deepdive initdb
deepdive initdb [TABLE]
```

This command initializes the underlying database configured for the application by creating necessary tables and loading the initial data into them.
It makes sure the following:
If `TABLE` is not given, it makes sure the following:

1. The configured database is created.
2. The tables defined in `schema.sql` are created.
3. The data that exist under `input/` are loaded into the tables with the help of `load.sh`.
2. The tables defined in `schema.sql` (for deepdive application) or `app.ddlog` (for ddlog application) are created.
3. The data that exists under `input/` is loaded into the tables with the help of `init.sh`.

If `TABLE` is given, it will make sure the following:

1. The configured database is created.
2. The given table is created.
3. The data that exists under `input/` is loaded into the `TABLE` with the help of `init_TABLE.sh`.


### Running Pipelines
Expand Down Expand Up @@ -96,6 +132,7 @@ deepdive sql "SELECT doc_id, COUNT(*) FROM sentences GROUP BY doc_id"
```

To get the result as tab-separated values (TSV), or comma-separated values (CSV), use the following command:

```bash
deepdive sql eval "SELECT doc_id, COUNT(*) FROM sentences GROUP BY doc_id" format=tsv
deepdive sql eval "SELECT doc_id, COUNT(*) FROM sentences GROUP BY doc_id" format=csv header=1
Expand Down
1 change: 1 addition & 0 deletions doc/doc/advanced/developer.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
layout: default
title: DeepDive Developer's Guide
---

# DeepDive Developer's Guide
Expand Down
1 change: 1 addition & 0 deletions doc/doc/advanced/docker.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
layout: default
title: Using DeepDive with Docker
---

# Using DeepDive with Docker
Expand Down
16 changes: 5 additions & 11 deletions doc/doc/advanced/ec2.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
layout: default
title: Using DeepDive on EC2
---

# Using DeepDive on EC2
Expand All @@ -22,24 +23,17 @@ The following are the steps needed to launch an EC2 instance and start DeepDive
- Run the following command to install Postgres and DeepDive.

```bash
curl -fsSL deepdive.stanford.edu/install | bash -s postgres deepdive
bash <(curl -fsSL deepdive.stanford.edu/install) postgres deepdive
```

You will be asked to create a new password for Postgres user account.
Remember to set `PGPASSWORD` environment varible to that password, e.g., `pa$$w0rd`, to let DeepDive and `psql` command access the database.

```bash
export PGPASSWORD='pa$$w0rd'
```

- Navigate to `./deepdive` and run tests to confirm that the
- Optionally, you can run tests to confirm that the
installation was successful.

```bash
cd ./deepdive
make test
bash <(curl -fsSL deepdive.stanford.edu/install) run_deepdive_tests
```


### Notes

- For improved I/O performance the postgresql data directory is created on the
Expand Down
1 change: 1 addition & 0 deletions doc/doc/advanced/factor_graph_schema.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
layout: default
title: Factor Graph Grounding Output Reference
---

# Factor Graph Grounding Output Schema Reference
Expand Down
35 changes: 33 additions & 2 deletions doc/doc/advanced/greenplum.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,6 @@
---
layout: default
title: Using DeepDive with Greenplum
---

# Using DeepDive with Greenplum
Expand Down Expand Up @@ -129,8 +130,7 @@ Be sure to **reboot** after changing these kernel parameters.

Download Greenplum for your operating system. For a free Community Edition, you
can find the download link and the official guide on the [GoPivotal
website](http://www.gopivotal.com/products/pivotal-greenplum-database), or you
can download it directly [here](http://downloads.cfapps.io/gpdb_db_el5_64).
website](http://www.gopivotal.com/products/pivotal-greenplum-database).

Install Greenplum using the downloaded package:

Expand All @@ -156,6 +156,7 @@ source /usr/local/greenplum-db/greenplum_path.sh
### Configure ssh with localhost

Now you need to generate ssh keys for `localhost`. Run:

```bash
$ gpssh-exkeys -h localhost
```
Expand Down Expand Up @@ -252,6 +253,36 @@ postgres=# \q

Use `gpstop` and `gpstart` to stop / start the Greenplum server at any time.

### <a name="parallelgrounding" href="#"></a> Parallel grounding
[Grounding](../basics/overview.html#grounding) is the process of building the
factor graph. You can enable parallel grounding to speed up the grounding phase,
which makes use of Greenplum's parallel file system (gpfdist). To use parallel
grounding, first make sure that Greenplum's file system server `gpfdist` is running
locally, i.e., on the machine where you will run the DeepDive applications.
If it is not running, you can use the following command to start gpfdist

gpfdist -d [directory] -p [port] &

where you specify the directory for storing the files and the HTTP port to run on.
The directory should be an **empty directory** since DeepDive will clean up
this directory or overwrite files.
Then, in `deepdive.conf`, specify the gpfdist settings in the `deepdive.db.default` as
follows

db.default {
gphost : [host of gpfdist]
gpport : [port of gpfdist]
gppath : [**absolute path** of gpfdist directory]
}

where gphost, gpport, gppath are the host, port, and absolute path
gpfdist is running on (specified when starting gpfdist server).

Finally, tell DeepDive to use parallel grounding by adding the following to
`deepdive.conf`:

inference.parallel_grounding: true

## <a name="faq" href="#"></a> FAQs

- **When I use Greeplum, I see the error "ERROR: data line too long. likely due to
Expand Down
Loading

0 comments on commit 86cf6b7

Please sign in to comment.